These are the notes for the skype conference call held on Wednesday january 23 2013, on the progress and activities of the Social Network Benchmark task force (11:30-12:45).
- Larri, Norbert (UPC); Orri (OGL); Peter, Renzo (VUA) ; Alex Averbuch (NEO), Andrey (TUM)
Peter asked Alex to explain his remarks on the possibility or need to change the schema to make it better suited for graph database systems. Alex responded by giving the example of the posts in the forum: should all the message nodes be connected by an edge with the forum, or should they form e.g. a linked list? This kind of detail shows a difference from relational thinking, where relationships between nodes are computed, rather than explicitky created. The bigger question is how specific the SIB should be in specifying the exact shape of the data. It may well be that one shape favors one kind of system over the other. Alex argued that traversal patterns would be incomparable if performed over two different implementations of the same schema. Norbert and Peter were of the opinion that while each implementation should encode all structural information (in the example case, forum posts form a tree), yet the exact mapping should be tunable to the systems strengths.
Norbert explained the latest work on the SIB dataset grah measure computations. He also promised to provide more detailed feedback on the current SIB queries. From discussions with the Havas/Accesso users, the SIB schema was largely validated. However, there are some suggestions for adaptation (treating posts and photos similarly) and also to add Twitter-like retweet functionality to these posts. The concept of followers might be mapped on friends, but Norbert is going to check again with the users
Andrey explained that he had shifted focus towardsYAGO for certain choke point examples. Andrey stated that he found the RDF-H data too regular. Orri said that it still provides useful chokepoints. Andrey said that query patterns with many self-joins can also be obtained in YAGO. YAGO has large sub-type hierarchies, which can play a role in certain choke points. Peter expressed a bit of disappointment to abandon SIB for the choke points, but Andrey said that YAGO or DBpedia could still be part of the SIB dataset, for data enrichment. A discussion ensued with the graph vendor members as to whether it is realistic to ask graph database systems to load and query datasets such as YAGO or DBpedia. This turns out to be not too far-fetched.
(off topic) Finally, the discussion turned to the TUC neeting. Peter asked Alex whether NEO would bebringing some users to Munich and he said they were trying. We are also interested in getting Havas/Accesso to Munich, and all partners in general were encouraged to start thing about this.
- will come up with some detailed finetuning of the SIB schema as Neo would like it
- will try to approach some social network users for the TUC meeting in Munich
- UPC to continue working with the S3G2 data ganerator and adapt it (document of work here)
- to analyze the graph metrics computed on SIB data
- provide pre-generated datasets with README to NEO
- together with Duc create a true 'scale factor' that allows to predictably generate a dataset of a certain size
- UPC to approach Accesso and Havas Media (partly done - postponed now)
- show the social network schema and query sets and ask for feedback (done)
- try to obtain real datasets (Facebook, Twitter, etc)
- run existing DEX code to compute graph metrics and compare with S3G2
- timestamps in S3G2 [ postponed ]
- Duc to explain the current situation (what timestamps are generated and with what constrains and correlations)
- Duc to explain and share his stream data generator
- UPC to enhance the stream generator to generate an update query workload
- UPC to design a mechanism to separate a S3G2 dataset in a snapshot and a subsequent stream of updates
- query choke points
- Andrey to modify the transactional queries so they include optionals and are affected by correlations (add more parameters)
- include work to devise a mechanism to "learn" similar parameter bindings of correlated parameters with the same selectivities
- Andrey to encode the analytical SIB queries in SPARQL
- providing more choke point ideas
- Orri offered to provide more, input from others also welcome (Peter?, Andrey/Thomas?, ...)