Skip to end of metadata
Go to start of metadata
These are the notes sent round by Miquel summarizing the internal UPC on the Social Network Benchmark task force.
  • Miquel, Larri, David?, Norbert?, ??

1 Task Force Re-Cap

Current definition of the task force (meeting minutes) includes:

1.A Data Generator

Start with the Social Intelligent Benchmark (SIB) data generator (S3G2)

1.B Query Workloads

The TF will generate 2 different benchmarks
     1. Transactional benckmark with updates (add user, friend, discussion, discussion post, photo, photo tag, add/delete friend), lookups and simple analytics
     2. Analytical benchmark with global graph analytics and possibly data load and enrichment (to be discussed).

1.C Metrics

     1.  The transactional benchmark has a Throughput score
        1.1 Measure  the max amount of queries/sec, given a concurrency level of choice
        1.2 does it also have a Power Score?
    2. The analytical benchmark has both a Power and Throughput Score
        2.1 benchmark bulk load time may be incorporated into the scores

2 Points that emerged during the internal meeting

2.A) Along the three different axis (Data, Queries and Metrics), our opinion is that the data generation part is the most important at the beginning. In this sense we think we should put more effort on this topic in the next months. For the benchmarks have credibility and impact, the data used should be as close as possible to real data. This includes not only topological properties (such as degree distribution, large component, etc.), but also content distribution, that is how the attributes of the nodes/edges are distributed and correlated over the whole network and also the temporal dimension, which is an important component in social networks. In this sense, we have a good starting point with the S3G2 data generator from Peter and Orri, which already handles some of these points. Thus, maybe a further discussion on this data generation issues could be interesting in order to: a) see exactly what characteristics of the data do we expect and b) see whether the existing data generation tools meet these requirements or not and what to do. Another point that emerged during the discussion was to try to ask real social network providers such as Facebook, Twitter, Tuenti, … for real (anonimized) data to try to extract nd model characteristics from that to be included in the data generation process.
2.B) Although the task force comprises two benchmarks (transactional and analytical), the possibility to focus on one of them, either the transactional or the analytical one, during the first year was on the table during the meeting. The reason for that, is that this way we focus all our efforts on just one direction avoiding to have to manage too much things at the same time. But, we could also develop the two benchmarks in parallel as well.
2.C) For each of the three axis, we think that a first period of discussion among all the TF members and then an implementation phase would be a good way to achieve the objectives.

3 Possible calendar

3.A Data generation

Given that there is a task related to this point and the associated delivrable (D2.2.1) is planned for month 12 (september 2013), we could leave 3 months for discussion and data acquisition and analysis (this means march/april 2013) and then 3 months more for the implementation (june-july 2013). Specific tasks would include:
 * Discussion and acquisition:
 * Look at S3G2 data generator -> LDBC members of the TF
 * Investigate the data generator generator from TU Berlin? (I'm not sure about that)
 * Ask for data to Santiago Murillo (Carlos?)
 * Ask for data to Facebook, Twitter, …
 * Data analysis and characterization
 * Data generator implementation
 * TBD

3.B Query definition

For each of the proposed benchmarks, regardless of the decision of developing both benchmarks in parallel or not, a first discussion/definition period could comprise march-may 2013 and a second period of implementation could be june-august 2013. Specific tasks would include:
 * Look at the SIB W3C queries (all TF members)
 * Look at the queries to make sure they are affected by data correlations (Peter and Orri)
 * Look at the TPCTC 2012 paper on graph database benchmark
 * Review specs for query definition (Santiago + Carlos)
 * Implementation
 * Transactional queries (Peter and Orri?)
 * Analytical queries (UPC and NEO?)

3.C Metrics

This last part could be split into a first definition part (june-august 2013) and an implementation part (september-november 2013).
 * Define metrics for the transactional case:
 * Throughput score: Queries/sec given a concurrency level
 * Power score?
 * Define metrics for the transactional case:
 * Throughput score: Queries/sec given a concurrency level
 * Power score
 * Review specs for metric definition (Santiago + Carlos)
 * Implementation


  • No labels