|TG1||16/Jan||23/Jan||Quantization of population||UPC|
This task consists in quantizing the populations of countries, so they can become more predictable and stable. The goal is to reduce the variation of the execution times when different workloads with the same parameters are executed.
The result is a new version of the generator where the populations of the countries have been quantized, so the generated data respects this new populations.
|DONE. To be validated by OGL.|
|TG2||23/Jan||20/Mar||Quantization of the rest of characteristics||UPC|
Like the previous task, this task consists in quantizing the rest of characteristics of the graph, if any, for the same goal as exposed before. For example, the number of Friends (still not approved).
|Similar to the previous task, but with the rest of the characteristics||Changes are going to be developed as they are a requirement. Some of the potential new quantizations will depend on substitution parameters.|
|TG3||6/Feb||13/Feb||Proposal of changes for Flashmob post generation||UPC|
The goal of this task is to first, do some research about how the generation of posts correlated with fhaslmob events in the real world can be implemented.
|The result of this task is a proposal of changes to the generator.||DONE. There is a wiki page explaining the generation process.|
|TG4||13/Feb||27/Feb||Implementation of Flashmob post generation||UPC||This task consists in implementing the proposal of the previous task.|
The result is a new version of the generator, where the changes proposed in the previous task have been implemented, and hence the posts data generated are correlated with flashmob events.
DONE. There have been some validation of the results produced performed by Arnau Prat.
|TG9||12/Mar||20/Mar||Deterministic||Implementation of determinism||The generator have to be deterministic independently of the number of machines used.|
Implementation of the update stream workload.
A new version of the generator, where the update streams (or the data required) are generated along with the current data.
Implement the corresponding changes into the generator, so the metadata needed by the workload generator to generate the workload is provided for a given data set.
A new version of the generator which produces the metadata required by the workload generator.
|DISCARDED. The workload generators will use their own metadata and setup files.|
|TG7||27/Mar||30/Mar||Documentation||UPC||Finish all the documentation regarding the data generation process.||A polished and complete documentation of the generator.|
|TG8||-||-||New IDs (URIs) with timestamp prefix||UPC||All entities with creationDate (e.g. Person, Post, ...) will have new IDs (URIs) which are the concatenation of the creationDate and the current ID||New long IDs (URIs) that can be sorted by creation date|
|TG10||01/Abr||02/Abr||Facebook-like||UPC||Implementation of a facebook-like degree distribution.||The data generator produces a degree distribution similar to that expected in facebook.|
Interactive Workload & QGEN
Scale factors for the interactive query set
This task will identify the scale factors for the interactive benchmark such as the number of users, number of years, the required space for storage and the number of clients ... Regarding the transactions in the interactive queries, this task defines the ratio between the transaction load presented to the SUTs, the cardinality of the tables accessed by the transactions, ...
|The basic unit of scaling and scaling factors are defined.|
|TQ2.1||9/Feb||21/Feb||Finish Interactive workload||OGL||12 queries done 8 queries to go, will be delivered in both SPARQL and SQL. We expect to contribute more to the interactive mix, specifically 2 more short queries and some enlargement on updates involving some precomputation, otherwise updates will be insignificant in the interactive workload. |
|TQ2.2||9/Feb||30/Mar||Prototype of BI workload||OGL||14 queries|
|TQ3||30/Mar||Determining the SNB interactive mix ratios||OGL||This depends in part on TUM’s implementation of the update driver but will proceed in part without. As earlier suggested, we aim at 5% update, 60% short and 35% long queries in the update mix and using a Virtuoso SQL based implementation we will set the frequencies so that this mix is obtained. After this we will be looking for feedback towards adjusting the mix components.||Query probabilities in the interactive mix are defined|
|TQ4||31/Jan||30/Mar||Update driver||TUM||The objective of the task is to design the driver that would generate transactional workload for the interactive SN benchmark.||Walking skeleton of the update driver.|
|TQ5||10/Mar||21/Mar||Interactive benchmark execution rules & metrics||VUA|
This task will define defines the execution rules and the methods for calculating the benchmark metric. This may need to learn the rules and metrics from TPC?C and TPC?E.
Execution rules and metrics are defined
|(Renzo) Draft of the Specification (April 16).|
|TQ6||7/Feb||30/Mar||Substitution parameters||VUA||The objective of this task is to define methods for selection and generation of test data for the interactive queries. Test data is the data used to replace the substitution parameters in the query templates and create the instance queries. The selected test data must ensure that the instance queries are comparable in the sense of having similar execution complexity.||Methods for selecting test data for the interactive queries||The problem of selecting test data is being investigated from two points of view (Andrey/Peter and Renzo).|
|TQ7||2/Mar||15/Mar||Interactive benchmark validation||VUA|
This task will define the rules for validating the performance results (e.g., define the required precision of the output results)
|Rules for validating the results are defined.|
This period will be spent for writing and revising the paper about SNB benchmark that may be submitted to VLDB
|An industrial paper submitted|
|TQ9||23/Mar||30/Mar||Final documentation of the interactive query set||NEO|
Including description, parameters and results, validation setup, etc. (we have to be sure that the definition of the queries correspond with the sample queries and results in GitHub)
|TQ10||30/Mar||SQL implementation of the SIB workloads||OGL||SQL queries will be used as a baseline for performance measures, while SPARQL queries will be used as examples and to generate the validation results.|
|TQ11||30/Mar||Integration of the BIBM driver and the new SIB update driver||OGL||This should be minimal, as the BIBM query part should be an easy cut and paste into the update driver.|
|TQ12||30/Mar||Analysis of the execution and desired query plans for SIB interactive and BI||OGL|
|TD1||30/03||NEO||Definition and implementation of a common presentation/documentation style for all benchmarks|
|TD2||30/03||NEO||Final organization and documentation of tools in GitHub based on the previous task|