Skip to end of metadata
Go to start of metadata

On this page it will be explained how the substitution parameters are created for each query: dictionaries, distributions, etc...

This will be a common place for future discussions about query parameters in QGEN and their relationship with DBGEN.

On this first phase, all the substitution parameters are taken from textual files, in the td_data folder on GITHUB, that are generated either from DBGEN, or as a result of a simple query executed over the loaded data, or manually created.

 

  • Q1:
    Name - retrieve all unique first names from dataset, then select uniformly randomly from those names (used files: personNames.txt and personNames.sql)
  • Q2:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
    Date0 - select uniformly randomly a date of creation a post, but the date should not be from the first and the last third of the whole post creation timeline, between 33% and 66% (used files: creationPostDate.txt and creationPostDate.sql)
  • Q3:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
    Date0 - a random date of post creation (but between 0% and 66% of the whole post creation timeline) (used files: creationPostDate.txt and creationPostDate.sql)
    Duration - a number of days (but in this case: 33% of the length of post creation timeline)  (used files: creationPostDate.txt and creationPostDate.sql)
    Country1 - the first of country pair (file: countryPairs.txt)
    Country2 - the second of country pair (file: countryPairs.txt)
  • Q4:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
    Date0 - a random date of post creation (but between 0% and 95% of the whole post creation timeline) (used files: creationPostDate.txt and creationPostDate.sql)
    Duration - a random number of days (but in this case: from 2% to 4% of the length of post creation timeline) (used files: creationPostDate.txt and creationPostDate.sql)
  • Q5:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
    Date0 - a random date of post creation (but between 0% and 95% of the whole post creation timeline) (used files: creationPostDate.txt and creationPostDate.sql)
    Duration - a random number of days (but in this case: from 2% to 4% of the length of post creation timeline) (used files: creationPostDate.txt and creationPostDate.sql)
  • Q6:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
    Tag - select uniformly randomly tag URI (used files: tagURI.txt and tagURI.sql)
  • Q7:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
  • Q8:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql)
  • Q9:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql) 
    Date0 - select uniformly randomly a date of creation a post, but the date should not be from the first and the last third of the whole post creation timeline, between 33% and 66% (used files: creationPostDate.txt and creationPostDate.sql)
  • Q10:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql) 
    HS0 - select uniformly randomly a horoscope sign (a random number between 1 and 12)
    HS1 - HS0 + 1 (but 12 + 1 = 1)
  • Q11:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql) 
    Country - select uniformly randomly country URI (used files: orgLocations.txt and orgLocations.sql) 
    Date0 - a random date from 0% to 100% of whole workFrom timeline (used files: workFromDate.txt and workFromDate.sql)
  • Q12:
    Person - select uniformly randomly person URI (used files: personNumber.txt and personNumber.sql) 
    TagType - select uniformly randomly tagTypeURI (used files: tagTypes.txt and tagTypes.sql)

 

List of files with data that QGEN currently uses, but should be generated by DBGEN:

  • personNames.txt (for first names of persons)
  • personNumber.txt (ID of the first person is always 0, but the last ID can be easily generated by DBGEN)
  • creationPostDate.txt (the first and the last date of post creation can be easily generated by DBGEN, as well)
  • countryPairs.txt (this file contains a list of country pairs, created manually; the country pairs cannot contain two small countries that are far away, because Q3 will return 0 results)
    • generated at runtime by the QGEN based on information coming from DBGEN, such as:
      • population
      • continent
      • languages
      • geoposition
  • tagURI.txt (this file contains a list of TagURIs)
  • orgLocations.txt (list of countries)
  • workFromDate.txt (the first and the last date that can appear after predicate workFrom, and it can be easily generated by DBGEN)
  • tagTypes.txt (list of the most common tag types)
  • cities.txt (list of cities)
  • No labels