Skip to end of metadata
Go to start of metadata


  • Irini (FORTH)
  • Renzo (VUA)
  • Norbert (UPC)
  • Peter (VUA)
  • Larri (UPC) - in the beginning
  • Duc (VUA)
  • Orri (OGL)
  • Alex (NEO)


  • Downloaded, compiled, and tried to use ldbc_socialnet_bm, to generate a data set and to see how easy it was to use
    • Encountered problems, but was able to resolve them with assistance


  • Initiated discussion in response to Orri's request to enrich the generated data 
    • Problem
      • Current RDF format is too simple, not using references to DBPedia
    • Proposal
      • Add identifier to each Tag, and a reference that points to a multi-dimensional entity (i.e. based on data extracted from DBPedia)
        • This means that knowledge of the set of all Tags in the data set will be known apriori (before importing the entire data set)
      • Identifier can be a URI from DBPedia
      • Peter suggested keeping full URI from DBPedia in the generated data set (fair to RDF vendors)
    • Discussion
    • Conclusions
      • Tags will contain simple references to richer entities
      • References will be URIs
      • Generated data set will be self contained, i.e., a subset of DBPedia will be exported in extra dimension CSV tables (not as RDF)
      • Dimensions to consider
        • URI as attribute to Tag
        • URI as attribute to Location
        • URI as attribute to Organization

Peter & Orri 

  • Alex - Please correct this section, RDF guys, I found this discussion difficult to follow
  • Do we need reification?
  • Instances have one-to-many relationships with attributes. How should they be specified in RDF? 
  • What was the outcome here? 


  • Orri mentioned that population of Canada is smaller than other countries - it’s a problem
  • Norbert tells that Canada has 33M people, the same as Burma. The generator assigns people to countries based on population and not by the proportion of accounts per country. This means that there will be approximately the same number of Canadians than Burmese, and less than Spaniards. There is not a proposal to change the current algorithm or the distributions. 


  • What has changed in data generator over last two weeks?
  • Response
    • Data 
    • Dictionaries
      • Problems to export DBPedia abstracts: no wikiPageWikiLink in public interface 
      • => Renzo, download a snapshot to run Duc's queries
    • Serializers
      • export Photo as Post.image  
      • validate RDF output format using JENA 
      • fix errors in TURTLE and N3 
      • PhotoAlbum as Forum 
      • change RDF namespace to dbp for Tags and Locations
    • GITHUB 
      • create README file (but it is in the wrong folder at present)
      • create CHANGELOG file  
      • configure .gitignore  
      • split parameters: only NUM_PERSON and NUM_YEARS  
      • out file names  - cleanup unused modules and functions


  • Please download the generator, try to compile it, and provide feedback

Peter & Norbert

  • Goal
    • We want a Power Law distribution for Tag frequency _AND_ for Tags to be correlated with Persons/Location
  • With reference to the following documents
  • Discussion
    • Renzo explained the plots in tags_distribution.pdf, as well as how Persons and Tags are correlated at present
    • Possibility of modifying the tag distributions to mimic those presented in Flickr paper
    • Orri initiated discussion about "country specific interests" VS "local specific interests" (based on friends interests, etc.)
  • Conclusion
    • More discussion needed
    • Orri will download the new generated data file, then write online queries against that, in order to define them more clearly
  • No labels