Skip to end of metadata
Go to start of metadata

SIB Data Schema (v4 10/Jul/2013)

The following file contains the UML model of the SIB schema: sib_uml.pdf

The source file is in Gaphor format, a GPL tool for UML 2.0 data modeling: ldbc_sna.gaphor

A tabular representation of the data schema is available in the file schema.xlsx

Description by-example

 

Instance diagram example: sib_person.jpg

 

  • A person is a unique entity in the SN that is created when the person joins the SN. She has a first and last name, and her profile consists of the gender and the birthday date.
  • A person also has one or more email addresses.
  • Each person is based near a city (a place). There is a hierarchy of places: cities, countries and continents. Places are used to establish regional correlation between entities.
  • People in the social graph is connected to other people by establishing friendships. Two friends know each other by a symmetric pair of relationships. A person can be also the follower of another person who is an authority or a celebrity.
  • The first connection of the person to the SN registers the IP address and the browser used when she joined the SN. Each IP address is located in a specific country.
  • A person tags several interest in topics that she likes. There is a hierarchy of classes of tags.
  • Each person speaks one or more languages that are official or commonly spoken in her country, or even foreign languages such as English.
  • There are two kind of organizations: universities and companies. Universities are located in cities, while companies are based in countries. A person has studied in several universities at some year. A person can also have work in several companies since some year.
  • A person is the moderator of a forum or wall. The moderator tags the forum to indicate the interests of the forum members.

  • People with similar interest join forums to become members.

  • Each forum has many posts created by its members.

  • A post can contain a text, written in a certain language, and/or an image file. A post contains some tags which provide the main ideas of the post.

  • A post contains a list of users who liked it.

  • People can comment post or other comments. A comment is just a simple text.

  • Post and comments track the country where they were created, plus the browser and the IP address used.

Data Generator

The data generator is available in Github

https://github.com/ldbc/ldbc_socialnet_bm

The current implementation allows the following parameters:

  • numtotalUser: The number of users the social network will have. It shoud be bigger than 1000.
  • startYear: The first year.
  • numYears: The period of years.
  • serializerType: The serializer type has to be one of this three values: ttl (Turtle format), n3 (N3 format), csv (coma separated value).
  • rdfOutputFileName: The base name for the files generated in rdf format (Turtle and N3)

A description of additional features of the data schema and methods for data generation are presented in the file schema-features.docx

CSV output

When the generator is configured to generate the data as CSV files it creates a CSV file for each entity and relationship. 

The file CSV-files.docx contains a description of the files created by the generator. 

RDF output (last update: July 30, 2013)

When the generator is configured to generate RDF data, the output consists in two files:

  • A first file that contains the data corresponding to entities Person, Group, Forum, Post and Comment. A sample file is sndata.n3.  
  • A second file that contains data corresponding to entities Organization, Tag and Location. This data is obtained from DBpedia but it is part of the data schema.  A sample file is dbpedia.n3 .     

 

Changes

May 30 version: The file was modified to be consistent with the example used in the description of the data schema. The structure was verified by Norbert and Renzo.

July 5: The data schema was updated with small changes. The summary table was complemented with the details about the schema. The samples of the RDF output were updated.

July 30: Minor modifications to the RDF codification. We ensure that the file dbpedia.n3 contains just triples occurring in DBpedia. The triples describing the hierarchy of locations (Places) are in the file sndata.n3 (such hierarchy does not exist in DBpedia).

  • No labels