Skip to end of metadata
Go to start of metadata



  • Peter Boncz (CWI)
  • Irini Fundulaki (FORTH)
  • Alex Averbuch (NEO)
  • Barry Bishop (ONTO)
  • Venelin Kotsev (ONTO)
  • Orri Erling (OPL)


  • Dave Rogers (BBC)
  • Jem Rayfield (BBC)


The benchmark software is being developed using Java (for portability) and can be found in the project's subversion repository here:

Requirements and motivation

What this benchmark testsThe compulsory part of this benchmark boils down to a raw measurement of concurrent query and update throughput based on modifications of 'objects', i.e. where a set of RDF triples (probably in a named graph) describe all the properties of an object.
CharacteristicsThe benchmark is organised around a 'publishing' scenario on top of an RDF database that involves CRUD operations on 'objects' (usually publishers' journalistic assets) where data about objects is read (for publication/aggregation) much more frequently than it is updated. Metadata about objects links the asset via a tagging ontology to reference and domain specific ontologies and datasets.
Data Model

The data used for this benchmark will fall under the following categories:

  • Reference data (combination of several LOD datasets, e.g. GeoNames, MusicBrainz, etc)
  • Domain ontologies - specialist ontologies that describe certain areas of expertise of the publishing, e.g. Sport, Education
  • Publication asset ontologies - that describe the structure and form of the assets that are published, e.g. news stories, photos, video, audio, etc
  • Tagging ontologies and the metadata that links assets with reference/domain ontologies

BBC and Press Association ontologies are available online: and

Query and update languageSPARQL 1.1 is used throughout, however variations are allowed for the full-text search, ranking and geo-spatial requirements.
Required features for compliance
  • Full ACID guarantees
  • Atomic CRUD operations on asset metadata - these are typically achieved in a drop/replace style at the 'object' level, i.e. collections of RDF statements about an object are stored in a unique named graph. Any updates to the properties of this object (no matter how minor) are achieved by dropping the named graph and re-inserting all property values for an object in a single transaction.
  • Non-trivial and consistent inference - answers to queries should at all times be consistent with the chosen rule-set, which is based on RDFS with some language elements from OWL, i.e. owl:TransitiveProperty, owl:SymmetricProperty, owl:FunctionalProperty, owl:InverseFunctionalProperty, owl:inverseOf, owl:equivalentClass, owl:equivalentProperty, owl:sameAs, owl:intersectionOf, owl_intersectionOf, owl:unionOf
  • Full-text search - probably should be optional
  • Geo-spatial - probably optional
  • Ranking mechanism - probably optional, but need to define allowed mechanisms


Implementation details

  • Data generation or datasets, semi-synthetic data?
  • How many connected clients (and of what type)?
    • Aggregation agents - execute queries only for identifying content to aggregate
    • Annotation agents - CRUD operations on metadata, e.g. a new media asset requires a new RDF description
    • Fixed ratio of 20 aggregation agents to 1 annotation agent
  • Standard unit of dataset size: 1 billion statements
  • Define a required rate of updates that is proportional to the dataset size (fairly low for this use-case, e.g. 20 updates per second per billion statements)
  • Define parameterised query patterns
  • Test sponsor can tune query rate to maintain the required update rate - success criteria include the maintenance of the required update rate without any catastrophic latency issues


Workflow and measurement points

  • Reference data and metadata load (time is not relevant for this benchmark, although a maximum time to load could be mandated)
  • Warm up period - random sequence of query patterns executed
  • Tuning phase - selection of target query mix rate
  • Test phase
    • Sample some query results for correctness allowing for:
      • inference
      • effects of previous completed updates
      • transaction isolation
      • accuracy (Full-text search, ranking, geo considerations)
  • Results:
    • Sustained query rate
    • Price (total cost of ownership of software and hardware over 3 years) per kiloquery per second

Optional test phases

  • Option 1 - backup:
    • Re-run with online backup executed during test phase
    • Record relative drop in query and update rates (update rate allowed to drop below threshold, but must remain above zero)
  • Option 2 - resilience:
    • During test-phase simulate power-off in 25% to 50% of nodes
    • Record relative drop in query and update rates (update rate allowed to drop below threshold, but must remain above zero)
  • Option 3 - update reference data:
    • Replace one of the reference datasets (e.g. geonames)
    • Several ways to do this, allow test sponsor to choose approach:
      • Compute delta (different environment) - probably forbid this
      • Drop and reload graph in one transaction or chunks
      • Load new graph and drop the old one (in one go or in chunks)
    • Record relative drop in query and update rates
  • Option 4 - schema/ontology restructure:
    • Introduce a sequence if class hierachy modifications
    • Record relative drop in query and update rates


Benchmark variations

Instead of using a separate content store, a modification of this use-case will be where textual content, e.g. documents or news stories, is stored in the RDF database, possibly broken down in to smaller units, e.g. paragraphs.



  • No labels