The data schema of a benchmark (also called conceptual schema or conceptual data model) defines the structure of the data (used by a benchmark) in terms of entities, relationships between entities, and attributes (of entities and relationships). Additionally, the description of the data schema includes features for the data imposed by the workloads and the choke points of the benchmark (e.g., data correlations and statistical distributions).
The description of the data schema should include the elements presented below.
UML Class diagram
The Unified Modeling Language (UML) is a widely accepted language for data and process modeling. To describe the static structure of the data schema used by a benchmark, we consider the use of UML class diagrams. Next, we describe such diagrams and include some examples.
An UML class diagram is used to describe the structure of a system by showing the classes participating in the systems, the attributes and operations (or methods) of the classes, and relationships among the classes. Considering the we are going to use UML class diagrams to model the structure of the data used by a benchmark, we will talk about classes of entities.
A class of entity is represented with a box containing three blocks: the upper block contains the name of the entity class; the middle block holds the attributes; and the bottom block presents the methods of the class (the third block will not be used, as we are not modeling methods). The definition of an attribute includes its name, datatype and cardinality.
For example, the following figure shows the UML representation of the entity class “Person”, which includes the attributes name, age and email. Note that all attributes define a datatype, and the attribute “email” defines it cardinality (i.e., multivalued).
A relationship (or association) is represented by a line connecting two entity classes, and labeled with the name of the relationship. If the relationships is directed then arrow heads or the symbols > and < must be used. The cardinality of a class in the relationship must be included in the opposite side of the line (close to the other class). Types of cardinality can be: zero or one (0..1), one only (1), zero or more (0..*), one or more (1..*). Additionally, the role that a class plays in the relationship can be included on the class’s side of the line.
The following figure shows the representation of the relationship “hasTag” between entity classes "Post" and Tag". This diagrams establishes that a person can be related to cero or one tag, and that a tag can be related to cero o many posts.
Standard notation for names
Consider the following standard notation rules for naming elements in a diagram:
- Entity names: upper camel case notation. That is, each word in the name begins with a capital letter (e.g., Person, TagClass).
- Attribute names: lower camel case notation. Is the same as upper camel case, except the first letter of the name is in lowercase (e.g., content, creationDate).
- Relationship names: lower camel case, plus a complementary prefix/suffix when necessary to clarify the role of the entity classes participating in the relationship (e.g., isLocatedIn).
There are additional features of UML class diagrams that can be used in data modeling. Among them we can mention inheritance, interfaces, and association classes.
The notion of inheritance can be used to model a hierarchy of entity classes, after an abstraction step of specialization or generalization. A relation of inheritance is represented with a solid line drawn from the child class, with an unfilled arrowhead, pointing to the super class. The following figure shows an example of inheritance where the objective is to model a classification of organizations (universities and companies).
In some cases, we can also use inheritance to simplify the representation of two or mode entity classes sharing some attributes and relationships. An interface is a virtual entity class, which represents inheritance but the database does not contain data instances of such class.
The following diagram shows the interface Message, which is used to generalize the entity classes Post and Comment, by means of encapsulating the attributes creationDate and content. In this example, the database does not contain data about messages.
There are cases when we need to represent attributed relationships, that is relationships that contains attributes providing valuable information about the relationship. An attributed relationship is represented by means of an association class.
The following example shows the use of an association class to represent the attributed relationship studyAt.
UML class diagrams allow the representation of additional features in the data, for example composition and aggregation. However, we concentrate in the most relevant for our purposes. For additional features we recommend to review the standard specification of UML class diagrams  or tutorial available in the Web [2,3].
Description by example
The UML class diagram COULD be complemented with a textual description of the elements of the schema, including some examples of data entities following the data schema.
Features of the data defined by the data schema
An UML class diagram is good to describe the static structure of the data used in the benchmark. However, such data usually includes additional features imposed by the workloads and the choke points of the benchmark (e.g., data correlations).
Consider the following properties and features that must be described for each of the elements in the data schema.
- Unique attributes: some attributes can be defined as unique for an entity class. It means that each data instance of the entity class contains a different value for the attribute.
- Values for attributes: we can restrict the datatype of an attribute to specific values or data patterns. For example, the attribute gender of the entity class Person can be restricted to the values ”male” and “female”.
- Generation method: we can specify the methods to generate the values for attributes and relationships. For example, the values for an attribute can be obtained randomly from a dictionary. In this case, a description of the dictionary must be included.
- Data correlations: the data values of the elements in the schema can be correlated, for example, persons living in the same country have a higher probability of being related to the universities in such country.
- Statistical distributions: a relationship can follow a statistical (or probability) distribution. For example, the relationship of friendship among persons can be defined to satisfy a power law distribution.
 UML basics: The class diagram. http://www.ibm.com/developerworks/rational/library/content/RationalEdge/sep04/bell/
 UML 2: Class Diagrams. http://www.agilemodeling.com/artifacts/classDiagram.htm