Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Aggregation Performance
    • SNBI-1.1/TCPH-1.2: QOPT Interesting Orders
      • This choke-point tests the ability of the query optimizer to exploit the interesting orders induced by some operators. Apart from clustered indexes providing key order, other operators also preserve or even induce tuple orderings.
        Sort-based operators create new orderings, typically the probe-side of a hash join conserves its order, etc.
    • SNBI-1.2/TCPH-1.1: QEXE High Cardinality group-by performance
      • This choke-point tests the ability of the execution engine to parallelize group-by’s with a large number of groups. Some queries require performing large group-by’s.
        In such a case, if an aggregation produces a significant number of groups, intra query parallelization can be exploited as each thread may make its own partial aggregation.
        Then, to produce the result, these have to be re-aggregated. In order to avoid this, the tuples entering the aggregation operator may be partitioned by a hash of the grouping key and be sent to the appropriate partition.
        Each partition would have its own thread so that only that thread would write the aggregation, hence avoiding costly critical sections as well. A high cardinality distinct modifier in a query is a special case of this choke point.
        It is amenable to the same solution with intra query parallelization and partitioning as the group-by.
        We further note that scale-out systems have an extra incentive for partitioning since this will distribute the CPU and memory pressure over multiple machines, yielding better platform utilization and scalability.
    • SNBI-1.3: QEXE Complex aggregate performance
      • This choke-point test the performance of the execution engine to perform complex aggregates. Many databases offer user defined aggregates and more complex aggregation operations than the basic count, sum, max and min, for example string concatenation aggregation operator. These types of aggregates are expected to benefit from the same optimizations as the basic built-in ones, for example partitioning.
    • SNBI-1.4: QOPT Top-k push down
      • Top-k push down. This choke-point tests the ability of the query optimizer to perform optimizations based on top-k selections. Many times queries demand for returning the top-k elements.
        Once k results are obtained, extra restrictions in a selection can be added based on the properties of the kth element currently in the top-k, being more restrictive as the query advances, instead of sorting all elements and picking the highest k.
    • SNBI-1.5/TCPH-1.4: QEXE Dependent group-by keys
      • This choke-point tests the ability of the query optimizer to exclude those functionally dependent group-bys. Sometimes queries require for group-by’s on a set of columns and a key, where the value of the key determines the columns.
        In this situation, the aggregation operator should be able to exclude certain group-by attributes from the key matching.
    • SNBI-1.6/TCPH-1.3: QEXE Low Cardinality group-by performance
      • This choke-point tests the ability to efficiently perform group by evaluation when only a very limited set of groups is available.  This can require special strategies for parallelization, e.g. pre-aggregation when possible. This case also allows using special strategies for grouping like using array lookup if the domain of keys is small.

  • Join Performance
    • SNBI-2.1/TCPH-2.3: QOPT Rich join order optimization
      • This choke-point tests the ability of the query optimizer to find optimal join orders. A graph can be traversed in different ways. In the relational model, this is equivalent as different join orders.
        The execution time of these orders may differ by orders of magnitude. Therefore, finding an efficient join (traversal) order is important, which in general, requires enumeration of all the possibilities.
        The enumeration is complicated by operators that are not freely re-orderable like semi-, anti-, and outer- joins. Because of this difficulty most join enumeration algorithms do not enumerate all possible plans, and therefore can miss the optimal join order. Therefore, these chokepoint tests the ability of the query optimizer to find optimal join (traversal) orders.
    • SNBI-2.2/TCPH-2.4: QOPT Late projection
      • This choke-point tests the ability of the query optimizer to delay the projection of unneeded attributes until late in the execution. Queries where certain columns are only needed late in the query.
        In such a situation, it is better to omit them from initial table scans, as fetching them later by row-id with a separate scan operator, which is joined to the intermediate query result, can save temporal space, and therefore I/O.
        Late projection does have a trade-off involving locality, since late in the plan the tuples may be in a different order, and scattered I/O in terms of tuples/second is much more expensive than sequential I/O.
        Late projection specifically makes sense in queries where the late use of these columns happens at a moment where the amount of tuples involved has been considerably reduced;  
        for example after an aggregation with only few unique group-by keys, or a top-k operator.
    • SNBI-2.3/TCPH-2.1: QOPT Join type selection
      • This choke-point tests the ability of the query optimizer to select the proper join operator type, which implies accurate estimates of cardinalities.
        Depending on the cardinalities of both sides of a join, a hash or an index index based join operator is more appropriate.
        This is especially important with column stores, where one usually has an index on everything. Deciding to use a hash join requires a good estimation of cardinalities on both the probe and build sides.
        In TPC-H, the use of hash join is almost a foregone conclusion in many cases, since an implementation will usually not even define an index on foreign key columns.
        There is a break even point between index and hash based plans, depending on the cardinality on the probe and build sides
    • SNBI-2.4/TCPH-2.2: QOPT Sparse foreign key joins
      • This choke-point tests the performance of join operators when the join is sparse. Sometimes joins involve relations where only a small percentage of rows in one of the tables is required to satisfy a join. When tables are larger, typical join methods can be sub-optimal. Partitioning the sparse table, using Hash Clustered indexes or implementing bloom filter tests inside the join are techniques to improve the performance in such situations [7].
  • Data Access Locality
    • SNBI-3.1/TCPH-3.3: QOPT Detecting correlation
      • This choke-point tests the ability of the query optimizer to detect data correlations and exploiting them. If a schema rewards creating clustered indexes, the question then is which of the date or data columns to use as key.
        In fact it should not matter which column is used, as range- propagation between correlated attributes of the same table is relatively easy. One way is through the creation of multi-attribute histograms after detection of attribute correlation.
        With MinMax indexes, range-predicates on any column can be translated into qualifying tuple position ranges. If an attribute value is correlated with tuple position, this reduces the area to scan roughly equally to predicate selectivity.
    • SNBI-3.2: STORAGE Dimensional clustering
      • This chokepoint tests suitability of the identifiers assigned to entities by the storage system to better exploit data locality. A data model where each entity has a unique synthetic identifier,
        e.g. RDF or graph models, has some choice in assigning a value to this identifier. The properties of the entity being identified may affect this, e.g. type (label), other dependent properties,
        e.g. geographic location, date, position in a hierarchy etc, depending on the application. Such identifier choice may create locality which in turn improves efficiency of compression or index access.
    • SNBI-3.3: QEXE Scattered Index Access patterns
      • This choke-point tests the performance of indexes when scattered accesses are performed. The efficiency of index lookup is very different depending on the locality of keys coming to the indexed access.
        Techniques like vectoring non-local index accesses by simply missing the cache in parallel on multiple lookups vectored on the same thread may have high impact.
        Also detecting absence of locality should turn off any locality dependent optimizations if these are costly when there is no locality. A graph neighborhood traversal is an example of an operation with random access without predictable locality.
  • Expression Calculation
    • SNBI-4.1/TPCH-4.2a: QOPT Common subexpression elimination
      • This choke-point tests the ability of the query optimizer to detect common sub-expressions and reuse their results. A basic technique helpful in multiple queries is common subexpression elimination (CSE).
        CSE should recognize also that average aggregates can be derived afterwards by dividing a SUM by the COUNT when those have been computed.
    • SNBI-4.2/TCPH-4.2d: QOPT Complex boolean expression joins and selections
      • This choke-point tests the ability of the query optimizer to reorder the execution of boolean expressions to improve the performance. Some boolean expressions are complex, with possibilities for alternative optimal evaluation orders.
        For instance, the optimizer may reorder conjunctions to test first those conditions with larger selectivity [19].
    • SNBI-4.3 QEXE Low overhead expressions interpretation
      • This choke-point tests the ability to efficiently evaluate simple expressions on a large number of values. A typical example could be simple arithmetic expressions, mathematical functions like floor and absolute or date functions like extracting a year.

  • Correlated Sub-queries
    • SNBI-5.1/TCPH-5.1: QOPT Flattening sub-queries
      • This choke-point tests the ability of the query optimizer to flatten execution plans when there are correlated sub-queries. Many queries have correlated sub-queries and their query plans can be flattened,
        such that the correlated sub-query is handled using an equi-join, outer-join or anti-join. To execute queries well, systems need to flatten both sub-queries, the first into an equi-join plan, the second into an anti-join plan.
        Therefore, the execution layer of the database system will benefit from implementing these extended join variants.
        The ill effects of repetitive tuple-at-a-time sub-query execution can also be mitigated if execution systems by using vectorized, or block-wise query execution, allowing to run sub-queries with thousands of input parameters instead of one.
        The ability to look up many keys in an index in one API call creates the opportunity to benefit from physical locality, if lookup keys exhibit some clustering.
    • SNBI-5.2/TCPH-5.3: QEXE Overlap between outer and sub-query
      • This choke-point tests the ability of the execution engine to reuse results when there is an overlap between the outer query and the sub-query. In some queries, the correlated sub-query and the outer query have the same joins and selections.
        In this case, a non-tree, rather DAG-shaped [21] query plan would allow to execute the common parts just once, providing the intermediate result stream to both the outer query and correlated sub-query,
        which higher up in the query plan are joined together (using normal query decorrelation rewrites).
        As such, the benchmark rewards systems where the optimizer can detect this and the execution engine supports an operator that can buffer intermediate results and provide them to multiple parent operators.
    • SNBI-5.3/¿TCPH-5.2?: QEXE Intra-query result reuse
      • This choke-point tests the ability of the execution engine to reuse sub-query results when two sub-queries are mostly identical.
        Some queries have almost identical sub-queries, where some of their internal results can be reused in both sides of the execution plan, thus avoiding to repeat computations.
  • Parallelism and Concurrency
    • SNBI-6.1/TCPH-6.3: QEXE Inter-query result reuse
      • This choke-point tests the ability of the query execution engine to reuse results from different queries. Sometimes with a high number of streams a significant amount of identical queries emerge in the resulting workload.
        The reason is that certain parameters, as generated by the workload generator, have only a limited amount of parameters bindings.
        This weakness opens up the possibility of using a query result cache, to eliminate the repetitive part of the workload.
        A further opportunity that detects even more overlap is the work on recycling, which does not only cache final query results, but also intermediate query results of a "high worth".
        Here, worth is a combination of partial-query result size, partial-query evaluation cost, and observed (or estimated) frequency of the partial-query in the workload. 
  • RDF and Graph Specifics
    • SNBI-7.1: QOPT Translation of internal ids to external ones [KILL WITH FIRE]
      • This choke-point tests the ability of the query optimizer to delay the translation between internal and external entity ids to late in the query. Translate at point of minimum cardinality,
        e.g. after top k order by RDF and possibly graph models often use a synthetic integer identifier for entities, e.g. URI’s . For presentation to the client applications, these identifiers must be translated to their original form,
        e.g. the URI string that was used when loading the data. This should be done as late as possible, or at the point of minimal cardinality in the plan.
    • SNBI-7.2: QOPT Cardinality Estimation of transitive paths
      • This choke-point tests the ability of the query optimizer to properly estimate the cardinality of intermediate results when executing transitive paths. A transitive path may occur in a “fact table” or a “dimension table” position.
        A transitive path may cover a tree or a graph, e.g. descendants in a geographical hierarchy vs. graph neighborhood or transitive closure in a many-to-many connected social network.
        In order to decide proper join order and type, the cardinality of the expansion of the transitive path needs to be correctly estimated.
        This could for example take the form of executing on a sample of the data in the cost model or of gathering special statistics, e.g. the depth and fan-out of a tree. In the case of hierarchical dimensions,
        e.g. geographic locations or other hierarchical classifications, detecting the cardinality of the transitive path will allow one to go to a star schema plan with scan of a fact table with a selective hash join.
        Such a plan will be on the other hand very bad for example if the hash table is much larger than the “fact table” being scanned.
    • SNBI-7.3: QEXE Execution of a transitive step
      • This choke-point tests the ability of the query execution engine to efficiently execute transitive steps. Graph workloads may have transitive operations, for example finding a shortest path between vertices.
        This involves repeated execution of a short lookup, often on many values at the same time, while usually having an end condition, e.g. the target vertex being reached or having reached the border of a search going in the opposite direction.
        For the best efficiency, these operations can be merged or tightly coupled to the index operations themselves. Also parallelization may be possible but may need to deal with a global state, e.g. set of visited vertices.
        There are many possible tradeoffs between generality and performance
    • SNBI 7.4: QEXE Efficient evaluation of termination criteria for transitive queries
      • This tests the ability of a system to express termination criteria for transitive queries so that not the whole transitive relation has to be evaluated as well as efficient testing for termination.
    • SNBI 7.5: QEXE Path pattern reuse
      • This choke-point tests the ability of the execution engine to reuse work across graph traversals [TODO: complete in more detail]
        E.g., when computing paths within a range of distances, it is often possible to incrementally compute longer paths by reusing paths of shorter distances that were already computed

TPC-H Interactive Chokepoints

...

Paper:

...

Slides:

  • Aggregation Performance: Performance of aggregate calculations
    • TPCH-P1.1 QEXE: Ordered Aggregation
    • TPCH-1.2 QOPT: Interesting Orders
    • TPCH-1.3 QOPT: Small Group-by Keys (array lookup)
    • TPCH-1.4 QEXE: Dependent Group-By Keys (removal of)
  • Join Performance: Voluminous joins, with or without selections
    • TPCH-2.1 QEXE: Large Joins (out-of-core)
    • TPCH-2.2 QEXE: Sparse Foreign Key Joins (bloom filters)
    • TPCH-2.3 QOPT: Rich Join Order Optimization
    • TPCH-2.4 QOPT: Late Projection (column stores)
  • Data Access Locality: Non-full-scan access to (correlated) table data
    • TPCH-3.1 STORAGE: Columnar Locality (favors column storage)
    • TPCH-3.2 STORAGE: Physical Locality by Key (clustered index, partitioning)
    • TPCH-3.3 QOPT: Detecting Correlation (ZoneMap,MinMax,multi-attribute histograms)
  • Expression Calculation: Efficiency in evaluating (complex) expressions
    • TPCH-4.1 Raw Expression Arithmetic
      • TPCH-4.1a QEXE: Arithmetic Operation Performance 
      • TPCH-4.1b QEXE: Overflow Handling (in arithmetic operations)
      • TPCH-4.1c QEXE: Compressed Execution
      • TPCH-4.1d QEXE: Interpreter Overhead (vectorization; CPU/GPU/FPGA JIT compile)
    • TPCH-4.2 Complex Boolean Expressions in Joins and Selections
      • TPCH-4.2a QOPT: Common Subexpression Elimination (CSE)
      • TPCH-4.2b QOPT: Join-Dependent Expression Filter Pushdown
      • TPCH-4.2c QOPT: Large IN Clauses (invisible join)
      • TPCH-4.2d QEXE: Evaluation Order in Conjunctions and Disjunctions
    • TPCH-4.3 String Matching Performance
        • TPCH-4.3a QOPT: Rewrite LIKE(X%) into a Range Query 
        • TPCH-4.3b QEXE: Raw String Matching Performance (e.g. using SSE4.2)
    • TPCH-4.3c QEXE: Regular Expression Compilation (JIT/FSA generation)
  • Correlated Subqueries: Efficiently handling dependent subqueries.
    • TPCH-5.1 QOPT: Flattening Subqueries (into join plans).
    • TPCH-5.2 QOPT: Moving Predicates into a Subquery
    • TPCH-5.3 QEXE: Overlap between Outer- and Subquery
  • Parallelism and Concurrency: Making use of parallel computing resources
    • TPCH-6.1 QOPT: Query Plan Parallelization
    • TPCH-6.2 QEXE: Workload Management
    • TPCH-6.3 QEXE: Result Re-use

...