Table of Contents

Tabulify - TPC-DS (Benchmark)

Tpc

About

Tabulify supports the Tpc-Ds database benchmark on the following points:

TPC-DS is a widely recognized benchmark for evaluating the performance of data warehouses and analytical databases. It involves a dataset spread across 24 tables.

The benchmark includes 99 complex queries designed to test various aspects of database performance, such as joins, aggregations, and subqueries.

The TPC-DS schema is based on a snowflake schema, representing real-world scenarios like

Size 1TB

TPC-DS 1TB involves a dataset of approximately 1TB in size, containing around 6.35 billion records spread across 24 tables.

The 1TB scale is considered a moderate size for data warehouses but is still challenging due to the complexity of the queries and the large number of records

Operations

Schema Management

This section shows you how to manage the sub-schema of TPC-DS

All tables

tpcds - all TPC-DS tables

tabli data list *@tpcds
tabli data create *@tpcds @targetConnection
tabli data fill *@tpcds @targetConnection

Dwh

the data-warehouse tables - all tables without the tables that starts with a s (ie without the staging tables)

tabli data list [!s]*@tpcds
tabli data create [!s]*@tpcds @targetConnection
tabli data fill [!s]*@tpcds @targetConnection

Store Sales

The store-sales schema has the store_sales and store_return star schema (a data-warehouse schema).

tabli data list --with-dependencies store*@tpcds

tabli data create --with-dependencies store*@tpcds @targetConnection
tabli data copy --with-dependencies store*@tpcds @targetConnection

This article explains this technic: how to select a star schema

Note on the schema

The TPC-DS benchmark does not define the B column (business key) as unique key. Our implementation makes them all unique (except on the item table where the column is unique only with the start and end date)

Why ? Because when using TPC-DS as a sample schema, the data generator will then create data that is consistent with the queries.

For TPC-DS, a business key is neither a primary key nor a foreign key in the context of the data warehouse schema. It is only used to differentiate new data from update data of the source tables during the data maintenance operations.