Tabulify - TPC-DS (Benchmark)
About
Tabulify supports the Tpc-Ds database benchmark on the following points:
- the transfer of data
- the execution of TPC-DS queries
TPC-DS is a widely recognized benchmark for evaluating the performance of data warehouses and analytical databases. It involves a dataset spread across 24 tables.
The benchmark includes 99 complex queries designed to test various aspects of database performance, such as joins, aggregations, and subqueries.
The TPC-DS schema is based on a snowflake schema, representing real-world scenarios like
- web,
- catalog,
- and store sales.
Size 1TB
TPC-DS 1TB involves a dataset of approximately 1TB in size, containing around 6.35 billion records spread across 24 tables.
The 1TB scale is considered a moderate size for data warehouses but is still challenging due to the complexity of the queries and the large number of records
Operations
Schema Management
This section shows you how to manage the sub-schema of TPC-DS
All tables
tpcds - all TPC-DS tables
tabli data list *@tpcds
tabli data create *@tpcds @targetConnection
tabli data fill *@tpcds @targetConnection
Dwh
the data-warehouse tables - all tables without the tables that starts with a s (ie without the staging tables)
tabli data list [!s]*@tpcds
tabli data create [!s]*@tpcds @targetConnection
tabli data fill [!s]*@tpcds @targetConnection
Store Sales
The store-sales schema has the store_sales and store_return star schema (a data-warehouse schema).
tabli data list --with-dependencies store*@tpcds
- Tabli - Data Create command: With the same argument, you can create the tables
tabli data create --with-dependencies store*@tpcds @targetConnection
tabli data copy --with-dependencies store*@tpcds @targetConnection
This article explains this technic: how to select a star schema
Note on the schema
The TPC-DS benchmark does not define the B column (business key) as unique key. Our implementation makes them all unique (except on the item table where the column is unique only with the start and end date)
Why ? Because when using TPC-DS as a sample schema, the data generator will then create data that is consistent with the queries.
For TPC-DS, a business key is neither a primary key nor a foreign key in the context of the data warehouse schema. It is only used to differentiate new data from update data of the source tables during the data maintenance operations.