Tabulify integrates natively a data generator.
You can generate realistic production data and start working on your project right away.
Because the data is fake but realistic, you don't need to:
The data fill operation is an operation that will select target data resource and fill them with data.
Tabulify supports two mode:
The fill operation is supported by the data fill command.
Let's first delete all data with the data truncate command to get a clean schema.
tabli data truncate *@sqlite
The below fill command will fill all tables with auto-generated data
tabli data fill *@sqlite
Source Target Latency (ms) Row Count Error Message
----------------------------- ----------------------------- ------------ --------- ----- -------
call_center@memgen call_center@sqlite 137 1000
catalog_page@memgen catalog_page@sqlite 48 1000
catalog_sales@memgen catalog_sales@sqlite 130 1000
customer@memgen customer@sqlite 98 1000
customer_address@memgen customer_address@sqlite 94 1000
customer_demographics@memgen customer_demographics@sqlite 62 1000
date_dim@memgen date_dim@sqlite 104 1000
household_demographics@memgen household_demographics@sqlite 32 1000
income_band@memgen income_band@sqlite 37 1000
item@memgen item@sqlite 195 1000
promotion@memgen promotion@sqlite 76 1000
ship_mode@memgen ship_mode@sqlite 45 1000
store@memgen store@sqlite 105 1000
store_sales@memgen store_sales@sqlite 100 1000
time_dim@memgen time_dim@sqlite 56 1000
warehouse@memgen warehouse@sqlite 81 1000
web_page@memgen web_page@sqlite 70 1000
web_sales@memgen web_sales@sqlite 135 1000
web_site@memgen web_site@sqlite 96 1000
The data fill command loads 1000 records for each table because this is the default value of the max-record-count option ( This option defines the number of records generated ).
By running the query 11 (of the query lesson), we don't get any data back.
tabli data print (sqlite/query_11.sql@tpcds_query)@sqlite
customer_id customer_first_name customer_last_name customer_email_address
----------- ------------------- ------------------ ----------------------
Why ? Because the query 11 is based on time data of the year 2001 and unfortunately the auto-generated data does not contain 2001 in the d_year column.
tabli data head --limit 10 date_dim@sqlite
The first 10 rows of the data resource (date_dim@sqlite):
d_date_sk d_date_id d_date d_month_seq d_week_seq d_quarter_seq d_year d_dow d_moy d_dom d_qoy d_fy_year d_fy_quarter_seq d_fy_week_seq d_day_name d_quarter_name d_holiday d_weekend d_following_holiday d_first_dom d_last_dom d_same_day_ly d_same_day_lq d_current_day d_current_week d_current_month d_current_quarter d_current_year
--------- --------- ---------- ----------- ---------- ------------- ------ ----- ----- ----- ----- --------- ---------------- ------------- ---------- -------------- --------- --------- ------------------- ----------- ---------- ------------- ------------- ------------- -------------- --------------- ----------------- --------------
1 a 2020-12-06 8 7 8 3 5 8 2 2 1 3 2 j e r w q 8 4 9 0 x g b k i
2 b 2020-12-04 0 6 7 0 1 5 6 5 4 4 4 t n h t l 3 8 4 5 b j x l a
3 c 2020-12-12 5 3 1 7 9 0 0 7 9 5 9 m c c b g 1 0 7 1 o h h f a
4 d 2020-12-12 9 4 9 9 0 4 9 2 8 10 9 b l b d o 8 2 10 1 d h k n n
5 e 2020-12-02 4 8 3 6 2 2 3 8 8 8 8 i e e g k 3 1 8 0 b l w a p
6 f 2020-12-09 8 2 0 10 3 6 2 0 5 2 6 i u c i o 4 1 3 5 e q p t v
7 g 2020-12-10 5 10 10 8 6 2 1 3 10 7 2 e o h h a 9 5 8 8 q p k n f
8 h 2020-12-06 1 6 10 7 7 5 5 2 1 10 1 t w f o k 10 10 4 4 y j m j h
9 i 2020-12-06 6 2 8 1 5 1 5 3 6 9 10 s q l v d 6 5 0 6 o f v n x
10 j 2020-12-03 6 8 6 8 6 0 3 7 9 6 2 v s w o a 1 10 1 6 w n v x q
To update the column dyear with data from the year 2001, we will use a generator in the next section.
A generator is a file that contains the data generation definition.
For each column, a column data generator is defined that control the data generated.
The below generator generates one year of data with two columns:
LogicalName: date_dim
Columns:
- name: d_date
type: date
comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
DataGenerator:
type: sequence
start: 2001-01-01
step: 1
- name: d_year
precision: 4
comment: A column with a expression generator that extract the year from the date column
DataGenerator:
type: expression
ColumnParents: d_date
expression: "x.getFullYear()"
This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated
tabli data head date_dim_2001--datagen.yml@howto
The first 10 rows of the data resource (date_dim_2001--datagen.yml@howto):
d_date d_year
---------- ------
2001-01-01 2001
2001-01-02 2001
2001-01-03 2001
2001-01-04 2001
2001-01-05 2001
2001-01-06 2001
2001-01-07 2001
2001-01-08 2001
2001-01-09 2001
2001-01-10 2001
After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controled.
tabli data fill --generator-selector date_dim_2001--datagen.yml@howto *@sqlite
As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with the glob pattern.
Output:
Source Target Latency (ms) Row Count Error Message
----------------------------- ----------------------------- ------------ --------- ----- -------
call_center@memgen call_center@sqlite 131 1000
catalog_page@memgen catalog_page@sqlite 49 1000
catalog_sales@memgen catalog_sales@sqlite 130 1000
customer@memgen customer@sqlite 71 1000
customer_address@memgen customer_address@sqlite 64 1000
customer_demographics@memgen customer_demographics@sqlite 84 1000
date_dim@memgen date_dim@sqlite 1794 1000
household_demographics@memgen household_demographics@sqlite 32 1000
income_band@memgen income_band@sqlite 30 1000
item@memgen item@sqlite 145 1000
promotion@memgen promotion@sqlite 77 1000
ship_mode@memgen ship_mode@sqlite 42 1000
store@memgen store@sqlite 108 1000
store_sales@memgen store_sales@sqlite 94 1000
time_dim@memgen time_dim@sqlite 53 1000
warehouse@memgen warehouse@sqlite 79 1000
web_page@memgen web_page@sqlite 66 1000
web_sales@memgen web_sales@sqlite 131 1000
web_site@memgen web_site@sqlite 122 1000
And the query 11 is now giving back a result. The generated data is minimal and should be further defined.
tabli data print (sqlite/query_11.sql@tpcds_query)@sqlite
customer_id customer_first_name customer_last_name customer_email_address
----------- ------------------- ------------------ ----------------------
bbi g k e
bbx j a w
bhj h r i
cw z w p
ga q e d
kd w f h
sl a x q
Learn how to compare data resource.