Learning Tabulify - Step 9 - How to fill a data resource with generated data ?

Data Generation

Tabulify integrates natively a data generator.

You can generate realistic production data and start working on your project right away.

Because the data is fake but realistic, you don't need to:

  • anonymize production data in your development environment because of the privacy laws
  • or create any acceptance environment.

The fill command

The data fill operation is an operation that will select target data resource and fill them with data.

Tabulify supports two mode:

  • auto - the data generated is automatically chosen
  • generator - the data generated is defined in a file called the generator

The fill operation is supported by the data fill command.

Auto Fill

Let's first delete all data with the data truncate command to get a clean schema.

tabli data truncate *@sqlite

The below fill command will fill all tables with auto-generated data

tabli data fill *@sqlite
Source                          Target                          Latency (ms)   Row Count   Error   Message
-----------------------------   -----------------------------   ------------   ---------   -----   -------
call_center@memgen              call_center@sqlite              137            1000
catalog_page@memgen             catalog_page@sqlite             48             1000
catalog_sales@memgen            catalog_sales@sqlite            130            1000
customer@memgen                 customer@sqlite                 98             1000
customer_address@memgen         customer_address@sqlite         94             1000
customer_demographics@memgen    customer_demographics@sqlite    62             1000
date_dim@memgen                 date_dim@sqlite                 104            1000
household_demographics@memgen   household_demographics@sqlite   32             1000
income_band@memgen              income_band@sqlite              37             1000
item@memgen                     item@sqlite                     195            1000
promotion@memgen                promotion@sqlite                76             1000
ship_mode@memgen                ship_mode@sqlite                45             1000
store@memgen                    store@sqlite                    105            1000
store_sales@memgen              store_sales@sqlite              100            1000
time_dim@memgen                 time_dim@sqlite                 56             1000
warehouse@memgen                warehouse@sqlite                81             1000
web_page@memgen                 web_page@sqlite                 70             1000
web_sales@memgen                web_sales@sqlite                135            1000
web_site@memgen                 web_site@sqlite                 96             1000

The data fill command loads 1000 records for each table because this is the default value of the max-record-count option ( This option defines the number of records generated ).

Query 11

By running the query 11 (of the query lesson), we don't get any data back.

tabli data print (sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------


Why ? Because the query 11 is based on time data of the year 2001 and unfortunately the auto-generated data does not contain 2001 in the d_year column.

tabli data head --limit 10 date_dim@sqlite
The first 10 rows of the data resource (date_dim@sqlite):
d_date_sk   d_date_id   d_date       d_month_seq   d_week_seq   d_quarter_seq   d_year   d_dow   d_moy   d_dom   d_qoy   d_fy_year   d_fy_quarter_seq   d_fy_week_seq   d_day_name   d_quarter_name   d_holiday   d_weekend   d_following_holiday   d_first_dom   d_last_dom   d_same_day_ly   d_same_day_lq   d_current_day   d_current_week   d_current_month   d_current_quarter   d_current_year
---------   ---------   ----------   -----------   ----------   -------------   ------   -----   -----   -----   -----   ---------   ----------------   -------------   ----------   --------------   ---------   ---------   -------------------   -----------   ----------   -------------   -------------   -------------   --------------   ---------------   -----------------   --------------
        1   a           2020-12-06             8            7               8        3       5       8       2       2           1                  3               2   j            e                r           w           q                               8            4               9               0   x               g                b                 k                   i
        2   b           2020-12-04             0            6               7        0       1       5       6       5           4                  4               4   t            n                h           t           l                               3            8               4               5   b               j                x                 l                   a
        3   c           2020-12-12             5            3               1        7       9       0       0       7           9                  5               9   m            c                c           b           g                               1            0               7               1   o               h                h                 f                   a
        4   d           2020-12-12             9            4               9        9       0       4       9       2           8                 10               9   b            l                b           d           o                               8            2              10               1   d               h                k                 n                   n
        5   e           2020-12-02             4            8               3        6       2       2       3       8           8                  8               8   i            e                e           g           k                               3            1               8               0   b               l                w                 a                   p
        6   f           2020-12-09             8            2               0       10       3       6       2       0           5                  2               6   i            u                c           i           o                               4            1               3               5   e               q                p                 t                   v
        7   g           2020-12-10             5           10              10        8       6       2       1       3          10                  7               2   e            o                h           h           a                               9            5               8               8   q               p                k                 n                   f
        8   h           2020-12-06             1            6              10        7       7       5       5       2           1                 10               1   t            w                f           o           k                              10           10               4               4   y               j                m                 j                   h
        9   i           2020-12-06             6            2               8        1       5       1       5       3           6                  9              10   s            q                l           v           d                               6            5               0               6   o               f                v                 n                   x
       10   j           2020-12-03             6            8               6        8       6       0       3       7           9                  6               2   v            s                w           o           a                               1           10               1               6   w               n                v                 x                   q

To update the column dyear with data from the year 2001, we will use a generator in the next section.

Generator

A generator is a file that contains the data generation definition.

For each column, a column data generator is defined that control the data generated.

The below generator generates one year of data with two columns:

LogicalName: date_dim
Columns:
  - name: d_date
    type: date
    comment: A column with a sequence generator that generates a date sequence from 2001-01-01 and after
    DataGenerator:
      type: sequence
      start: 2001-01-01
      step: 1
  - name: d_year
    precision: 4
    comment: A column with a expression generator that extract the year from the date column
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "x.getFullYear()"


This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated

tabli data head date_dim_2001--datagen.yml@howto
The first 10 rows of the data resource (date_dim_2001--datagen.yml@howto):
d_date       d_year
----------   ------
2001-01-01   2001
2001-01-02   2001
2001-01-03   2001
2001-01-04   2001
2001-01-05   2001
2001-01-06   2001
2001-01-07   2001
2001-01-08   2001
2001-01-09   2001
2001-01-10   2001

Fill with generators

After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controled.

tabli data fill --generator-selector date_dim_2001--datagen.yml@howto   *@sqlite

As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with the glob pattern.

Output:

Source                          Target                          Latency (ms)   Row Count   Error   Message
-----------------------------   -----------------------------   ------------   ---------   -----   -------
call_center@memgen              call_center@sqlite              131            1000
catalog_page@memgen             catalog_page@sqlite             49             1000
catalog_sales@memgen            catalog_sales@sqlite            130            1000
customer@memgen                 customer@sqlite                 71             1000
customer_address@memgen         customer_address@sqlite         64             1000
customer_demographics@memgen    customer_demographics@sqlite    84             1000
date_dim@memgen                 date_dim@sqlite                 1794           1000
household_demographics@memgen   household_demographics@sqlite   32             1000
income_band@memgen              income_band@sqlite              30             1000
item@memgen                     item@sqlite                     145            1000
promotion@memgen                promotion@sqlite                77             1000
ship_mode@memgen                ship_mode@sqlite                42             1000
store@memgen                    store@sqlite                    108            1000
store_sales@memgen              store_sales@sqlite              94             1000
time_dim@memgen                 time_dim@sqlite                 53             1000
warehouse@memgen                warehouse@sqlite                79             1000
web_page@memgen                 web_page@sqlite                 66             1000
web_sales@memgen                web_sales@sqlite                131            1000
web_site@memgen                 web_site@sqlite                 122            1000

And the query 11 is now giving back a result. The generated data is minimal and should be further defined.

tabli data print (sqlite/query_11.sql@tpcds_query)@sqlite
customer_id   customer_first_name   customer_last_name   customer_email_address
-----------   -------------------   ------------------   ----------------------
bbi           g                     k                    e
bbx           j                     a                    w
bhj           h                     r                    i
cw            z                     w                    p
ga            q                     e                    d
kd            w                     f                    h
sl            a                     x                    q

Next

Learn how to compare data resource.

Data Resource Comparison




Related Pages
Learning Tabulify - Step 8 - How to modify the content of a table

Tabulify learning guidetransfer step The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation. The copy and move operation does not modify...

Task Runner