Data Generation
Tabulify integrates natively a data generator.
You can generate realistic production data and start working on your project right away.
Because the data is fake but realistic, you don't need to:
- anonymize production data in your development environment because of the privacy laws
- or create any acceptance environment.
The fill command
The data fill operation is an operation that will select target data resource and fill them with data.
Tabulify supports two mode:
- auto - the data generated is automatically chosen
- generator - the data generated is defined in a file called the generator
The fill operation is supported by the data fill command.
Auto Fill
Let's first delete all data with the data truncate command to get a clean schema.
The below fill command will fill all tables with auto-generated data
The data fill command loads 1000 records for each table because this is the default value of the max-record-count option ( This option defines the number of records generated ).
Query 11
By running the query 11 (of the query lesson), we don't get any data back.
Why ? Because the query 11 is based on time data of the year 2001 and unfortunately the auto-generated data does not contain 2001 in the d_year column.
To update the column dyear with data from the year 2001, we will use a generator in the next section.
Generator
A generator is a file that contains the data generation definition.
For each column, a column data generator is defined that control the data generated.
The below generator generates one year of data with two columns:
- d_date that has a date sequence generator to generate date data from 2001-01-01 and upwards
- dyear that has a expression generator that extracts the year of the d_date column.
This generator is also a content resource and therefore you can use it as any tabular resource and take a look at the data generated
Fill with generators
After having created a generator for the date_dim table, we can pass it to the data fill command with the –generator-selector option to make the data generation more controled.
As the option generator-selector is a resource selector, you can create a generator for each table where you want to customize the generated data and select them with the glob pattern.
Output:
And the query 11 is now giving back a result. The generated data is minimal and should be further defined.
Next
Learn how to compare data resource.