How to create a CSV File with generated data

About

This article shows you how to generate a csv file from a generator tabular file

As a generator is a file, if you use any data operation against a file system data store, the target created would be a file.

Therefore, this is mandatory to use the fill operation

Steps by Step

Prerequisites

Create your generator

You first need to create your generator data resource to define the data that should be generated.

We will take as example the d_date–datagen.yml generator that generates a date dimension.

This generator is explained in this howto

Comment: An example of date dimension generator based on the `date_dim` table of TPCDS
Columns:
  - name: d_date_sk
    comment: A surrogate key
    Type: integer
    DataGenerator:
      type: sequence
  - name: d_date
    comment: A business key in date format
    Type: date
    DataGenerator:
      type: sequence
  - name: d_date_id
    comment: A business key in string
    Type: varchar
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "x.toISOString().substring(0,10)"
  - name: d_month_seq
    comment: An ascendant sequence for the month
    Type: integer
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "function pad(number) {if (number < 10) { return '0' + number; } return number; }; x.getFullYear()+''+(pad(x.getMonth()+1))"
  - name: d_day_name
    comment: The name of the day
    Type: varchar
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "var days = ['Sunday','Monday','Tuesday','Wednesday','Thursday','Friday','Saturday']; days[x.getDay()]"
  - name: d_moy
    comment: the month number in year
    Type: Integer
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "x.getMonth()+1"
  - name: d_year
    comment: The year number
    Type: Varchar
    Precision: 4
    DataGenerator:
      type: expression
      ColumnParents: d_date
      expression: "x.getFullYear()"


Run the fill command

The below data fill command will create a CSV file named d_date.csv in the temporary directory.

tabli data fill --generator-selector date_dim--datagen.yml@howto date_dim.csv@temp
Source            Target              Latency (ms)   Row Count   Error   Message   
---------------   -----------------   ------------   ---------   -----   -------   
date_dim@memgen   date_dim.csv@temp   6119           1000

Check the result

What is the data of the CSV file with the data head command?

tabli data head --limit 30 d_date.csv@temp




Related Pages
Undraw Data Processing
How to generate a date dimension ?

A date dimension is a typical case for data generation and this article shows you how to generate it.
Undraw Data Processing
How to use a data generator in a data operation

This how-to shows you how to use a data generator as data source

Task Runner