Learning Tabulify - Step 8 - How to modify the content of a table

Learning Tabulify - Step 8 - How to modify the content of a table

This page is part of the Tabulify learning guide and to be able to follow it, you should have followed the transfer step.

Data Modification

The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation.

The copy and move operation does not modify any actual data. They just create a target data resource that is an exact copy of the source data resource.

This page introduces you to the data modification operations that modify the target data resource content. ie

They all works the same way, they accepts a source data resource that defines the data to insert, update, upsert or delete.

Warehouse - The target resource

The tpc-ds warehouse table will be our target data resource for this page.

The warehouse table has one row that we can see with the data print command

tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk   w_warehouse_id     w_warehouse_name      w_warehouse_sq_ft   w_street_number   w_street_name   w_street_type   w_suite_number   w_city   w_county            w_state   w_zip   w_country       w_gmt_offset
--------------   ----------------   -------------------   -----------------   ---------------   -------------   -------------   --------------   ------   -----------------   -------   -----   -------------   ------------
             1   AAAAAAAABAAAAAAA   Conventional childr              977787   651               6th             Parkway         Suite 470        Midway   Williamson County   TN        31904   United States             -5

Warehouse Upsert with a CSV file as Source

The upsert data operation is supported by the data upsert command.

This upsert operation:

  • will try to insert data
  • and if it can't (because there is already a record identified by a primary key), it will update it.

We will:

  • modify the name of the first warehouse to Modified Warehouse Name
  • add a second new warehouse row.

The source CSV file for the upsert

The data to upsert is defined in the warehouse_upsert.csv Csv data resource in the howto connection.

We take a look at its content with the data print command:

tabul data print warehouse_upsert.csv@howto
warehouse_upsert.csv@howto
w_warehouse_sk   w_warehouse_id     w_warehouse_name          w_warehouse_sq_ft   w_street_number   w_street_name   w_street_type   w_suite_number   w_city      w_county            w_state   w_zip      w_country       w_gmt_offset
--------------   ----------------   -----------------------   -----------------   ---------------   -------------   -------------   --------------   ---------   -----------------   -------   --------   -------------   ------------
1                AAAAAAAABAAAAAAA   Modified Warehouse Name   977787              651               6th             Parkway         Suite 470        Midway      Williamson County   TN        31904      United States   -5
2                AAAAAAAABAAAAAAB   Van Gogh Warehouse        100000              6                 Museumplein     Plein                            Amsterdam   Nord-Holland        NH        10171 DJ   Netherlands     1

The upsert command

tabul data upsert warehouse_upsert.csv@howto warehouse@sqlite
  • Run it and you get the below output:
Transfer results
input                        target             latency   record_count   error_code   error_message
--------------------------   ----------------   -------   ------------   ----------   -------------
warehouse_upsert.csv@howto   warehouse@sqlite   0.52s                2

  • You can verify that the target data resource has been changed with the data print command and check that:
    • there is a second new record
    • the new name of the first warehouse is now Modified Warehouse Name
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk   w_warehouse_id     w_warehouse_name          w_warehouse_sq_ft   w_street_number   w_street_name   w_street_type   w_suite_number   w_city      w_county            w_state   w_zip      w_country       w_gmt_offset
--------------   ----------------   -----------------------   -----------------   ---------------   -------------   -------------   --------------   ---------   -----------------   -------   --------   -------------   ------------
             1   AAAAAAAABAAAAAAA   Modified Warehouse Name              977787   651               6th             Parkway         Suite 470        Midway      Williamson County   TN        31904      United States             -5
             2   AAAAAAAABAAAAAAB   Van Gogh Warehouse                   100000   6                 Museumplein     Plein                            Amsterdam   Nord-Holland        NH        10171 DJ   Netherlands                1

Warehouse Update with a SQL query as Source

The source query for the update

Let's say for the sake of simplicity that we want to increase the square footage column w_warehouse_sq_ft of the warehouse by 10 square meter.

The source query needs to include:

  • the primary key w_warehouse_sk
  • the target column with the new value w_warehouse_sq_ft

Example with the warehouse_update.sql file in the howto connection

tabul data print --type text warehouse_update.sql@howto
select
  w_warehouse_sk,
  w_warehouse_sq_ft+10 as w_warehouse_sq_ft, -- the name of the column match the name of the target column
  w_warehouse_sq_ft as w_warehouse_sq_ft_old_value -- just to show the diff
from
  warehouse w2
;


tabul data print '(warehouse_update.sql@howto)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(warehouse_update.sql@howto)@sqlite
w_warehouse_sk   w_warehouse_sq_ft   w_warehouse_sq_ft_old_value
--------------   -----------------   ---------------------------
             1              977797                        977787
             2              100010                        100000

The update command

Calling the data update command with the query as source will update the warehouse table w_warehouse_sq_ft columns by 10 square meters.

tabul data update \
    --transfer-mapping-strict false \
    '(warehouse_update.sql@howto)@sqlite' warehouse@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
Transfer results
input                                 target             latency   record_count   error_code   error_message
-----------------------------------   ----------------   -------   ------------   ----------   -------------
(warehouse_update.sql@howto)@sqlite   warehouse@sqlite   0.40s                2

where:

  • --transfer-mapping-strict is set to false because the source column w_warehouse_sq_ft_old_value is not a target column.

We can verify it by printing the warehouse table

tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk   w_warehouse_id     w_warehouse_name          w_warehouse_sq_ft   w_street_number   w_street_name   w_street_type   w_suite_number   w_city      w_county            w_state   w_zip      w_country       w_gmt_offset
--------------   ----------------   -----------------------   -----------------   ---------------   -------------   -------------   --------------   ---------   -----------------   -------   --------   -------------   ------------
             1   AAAAAAAABAAAAAAA   Modified Warehouse Name              977797   651               6th             Parkway         Suite 470        Midway      Williamson County   TN        31904      United States             -5
             2   AAAAAAAABAAAAAAB   Van Gogh Warehouse                   100010   6                 Museumplein     Plein                            Amsterdam   Nord-Holland        NH        10171 DJ   Netherlands                1

Warehouse Delete

The source CSV file for the DELETE

The delete command takes as input a source data resource that contains the primary or unique key of a target table.

We want to delete the second record, the CSV file contains then the primary key with the value 2

tabul data print warehouse_delete.csv@howto
  • Run it. You get the below output:
warehouse_delete.csv@howto
w_warehouse_sk
--------------
2

Warehouse Delete

tabul data delete warehouse_delete.csv@howto warehouse@sqlite
Transfer results
input                        target             latency   record_count   error_code   error_message
--------------------------   ----------------   -------   ------------   ----------   -------------
warehouse_delete.csv@howto   warehouse@sqlite   0.45s                1

tabul data list -a name -a count warehouse@sqlite
path        media_type   name        count
---------   ----------   ---------   -----
warehouse   table        warehouse       1

  • Do we still see the first row ?
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk   w_warehouse_id     w_warehouse_name          w_warehouse_sq_ft   w_street_number   w_street_name   w_street_type   w_suite_number   w_city   w_county            w_state   w_zip   w_country       w_gmt_offset
--------------   ----------------   -----------------------   -----------------   ---------------   -------------   -------------   --------------   ------   -----------------   -------   -----   -------------   ------------
             1   AAAAAAAABAAAAAAA   Modified Warehouse Name              977797   651               6th             Parkway         Suite 470        Midway   Williamson County   TN        31904   United States             -5

Applying in-place SQL Data Manipulation

Until now, all data manipulation were out-of-place, meaning that there was a source and a target.

You can also apply in-place Data Manipulation with SQL Data Manipulation Language request (UPDATE/INSERT/UPDATE/MERGE)

For a full example, applying DDL and DML request, see this howto

Next

With the laws on data privacy, it's now mandatory to develop data application with generated data. Tabulify has built-in a data generator that we will explore in the next page.

How to fill a resource with generated data ?




Related Pages
Learning Tabulify - Step 7 - How to execute a SQL Query

For Tabulify, a SQL Query is a runtime resource that when executed returns a data set in tabular format (known as the result set). The selector of a runtime is known as a runtime selector. This selector...

Task Runner