Learning Tabulify - Step 8 - How to modify the content of a table
This page is part of the Tabulify learning guide and to be able to follow it, you should have followed the transfer step.
Data Modification
The transfer page should have made you familiar with the transfer concept where we have introduced the copy operation.
The copy and move operation does not modify any actual data. They just create a target data resource that is an exact copy of the source data resource.
This page introduces you to the data modification operations that modify the target data resource content. ie
They all works the same way, they accepts a source data resource that defines the data to insert, update, upsert or delete.
Warehouse - The target resource
The tpc-ds warehouse table will be our target data resource for this page.
The warehouse table has one row that we can see with the data print command
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Conventional childr 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
Warehouse Upsert with a CSV file as Source
The upsert data operation is supported by the data upsert command.
This upsert operation:
- will try to insert data
- and if it can't (because there is already a record identified by a primary key), it will update it.
We will:
- modify the name of the first warehouse to Modified Warehouse Name
- add a second new warehouse row.
The source CSV file for the upsert
The data to upsert is defined in the warehouse_upsert.csv Csv data resource in the howto connection.
We take a look at its content with the data print command:
tabul data print warehouse_upsert.csv@howto
warehouse_upsert.csv@howto
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
The upsert command
- The upsert command is a transfer command and takes as argument:
tabul data upsert warehouse_upsert.csv@howto warehouse@sqlite
- Run it and you get the below output:
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_upsert.csv@howto warehouse@sqlite 0.52s 2
- You can verify that the target data resource has been changed with the data print command and check that:
- there is a second new record
- the new name of the first warehouse is now Modified Warehouse Name
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977787 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100000 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
Warehouse Update with a SQL query as Source
The source query for the update
Let's say for the sake of simplicity that we want to increase the square footage column w_warehouse_sq_ft of the warehouse by 10 square meter.
The source query needs to include:
- the primary key w_warehouse_sk
- the target column with the new value w_warehouse_sq_ft
Example with the warehouse_update.sql file in the howto connection
tabul data print --type text warehouse_update.sql@howto
select
w_warehouse_sk,
w_warehouse_sq_ft+10 as w_warehouse_sq_ft, -- the name of the column match the name of the target column
w_warehouse_sq_ft as w_warehouse_sq_ft_old_value -- just to show the diff
from
warehouse w2
;
- The result of the select query can be seen with the print command
tabul data print '(warehouse_update.sql@howto)@sqlite'
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
(warehouse_update.sql@howto)@sqlite
w_warehouse_sk w_warehouse_sq_ft w_warehouse_sq_ft_old_value
-------------- ----------------- ---------------------------
1 977797 977787
2 100010 100000
The update command
Calling the data update command with the query as source will update the warehouse table w_warehouse_sq_ft columns by 10 square meters.
tabul data update \
--transfer-mapping-strict false \
'(warehouse_update.sql@howto)@sqlite' warehouse@sqlite
# The quotes are only mandatory in bash because parenthesis are a bash token (ie subshell)
Transfer results
input target latency record_count error_code error_message
----------------------------------- ---------------- ------- ------------ ---------- -------------
(warehouse_update.sql@howto)@sqlite warehouse@sqlite 0.40s 2
where:
- --transfer-mapping-strict is set to false because the source column w_warehouse_sq_ft_old_value is not a target column.
We can verify it by printing the warehouse table
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- --------- ----------------- ------- -------- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
2 AAAAAAAABAAAAAAB Van Gogh Warehouse 100010 6 Museumplein Plein Amsterdam Nord-Holland NH 10171 DJ Netherlands 1
Warehouse Delete
The source CSV file for the DELETE
The delete command takes as input a source data resource that contains the primary or unique key of a target table.
We want to delete the second record, the CSV file contains then the primary key with the value 2
tabul data print warehouse_delete.csv@howto
- Run it. You get the below output:
warehouse_delete.csv@howto
w_warehouse_sk
--------------
2
Warehouse Delete
- You give the CSV source and the target as argument to the data delete command to delete the second row
tabul data delete warehouse_delete.csv@howto warehouse@sqlite
Transfer results
input target latency record_count error_code error_message
-------------------------- ---------------- ------- ------------ ---------- -------------
warehouse_delete.csv@howto warehouse@sqlite 0.45s 1
- How many rows do we have ?
tabul data list -a name -a count warehouse@sqlite
path media_type name count
--------- ---------- --------- -----
warehouse table warehouse 1
- Do we still see the first row ?
tabul data print warehouse@sqlite
warehouse@sqlite
w_warehouse_sk w_warehouse_id w_warehouse_name w_warehouse_sq_ft w_street_number w_street_name w_street_type w_suite_number w_city w_county w_state w_zip w_country w_gmt_offset
-------------- ---------------- ----------------------- ----------------- --------------- ------------- ------------- -------------- ------ ----------------- ------- ----- ------------- ------------
1 AAAAAAAABAAAAAAA Modified Warehouse Name 977797 651 6th Parkway Suite 470 Midway Williamson County TN 31904 United States -5
Applying in-place SQL Data Manipulation
Until now, all data manipulation were out-of-place, meaning that there was a source and a target.
You can also apply in-place Data Manipulation with SQL Data Manipulation Language request (UPDATE/INSERT/UPDATE/MERGE)
For a full example, applying DDL and DML request, see this howto
Next
With the laws on data privacy, it's now mandatory to develop data application with generated data. Tabulify has built-in a data generator that we will explore in the next page.