Learning Tabulify - Step 4 - How to select Data Resources
Concepts
To select a data resources such as a file or a database table, Tabulify uses the concept of:
- and dependency (Do we select also the dependent data resources)
This page goes through this concepts with explanation and examples.
Data Selector
A data selector is composed of two parts:
- and a connection
- separated by the @ (at) sign.
A data selector looks like that:
globPattern@connection
A glob pattern permits to define the name or the path of the data resource located in its system connection.
Normal Selection
For instance, with the internal TPC-DS data store, the below list command will select all tables that ends with the term sales because the * character matches all characters.
tabli data list *sales@tpcds
where:
- tabli is the main command line utility
- data is a module (ie the data module)
- list is a command
- *sales@tpcds is a resource data selector that select data resources.
- tpcds defines the connection
- *sales defines the tables to look for with a glob pattern. In our case all tables that finish with the word sales because * is the globbing star and select all characters.
Output:
path
-------------
catalog_sales
store_sales
web_sales
To get more practice on glob pattern, you can have a look at this page. How to select data resources with a Glob Pattern
Selection with dependencies
When moving data due to foreign-key constraint, you need to move the data resources and their dependencies.
That's why Tabulify offers the --with-dependencies flag that will select also the dependent resources of the selected data resource
Example: All tables that have a name that ends with sales in the tpcds system and their dependent tables
tabli data list --with-dependencies *sales@tpcds
path
----------------------
call_center
catalog_page
catalog_sales
customer
customer_address
customer_demographics
date_dim
household_demographics
income_band
item
promotion
ship_mode
store
store_sales
time_dim
warehouse
web_page
web_sales
web_site
Local File System
The connection part of a data selector is not mandatory as the default connection is the local file system.
Therefore, performing the list command with a data selector without connection will give you a list of the file in your current directory.
tabli data list *
path
-----------------------
README.md
characters.csv
date_dim--datagen.yml
sequence--datagen.yml
This is then the equivalent of the ls command
Next
Now that we know how to select data resources, the next page will show you how to print their content.