How to select data resources with a Glob Pattern

About

With this page, you will learn what a glob pattern is and how to use it to select data resources.

A glob pattern or glob expression is a string that can be matched or not against another string. If the glob pattern matches, a data resource name, the data resource is selected otherwise it's not.

To express the pattern, special characters called wildcard are used that have special meaning. The following paragraphs go through each of this wildcard and shows you how to use them.

For the hackers, this is like a regular expression but simplified

Steps

Prerequisites

You should have Tabulify installed on your computer.

Learning Tabulify - Step 1 - Installation

Star

The star character * also known as asterix matches any number of characters.

If we want to select all data resources that ends with the term sales, we will use the following glob pattern

*sales

where:

  • * matches all characters before sales
  • sales matches itself.

Example:

tabli data list *sales@tpcds
PATH            
-------------   
catalog_sales   
store_sales     
web_sales

Question mark

A question mark, ?, matches exactly one character.

Example:

  • ???? - Matches all data resources with exactly four letters or digits
tabli data list ????@tpcds
PATH   
----   
item

  • w?*sales - Matches any string beginning with w, followed by at least one letter or digit, and ending with sales
tabli data list w?*sales@tpcds
PATH        
---------   
web_sales

Braces

Braces {} specify a collection of subpatterns.

For example:

  • {item,store} matches item or store
tabli data list {item,store}@tpcds
PATH    
-----   
item    
store

  • {web*,store*} matches all data resource names that begins with web or store.
tabli data list {web*,store*}@tpcds
PATH            
-------------   
store           
store_returns   
store_sales     
web_page        
web_returns     
web_sales       
web_site

Square brackets

Square brackets [] convey:

  • a set of single characters
  • or a range of characters when the hyphen character (-) is used

that matches a single character. Within the square brackets, the wildcard *, ?, and \ match themselves.

Example:

  • [aeiou] matches any lowercase vowel.
  • [0-9] matches any digit.
  • [A-Z] matches any uppercase letter.
  • [a-z,A-Z] matches any uppercase or lowercase letter.
  • [!abc] exclusion does not match the set of characters abc
  • *[0-9]* ? Matches all strings containing a numeric value

Example:

  • all data resource that does not have a wy in their name
tabli data list *[wy]*@tpcds
PATH                   
--------------------   
inventory              
s_inventory            
s_warehouse            
s_web_order            
s_web_order_lineitem   
s_web_page             
s_web_returns          
s_web_site             
warehouse              
web_page               
web_returns            
web_sales              
web_site

You can also search container such as directory or schema recursively by adding to your glob pattern the double start (or asterix).

For instance, the below patter will search the file that starts with RE in the current directory and all sub-directories of the howto connection directory

tabli data list **/RE*@howto
  • And there is two README.md files found that match the pattern
PATH                  
-------------------   
README.md             
recursive\README.md

Escape

The escape character is the backslash \ and turns a wildcard into a simple characters.

Syntax

\wildcard

For example:

  • \\ will match a single backslash \
  • \? matches the question mark ?
  • \* matches the star *

This is almost always not needed because most of the wildcard are already not allowed when naming resource such as file or table.




Related Pages
Undraw Apps
Learning Tabulify - Step 4 - How to select Data Resources

To select a data resources such as a file or a database table, Tabulify uses the concept of: and dependency (Do we select also the dependent data resources) This page goes through this concepts...
Undraw Apps
Tabulify - Glob Pattern

Glob pattern are used in Tabulify to select objects such as : connection resource (in data selector) or configuration glob pattern The following wildcard are supported in order to define the...

Task Runner