About
transfer is a data operation that executes all this data resources operations:
- copy (Default)
In general, if you need to move or process data, you will transfer it.
The data transfer action permits you to
- copy a local file to a sftp server
- and more
Usage
pipeline:
# ...
- name: "Store"
operation: transfer
args:
target-uri: 'build/${logicalName}.csv@cd'
transfer-operation: insert
target-operation: replace
The transfer operation is performed by the tabli data transfer command
Arguments
Name | Default | Description |
---|---|---|
target-uri | - | The target uri (Mandatory) |
target-operation | - | A resource operation that will happens on the target before the transfer |
source-operation | - | A resource operation that will happens on the source after the transfer |
Flow Property | ||
step-granularity | resource (default) or record | The operation will be done for the whole resource or record by record |
output | target | The output - targets, the target resources - sources, the sources resources are passed - results, the results of the transfer step is passed |
Cross Transfer Properties | ||
source-fetch-size | 10000 | The number of record from the source for one fetch (ie network message size from the source system) |
buffer-size | 2 x target-worker x fetch-size | The maximum number of record from stored in transit between the source and the target |
target-batch-size | 10000 | The number of record send to target system at once (ie network message size to the target system) |
target-commit-frequency | <math>2^{31}</math> (Infinite) | The number of batch sends that will trigger a commit to the target system |
target-worker | 1 | The number of thread against the target system that will send batches |
with-bind-variables | true | If the target system is a sql database, SQL bind variables will be used |
metrics-data-uri | A data uri where the transfer metrics will be saved | |
Transfer Properties | ||
transfer-operation | copy for a file system insert otherwise | A transfer operation (copy, insert, upsert, update, …) |
transfer-mapping-method | name | how the source and target columns are mapped (by name, position or map) |
transfer-column-mapping | if the mapping method is map, you can define a map of source target column name |
Transfer Operation
The transfer operation supports the following values for the transfer-operation argument.
Name | Alias | Require Same Source/Target Structure and Data | Local Equivalent Metadata Operation |
---|---|---|---|
copy | Yes | - | |
move | Yes | rename | |
insert | append | No | - |
upsert | No | - | |
update | No | - | |
delete | No | - |
The default transfer operation is system dependent:
- copy is the default for a file system
- insert is the default for a database system (If the target does not exist, it's always by default created)
Resource Operation
The following values may be used for the target-operation and source-operation arguments.
Value | Description |
---|---|
truncate | Truncate the resource |
drop | Drop the resource |
replace | Replace the resource |
keep | Does not modify the actual resource (ie does not replace the target if it exists) |
Result
If you set the output arguments to result, a result data path will be returned (in place of targets) and it will contain the following columns:
- latency: the latency in ms
- error and message: an error and a message if any errors has occured.
Metrics
If the argument metrics-data-uri is given, a metrics data resource will be created with the following columns:
- run - the run id
- timestamp - the metrics timestamp
- metric - the name of the metrics
- value - the value of the metrics
- worker - the name of the worker (thread)
The following metrics will be recorded:
- BufferSize: the size of the memory buffer between the source and the target
- BufferMaxSize: the maximum size of the buffer
- BufferRatio: the ratio from Buffer Size against Buffer MaxSize
- Commits: the number of commits
- Records: the number of records
Cross Transfer
A transfer can happen:
- on the same connection
- or between two different connections (called also a cross-transfer)
Tabulify Optimization: If two connections share the same credentials and the same system URL, the transfer will be optimized and considered local. This way, the transfer is applied only on the system metadata and it happens faster because the data is not moved.