(S)FTP Import task

This task lets you import a CSV formatted file from an FTP or SFTP server.

For a FTP server, you need to provide a valid login/password. For a (S)FTP server, you can connect either with a login/password or with a keyfile.

Here is a documentation about advanced fields:

Keyfile password

In case your keyfile is encrypted, you need to provide here its password.

Folder

Specify the folder on the (S)FTP server where the files will be requested.

File pattern

If you know the exact name of the file you want to import, enter it here. If you want to use a wildcard, use the * character. For example, if your files are my_data-2019-11-01.csv, my_data-2019-11-02.csv, ... you can use the pattern my_data-*.csv.

Wildcard pattern matching mode

In case the pattern matches several files on the server, the filenames are sorted in alphanumerical order. For example, files ["my_data-2019-11-02.csv", "my_data-2019-11-03.csv", "my_data-2019-11-01.csv"] will be sorted to ["my_data-2019-11-01.csv", "my_data-2019-11-02.csv", "my_data-2019-11-03.csv"]. Then, by default, the last file of the sorted filenames will be loaded (e.g. my_data-2019-11-03.csv in this example).

In case you want to load the first file of the sorted filenames, use this option Wildcard pattern matching mode to specify to load the first file (e.g. to load my_data-2019-11-01.csv in the above example).

Note that only one file is loaded per execution. If you need to load multiples files, you need to trigger several times the task.

Force types

In some specific cases, you may want to force the data types for some columns. You need to type them in a JSON format.

Example:

1
{"mycolumn1": "int32", "mycolumn2": "float64", "mycolumn_that_should_be_a_date": "datetime"}

The provided types must be one of int32, int64, float32, float64, datetime, str.

  • int64 is an integer on 64 bits (8 bytes), also referred as BigInt in some databases.
  • float64 is a double precision float.

Flow mode

Full (default)

Replace all rows in storage with rows from the input file. All the data from the existing storage is deleted, and then the rows from the input file are loaded. The storage table is (re)created with columns from the imported file. So you can add/remove columns in the CSV file between imports in this mode.

Append

Append all rows from the input file to the storage. The columns in the loaded file must be exactly the same as in the storage table.

Key-based Update

If you choose this option, you must provide the name of the key column in the Flow key field.

In this mode, for a given row in the input file:

  • if its key already exists in the storage, the row in the storage is replaced by the row from the input file.
  • else it is append to the storage

Note that the columns in the loaded file must be exactly the same as in the storage table.

Tip: use the filename as update key

You can use the special value $$SRN_FILENAME$$ for the Flow key field. When using this value, the import will update rows coming from the imported filename which are already in the output storage. For example, if you have previously imported a file name 'data_2019_06_04.csv' and this file has been modified on your server and you need to reimport it: use the special $$SRN_FILENAME$$ update key and rows previously imported from this file will be updated and new rows will be added.

This option requires the option Add filename as column to be enabled.

Incremental Update

If you choose this option, you must provide the name of the key column in the Flow key field.

The algorithm gets the max of the flow key column in the output storage and append rows from update dataset where key is strictly above this value. This is useful to add new Ids or new data according to a datetime column.

Note that the columns in the loaded file must be exactly the same as in the storage table.

Incremental Forced Update

If you choose this option, you must provide the name of the key column in the Flow key field.

The algorithm gets the min of the flow key column in the update dataset (key column), remove all rows in output storage where key is above (or equal) this value, and then append the full update dataset to the output storage.

Note that the columns in the loaded file must be exactly the same as in the storage table.

Delete file on (S)FTP Server

When this option is enabled, the imported file will be deleted from server right after the import.

Add filename as column

When this option is enabled, a column __srn__filename__ is added to the output table. For each row, it contains the filename it comes from. This is especially useful when using an update mode.