Skip to content

(S)FTP Import task

This task lets you import a CSV formatted file or an XLSX file from an FTP or SFTP server.

For a FTP server, you need to provide a valid login/password. For a (S)FTP server, you can connect either with a login/password or with a keyfile.

Date formats for CSV files

When your CSV files has date or datetime columns, you must check in the output storage that these columns are detected as datetime. If not, you won't be able to filter on date ranges, to group the rows (e.g. by month or by quarter).

If the output columns are not datetime, the reason is that our engine didn't detect the content of your columns as datetime. The most common reason is that your data format is not in our list of detected formats, which is:

  • "%Y-%m-%dT%H:%M:%S"
  • "%Y-%m-%dT%H:%M:%S.%fZ"
  • "%Y-%m-%d %H:%M:%S"
  • "%Y-%m-%d %H:%M:%S.%f"
  • "%d-%m-%Y"
  • "%Y-%m-%d"
  • "%d/%m/%Y"
  • "%Y/%m/%d"
  • "%m-%d-%Y"
  • "%Y-%d-%m"
  • "%m/%d/%Y"
  • "%Y/%d/%m"

If your format is not in this list, you need to manually tell our engine that a given column is a datetime. To do this, use the force types field (see documentation below).

Use ISO format as much as you can

If you can, use the ISO format "%Y-%m-%dT%H:%M:%S" (e.g. 2019-10-21T14:45:23). It will be compatible with most of the tools you need to interact with.

Keyfile password

In case your keyfile is encrypted, you need to provide here its password.

Folder

Specify the folder on the (S)FTP server where the files will be requested.

File pattern

If you know the exact name of the file you want to import, enter it here. If you want to use a wildcard, use the * character. For example, if your files are my_data-2019-11-01.csv, my_data-2019-11-02.csv, ... you can use the pattern my_data-*.csv.

Wildcard pattern matching mode

In case the pattern matches several files on the server, the filenames are sorted in alphanumerical order. For example, files ["my_data-2019-11-02.csv", "my_data-2019-11-03.csv", "my_data-2019-11-01.csv"] will be sorted to ["my_data-2019-11-01.csv", "my_data-2019-11-02.csv", "my_data-2019-11-03.csv"]. Then, by default, the last file of the sorted filenames will be loaded (e.g. my_data-2019-11-03.csv in this example).

In case you want to load the first file of the sorted filenames, use this option Wildcard pattern matching mode to specify to load the first file (e.g. to load my_data-2019-11-01.csv in the above example).

Note that only one file is loaded per execution. If you need to load multiples files, you need to trigger several times the task.

Force types

In some specific cases, you may want to force the data types for some columns. You need to type them in a JSON format (note that you must use double quotes and not single quotes).

Example:

{"mycolumn1": "int32", "mycolumn2": "float64", "mycolumn_that_should_be_a_date": "datetime"}

The provided types must be one of int32, int64, float32, float64, datetime, str, str(X).

  • int64 is an integer on 64 bits (8 bytes), also referred as BigInt in some databases.
  • float64 is a double precision float.
  • str(X) is a varchar column with length X.

Which column name to use

You may have noticed that the output storage column names may be slightly different from your file column names. When you specify the forced type of a column, you need to use the column name in the output storage, and not the name from your file. For example, if you have a date of creation column in your file, the output column name will be date_of_creation. You need to use this date_of_creation name in the forced type field.

Flow mode

Full (default)

Replace all rows in storage with rows from the input file. All the data from the existing storage is deleted, and then the rows from the input file are loaded. The storage table is (re)created with columns from the imported file. So you can add/remove columns in the CSV file between imports in this mode.

Append

Append all rows from the input file to the storage. The columns in the loaded file must be exactly the same as in the storage table.

Key-based Update

If you choose this option, you must provide the name of the key column in the Flow key field.

In this mode, for a given row in the input file:

  • if its key already exists in the storage, the row in the storage is replaced by the row from the input file.
  • else it is append to the storage

Note that the columns in the loaded file must be exactly the same as in the storage table.

Tip: use the filename as update key

You can use the special value $$SRN_FILENAME$$ for the Flow key field. When using this value, the import will update rows coming from the imported filename which are already in the output storage. For example, if you have previously imported a file name 'data_2019_06_04.csv' and this file has been modified on your server and you need to reimport it: use the special $$SRN_FILENAME$$ update key and rows previously imported from this file will be updated and new rows will be added.

This option requires the option Add filename as column to be enabled.

Incremental Update

If you choose this option, you must provide the name of the key column in the Flow key field.

The algorithm gets the max of the flow key column in the output storage and append rows from update dataset where key is strictly above this value. This is useful to add new Ids or new data according to a datetime column.

Note that the columns in the loaded file must be exactly the same as in the storage table.

Incremental Forced Update

If you choose this option, you must provide the name of the key column in the Flow key field.

The algorithm gets the min of the flow key column in the update dataset (key column), remove all rows in output storage where key is above (or equal) this value, and then append the full update dataset to the output storage.

Note that the columns in the loaded file must be exactly the same as in the storage table.

Delete file on (S)FTP Server

When this option is enabled, the imported file will be deleted from server right after the import.

Add filename as column

When this option is enabled, a column __srn__filename__ is added to the output table. For each row, it contains the filename it comes from. This is especially useful when using an update mode.