Storage data source¶

A Storage data source is a table within the Serenytics internal data-warehouse (i.e. AWS Redshift). It can be used to store and merge data from different systems (e.g. the CRM, the marketing automation system, the purchases system...). It can also be used to store the results of data processings involving these merged datasets.

When you create a new storage, it is empty by default. There are several ways to load data in a storage:

Using an ETL step

ETL steps are jobs you can create and schedule from the Automation menu. An ETL takes an input data source of any type, transforms its data, and stores the resulting dataset in a storage. This is useful to prepare your data.
Importing a CSV file from a (S)FTP Server

You can create a FTP Job Import from the Automation menu. This lets you load a CSV file from a FTP or SFTP server in a storage.
Using the Serenytics Python client to load a file or any Pandas dataframe

You can create Python scripts from the Automation menu. In these scripts, the serenytics module lets you load a file into a storage. You can also load a Pandas dataframe. This is useful when you need to achieve advanced computation and use the resulting storage in a dashboard (or as a source of another etl step/python script). You can also create a python script to load data from a source if Serenytics does not provide a plug&play connector ( e.g. for non standard REST API). You just need to obtain your data in Python and then store the result in a storage. Then, you can use this data in your dashboards.
Pushing messages

You can also push JSON messages into a storage calling the Serenytics REST API. If a field previously used (i.e. it exists in the current storage model) is not provided by a JSON message, it is set to a default value in the added row. If a field exists in a new JSON message but not in the current model, it will be added to your storage (and all existing rows will have the NULL value for this field).

Tip: how to push a message in a Storage with curl

Use: curl -d '{"key":"msg1", "key2":"msg2"}' -H "Content-Type: application/json" -X POST STORAGE_URL

where you need to replace STORAGE_URL by the actual push url of your data source. It is available in the configuration page of your storage, in the field Url. It looks like https://api.serenytics.com/api/data_source/13350b52-68d9-4590-8eed-e3c9ac70c583/push/3d6b284710213ae38a40348d2cf5c944a204fa59.

Of course, any similar HTTP POST done in any language will work.

Storage model¶

The model of a storage is the list of its columns and their types. By default, when you import a dataset in a storage, its model will be inferred (i.e. automatically generated) from the dataset. For example, if a column of a CSV file contains dates, the corresponding storage column will have the Datetime type. If a new CSV file is imported in the same storage, the model might be modified. For example, if a column in the new CSV file has the same name as in the previously loaded file, but instead of containing dates, it contains dates and strings, the storage column type will be changed from Datetime to String.

This is convenient to set up a project but it is risky in production as a dashboard (or another ETL step) might be using the fact that a Storage column has a particular type. To avoid this issue, we strongly advise you to use the Lock-model option when a storage is used in a production environment.

Lock-model option¶

When you enable this option on a storage, the model cannot be modified by any operation. We strongly advise you to enable this option once your dataflow is ready to be used in production.

To be precise, even if this option is enabled, you can still add new columns to your model. But any other operations such as modifying the type of an existing column or dropping a column will be refused.