Join connector¶

This connector lets you create a join between several datasources (which must be already connected in Serenytics).

Let's say you have a first SQL datasource with purchases (one row by purchase) and a column id_customer, and another SQL datasource with customers (one row by customer).

With the join connector, you can create a table with one row by purchase, and each row contains the data from its customer.

You can join as many datasource as you want. There is no limitation.

In the above example, the id_customer is called the join key (its name can be different in the two datasources). This key can be a column from each datasource. But it can also be a formula you created on a datasource.

Join on SQL datasources from the same server¶

The most common use case is to create a join between datasources based on the same underlying SQL server (this is true when you create a join between Storages in the Serenytics datawarehouse).

With this type of configuration, the joined data is never physically stored in a table. Our engine only creates the correct SQL query, passes it to the SQL server and gets the result. So the computation is achieved on the SQL server and is optimized.

Join between other datasources types (or between tables from different SQL servers)¶

We speak here of cases where you join:

a XLS file and a SQL server
a REST API and an XLS file
a table from a SQL server and another table from another SQL server
...

In this cases, the queries will be very slow. To achieve this type of join, our engine first gets all the rows from all the datasources from the join. Then it gathers the data in a single table (not stored), and possibly applies the groupby, computes the formulas... and then returns the results.

So this type of join will work (if the datasets are not too large), but it will be slow. If you have to do that, a good turnaround is to create the join, and then to create an ETL step to copy the result in a storage. And then to build your dashboards on this storage. This will be much faster.

Prefixes¶

When you join several datasources, you must choose a prefix for each datasource. If you join a datasource named customers metadata obtained from ads, which has a column named total_spent, and you choose the prefix customer_meta, this column will appear everywhere as customer_meta.total_spent.

Choose your prefixes to be as explicit as possible but keep it short to avoid long column names which will overflow in the dashboard editor.

Join type¶

When you add a new datasource to a join, you have to decide between INNER, LEFT OUTER and FULL OUTER.

This is identical to the same setting in SQL languag.

INNER is the default value. But it implies that your data is clean, and you have no missing values. If a key value is not found in the joined table, the corresponding rows will be removed from the resulting dataset.

If you have missing data (or if you're not sure), it's better to use LEFT OUTER. When a key value is not found in the joined table, the row will still exist in the resulting join, but columns from the joined table will have the NULL value. In a formula or in a dashboard, you can filter this rows using the is_defined filter (e.g. to count the number of missing rows).

Using FULL OUTER is strongly discouraged as it usually creates too much complexity. We recommend to use it only for data quality analysis or for very advanced use cases.

option `Include joined datasources formulas`¶

When a datasource is joined, its formula are available in the join datasource.

For example, if you join a customer datasource and this datasource has a formula named fx_customer_age_category. This formula will be available in the join datasource.

For advanced cases, mostly to avoid name collisions, you can disable this option.

option `Allow join on multiple keys`¶

By default, when you join a datasource to another one, you only have to pick the key column in the first datasource and its corresponding name in the second datasource (i.e. a pair of columns).

In some advanced cases, you need to join on several columns pairs. You need to enable this option to achieve this.

option `Allow a table to be joined several times`¶

Let's say you have a first table person and a table purchase with a column seller_id which is a foreign key to the table person and a column buyer_id which is also a foreign key to the table person.

In this case, you need to create a join with the table purchase and to add twice the table person in the join. Once using the seller_id column as a key. And once using the column buyer_id as a key.

So the table person is joined several times.

To achieve this, you need to enable the option Allow a table to be joined several times. This is an advanced option. Please contact us if you need support to configure it.