# Data source formulas¶

When you add a formula on a datasource, you must choose its type among:

• simple column
• conditional column
• value

A `simple column` or a `conditional column` is a new "virtual" column which is added to your datasource. Then, you can use this formula exactly like a standard column of your datasource, in your dashboards or in your data preparation steps.

A `value` is not a new column, it is a single computed value (usually a single KPI such as the number of sales in the current month).

It is possible to use a formula within another formula. We advise you to create small and simple formulas and then to combine them. This makes it easier to check the results of your formulas.

## Examples¶

Let's say your data contains a column `[Price]`, a column `[Cost]` and a column `[Date]`. Here are some examples of what new columns you can create.

### Simple formulas¶

• Simple calculus
``````[Price] * 10
``````
• Using two columns:
``````[Price] / [Cost] * 100
``````
• With parentheses:
``````([Price] - [Cost]) / [Price] * 100
``````
• Operation on Date
``````extract_date_part([Date], "year")
``````

### Conditional formulas¶

• Simple IF condition:
``````[Price] == 100
``````
• Using and operator:
``````[Price] > 10 and [Country] != "USA"
``````
• Using or operator:
``````[Price] > 10 or [Cost] < 20
``````
• Using in operator
``````[Country] in ("USA", "Germany", "France")
``````
• Using not operator
``````not ([Country] in ("USA", "Germany", "France"))
``````
• Using full dates:
``````[Date] > "2010-10-24" and [Date] < "2010-10-30"
``````
• Using date parts:
``````extract_date_part([Date], "year") >= 2014
``````
• Using string comparison:
``````like([postal_code], "_____")
``````

### Value formulas¶

• Simple total:
``````sum([Price])
``````
• Conditional total:
``````sum_if([Price], [Cost] > 10)
``````
• Ratio:
``````sum_if([Price], [Cost] > 10) / sum([Price])
``````

### Formulas of formulas¶

You can use a formula within another formula. As an example, you can create a simple formula named "GOLD Product":

``````[Category] in ["Jewelry", "Watch", "Leather goods"]
``````

and then use it in a second formula to compute the ratio of the revenue of these GOLD products within the total revenue:

``````sum_if([Price], [GOLD Product]) / sum[Price])
``````

### Using dashboard variables in a formula¶

In a dashboard, you can define dashboard variables in the global dashboard options menu:

These variables can be used in datasource formulas. The value of a variable can be retrieved with the double bracket syntax `{{__myVar__}}`. Here is an example:

See here for more details about using variables in datasource formulas.

## Date-time functions¶

### `extract_date_part()`¶

Extract date part of a datetime column.

Parameters:

• `column`: column, formula or function
• `date_part`: can be `"year"`, `"month"`, `"quarter"`, `"week"`, `"day"`, `"dayofyear"`, `"weekday"`, `"hour"` or `"minute"`.
• `language`[optional]: can be `"en"`, `"fr"`, `"num-fr"`, `"num-en"`. Use this option to get a literal value instead of a numerical value (valid only for `month` and `weekday`). Using the `num` prefix makes the data sortable (e.g. 01-January, 02-February, ...).

Details:

• For `day`: the function returns the day of the month (eg. for "2022-11-25", it returns 25).
• For `dayofweek`: the function returns the day rank since the beginning of the year (eg. for "2022-2-25", it returns 56 as the 25th of February if the 56th days of the year 2022).
• For `weekday`: monday is day 0 and sunday is day 6.
• For `week`: a week starts the monday. Days before the first monday of the year are in week 0.

Example:

``````extract_date_part([MyColumn], "week")
``````

Availability: all data source types.

### `utc_to_timezone()`¶

Convert a column in UTC to the specified timezone.

Parameters:

• `column`: column, formula or function
• `timezone`: a valid timezone such as `"US/Eastern"` or `"Europe/Paris"`. See full list.

Example:

``````utc_to_timezone([MyColumn], "Europe/Paris")
``````

Availability: all data source types.

### `date_diff()`¶

Return difference between 2 date-times in specified time unit. The result is an integer.

More precisely, `date_diff()` determines the number of unit of time boundaries that are crossed between the two datetimes. The result is an integer (e.g.: 1 for the difference in days between "2016-10-15 23:00:00" and "2016-10-16 03:00:00").

Parameters:

• `first_date`: column, formula or function
• `second_date`: column, formula or function
• `time_unit`[optional]: a valid time unit. Default: `"day"`. It can be one of: `"year"`, `"month"`, `"day"`, `"hour"`, `"minute"`, `"second"`.

Example:

``````date_diff([MyDateColumn1], [MyDateColumn2], "hour")
``````

Availability: only MySQL, SQL Server, AWS Redshift and AWS Redshift Storage. On MySQL, the only `time_unit` available is the default one: `"day"`.

### `time_delta()`¶

Return difference between 2 date-times in specified time unit. The result is a float: e.g. 1.083 for the delta between "2016-10-15 14:00:00" and "2016-10-16 16:00:00"

Parameters:

• `first_date`: column, formula or function
• `second_date`: column, formula or function
• `time_unit`[optional]: a valid time unit. Default: `"day"`. It can be one of: `"day"`, `"hour"`, `"minute"`, `"second"`.

Example:

``````time_delta([MyDateColumn1], [MyDateColumn2], "hour")
``````

Availability: All source types.

### `now()`¶

Return current date-time (in UTC).

Availability: All source types.

### `date()`¶

Create a date from year, month and day data.

Parameters:

• `year`: column, formula or function that returns the year of the date to create.
• `month`[optional]: column, formula or function that returns the month of the date to create. Default: 1 (January).
• `day`[optional]: column, formula or function that returns the day of the date to create. Default: 1 (first day of the month).

Example:

``````date([YearColumn], [MonthColumn])
``````

Availability: All source types except Microsoft SQL Server.

### `date_add()`¶

Add a certain interval to a date.

Parameters:

• `myDate`: column, formula or function containing the initial date.
• `interval_type`: type of interval to add. It can be one of: `"day"`, `"hour"`, `"minute"`, `"second"`.
• `nb_intervals`: number of intervals to add. A negative value will substract the interval.

Example:

``````# to add 10 days to a given date
``````

Availability: All source types except files and APIs.

## Date-time filtering¶

For date/time columns, you can use special functions to compute conditions.

### Examples¶

If your data contains a column `[Timestamp]`, you can filter it on:

• today
``````[Timestamp] in day(0)
``````
• yesterday
``````[Timestamp] in day(-1)
``````
• last 24 hours
``````[Timestamp] >= hour(-24)
``````
• current month
``````[Timestamp] In month(0)
``````
• last 2 months with a rolling window (= last 62 days)
``````Timestamp] >= days(-62)
``````
• current month until yesterday but not including today
``````[Timestamp] >= month(0) and [Timestamp] < day(0)
``````
• current year to date (from first day of the year to yesterday included)
``````[Timestamp] In year_to_date(0)
``````
• previous month to date (from first day of previous month to same day as today, last month)
``````[Timestamp] In month_to_date(-1)
``````

### Note on date/time data bounds¶

• `timestamp <= period(arg)` means all timestamps until the end of the period included
• `timestamp < period(arg)` means all timestamps until before the beginning of the period
• `timestamp >= period(arg)` means all timestamps since the beginning of the period included
• `timestamp > period(arg)` means all timestamps since after the end of the period

• `period(0)` means the current period (e.g. today)

• `period(-1)` means the previous period (e.g. yesterday), and so on.

• `period_to_date(0)` means the current period until yesterday (included). For `year_to_date(0)`, if today is `2018-06-20`, the range starts at `2018-01-01 00:00:00` and ends at `2018-06-19 23:59:59`

• `period_to_date(-1)` means from first day of previous period to same day as today in previous period. For `year_to_date(-1)`, if today is `2018-06-20`, the range starts at `2017-01-01 00:00:00` and ends at `2017-06-19 23:59:59`

### Available functions¶

• `minute(arg)`
• `hour(arg)`
• `day(arg)`
• `week(arg)`
• `week_to_date(arg)`
• `month(arg)`
• `month_to_date(arg)`
• `quarter(arg)`
• `quarter_to_date(arg)`
• `year(arg)`
• `year_to_date(arg)`

## String functions¶

### `like()`¶

Compares column values to a pattern. This function is case-sensitive for all the datasources (except for MySQL and SQLServer where it is case-insensitive).

Use ilike to be case-insensitive.

Parameters:

• `column`: column, formula or function
• `pattern`: Pattern to match. Can contain wildcards (`%` matches zero or more characters, `_` matches a single character).

Example:

``````# returns true if zip_code length is five and first two digits are "75"
like([zip_code], "75___")
# returns true if url contains the string "/product/"
like([url], "%/product/%")
``````

Availability: all data source types.

### `ilike()`¶

Compares column values to a pattern. This function is case-insensitive for all the datasources.

Parameters:

• `column`: column, formula or function
• `pattern`: Pattern to match. Can contain wildcards (`%` matches zero or more characters, `_` matches a single character).

Example:

``````# returns true if zip_code length is five and first two digits are "75"
ilike([zip_code], "75___")
# returns true if url contains the string "/product/" or "/PRODUCT/"
ilike([url], "%/product/%")
``````

Availability: all data source types.

### `contains()`¶

Searches a substring in a string. This function is case-sensitive for all the datasources (except for MySQL and SQLServer where it is case-insensitive).

Use icontains to be case-insensitive.

Parameters:

• `column`: column, formula or function
• `substring`: substring to search.

Example:

``````# returns true if zip_code length contains "75"
contains([zip_code], "75")
# returns true if url contains the string "product"
contains([url], "product")
``````

Availability: all data source types.

### `icontains()`¶

Searches a substring in a string. This function is case-insensitive for all the datasources.

Parameters:

• `column`: column, formula or function
• `substring`: substring to search.

Example:

``````# returns true if zip_code length is five and first two digits are "75"
icontains([zip_code], "75")
# returns true if url contains the string "product" or "PRODUCT"
icontains([url], "product")
``````

Availability: all data source types.

### `regexp_match()`¶

Returns True if a string matches a regular expression else returns False.

Parameters:

• `column`: column, formula or function.
• `pattern`: Regular expression to match.
``````# returns true if zip_code length is five and first two digits are "75" and last 3 digits are numbers
regexp_match([zip_code], "^75[0-9]{3}\$")
# details:
# ^ matches the begining of the string
# 75 matches the string "75"
# [0-9] matches a decimal between 0 and 9
# {3} indicates that [0-9] must be matched exactly three times
# \$ matches the end of the string

# returns true if a string is a date with format YYYY-MM-DD
regexp_match([date_str], "^[0-9]{4}-[0-9]{2}-[0-9]{2}\$")
``````

Availability: all data sources types except MS SQLServer.

### `left()`¶

Returns the n first characters of a string.

Parameters:

• `column`: column, formula or function
• `n`: number of characters to extract

Example:

``````left([MyColumn], 5)
``````

Availability: all data source types.

Returns the last n characters of a string.

Parameters:

• `column`: column, formula or function
• `n`: number of characters to extract

Example:

``````right([MyColumn], 5)
``````

Availability: all data source types.

### `str_len()`¶

Returns the length of a string.

Parameters:

• `column`: column, formula or function

Example:

``````str_len([MyColumn])
``````

Availability: all data source types.

### `lower_case()`¶

Returns `column` formatted in lower case.

Parameters:

• `column`: column, formula or function

Example:

``````lower_case([MyColumn])
``````

Availability: all data source types.

### `upper_case()`¶

Returns `column` formatted in upper case.

Parameters:

• `column`: column, formula or function

Example:

``````upper_case([MyColumn])
``````

Availability: all data source types.

### `lpad()`¶

Returns `column`, left-padded to length `nb_chars` characters with `padding_char`, replicated as many times as necessary.

Parameters:

• `column`: column, formula or function to pad
• `nb_chars`: length of output string
• `padding_char`: character to use for padding

Example:

``````lpad([MyColumn], 4, "0")
``````

### `rpad()`¶

Returns `column`, right-padded to length `nb_chars` characters with `padding_char`, replicated as many times as necessary.

Parameters:

• `column`: column, formula or function to pad
• `nb_chars`: length of output string
• `padding_char`: character to use for padding

Example:

``````rpad([MyColumn], 4, " ")
``````

### `split()`¶

Splits a string on the specified delimiter and returns the part at the specified position.

Parameters:

• `column`: column, formula or function
• `delimiter`: delimiter string
• `part`: position of the portion to return (counting from 1).

Example:

``````split([MyStringColumn], "/", 2)
``````

Availability: all data source types except Microsoft SQL Server.

### `substring()`¶

Extract a substring from column values.

Parameters:

• `column`: column, formula or function
• `first_char`: Index of first char extracted. Starts at 1.
• `nb_chars`: Length of extracted substring

Example:

``````substring([MyStringColumn], 1, 5)
``````

Availability: all data source types.

### `replace()`¶

Replace all occurrences of substring_from with substring_to.

Parameters:

• `column`: column, formula or function
• `substring_from`: Substring to replace
• `substring_to`: New substring

Example:

``````replace([MyStringColumn], "ABC", "X")
``````

Availability: all data source types.

## Conversion functions¶

### `float()`¶

Convert string column values to float values. If the string cannot be converted to a numeric value, the function will return an error.

If your input column contains string which are not representation of float values, you can create a conditional formula using `regexp_match([input_column],"[+-]?([0-9]*[.])?[0-9]+")`. If this is true, return `float([input_column])` else return a default value (either `null()` or `0`).

Parameters:

• `column`: column, formula or function

Example:

``````float([MyStringColumn])
``````

Availability: all datasources except files.

### `numeric()`¶

Convert string column values to numeric values. If the string cannot be converted to a numeric value, the result is 0.

Parameters:

• `column`: column, formula or function

Example:

``````numeric([MyStringColumn])
``````

Availability: only files.

### `str()`¶

Convert column values to string values.

Parameters:

• `column`: column, formula or function

Example:

``````str([MyColumn])
``````

Availability: all data source types.

### `round()`¶

Convert `column` values to their closest integer values. Float values are casted to integer values. Undefined values are casted to zero. For example, `round(3.1)` returns `3` and `round(3.9)` returns `4`.

Parameters:

• `column`: column, formula or function

Example:

``````round([MyColumn])
``````

Availability: all data source types.

### `to_int()`¶

Convert `column` values to integer values. Float values are casted to integer values. Undefined values are casted to zero.

`to_int` floors the value, i.e. it returns the largest integer less than or equal to the input value. Use `round` if you need to round a value to the closest integer number.

Returns 0 if the input value is NULL.

Parameters:

• `column`: column, formula or function

Example:

``````to_int([MyColumn])
``````

Availability: all data source types.

### `datetime_from_string()`¶

Convert a string to a Datetime.

The function will return an error if the passed string does not match the given format in a single row. If you are not sure about the quality of your data, use the function `regexp_match` to check the format. And if it doesn't match for a row, return `null()` instead of `datetime_from_string`.

Parameters:

• `string`: column, formula or function.
• `format`: the format of the input string to convert. If your datasource is a file, use the Python `strftime` format (see documentation strftime.org). If your datasource is a PostGreSQL table or a Serenytics storage, use the PostGreSQL syntax (see documentation postgresql.org. If your database is a MySQL table, use the MySQL syntax (see documentation dev.mysql.com). If your database is a BigQuery table, use the BigQuery syntax (see documentation cloud.google.com).

Example:

In this example, the input column contains string such as: `2013--01--23 21:17`

``````# for a file
datetime_from_string([date_str], "%Y--%m--%d %H:%M")

# for a PostGreSQL or a Serenytics storage
datetime_from_string([date_str], "YYYY--MM--DD HH24:MI")

# for a MySQL table
datetime_from_string([date_str], "%Y--%m--%d %H:%i")
``````

Availability: All source types except Microsoft SQL Server.

### `json_extract_path_text()`¶

Extract a value within a string representing a JSON. The parameters specify the path to the value. It returns the empty string if the path does not exist in the JSON.

Parameters:

• `string`: column, formula or function.
• `path1`: first key of the JSON path
• `[path2]`: second key of the JSON path
• up to 5 keys

Availability: Only Serenytics Storages, Redshift, PostGreSQL and MySQL.

Example:

``````# if the value is '{"first_name":"john", "last_name": "doe"}', returns 'john'.
json_extract_path_text([json_str], "fist_name")

# if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}'
# returns 'paris'.

# if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}'
# returns an empty string.
``````

## Ranking functions¶

### `is_first()`¶

Return 1 when current value appears for the first time in `column`, 0 otherwise. Can be used to work with unique values or remove doublons.

Parameters:

• `column`: column, formula or function

Example:

``````is_first([MyColumn])
``````

Availability: only files.

### `ntile()`¶

Sort rows according to `column`, split them in `nb_buckets` having the same number of rows and return the id of the bucket each row falls in. First bucket with smallest values has id 1.

Parameters:

• `column`: column, formula or function
• `nb_buckets`: number of buckets

Example:

``````ntile([MyColumn], 100)
``````

Availability: all data source types.

### `rank()`¶

Sort rows according to `column` in `order` (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking and there will be a discontinuity in the result. If `partition_by`is specified, the ranking is computed for each value of this `column`.

Parameters:

• `column`: column, formula or function
• `order`: either `"asc"` or `"desc"`
• `partition_by`[optional]: column, formula or function

Example:

``````rank([MyColumn], "asc")
``````
``````# compute the orders rankings, by customer
rank([order_date], "asc", [order_customer_id])
``````

Availability: all data source types.

### `dense_rank()`¶

Sort rows according to `column` in `order` (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking. The `dense_rank` function differs from `rank` in one respect: if two or more rows tie, there is no gap in the sequence of ranked values. For example, if two rows are ranked 1, the next rank is 2. If `partition_by`is specified, the ranking is computed for each value of this `column`.

Parameters:

• `column`: column, formula or function
• `order`: either `"asc"` or `"desc"`
• `partition_by`[optional]: column, formula or function

Example:

``````dense_rank([MyColumn], "desc")
``````
``````# compute the orders rankings, by customer
rank([order_date], "asc", [order_customer_id])
``````

Availability: all data source types.

### `quantile()`¶

Return the value from `column` at the quantile given by `fraction`.

Parameters:

• `column`: column, formula or function
• `fraction`: quantile fraction (e.g. 0.5 for the median)
• `interpolation`: interpolation method to use when the desired quantile lies between two data points. Choices are: `"linear"` or `"nearest"`

Example:

``````# compute median value:
quantile([MyColumn], 0.5, "linear")
``````

Availability: all data source types except MySQL.

## Value functions¶

### `sum()`¶

Compute the sum of `column`.

Parameters:

• `column`: column, formula or function

Example:

``````sum([MyColumn])
``````

### `sum_if()`¶

Compute the sum of `column` for rows where `condition` is true.

Parameters:

• `column`: column, formula or function
• `condition`: boolean expression

Example:

``````sum_if([column], [MyOtherColumn] == 3)
``````

### `avg(), avg_if()`¶

Return the average. See sum and sum_if for more details.

### `min(), min_if()`¶

Return the min. See sum and sum_if for more details.

### `max(), max_if()`¶

Return the max. See sum and sum_if for more details.

### `count(), count_if()`¶

Return the number of data rows. See sum and sum_if for more details.

### `dcount(), dcount_if()`¶

Return the distinct count (number of different values). See sum and sum_if for more details.

### `last_value()`¶

Get the last value of `value_column` when ordering by `order_column`.

Parameters:

• `value_column`: column, formula or function
• `order_column`: column, formula or function used to order the values of `value_column`

Example:

``````# get the most recent value of a time serie
last_value([value], [date])
``````

### `last_value_if()`¶

Get the last value of `value_column` for rows where `condition` is true when ordering by `order_column`.

Parameters:

• `value_column`: column, formula or function
• `order_column`: column, formula or function used to order the values of `value_column`
• `condition`: boolean expression

Example:

``````# get the last value in the previous month
last_value_if([value], [date], [date] in month(-1))
``````

## Data-quality functions¶

### `is_defined()`¶

Returns `False` if `column` is `NULL` or an empty string, otherwise returns `True`. Use `not is_defined([myCol])` to test if a column is not `NULL`.

Parameters:

• `column`: column, formula or function

Example:

``````is_defined([MyColumn])
``````

is_defined() in conditional formula in SQLServer data-sources

In a conditional formula in a SQLServer data-source, you have to use `is_defined_sqlserver_condition([MyColumn])` instead of `is_defined([MyColumn])`.

### `is_empty()`¶

Returns `True` if `column` is an empty string, `False` otherwise. Column must be of type `String`. Use `not is_empty([myCol])` to test if a column is not empty.

Parameters:

• `column`: column, formula or function

Example:

``````is_empty([MyColumn])
``````

## Misc¶

### `null()`¶

Returns the NULL value.

This can be used when you need to dcount values of a column, but you want to exclude some rows. You can create a conditional formula that will return `null()` for the rows to exclude. And then, apply your dcount on this conditional formula. The `null` rows won't be counted in the `dcount`.

To check if a column is NULL or empty, it's simpler to use the functions `is_defined` or `is_empty`.

Example:

Using a table with a list of purchases, you can count the distinct number of customers who has bought items with a price higher than 100 doing a `dcount` on the below formula. This will also work if you group the purchases by any dimensions (e.g. grouping by the category of products in a table widget).

``````IF [price]>100 THEN [client_id]
ELSE null()
``````