Skip to content

Data source formulas

When you add a formula on a datasource, you must choose its type among:

  • simple column
  • conditional column
  • value

A simple column or a conditional column is a new "virtual" column which is added to your datasource. Then, you can use this formula exactly like a standard column of your datasource, in your dashboards or in your data preparation steps.

A value is not a new column, it is a single computed value (usually a single KPI such as the number of sales in the current month).

It is possible to use a formula within another formula. We advise you to create small and simple formulas and then to combine them. This makes it easier to check the results of your formulas.

Examples

Let's say your data contains a column [Price], a column [Cost] and a column [Date]. Here are some examples of what new columns you can create.

Simple formulas

  • Simple calculus
[Price] * 10
  • Using two columns:
[Price] / [Cost] * 100
  • With parentheses:
([Price] - [Cost]) / [Price] * 100
  • Operation on Date
extract_date_part([Date], "year")

Conditional formulas

  • Simple IF condition:
[Price] == 100
  • Using and operator:
[Price] > 10 and [Country] != "USA"
  • Using or operator:
[Price] > 10 or [Cost] < 20
  • Using in operator
[Country] in ("USA", "Germany", "France")
  • Using not operator
not ([Country] in ("USA", "Germany", "France"))
  • Using full dates:
[Date] > "2010-10-24" and [Date] < "2010-10-30"
  • Using date parts:
extract_date_part([Date], "year") >= 2014
  • Using string comparison:
like([postal_code], "_____")

Value formulas

  • Simple total:
sum([Price])
  • Conditional total:
sum_if([Price], [Cost] > 10)
  • Ratio:
sum_if([Price], [Cost] > 10) / sum([Price])

Formulas of formulas

You can use a formula within another formula. As an example, you can create a simple formula named "GOLD Product":

[Category] in ["Jewelry", "Watch", "Leather goods"]

and then use it in a second formula to compute the ratio of the revenue of these GOLD products within the total revenue:

sum_if([Price], [GOLD Product]) / sum[Price])

Using dashboard variables in a formula

In a dashboard, you can define dashboard variables in the global dashboard options menu:

Global filter example with template

These variables can be used in datasource formulas. The value of a variable can be retrieved with the double bracket syntax {{__myVar__}}. Here is an example:

Global filter example with template

See here for more details about using variables in datasource formulas.

Date-time functions

extract_date_part()

Extract date part of a datetime column.

Parameters:

  • column: column, formula or function
  • date_part: can be "year", "month", "quarter", "week", "day", "dayofyear", "weekday", "hour" or "minute".
  • language[optional]: can be "en", "fr", "num-fr", "num-en". Use this option to get a literal value instead of a numerical value (valid only for month and weekday). Using the num prefix makes the data sortable (e.g. 01-January, 02-February, ...).

Details:

  • For day: the function returns the day of the month (eg. for "2022-11-25", it returns 25).
  • For dayofweek: the function returns the day rank since the beginning of the year (eg. for "2022-2-25", it returns 56 as the 25th of February if the 56th days of the year 2022).
  • For weekday: monday is day 0 and sunday is day 6.
  • For week: a week starts the monday. Days before the first monday of the year are in week 0.

Example:

extract_date_part([MyColumn], "week")

Availability: all data source types.


first_day_of_week()

Extract the first day of a week.

Parameters :

  • column: column, formula or function

Example :

first_day_of_week([MyColumn])

Availability : all data types (in SQL, only available on PostGreSQL servers).


utc_to_timezone()

Convert a column in UTC to the specified timezone.

Parameters:

  • column: column, formula or function
  • timezone: a valid timezone such as "US/Eastern" or "Europe/Paris". See full list.

Example:

utc_to_timezone([MyColumn], "Europe/Paris")

Availability: all data source types.


date_diff()

Return difference between 2 date-times in specified time unit. The result is an integer.

More precisely, date_diff() determines the number of unit of time boundaries that are crossed between the two datetimes. The result is an integer (e.g.: 1 for the difference in days between "2016-10-15 23:00:00" and "2016-10-16 03:00:00").

Parameters:

  • first_date: column, formula or function
  • second_date: column, formula or function
  • time_unit[optional]: a valid time unit. Default: "day". It can be one of: "year", "month", "day", "hour", "minute", "second".

Example:

date_diff([MyDateColumn1], [MyDateColumn2], "hour")

Availability: only MySQL, SQL Server, AWS Redshift and AWS Redshift Storage. On MySQL, the only time_unit available is the default one: "day".


time_delta()

Return difference between 2 date-times in specified time unit. The result is a float: e.g. 1.083 for the delta between "2016-10-15 14:00:00" and "2016-10-16 16:00:00"

Parameters:

  • first_date: column, formula or function
  • second_date: column, formula or function
  • time_unit[optional]: a valid time unit. Default: "day". It can be one of: "day", "hour", "minute", "second".

Example:

time_delta([MyDateColumn1], [MyDateColumn2], "hour")

Availability: All source types.


now()

Return current date-time (in UTC).

Availability: All source types.


date()

Create a date from year, month and day data.

Parameters:

  • year: column, formula or function that returns the year of the date to create.
  • month[optional]: column, formula or function that returns the month of the date to create. Default: 1 (January).
  • day[optional]: column, formula or function that returns the day of the date to create. Default: 1 (first day of the month).

Example:

date([YearColumn], [MonthColumn])

Availability: All source types except Microsoft SQL Server.


date_add()

Add a certain interval to a date.

Parameters:

  • myDate: column, formula or function containing the initial date.
  • interval_type: type of interval to add. It can be one of: "day", "hour", "minute", "second".
  • nb_intervals: number of intervals to add. A negative value will substract the interval.

Example:

# to add 10 days to a given date
date_add([myDate], "day", 10)

Availability: All source types except. For files and APIs, only "day" is accepted.


Date-time filtering

For date/time columns, you can use special functions to compute conditions.

Examples

If your data contains a column [Timestamp], you can filter it on:

  • today
[Timestamp] in day(0)
  • yesterday
[Timestamp] in day(-1)
  • last 24 hours
[Timestamp] >= hour(-24)
  • current month
[Timestamp] In month(0)
  • last 2 months with a rolling window (= last 62 days)
Timestamp] >= days(-62)
  • current month until yesterday but not including today
[Timestamp] >= month(0) and [Timestamp] < day(0)
  • current year to date (from first day of the year to yesterday included)
[Timestamp] In year_to_date(0)
  • previous month to date (from first day of previous month to same day as today, last month)
[Timestamp] In month_to_date(-1)

Note on date/time data bounds

  • timestamp <= period(arg) means all timestamps until the end of the period included
  • timestamp < period(arg) means all timestamps until before the beginning of the period
  • timestamp >= period(arg) means all timestamps since the beginning of the period included
  • timestamp > period(arg) means all timestamps since after the end of the period

  • period(0) means the current period (e.g. today)

  • period(-1) means the previous period (e.g. yesterday), and so on.

  • period_to_date(0) means the current period until yesterday (included). For year_to_date(0), if today is 2018-06-20, the range starts at 2018-01-01 00:00:00 and ends at 2018-06-19 23:59:59

  • period_to_date(-1) means from first day of previous period to same day as today in previous period. For year_to_date(-1), if today is 2018-06-20, the range starts at 2017-01-01 00:00:00 and ends at 2017-06-19 23:59:59

Available functions

  • minute(arg)
  • hour(arg)
  • day(arg)
  • week(arg)
  • week_to_date(arg)
  • month(arg)
  • month_to_date(arg)
  • quarter(arg)
  • quarter_to_date(arg)
  • year(arg)
  • year_to_date(arg)

String functions

like()

Compares column values to a pattern. This function is case-sensitive for all the datasources (except for MySQL and SQLServer where it is case-insensitive).

Use ilike to be case-insensitive.

Parameters:

  • column: column, formula or function
  • pattern: Pattern to match. Can contain wildcards (% matches zero or more characters, _ matches a single character).

Example:

# returns true if zip_code length is five and first two digits are "75"
like([zip_code], "75___")
# returns true if url contains the string "/product/"
like([url], "%/product/%")

Availability: all data source types.


ilike()

Compares column values to a pattern. This function is case-insensitive for all the datasources.

Parameters:

  • column: column, formula or function
  • pattern: Pattern to match. Can contain wildcards (% matches zero or more characters, _ matches a single character).

Example:

# returns true if zip_code length is five and first two digits are "75"
ilike([zip_code], "75___")
# returns true if url contains the string "/product/" or "/PRODUCT/"
ilike([url], "%/product/%")

Availability: all data source types.


contains()

Searches a substring in a string. This function is case-sensitive for all the datasources (except for MySQL and SQLServer where it is case-insensitive).

Use icontains to be case-insensitive.

Parameters:

  • column: column, formula or function
  • substring: substring to search.

Example:

# returns true if zip_code length contains "75"
contains([zip_code], "75")
# returns true if url contains the string "product"
contains([url], "product")

Availability: all data source types.


icontains()

Searches a substring in a string. This function is case-insensitive for all the datasources.

Parameters:

  • column: column, formula or function
  • substring: substring to search.

Example:

# returns true if zip_code length is five and first two digits are "75"
icontains([zip_code], "75")
# returns true if url contains the string "product" or "PRODUCT"
icontains([url], "product")

Availability: all data source types.


regexp_match()

Returns True if a string matches a regular expression else returns False.

Parameters:

  • column: column, formula or function.
  • pattern: Regular expression to match.
# returns true if zip_code length is five and first two digits are "75" and last 3 digits are numbers
regexp_match([zip_code], "^75[0-9]{3}$")
# details:
# ^ matches the begining of the string
# 75 matches the string "75"
# [0-9] matches a decimal between 0 and 9
# {3} indicates that [0-9] must be matched exactly three times
# $ matches the end of the string

# returns true if a string is a date with format YYYY-MM-DD
regexp_match([date_str], "^[0-9]{4}-[0-9]{2}-[0-9]{2}$")

Availability: all data sources types except MS SQLServer.


left()

Returns the n first characters of a string.

Parameters:

  • column: column, formula or function
  • n: number of characters to extract

Example:

left([MyColumn], 5)

Availability: all data source types.


Returns the last n characters of a string.

Parameters:

  • column: column, formula or function
  • n: number of characters to extract

Example:

right([MyColumn], 5)

Availability: all data source types.


str_len()

Returns the length of a string.

Parameters:

  • column: column, formula or function

Example:

str_len([MyColumn])

Availability: all data source types.


lower_case()

Returns column formatted in lower case.

Parameters:

  • column: column, formula or function

Example:

lower_case([MyColumn])

Availability: all data source types.


upper_case()

Returns column formatted in upper case.

Parameters:

  • column: column, formula or function

Example:

upper_case([MyColumn])

Availability: all data source types.


lpad()

Returns column, left-padded to length nb_chars characters with padding_char, replicated as many times as necessary.

Parameters:

  • column: column, formula or function to pad
  • nb_chars: length of output string
  • padding_char: character to use for padding

Example:

lpad([MyColumn], 4, "0")

rpad()

Returns column, right-padded to length nb_chars characters with padding_char, replicated as many times as necessary.

Parameters:

  • column: column, formula or function to pad
  • nb_chars: length of output string
  • padding_char: character to use for padding

Example:

rpad([MyColumn], 4, " ")

split()

Splits a string on the specified delimiter and returns the part at the specified position.

Parameters:

  • column: column, formula or function
  • delimiter: delimiter string
  • part: position of the portion to return (counting from 1).

Example:

split([MyStringColumn], "/", 2)

Availability: all data source types except Microsoft SQL Server.


substring()

Extract a substring from column values.

Parameters:

  • column: column, formula or function
  • first_char: Index of first char extracted. Starts at 1.
  • nb_chars: Length of extracted substring

Example:

substring([MyStringColumn], 1, 5)

Availability: all data source types.


replace()

Replace all occurrences of substring_from with substring_to.

Parameters:

  • column: column, formula or function
  • substring_from: Substring to replace
  • substring_to: New substring

Example:

replace([MyStringColumn], "ABC", "X")

Availability: all data source types.


Conversion functions

float()

Convert string column values to float values. If the string cannot be converted to a numeric value, the function will return an error.

If your input column contains string which are not representation of float values, you can create a conditional formula using regexp_match([input_column],"[+-]?([0-9]*[.])?[0-9]+"). If this is true, return float([input_column]) else return a default value (either null() or 0).

Parameters:

  • column: column, formula or function

Example:

float([MyStringColumn])

Availability: all datasources except files.


numeric()

Convert string column values to numeric values. If the string cannot be converted to a numeric value, the result is 0.

Parameters:

  • column: column, formula or function

Example:

numeric([MyStringColumn])

Availability: only files.


str()

Convert column values to string values.

Parameters:

  • column: column, formula or function

Example:

str([MyColumn])

Availability: all data source types.


round()

Convert column values to their closest integer values. Float values are casted to integer values. Undefined values are casted to zero. For example, round(3.1) returns 3 and round(3.9) returns 4.

Parameters:

  • column: column, formula or function

Example:

round([MyColumn])

Availability: all data source types.


to_int()

Convert column values to integer values. Float values are casted to integer values. Undefined values are casted to zero.

to_int floors the value, i.e. it returns the largest integer less than or equal to the input value. Use round if you need to round a value to the closest integer number.

Returns 0 if the input value is NULL.

Parameters:

  • column: column, formula or function

Example:

to_int([MyColumn])

Availability: all data source types.


datetime_from_string()

Convert a string to a Datetime.

The function will return an error if the passed string does not match the given format in a single row. If you are not sure about the quality of your data, use the function regexp_match to check the format. And if it doesn't match for a row, return null() instead of datetime_from_string.

Parameters:

  • string: column, formula or function.
  • format: the format of the input string to convert. If your datasource is a file, use the Python strftime format (see documentation strftime.org). If your datasource is a PostGreSQL table or a Serenytics storage, use the PostGreSQL syntax (see documentation postgresql.org. If your database is a MySQL table, use the MySQL syntax (see documentation dev.mysql.com). If your database is a BigQuery table, use the BigQuery syntax (see documentation cloud.google.com).

Example:

In this example, the input column contains string such as: 2013--01--23 21:17

# for a file
datetime_from_string([date_str], "%Y--%m--%d %H:%M")

# for a PostGreSQL or a Serenytics storage
datetime_from_string([date_str], "YYYY--MM--DD HH24:MI")

# for a MySQL table
datetime_from_string([date_str], "%Y--%m--%d %H:%i")

Availability: All source types except Microsoft SQL Server.


json_extract_path_text()

Extract a value within a string representing a JSON. The parameters specify the path to the value. It returns the empty string if the path does not exist in the JSON.

Parameters:

  • string: column, formula or function.
  • path1: first key of the JSON path
  • [path2]: second key of the JSON path
  • up to 5 keys

Availability: Only Serenytics Storages, Redshift, PostGreSQL and MySQL.

Example:

# if the value is '{"first_name":"john", "last_name": "doe"}', returns 'john'.
json_extract_path_text([json_str], "fist_name")

# if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}'
# returns 'paris'.
json_extract_path_text([date_str], "address", "city")

# if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}'
# returns an empty string.
json_extract_path_text([date_str], "address", "zipcode")

Ranking functions

is_first()

Return 1 when current value appears for the first time in column, 0 otherwise. Can be used to work with unique values or remove doublons.

Parameters:

  • column: column, formula or function

Example:

is_first([MyColumn])

Availability: only files.


ntile()

Sort rows according to column, split them in nb_buckets having the same number of rows and return the id of the bucket each row falls in. First bucket with smallest values has id 1.

Parameters:

  • column: column, formula or function
  • nb_buckets: number of buckets

Example:

ntile([MyColumn], 100)

Availability: all data source types.


rank()

Sort rows according to column in order (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking and there will be a discontinuity in the result. If partition_byis specified, the ranking is computed for each value of this column.

Parameters:

  • column: column, formula or function
  • order: either "asc" or "desc"
  • partition_by[optional]: column, formula or function

Example:

rank([MyColumn], "asc")
# compute the orders rankings, by customer
rank([order_date], "asc", [order_customer_id])

Availability: all data source types.


dense_rank()

Sort rows according to column in order (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking. The dense_rank function differs from rank in one respect: if two or more rows tie, there is no gap in the sequence of ranked values. For example, if two rows are ranked 1, the next rank is 2. If partition_byis specified, the ranking is computed for each value of this column.

Parameters:

  • column: column, formula or function
  • order: either "asc" or "desc"
  • partition_by[optional]: column, formula or function

Example:

dense_rank([MyColumn], "desc")
# compute the orders rankings, by customer
rank([order_date], "asc", [order_customer_id])

Availability: all data source types.


Value functions

sum()

Compute the sum of column.

Parameters:

  • column: column, formula or function

Example:

sum([MyColumn])

sum_if()

Compute the sum of column for rows where condition is true.

Parameters:

  • column: column, formula or function
  • condition: boolean expression

Example:

sum_if([column], [MyOtherColumn] == 3)

avg(), avg_if()

Return the average. See sum and sum_if for more details.

min(), min_if()

Return the min. See sum and sum_if for more details.

max(), max_if()

Return the max. See sum and sum_if for more details.

count(), count_if()

Return the number of data rows. See sum and sum_if for more details.

dcount(), dcount_if()

Return the distinct count (number of different values). See sum and sum_if for more details.

quantile_if()

Return the value from column at the quantile given by fraction, with a condition

Parameters:

  • column: column, formula or function
  • fraction: quantile fraction (e.g. 0.5 for the median)
  • interpolation: interpolation method to use when the desired quantile lies between two data points. Choices are: "linear" or "nearest"
  • condition: condition to apply before the computation is done

Example:

# compute median value:
quantile_if([MyColumn], 0.5, "linear", [other_col]=="PROD")

Availability: only files and API.


last_value()

Get the last value of value_column when ordering by order_column.

Parameters:

  • value_column: column, formula or function
  • order_column: column, formula or function used to order the values of value_column

Example:

# get the most recent value of a time serie
last_value([value], [date])

last_value_if()

Get the last value of value_column for rows where condition is true when ordering by order_column.

Parameters:

  • value_column: column, formula or function
  • order_column: column, formula or function used to order the values of value_column
  • condition: boolean expression

Example:

# get the last value in the previous month
last_value_if([value], [date], [date] in month(-1))

Data-quality functions

is_defined()

Returns False if column is NULL or an empty string, otherwise returns True. Use not is_defined([myCol]) to test if a column is not NULL.

Parameters:

  • column: column, formula or function

Example:

is_defined([MyColumn])

is_defined() in conditional formula in SQLServer data-sources

In a conditional formula in a SQLServer data-source, you have to use is_defined_sqlserver_condition([MyColumn]) instead of is_defined([MyColumn]).


is_empty()

Returns True if column is an empty string, False otherwise. Column must be of type String. Use not is_empty([myCol]) to test if a column is not empty.

Parameters:

  • column: column, formula or function

Example:

is_empty([MyColumn])

Misc

null()

Returns the NULL value.

This can be used when you need to dcount values of a column, but you want to exclude some rows. You can create a conditional formula that will return null() for the rows to exclude. And then, apply your dcount on this conditional formula. The null rows won't be counted in the dcount.

To check if a column is NULL or empty, it's simpler to use the functions is_defined or is_empty.

Example:

Using a table with a list of purchases, you can count the distinct number of customers who has bought items with a price higher than 100 doing a dcount on the below formula. This will also work if you group the purchases by any dimensions (e.g. grouping by the category of products in a table widget).

IF [price]>100 THEN [client_id]
ELSE null()