# Data source formulas¶

Data source formulas are new data columns computed on-demand. There exists three types of such formulas:

• simple column
• conditional column
• value

## Examples¶

Let's say your data contains a column `[Price]`, a column `[Cost]` and a column `[Date]`. Here are some examples of what new columns you can create.

### Simple formulas¶

• Simple calculus
 `1` ```[Price] * 10 ```
• Using two columns:
 `1` ```[Price] / [Cost] * 100 ```
• With parentheses:
 `1` ```([Price] - [Cost]) / [Price] * 100 ```
• Operation on Date
 `1` ```extract_date_part([Date], "year") ```

### Conditional formulas¶

• Simple IF condition:
 `1` ```[Price] == 100 ```
• Using and operator:
 `1` ```[Price] > 10 and [Country] != "USA" ```
• Using or operator:
 `1` ```[Price] > 10 or [Cost] < 20 ```
• Using in operator
 `1` ```[Country] in ("USA", "Germany", "France") ```
• Using full dates:
 `1` ```[Date] > "2010-10-24" and [Date] < "2010-10-30" ```
• Using date parts:
 `1` ```extract_date_part([Date], "year") >= 2014 ```
• Using string comparison:
 `1` ```like([postal_code], "_____") ```

### Value formulas¶

• Simple total:
 `1` ```sum([Price]) ```
• Conditional total:
 `1` ```sum_if([Price], [Cost] > 10) ```
• Ratio:
 `1` ```sum_if([Price], [Cost] > 10) / sum([Price]) ```

### Formulas of formulas¶

You can use a formula within another formula. As an example, you can create a simple formula named "GOLD Product":

 `1` ```[Category] in ["Jewelry", "Watch", "Leather goods"] ```

and then use it in a second formula to compute the ratio of the revenue of these GOLD products within the total revenue:

 `1` ```sum_if([Price], [GOLD Product]) / sum[Price]) ```

### Using dashboard variables in a formula¶

In a dashboard, you can define dashboard variables in the global dashboard options menu: These variables can be used in datasource formulas. The value of a variable can be retrieved with the double bracket syntax `{{__myVar__}}`: ## Date-time functions¶

### `extract_date_part()`¶

Extract date part of a datetime column.

Parameters:

• `column`: column, formula or function
• `date_part`: can be `"year"`, `"month"`, `"quarter"`, `"week"`, `"day"`, `"weekday"`, `"hour"` or `"minute"`.
• `language`[optional]: can be `"en"`, `"fr"`, `"num-fr"`, `"num-en"`. Use this option to get a literal value instead of a numerical value (valid only for months). Using the `num` prefix makes the data sortable (e.g. 01-January, 02-February, ...).

Example:

 `1` ```extract_date_part([MyColumn], "week") ```

Availability: all data source types.

### `utc_to_timezone()`¶

Convert a column in UTC to the specified timezone.

Parameters:

• `column`: column, formula or function
• `timezone`: a valid timezone such as `"US/Eastern"` or `"Europe/Paris"`. See full list.

Example:

 `1` ```utc_to_timezone([MyColumn], "Europe/Paris") ```

Availability: all data source types.

### `date_diff()`¶

Return difference between 2 date-times in specified time unit. The result is an integer.

More precisely, `date_diff()` determines the number of unit of time boundaries that are crossed between the two datetimes. The result is an integer (e.g.: 1 for the difference in days between "2016-10-15 23:00:00" and "2016-10-16 03:00:00").

Parameters:

• `first_date`: column, formula or function
• `second_date`: column, formula or function
• `time_unit`[optional]: a valid time unit. Default: `"day"`. It can be one of: `"year"`, `"month"`, `"day"`, `"hour"`, `"minute"`, `"second"`.

Example:

 `1` ```date_diff([MyDateColumn1], [MyDateColumn2], "hour") ```

Availability: only MySQL, SQL Server, Redshift and Storage. On MySQL, the only `time_unit` available is the default one: `"day"`.

### `time_delta()`¶

Return difference between 2 date-times in specified time unit. The result is a float: e.g. 1.083 for the delta between "2016-10-15 14:00:00" and "2016-10-16 16:00:00"

Parameters:

• `first_date`: column, formula or function
• `second_date`: column, formula or function
• `time_unit`[optional]: a valid time unit. Default: `"day"`. It can be one of: `"day"`, `"hour"`, `"minute"`, `"second"`.

Example:

 `1` ```time_delta([MyDateColumn1], [MyDateColumn2], "hour") ```

Availability: All source types.

### `now()`¶

Return current date-time.

Availability: All source types.

### `date()`¶

Create a date from year, month and day data.

Parameters:

• `year`: column, formula or function that returns the year of the date to create.
• `month`[optional]: column, formula or function that returns the month of the date to create. Default: 1 (January).
• `day`[optional]: column, formula or function that returns the day of the date to create. Default: 1 (first day of the month).

Example:

 `1` ```date([YearColumn], [MonthColumn]) ```

Availability: All source types except Microsoft SQL Server.

### `date_add()`¶

Add a certain interval to a date.

Parameters:

• `myDate`: column, formula or function containing the initial date.
• `interval_type`: type of interval to add. It can be one of: `"day"`, `"hour"`, `"minute"`, `"second"`.
• `nb_intervals`: number of intervals to add. A negative value will substract the interval.

Example:

 ```1 2``` ```# to add 10 days to a given date date_add([myDate], "day", 10) ```

Availability: All source types except files and APIs.

## Date-time filtering¶

For date/time columns, you can use special functions to compute conditions.

### Examples¶

If your data contains a column `[Timestamp]`, you can filter it on:

• today
 `1` ```[Timestamp] in day(0) ```
• yesterday
 `1` ```[Timestamp] in day(-1) ```
• last 24 hours
 `1` ```[Timestamp] >= hour(-24) ```
• current month
 `1` ```[Timestamp] In month(0) ```
• last 2 months with a rolling window (= last 62 days)
 `1` ```Timestamp] >= days(-62) ```
• current month until yesterday but not including today
 `1` ```[Timestamp] >= month(0) and [Timestamp] < day(0) ```
• current year to date (from first day of the year to yesterday included)
 `1` ```[Timestamp] In year_to_date(0) ```
• previous month to date (from first day of previous month to same day as today, last month)
 `1` ```[Timestamp] In month_to_date(-1) ```

### Note on date/time data bounds¶

• `timestamp <= period(arg)` means all timestamps until the end of the period included
• `timestamp < period(arg)` means all timestamps until before the beginning of the period
• `timestamp >= period(arg)` means all timestamps since the beginning of the period included
• `timestamp > period(arg)` means all timestamps since after the end of the period

• `period(0)` means the current period (e.g. today)

• `period(-1)` means the previous period (e.g. yesterday), and so on.

• `period_to_date(0)` means the current period until yesterday (included). For `year_to_date(0)`, if today is `2018-06-20`, the range starts at `2018-01-01 00:00:00` and ends at `2018-06-19 23:59:59`

• `period_to_date(-1)` means from first day of previous period to same day as today in previous period. For `year_to_date(-1)`, if today is `2018-06-20`, the range starts at `2017-01-01 00:00:00` and ends at `2017-06-19 23:59:59`

### Available functions¶

• `minute(arg)`
• `hour(arg)`
• `day(arg)`
• `week(arg)`
• `week_to_date(arg)`
• `month(arg)`
• `month_to_date(arg)`
• `quarter(arg)`
• `quarter_to_date(arg)`
• `year(arg)`
• `year_to_date(arg)`

## String functions¶

### `like()`¶

Compare column values to pattern.

Parameters:

• `column`: column, formula or function
• `pattern`: Pattern to match. Can contain wildcards (`%` matches zero or more characters, `_` matches a single character).

Example:

 ```1 2 3 4``` ```# returns true if zip_code length is five and first two digits are "75" like([zip_code], "75___") # returns true if url contains the string "/product/" like([url], "%/product/%") ```

Availability: all data source types.

### `regexp_match()`¶

Return True if a string matches a regular expression else returns False.

Parameters:

• `column`: column, formula or function.
• `pattern`: Regular expression to match.
 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```# returns true if zip_code length is five and first two digits are "75" and last 3 digits are numbers regexp_match([zip_code], "^75[0-9]{3}\$") # details: # ^ matches the begining of the string # 75 matches the string "75" # [0-9] matches a decimal between 0 and 9 # {3} indicates that [0-9] must be matched exactly three times # \$ matches the end of the string # returns true if a string is a date with format YYYY-MM-DD regexp_match([date_str], "^[0-9]{4}-[0-9]{2}-[0-9]{2}\$") ```

Availability: all data sources types except MS SQLServer.

### `left()`¶

Returns the n first characters of a string.

Parameters:

• `column`: column, formula or function
• `n`: number of characters to extract

Example:

 `1` ```left([MyColumn], 5) ```

Availability: all data source types.

Returns the last n characters of a string.

Parameters:

• `column`: column, formula or function
• `n`: number of characters to extract

Example:

 `1` ```right([MyColumn], 5) ```

Availability: all data source types.

### `str_len()`¶

Returns the length of a string.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```str_len([MyColumn]) ```

Availability: all data source types.

### `lower_case()`¶

Returns `column` formatted in lower case.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```lower_case([MyColumn]) ```

Availability: all data source types.

### `upper_case()`¶

Returns `column` formatted in upper case.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```upper_case([MyColumn]) ```

Availability: all data source types.

### `lpad()`¶

Returns `column`, left-padded to length `nb_chars` characters with `padding_char`, replicated as many times as necessary.

Parameters:

• `column`: column, formula or function to pad
• `nb_chars`: length of output string
• `padding_char`: character to use for padding

Example:

 `1` ```lpad([MyColumn], 4, "0") ```

### `rpad()`¶

Returns `column`, right-padded to length `nb_chars` characters with `padding_char`, replicated as many times as necessary.

Parameters:

• `column`: column, formula or function to pad
• `nb_chars`: length of output string
• `padding_char`: character to use for padding

Example:

 `1` ```rpad([MyColumn], 4, " ") ```

### `split()`¶

Splits a string on the specified delimiter and returns the part at the specified position.

Parameters:

• `column`: column, formula or function
• `delimiter`: delimiter string
• `part`: position of the portion to return (counting from 1).

Example:

 `1` ```split([MyStringColumn], "/", 2) ```

Availability: all data source types except Microsoft SQL Server.

### `substring()`¶

Extract a substring from column values.

Parameters:

• `column`: column, formula or function
• `first_char`: Index of first char extracted. Starts at 1.
• `nb_chars`: Length of extracted substring

Example:

 `1` ```substring([MyStringColumn], 1, 5) ```

Availability: all data source types.

## Conversion functions¶

### `numeric()`¶

Convert string column values to numeric values. If the string cannot be converted to a numeric value, the result is 0.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```numeric([MyStringColumn]) ```

Availability: only files.

### `str()`¶

Convert column values to string values.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```str([MyColumn]) ```

Availability: all data source types.

### `round()`¶

Convert `column` values to their closest integer values. Float values are casted to integer values. Undefined values are casted to zero. For example, `round(3.1)` returns `3` and `round(3.9)` returns `4`.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```round([MyColumn]) ```

Availability: all data source types.

### `to_int()`¶

Convert `column` values to integer values. Float values are casted to integer values. Undefined values are casted to zero.

`to_int` floors the value, i.e. it returns the largest integer less than or equal to the input value. Use `round` if you need to round a value to the closest integer number.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```to_int([MyColumn]) ```

Availability: all data source types.

### `datetime_from_string()`¶

Convert a string to a Datetime.

The function will return an error if the passed string does not match the given format in a single row. If you are not sure about the quality of your data, use the function `regexp_match` to check the format. And if it doesn't match for a row, return `null()` instead of `datetime_from_string`.

Parameters:

• `string`: column, formula or function.
• `format`: the format of the input string to convert. If your datasource is a file, use the Python `strftime` format (see documentation strftime.org). If your datasource is a PostGreSQL table or a Serenytics storage, use the PostGreSQL syntax (see documentation postgresql.org. If your database is a MySQL table, use the MySQL syntax (see documentation dev.mysql.com).

Example:

In this example, the input column contains string such as: `2013--01--23 21:17`

 ```1 2 3 4 5 6 7 8``` ```# for a file datetime_from_string([date_str], "%Y--%m--%d %H:%M") # for a PostGreSQL or a Serenytics storage datetime_from_string([date_str], "YYYY--MM--DD HH24:MI") # for a MySQL table datetime_from_string([date_str], "%Y--%m--%d %H:%i") ```

Availability: All source types except Microsoft SQL Server.

### `json_extract_path_text()`¶

Extract a value within a string representing a JSON. The parameters specify the path to the value. It returns the empty string if the path does not exist in the JSON.

Parameters:

• `string`: column, formula or function.
• `path1`: first key of the JSON path
• `[path2]`: second key of the JSON path
• up to 5 keys

Availability: Only Serenytics Storages, Redshift and PostGreSQL.

Example:

 ``` 1 2 3 4 5 6 7 8 9 10``` ```# if the value is '{"first_name":"john", "last_name": "doe"}', returns 'john'. json_extract_path_text([json_str], "fist_name") # if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}' # returns 'paris'. json_extract_path_text([date_str], "address", "city") # if the value is '{"first_name":"john", "last_name": "doe", "address": {"city" : "paris"}}' # returns an empty string. json_extract_path_text([date_str], "address", "zipcode") ```

## Ranking functions¶

### `is_first()`¶

Return 1 when current value appears for the first time in `column`, 0 otherwise. Can be used to work with unique values or remove doublons.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```is_first([MyColumn]) ```

Availability: only files.

### `ntile()`¶

Sort rows according to `column`, split them in `nb_buckets` having the same number of rows and return the id of the bucket each row falls in. First bucket with smallest values has id 1.

Parameters:

• `column`: column, formula or function
• `nb_buckets`: number of buckets

Example:

 `1` ```ntile([MyColumn], 100) ```

Availability: all data source types.

### `rank()`¶

Sort rows according to `column` in `order` (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking and there will be a discontinuity in the result. If `partition_by`is specified, the ranking is computed for each value of this `column`.

Parameters:

• `column`: column, formula or function
• `order`: either `"asc"` or `"desc"`
• `partition_by`[optional]: column, formula or function

Example:

 `1` ```rank([MyColumn], "asc") ```
 ```1 2``` ```# compute the orders rankings, by customer rank([order_date], "asc", [order_customer_id]) ```

Availability: all data source types.

### `dense_rank()`¶

Sort rows according to `column` in `order` (either "asc" or "desc") and return the rows ranking. If several rows have the same value, they get the same ranking. The `dense_rank` function differs from `rank` in one respect: if two or more rows tie, there is no gap in the sequence of ranked values. For example, if two rows are ranked 1, the next rank is 2. If `partition_by`is specified, the ranking is computed for each value of this `column`.

Parameters:

• `column`: column, formula or function
• `order`: either `"asc"` or `"desc"`
• `partition_by`[optional]: column, formula or function

Example:

 `1` ```dense_rank([MyColumn], "desc") ```
 ```1 2``` ```# compute the orders rankings, by customer rank([order_date], "asc", [order_customer_id]) ```

Availability: all data source types.

### `quantile()`¶

Return the value from `column` at the quantile given by `fraction`.

Parameters:

• `column`: column, formula or function
• `fraction`: quantile fraction (e.g. 0.5 for the median)
• `interpolation`: interpolation method to use when the desired quantile lies between two data points. Choices are: `"linear"` or `"nearest"`

Example:

 ```1 2``` ```# compute median value: quantile([MyColumn], 0.5, "linear") ```

Availability: all data source types except MySQL.

## Value functions¶

### `sum()`¶

Compute the sum of `column`.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```sum([MyColumn]) ```

### `sum_if()`¶

Compute the sum of `column` for rows where `condition` is true.

Parameters:

• `column`: column, formula or function
• `condition`: boolean expression

Example:

 `1` ```sum_if([column], [MyOtherColumn] == 3) ```

### `avg(), avg_if()`¶

Return the average. See sum and sum_if for more details.

### `min(), min_if()`¶

Return the min. See sum and sum_if for more details.

### `max(), max_if()`¶

Return the max. See sum and sum_if for more details.

### `count(), count_if()`¶

Return the number of data rows. See sum and sum_if for more details.

### `dcount(), dcount_if()`¶

Return the distinct count (number of different values). See sum and sum_if for more details.

### `last_value()`¶

Get the last value of `value_column` when ordering by `order_column`.

Parameters:

• `value_column`: column, formula or function
• `order_column`: column, formula or function used to order the values of `value_column`

Example:

 ```1 2``` ```# get the most recent value of a time serie last_value([value], [date]) ```

### `last_value_if()`¶

Get the last value of `value_column` for rows where `condition` is true when ordering by `order_column`.

Parameters:

• `value_column`: column, formula or function
• `order_column`: column, formula or function used to order the values of `value_column`
• `condition`: boolean expression

Example:

 ```1 2``` ```# get the last value in the previous month last_value_if([value], [date], [date] in month(-1)) ```

## Data-quality functions¶

### `is_defined()`¶

Returns `False` if `column` is `NULL` or an empty string, otherwise returns `True`. Use `not is_defined([myCol])` to test if a column is not `NULL`.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```is_defined([MyColumn]) ```

### `is_empty()`¶

Returns `True` if `column` is an empty string, `False` otherwise. Column must be of type `String`. Use `not is_empty([myCol])` to test if a column is not empty.

Parameters:

• `column`: column, formula or function

Example:

 `1` ```is_empty([MyColumn]) ```

## Misc¶

### `null()`¶

Returns the NULL value.

This can be used when you need to dcount values of a column, but you want to exclude some rows. You can create a conditional formula that will return `null()` for the rows to exclude. And then, apply your dcount on this conditional formula. The `null` rows won't be counted in the `dcount`.

To check if a column is NULL or empty, it's simpler to use the functions `is_defined` or `is_empty`.

Example:

Using a table with a list of purchases, you can count the distinct number of customers who has bought items with a price higher than 100 doing a `dcount` on the below formula. This will also work if you group the purchases by any dimensions (e.g. grouping by the category of products in a table widget).

 ```1 2``` ```IF [price]>100 THEN [client_id] ELSE null() ```