This guide explains the inner workings of Baserow formulas for developers.
See the understanding baserow formulas guide if you instead want a general guide of how to use formulas as a user within Baserow.
In Baserow there is a special formula field type. The user enters a single formula for a whole formula field which is then used to calculate every cell in the formula field.
Baserow formulas are written in the open source Baserow Formula language which is a simple expression based language similar to formulas you will find in other spreadsheet tools. The Baserow Formula language itself is a fully functioning programming language with a :
{git repo root}/formulas/*.g4
.{git repo root}/formulas/build.sh
for the script that generates these using
docker.web-frontend/modules/database/formula/parser
.backend/src/baserow/core/formulas/parser/generated
.backend/src/baserow/core/formulas/ast/tree.py
.backend/src/baserow/core/formulas/types
.backend/src/baserow/core/formulas/expression_generator
.A Formula is just a single expression which can consist of literals (string, int, decimal), functions, operators and nested formulas.
Functions are defined by implementing a BaserowFunctionDefinition and storing it in
the formula_function_registry
(which allows plugins to trivially add their own custom
functions)
Functions have specific or unlimited number of arguments. These arguments can be type checked and forced to have specific types otherwise an error is shown. The function itself has a return type.
A function can change its behaviour at type checking time, for example if a function is given two arguments of different types it could first choose to wrap them in a BaserowToText function call.
Functions define how to transform themselves into a Django Expression to calculate their result.
Plugins can easily add new Baserow formula functions and types by implementing
a BaserowFunctionDefinition
and registering it in the formula_function_registry
.
Hint: Use the various {Zero/One/Two/Three}ArgumentBaserowFunctionDefinition
sub-classes
get a nicer set of functions to implement corresponding to the arguments.
Operators are implemented as a mapping from an operator to a BaserowFunctionDefinition
. So the +
operator is just a fancy way of calling the BaserowAdd
function.
Operators have precedence as defined by the rule ordering in the BaserowParser.g4
grammar file.
Operators can have different implementations depending on the input types. For example
'a'+'b'
concatenates the two strings together, whereas 1+2
performs numeric
addition.
Formulas ultimately compile down to a prepared SQL statement which is used to
calculate and store the formula results in a PostgreSQL column.
To generate the SQL we transform a baserow formula into a Django Expression and then rely on Django to generate the SQL for us.
Formulas can reference other fields including other formula fields. They cannot reference themselves and circular references are disallowed.
Because formula fields can reference other fields, now whenever a field is edited, deleted, restored or created it might also affect other fields which depend on it. In these situations we construct the reference tree of all fields in a table, recalculate what each fields type is and if it has changed refresh that formula fields values.
Formula types can have their options override by user configurable formatting options.
For example:
1+1
is initially calculated to have the type of
BaserowFormulaNumberType(num_decimal_places=0)
.FormulaField
model by setting
its formula_type
field to number
, it’s num_decimal_places
field to 0
and
setting all other type option fields to null.1
.num_decimal_places
to 2
, which changes the
corresponding field on the model and also changes the type of that field to
BaserowFormulaNumberType(num_decimal_places=2)
.1.00
These user supplied type/formatting options will be reset when the overall type of a formula field changes, otherwise they will stick around.
There is an invalid type which stores an error on the FormulaField
model for formulas
which have an invalid type such as (1+'a')
etc.
Generally functions prefer to coerce types in sensible ways that a non-technical user
might expect. For example, CONCAT(field('a date field'),field('a boolean field'))
should work without having to use a function to cast each to text. * However we don’t
allow users to do odd things like compare a boolean to a date and instead provide a type
error rather than always returning false or something.
Renaming a field will rename any references to that field in a formula. This is achieved by the following process:
When deleting a field we:
'name'
which will then fix those
broken formulas.This also can happen when a field is restored from deletion or a field is renamed.
Importantly this means creating, restoring or renaming fields can cause any number of other fields in the same table to go from the invalid type to a valid type and hence we need to check and re-type the table in these situations.
Formula fields can be sorted and filtered using Baserow’s existing view filters based on
the BaserowFormulaType of the field. Simply using
the FormulaFieldType.compatible_with_formula_types
helper function when defining your view filters compatible_field_types
to say which of
the Formula Types work with your view filter.