Pre-Aggregation vs. Post-Aggregation Calculations

Fundamentally, there are two levels of calculation that occur in Birst:

Database executed: pre-aggregation or physical queries. This is used for custom measures and custom attributes. Report expressions can use pre-aggregation if the Push Down Expressions to Database feature is enabled. See Pre-Aggregate Report Expressions.
Note: The Push Down Expressions to Database setting is enabled by default.

Infor logical layer executed: post-aggregation or logical queries, using BQL. This is used for ETL scripts and report expressions.

For this reason, there are two forms of expressions in Birst. Because they serve different purposes in different contexts they must be different in some important ways. This is why the syntax that is used for report expressions, ETL scripts, and custom measures and attributes is not exactly the same.

Pre-Aggregation Calculations

Custom attributes and custom measures are “pre-aggregation” calculations. That means that expressions you enter there are turned into SQL and sent down to the database. This is commonly referred to as “function shipping” – where the load is pushed to the database. The database then can do the calculations row-by-row, aggregate the results and only return a highly aggregated, summarized result to the Birst logical layer.

Non-OLAP tools have you execute a query and then do all their calculations on the resulting data. But when you have large databases, it is completely infeasible to move millions or billions of records to the client for the purpose of aggregation and summary (commonly referred to as the “fire on the wire” problem). So with those tools, you have to hand craft SQL that does whatever pre-summarization you want. In general, you want volume calculations to be as close to the data at rest as possible (the disk or memory storage).

Because these non-OLAP tools require hand-crafting of SQL on large data sets, end-users cannot author reports with calculations in them. OLAP tools, such as Birst, were designed to let end-users assemble sets of columns that they want, and have the engine generate the queries on their behalf – making the process more scalable to end-users.

In addition, some calculations must be done row by row and cannot be done on aggregated results. Consider a “Sales” metric which is “Quantity*Price”. If you want to get total sales, you need to calculate this metric for each row, then sum it – the database has to calculate this. If you perform this calculation at the report level and get a sum of Quantity and multiply by the average of Price, you will get the wrong result.

When you type a formula for a custom attribute or measure, you are creating a pre-aggregation expression. These are different from post-aggregation expressions or logical queries because the columns you are referring to are physical columns in the raw tables, not logical constructs that may exist in many physical tables. There is no notion of a dimension or a measure in this form. A raw table is simply a bunch of columns. You can construct a definition of a measure or attribute with these raw columns.

The range of things a database can do is limited. And the database has no notion of joins, dimensions, or hierarchies. The Birst logical layer has all of that. So when you want to do things like positional calculations, ranks, lookups, and so on, those things must occur in the logical layer of Birstpost-aggregation. This means after the database does 99% of the heavy lifting summarizing your billion records into a 100 row result set, then logical expressions can manipulate those 100 rows to make even more complex calculations.

Post-Aggregation Calculations

In the world of logical queries, you have concepts like dimension and measure and position and hierarchy as well as the BQL which provides programmatic structures to do advanced manipulation of these constructs. In this world, the Birst engine does all of the calculations and you have 100% control over any manipulation of the data. The only downside is that it is data post-aggregation, meaning after the database has done its work. Logical queries that return result sets to the middle tier are restricted in their size because moving data out of the database in volume gets slow after a certain point. This is why Birst has set an upper limit to 100,000 records coming back from the database.

Let’s take a simple example of a custom metric [Sales] defined as “Quantity*Price”. Let’s also consider another base metric “Quantity” and the base attribute “Year”.

If you create the logical query below to calculate average unit price:

SELECT [Time.Year],[Sales]/[Quantity] AS 'Average Unit Price' FROM [ALL]

You can see both types of calculations: one that is being executed in the database and one that is executed in Birst.

That query gets transformed into physical SQL that looks something like:

SELECT DW_DM_TIME_DAY.Year$, SUM(FACT.quantity*FACT.price),SUM(quantity) FROM FACT INNER JOIN DW_DM_TIME_DAY on FACT.DayID=DW_DM_TIME_DAY.DayID

You can see that the calculation for “Sales” is compiled directly into SQL – so therefore the syntax that supports this is restricted only to things that can be sent to the database. The result set comes back and then Birst takes each row and divides the “Sales” value by the “Quantity” value to get an average unit price by year. The expression syntax at the logical layer has a lot more functions because you can use BQL functions.

ETL Services

The language for report-level (post-aggregation) expressions uses the same engine as the ETL scripting engine. Therefore, the BQL is used for ETL as well as report-level expressions. The only difference is that some statements only function when running in a script (WRITERECORD, for example) and others only are allowed when working in a report (lookups, for example). The reason there are restrictions is that some ETL statements make no sense in a report expression and some report-level expressions could (if used incorrectly) deteriorate script level performance. But the fact that they are the same thing allows you to embed scripts in reports. That’s how FunctionLookup, Transform, Let and some of the report-level scripting elements work and is what makes Birst's report-level capabilities far beyond what other BI/OLAP tools can handle.

Where possible, Birst makes the syntax for the two different types of expressions the same. For example, the operators (+,-,/,*) are the same. Some functions like DATEADD are the same syntax because underlying databases support that syntax.

The ability to do pre-aggregation expressions distinguishes Birst from non-OLAP products. The ability to do complex scripting at the logical layer distinguishes Birst from OLAP products.