Aggregates and Pre-Aggregation

Pre-aggregated data speeds up run-time query performance in two ways:

  • An aggregate can be created at a higher level than base data, thus allowing calculations to span fewer rows when running. This reduction in run-time row count is called compression. Generally a compression of at least 10 to 1 creates a meaningful increase in performance.

  • An aggregate can pre-join two large tables, allowing run-time queries to avoid expensive joins. In general, this is only useful for large table to large table joins (e.g., large fact to large dimension).

Aggregates come at a price:

  • Longer load times: The aggregates need to be calculated after every load of data. Load times can be mitigated using incremental refreshes of aggregates.

  • Storage: Aggregates take space and can bloat a database.

Aggregates can be layered. For example, if there is a fact table with 250 million records, you can create an aggregate that has 25 million records and another that has 200,000. Birst will automatically use the highest level aggregate available.

Query aggregation provides a very simple mechanism for creating and instantiating aggregates. The principal is simple: supply a logical query and Birst will instantiate that logical query as a physical table. Birst will then consider using this physical table to satisfy queries. Note that all column aliases in the logical query should be removed.

Consider the following logical query for the Northwind data set:

SELECT [Products.ProductName],[Time.Year/Quarter],[OrderDate: Sum: Quantity] FROM [ALL]

The next time the space is processed, Birst will create the physical table.

See Creating an Aggregate.

Query Navigation

To see how Birst decides what tables to pick, it is helpful to use the Navigate Query function. Query navigation enables you to identify reasons why a logical query may or may not be hitting an aggregate table as opposed to a base table. You can disable an aggregate (so that it is not used) and paste the query you are using for the aggregate into the Query Admin and click Navigate Query then compare those results with the results you get if you enable the aggregate. See Reading Query Navigation Results.

Joins to Dimension Tables

There are situations where a hybrid approach is preferred. Generally this is the case when there is a very wide dimension table and it is not desirable to duplicate the entire dimension within the aggregate. In this case, it is possible to allow the aggregate table to join to a dimension table. In order to do this, you must add the surrogate key for that dimension table to the query aggregate query. Contact your Infor representative to obtain the surrogate key of the table.

Security Filters

Aggregates also support data security filters. If a security filter is applied to a base table, the aggregate will pull any columns from the security filter into the aggregate (whether or not they are in the logical query). If, however, security filters are applied to columns that are below the grain of the logical query, then the aggregate may not filter correctly. In this case, make sure that any columns referenced in a security filter are also in the logical query.

Dynamic Parameters or Session Variables

As with security filters, if an aggregate is based on a custom formula that includes a dynamic parameter or a session variable, then the aggregate needs to compensate for that. Since the value of the formula is dynamic (based on values not known at the time the aggregate is built), the aggregate will pull all the component columns into the aggregate table and preserve the formula.

Partial Aggregates

In some cases a full aggregate is not needed. It is preferable to have a small aggregate that is very fast for recent data, but it is acceptable to have slower performance for much older data. For example, suppose we added a filter to the query aggregate to include data only from 2006 onward:

SELECT [Products.Products1249892458],[Time.Year/Quarter],[OrderDate: Sum: Quantity] FROM [ALL] WHERE [Time.Year/Quarter]>='2006/Q1'

Now, if we have a logical query where the filters are a subset of this data, then the query hits the aggregate. For example:

SELECT [Products.Products1249892458],[Time.Year/Quarter],[OrderDate: Sum: Quantity] FROM [ALL] WHERE [Time.Year/Quarter]>='2006/Q2'>

Processing Groups

Aggregates can be processed either at the end of the load or as part of a pre-defined processing group. In both cases, the aggregate will be executed at the end of processing ensuring that it hits the most recent data.

The processing group for an aggregate can be a processing group associated with a data source. Alternatively, it can be a separate processing group just for aggregates.

Tip: When you initially create an aggregate in Admin- Customize Space - Manage Aggregates, and you want to add it to a processing group, it must be an existing data processing group. Later you can change it to be part of an independent, aggregate-specific processing group.

To assign an aggregate to a processing group, provide the processing group name in the Add Aggregate dialog. See Creating an Aggregate.

The /REBUILD option can be used to drop the aggregate table and build the table entirely from scratch, as opposed to the normal process which adds new records to the existing aggregate table. To rebuild the aggregate from scratch, add /REBUILD to the end of the processing group name (with no spaces). For example:

MY_PROCESSING_GROUP/REBUILD

Incremental Aggregates

Birst supports the ability to add a logical filter as an incremental aggregate filter. During each load, this filter will be applied to a delete against the aggregate (that is, all records that satisfy the filter will be deleted). Then, an insert of new records will be applied for records that satisfy the filter. This gives you flexibility for incrementally updating an aggregate. It allows for cases where you may want to not only update today’s data, but you may want to update the aggregate for the last few weeks or months as hierarchies or values may result in changing of previously aggregated data.

Build Aggregates

A build filter adds an additional filter outside of the logical query to the aggregate process. For example, if you only want the aggregate to have data for the last year, you would use this filter. If you didn’t do this and added the filter to the aggregate query, any time you didn’t put a filter on a query, it would miss the aggregate.