Best practices

This section provides guidelines for using the pre-fetching Rules Engine.

Comments

Rules can have comments. There are two formats:
  • The single line format is //
  • The multiple line format is /* */

Avoid constant values

Previously, a rule like this was used to suppress consolidations on consolidated cells:

['Price']=C:0;

The problem with this construct is that a constant value is used. For users, the value 0 often equates to no value. But OLAP treats the 0 as a value, that is, it allocates memory for the value and starts calculating with it. A better solution is this:

['Price']=C:#NA;

In this case the result is the same, you still have no values, but OLAP understands that there are no values and does not start to work with them.

Avoid error results

Avoid all invalid references that return an #ERROR result. To remember that a cell contains an #ERROR value, OLAP allocates memory. This memory should be preserved.

You can do this by checking dimension boundaries when you are using DE.PREV, DE.NEXT, or DE.SIBLING functions in DB or GETATTR formulas. Additionally, check the existence of elements in the target dimension when you are mapping one dimension into another.

Here are some examples of how a check can be implemented in cases where you must access a value from a previous element:

IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )=0, #NA , DB(…,DE.PREV( 'dim' , !dim),…))

IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )>=1, DB(…,DE.PREV( 'dim' , !dim),…), #NA )

These rules apply:

[SOURCEDIM:'1st element',…]= #
[…] = DB(…,DE.PREV( 'dim' , !dim),…)

Avoid dynamic references

For best performance, do not use:

  • Variable hypercube names in external cube data references.
  • Variable dimension names in attribute data references.
  • Variable field names in attribute references.
  • Variable table IDs in attribute references.

Always use constant strings or constant numeric IDs.

Avoid unnecessary fixed members in cell references

The definition of a rule is always evaluated in the context of the cell that should be calculated. Therefore, you should omit in internal cell references all dimensions, except the ones where the context should change.

This is an example of bad practice:

['Actual','Q1','Starting Balance'] = ['Actual','January','Starting Balance'];

Instead, the rule should look like this:

['Actual','Q1','Starting Balance'] = ['January'];

Actual and Starting Balance should be omitted as they are taken from the context.

Use single-dimensional conditions

The determination of the exact source area is important for caching values for faster calculation. When you use IF statements, the source area is typically dynamic, dependent on the result of a logical expression. The more dimensions the IF condition includes, the more complex becomes the determination of the exact source area. Therefore, the goal is to keep the conditional expression of an IF statement one-dimensional.

In IF statements, it is normal to check data based on multiple dimensions: IF the year is equal to 2011 AND the month is equal to January, then one value shall be returned, or else another value. The AND operator creates a multidimensional condition.

To avoid this multidimensional condition, you can split an IF statement into multiple IF statements. For example:

[] = IF( !year = '2011' AND !month = 'January' , [Value1] , [Value2] );

This is not optimal because of the multidimensional AND condition. It should be written as:

[] = IF( !year = '2011' , IF( !month = 'January' , [Value1] , [Value2] ) , [Value2] );

This is more effective as it has two one-dimensional conditions.

Simplify rules

Rules can be simplified by mathematical transformations.

Example 1:

['Revenue'] = ['Nominal Revenue']-['Discount Percent']/100*['Nominal Revenue']

Here the value ['Nominal Revenue'] is listed twice. You can transform this into:

['Revenue'] = ['Nominal Revenue'] * (1 -['Discount Percent']/100)

Where the ['Nominal Revenue'] is moved in front of the brackets to be multiplied with each of the parts inside the brackets. Because of this change, the ['Nominal Revenue'] is used only once in the formula.

Example 2:

['Payment'] = IF(!Months='December', ['Salary']*['Bonus'], ['Salary'])

Again, one element is used twice in the definition. Move it out of the IF statement to get the same result:

['Payment'] = ['Salary'] * IF(!Months='December', ['Bonus'], 1)

It is more efficient to write several basic rules than one complex rule.

Pre-fetch values

In certain situations, it cannot be determined in advance which values need to be retrieved. The creator of rules can help to pre-fetch those values that might be needed in advance of the calculation. For example, if you have this rule:

['Account 1'] = B:
IF(['Account 2'] < 1000],
   ['Account 3a'],
   IF(['Account 2'] < 1000000,
      ['Account 3b'],
      ['Account 3c']
   )
)
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])

A base level cell on Account 1 receives either the value of Account 3a, Account 3b, or Account 3c based on the value of Account 2.

If a billion base values must be calculated using this rule, then it should try to get all necessary source values for the calculation in one step. But this is not easily possible, as different cells should be taken dependent on the value in Account 2. Today, all values for Account 2 are fetched in one step. Based on the evaluation of Account 2, it fetches the values either from Account 3a, Account 3b or Account 3c one by one. This can be inefficient, depending on the kind of cells behind Account 3a-c. Basically, there can be two situations, or a mixture of both:

  1. Account 3a-c are low-cost calculated cells and base cells which require no calculation. In this case the cost of returning the values is low in terms of calculation time needed.
  2. Account 3a-c are calculated cells (complex calculations, expensive in terms of calculation time needed). In this case it depends on the underlying calculation if the calculation is expensive or low-cost.

If the retrieval of the cells is low-cost, then it is efficient to retrieve the values for Account 3a-c together with Account 2 in one quick step. This is done be adding a dummy operation that does not influence the result:

['Account 1'] = B:
IF(['Account 2'] < 1000],
   ['Account 3a'],
   IF(['Account 2'] < 1000000,
      ['Account 3b'],
      ['Account 3c']
   )
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])

The only difference to the initial rule is in the last line. Here the values Account 3a, Account 3b, and Account 3c are added up and then multiplied by #NA. This does not change the overall result of the cell. But it helps especially in the case that the formula needs to calculate with values that are stored in the OLAP cube, as the OLAP has a fast algorithm to fetch the values from the cube.

The same pre-fetching idea might speed up the system in the case of low-cost rule calculation, while it might decrease the performance in case of complicated that got referenced.

This also applies to the STET rules function:

'Account 1']=B:
IF(
   ['Account 2']>0, 
   ['Account 2']/12, 
   STET
) 
+ #NA*STET

Currently, OLAP is unable to detect this situation and do this optimization by itself.

Use consolidations instead of rules

Calculations can be defined as consolidations inside a dimension or as a rule. It is more efficient to define a calculation as a consolidation inside a dimension than as a rule.

Where possible use consolidations instead of rules. For example, use the consolidation inside the dimension:

Q1
   January
   February
   March

Instead of:

Q1 = January + February + March