Best practices
This section provides guidelines for using the pre-fetching Rules Engine.
Comments
Rules can have comments. There are two formats:- The single line format is //
- The multiple line format is /* */
Avoid constant values
Previously, a rule like this was used to suppress consolidations on consolidated cells:
['Price']=C:0;
The problem with this construct is that a constant value is used. For users, the value 0 often equates to no value. But OLAP treats the 0 as a value, that is, it allocates memory for the value and starts calculating with it. A better solution is this:
['Price']=C:#NA;
In this case the result is the same, you still have no values, but OLAP understands that there are no values and does not start to work with them.
Avoid error results
Avoid all invalid references that return an #ERROR result. To remember that a cell contains an #ERROR value, OLAP allocates memory. This memory should be preserved.
You can do this by checking dimension boundaries when you are using
DE.PREV
, DE.NEXT
, or
DE.SIBLING
functions in DB
or
GETATTR
formulas. Additionally, check the existence of elements
in the target dimension when you are mapping one dimension into another.
Here are some examples of how a check can be implemented in cases where you must access a value from a previous element:
IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )=0, #NA , DB(…,DE.PREV( 'dim' ,
!dim),…))
IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )>=1, DB(…,DE.PREV( 'dim' ,
!dim),…), #NA )
These rules apply:
[SOURCEDIM:'1st element',…]= #
[…] = DB(…,DE.PREV( 'dim' , !dim),…)
Avoid dynamic references
For best performance, do not use:
- Variable hypercube names in external cube data references.
- Variable dimension names in attribute data references.
- Variable field names in attribute references.
- Variable table IDs in attribute references.
Always use constant strings or constant numeric IDs.
Avoid unnecessary fixed members in cell references
The definition of a rule is always evaluated in the context of the cell that should be calculated. Therefore, you should omit in internal cell references all dimensions, except the ones where the context should change.
This is an example of bad practice:
['Actual','Q1','Starting Balance'] = ['Actual','January','Starting
Balance'];
Instead, the rule should look like this:
['Actual','Q1','Starting Balance'] = ['January'];
Actual
and Starting Balance
should be
omitted as they are taken from the context.
Use single-dimensional conditions
The determination of the exact source area is important for caching values for
faster calculation. When you use IF
statements,
the source area is typically dynamic, dependent on the result of a logical
expression. The more dimensions the IF
condition
includes, the more complex becomes the determination of the exact source area.
Therefore, the goal is to keep the conditional expression of an IF
statement one-dimensional.
In IF
statements, it is normal to check data
based on multiple dimensions: IF
the year is equal
to 2011 AND
the month is equal to January, then
one value shall be returned, or else another value. The AND
operator creates a multidimensional condition.
To avoid this multidimensional condition, you can split an IF
statement into multiple IF
statements. For example:
[] = IF( !year = '2011' AND !month = 'January' , [Value1] , [Value2]
);
This is not optimal because of the multidimensional AND
condition. It should be written as:
[] = IF( !year = '2011' , IF( !month = 'January' , [Value1] , [Value2] ) ,
[Value2] );
This is more effective as it has two one-dimensional conditions.
Simplify rules
Rules can be simplified by mathematical transformations.
Example 1:
['Revenue'] = ['Nominal Revenue']-['Discount Percent']/100*['Nominal
Revenue']
Here the value ['Nominal Revenue']
is listed twice. You can
transform this into:
['Revenue'] = ['Nominal Revenue'] * (1 -['Discount
Percent']/100)
Where the ['Nominal Revenue']
is moved in
front of the brackets to be multiplied with each of the parts inside the brackets.
Because of this change, the ['Nominal Revenue']
is
used only once in the formula.
Example 2:
['Payment'] = IF(!Months='December', ['Salary']*['Bonus'],
['Salary'])
Again, one element is used twice in the definition. Move it out of the
IF
statement to get the same result:
['Payment'] = ['Salary'] * IF(!Months='December', ['Bonus'], 1)
It is more efficient to write several basic rules than one complex rule.
Pre-fetch values
In certain situations, it cannot be determined in advance which values need to be retrieved. The creator of rules can help to pre-fetch those values that might be needed in advance of the calculation. For example, if you have this rule:
['Account 1'] = B:
IF(['Account 2'] < 1000],
['Account 3a'],
IF(['Account 2'] < 1000000,
['Account 3b'],
['Account 3c']
)
)
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])
A base level cell on Account 1 receives either the value of Account 3a, Account 3b, or Account 3c based on the value of Account 2.
If a billion base values must be calculated using this rule, then it should try to get all necessary source values for the calculation in one step. But this is not easily possible, as different cells should be taken dependent on the value in Account 2. Today, all values for Account 2 are fetched in one step. Based on the evaluation of Account 2, it fetches the values either from Account 3a, Account 3b or Account 3c one by one. This can be inefficient, depending on the kind of cells behind Account 3a-c. Basically, there can be two situations, or a mixture of both:
- Account 3a-c are low-cost calculated cells and base cells which require no calculation. In this case the cost of returning the values is low in terms of calculation time needed.
- Account 3a-c are calculated cells (complex calculations, expensive in terms of calculation time needed). In this case it depends on the underlying calculation if the calculation is expensive or low-cost.
If the retrieval of the cells is low-cost, then it is efficient to retrieve the values for Account 3a-c together with Account 2 in one quick step. This is done be adding a dummy operation that does not influence the result:
['Account 1'] = B:
IF(['Account 2'] < 1000],
['Account 3a'],
IF(['Account 2'] < 1000000,
['Account 3b'],
['Account 3c']
)
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])
The only difference to the initial rule is in the last line. Here the values Account 3a, Account 3b, and Account 3c are added up and then multiplied by #NA. This does not change the overall result of the cell. But it helps especially in the case that the formula needs to calculate with values that are stored in the OLAP cube, as the OLAP has a fast algorithm to fetch the values from the cube.
The same pre-fetching idea might speed up the system in the case of low-cost rule calculation, while it might decrease the performance in case of complicated that got referenced.
This also applies to the STET
rules function:
'Account 1']=B:
IF(
['Account 2']>0,
['Account 2']/12,
STET
)
+ #NA*STET
Currently, OLAP is unable to detect this situation and do this optimization by itself.
Use consolidations instead of rules
Calculations can be defined as consolidations inside a dimension or as a rule. It is more efficient to define a calculation as a consolidation inside a dimension than as a rule.
Where possible use consolidations instead of rules. For example, use the consolidation inside the dimension:
Q1
January
February
March
Instead of:
Q1 = January + February + March