Best Practices related to rules
This section provides some best practice guidelines for using the new rules engine.
Rules comments
Rules can have comments. There are two formats:- The Single line format is //
- The multiline format is /* */
We recommend that you always use comments for long rules.
You can include a short description of the rule in the description field.
Avoid constant values in rules
Previously, a rule like this was often used to suppress consolidations on consolidated cells:
['Price']=C:0;
The problem with this construct is that a constant value is used. For users, the value 0 often equates to 'no value'. But the OLAP Server treats the 0 as a value, that is, it allocates memory for the value and starts calculating with it.
A better solution is this:
['Price']=C:#NA;
In this case the result is the same - you still sees no values, but the OLAP Server understands that there are no values and does not start to work with them.
Avoid error results
Avoid all invalid references which will return an #ERROR result. To remember that a cell contains an #ERROR value, the OLAP Server allocates memory. This memory should be preserved.
You can do this by checking dimension boundaries when you are using DE.PREV, DE.NEXT, or DE.SIBLING functions in DB or GETATTR formulas. Additionally, check the existence of elements in the target dimension when you are mapping one dimension into another.
Here are some examples of how a check can be implemented in cases where you must access a value from a previous element:
IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )=0, #NA , DB(…,DE.PREV( 'dim' , !dim),…))
IF( DE.INDEX( 'SOURCEDIM', !SOURCEDIM )>=1, DB(…,DE.PREV( 'dim' , !dim),…), #NA )
These rules apply:
[SOURCEDIM:'1st element',…]= # […] = DB(…,DE.PREV( 'dim' , !dim),…)
Avoid dynamic references
For best performance, do not use:
- Variable hypercube names in external cube data references
- Variable dimension names in attribute data references
- Variable field names in attribute references
- Variable table IDs in attribute references
Always use constant strings or constant numeric IDs.
Avoid unnecessary fixed members in cell references
The definition of a rule is always evaluated in the context of the cell that should be calculated. Therefore, you should omit in internal cell references all dimensions, except the ones where the context should change.
This is an example of bad practice:
['Actual','Q1','Starting Balance'] = ['Actual','January','Starting Balance'];
Instead, the rule should look like this:
['Actual','Q1','Starting Balance'] = ['January'];
‘Actual’ and ‘Starting Balance’ should be omitted, as they are taken from the context.
Use single-dimensional conditions
The determination of the exact source area is important for caching values for faster calculation. When you use IF-statements, the source area is typically dynamic , dependent on the result of a logical expression. The more dimensions the IF-condition includes, the more complex becomes the determination of the exact source area. Therefore, the goal is to keep the conditional expression of an IF-statement one-dimensional.
In IF-Statements, it is normal to check data based on multiple dimensions: IF the year is equal to 2011 AND the month is equal to January, then one value shall be returned, or else another value. The AND operator creates a multi-dimensional condition.
To avoid this multi-dimensional condition, you can split an IF statement into multiple IF statements. For example:
[] = IF( !year = '2011' AND !month = 'January' , [Value1] , [Value2] );
This is not optimal because of the multi-dimensional AND condition. It should be rewritten as:
[] = IF( !year = '2011' , IF( !month = 'January' , [Value1] , [Value2] ) , [Value2]
);
This is more effective as it has two one-dimensional conditions.
Simplify the rules
Rules can be simplified by mathematical transformations.
Example 1:
['Revenue'] = ['Nominal Revenue']-['Discount Percent']/100*['Nominal Revenue']
Here the value [‘Nominal Revenue’] is listed twice. You can transform this into:
['Revenue'] = ['Nominal Revenue'] * (1 -['Discount Percent']/100)
Where the [‘Nominal Revenue’] is moved in front of the brackets to be multiplied with each of the parts inside the brackets. Because of this change, the [‘Nominal Revenue’] is used only once in the formula.
Example 2:
['Payment'] = IF(!Months='December', ['Salary']*['Bonus'], ['Salary'])
Again, one element is used twice in the definition. Move it out of the IF statement to get the same result:
['Payment'] = ['Salary'] * IF(!Months='December', ['Bonus'], 1)
Help OLAP Server to pre-fetch values
In certain situations OLAP Server cannot determine in advance which values need to be retrieved. The creator of rules can help OLAP Server to pre-fetch those values that might be needed in advance of the calculation.
For example, if you have this rule:
['Account 1'] = B:
IF(['Account 2'] < 1000],
['Account 3a'],
IF(['Account 2'] < 1000000,
['Account 3b'],
['Account 3c']
)
)
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])
A base level cell on Account 1 will receive either the value of Account 3a, Account 3b or Account 3c based on the value of Account 2.
If OLAP Server must calculate a billion base values using this rule, then it should try to get all necessary the source values for the calculation in one step. But this is not easily possible, as different cells should be taken dependent on the value in Account 2. Today, OLAP Server fetches all values for Account 2 in one step. Based on the evaluation of Account 2, it will fetch the values either from Account 3a, Account 3b or Account 3c one by one. This might be inefficient, dependant on the kind of cells behind Account 3a-c. Basically, there can be two situations, or a mixture of both:
- Account 3a-c are low-cost calculated cells, and base cells, which require no calculation. In this case the cost of returning the values is low in terms of calculation time needed.
- Account 3a-c are calculated cells (complex calculations, expensive in terms of calculation time needed). In this case it depends on the underlying calculation if the calculation is expensive or low-cost.
If the retrieval of the cells is low-cost, then it would be efficient to retrieve the values for Account 3a-c together with Account 2 in one quick step. This can easily be done be adding a dummy operation that does not influence the result:
['Account 1'] = B:
IF(['Account 2'] < 1000],
['Account 3a'],
IF(['Account 2'] < 1000000,
['Account 3b'],
['Account 3c']
)
)
+ #NA * (['Account 3a']+['Account 3b']+['Account 3c'])
The only difference to the initial rule is in the last line. Here the values Account 3a, Account 3b and Account 3c are added up and then multiplied by #NA. This does not change the overall result of the cell. But it helps especially in the case that the formula needs to calculate with values that are stored in the OLAP Server cube, as OLAP Server has a fast algorithm to fetch the values from the cube.
The same pre-fetching idea might speed up the system in the case of low-cost rule calculation, while it might decrease the performance in case of complicated that got referenced.
This also applies to the rules function STET:
'Account 1']=B: IF( ['Account 2']>0, ['Account 2']/12, STET ) + #NA*STET
Currently, OLAP Server is unable to detect this situation and do this optimization by itself. We might find later a way to implement this or give the creator of the model a way to define which values should be pre-fetched.
Use Consolidations instead of Rules
Calculations can be defined in the OLAP Server in different ways, as consolidations inside a dimension or as a rule. It is more efficient to define a calculation as a consolidation inside a dimension than as a rule.
Where possible use consolidations instead of rules. For example:
Use the consolidation inside the dimension:
Q1 January February March
Instead of:
Q1 = January + February + March