Split DAX calculation steps to enhance memory utilization
August 30, 2013 1 Comment
An advantage of DAX measures over calculated columns is that they respond to user interaction. As such, you may be inclined to allocate all calculation logic into measures when there is a need to respond to slicer selection, for example.
From a pure performance point of view, this may not always be optimal however. Take the following example, in which a DAX developer intends to develop a metric for ‘most current sales amount’ per product (really, the last non empty sales amount per product). Using Adventure Works, you can see that each product may have different ‘last non empty dates’:
As such, a DAX measure to render the most current sales amount must first iterate on products prior to determining the sales total. In other words, it must determine – for each product – what was the most current order date and only aggregate the sales amount on that date. As a pure measure, this could be done as:
And that would give the expected result. However, from a computation perspective it may pose efficiency problems, particularly over big fact tables and scarce RAM resources due to the iterative command inside of the CALCULATE function.
If you think about it, the most current date with sales per product is not a changing variable but instead can be predetermined. However, the actual output could be dynamic (as the total could be filtered by user selection). So lets break the calculation in two steps:
- Determining most current sales date per product
- Aggregating based on user selection
The first one can be re-allocated to a calculated column, in effect pre-computing it:
CountRows ( Values ( FactResellerSales[OrderDateKey] ) ),
FactResellerSales[OrderDateKey] = Calculate (
Max ( FactResellerSales[OrderDateKey] ),
AllExcept ( FactResellerSales, FactResellerSales[ProductKey] )
leaving the second part as a measure (to allow aggregation based on user assigned filters):
The interesting part is we are not increasing much the memory footprint of the model itself, as the resulting values on the calculated column are either 1 or blank. However, we gain much on the way of calculation performance while still allowing for user interactive responses. This becomes clear, for example, when calculating ratios based on the DAX measure in question:
The impact in memory use is very positive, while the output accuracy remains the same. Comparing DAX physical query plans on both methods reveals a much simpler approach which relies heavily on the Storage Engine, gaining 73% in performance:
(13 milliseconds, cold cache):
In conclusion: consider pre-computing those calculation steps that do not change due to user interaction. Doing so may allow you to more efficiently compute numbers while more efficiently manage CPU utilization.