Split DAX calculation steps to enhance memory utilization

An advantage of DAX measures over calculated columns is that they respond to user interaction.  As such, you may be inclined to allocate all calculation logic into measures when there is a need to respond to slicer selection, for example.

From a pure performance point of view, this may not always be optimal however.  Take the following example, in which a DAX developer intends to develop a metric for ‘most current sales amount’ per product (really, the last non empty sales amount per product).  Using Adventure Works, you can see that each product may have different  ‘last non empty dates’: 

image

As such, a DAX measure to render the most current sales amount must first iterate on products prior to determining the sales total.  In other words, it must determine – for each product – what was the most current order date and only aggregate the sales amount on that date.  As a pure measure, this could be done as:

image

And that would give the expected result.  However, from a computation perspective it may pose efficiency problems, particularly over big fact tables and scarce RAM resources due to the iterative command inside of the CALCULATE function.

If you think about it, the most current date with sales per product is not a changing variable but instead can be predetermined.  However, the actual output could be dynamic (as the total could be filtered by user selection).  So lets break the calculation in two steps:

  • Determining most current sales date per product
  • Aggregating based on user selection

The first one can be re-allocated to a calculated column, in effect pre-computing it:

Calculate (
  CountRows ( Values ( FactResellerSales[OrderDateKey] ) ),
Filter (
  AllExcept (
        FactResellerSales,
        FactResellerSales[ProductKey],
        FactResellerSales[OrderDateKey] ),
        FactResellerSales[OrderDateKey] = Calculate (
                               Max ( FactResellerSales[OrderDateKey] ),
                               AllExcept ( FactResellerSales, FactResellerSales[ProductKey] )
                             )
               )
)

image 

leaving the second part as a measure (to allow aggregation based on user assigned filters):

image

The interesting part is we are not increasing much the memory footprint of the model itself, as the resulting values on the calculated column are either 1 or blank.  However, we gain much on the way of calculation performance while still allowing for user interactive responses.  This becomes clear, for example, when calculating ratios based on the DAX measure in question:

image

The impact in memory use is very positive, while the output accuracy remains the same.  Comparing DAX physical query plans on both methods reveals a much simpler approach which relies heavily on the Storage Engine, gaining 73% in performance:

(13 milliseconds, cold cache):

image 

In conclusion: consider pre-computing those calculation steps that do not change due to user interaction.  Doing so may allow you to more efficiently compute numbers while more efficiently manage CPU utilization.

About these ads

One Response to Split DAX calculation steps to enhance memory utilization

  1. Pingback: DAX context propagation, inactive relationships and calculated columns | Javier Guillén

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 60 other followers

%d bloggers like this: