Calculating aggregate tables in BISM Tabular
September 28, 2011 Leave a comment
One of the most interesting functions in Denali’s DAX implementation is the Summarize() function. To see how this can be of great use, imagine the following table called Payments:
Suppose we wanted to know what the employee monthly payment average is. To calculate this, we decide we might want to start by calculating the monthly payment average as a first step. In order to do this, we must first add the values for each month and then calculate the average. So we try the following DAX query in SSMS:
The number is obviously incorrect. Yes, it is true that Values() will return a table with only two rows (one representing each date). However, this one-column table has no knowledge of the [PaidAmount] column or how to related it to date values, hence the SUM is executed over the entire table resulting in the following…
… for a final average output of 480. Another option, if using PowerPivot to consume the tabular model, is to create an aggregate table and reimport it into PowerPivot. How? By creating a pivot table based on [Date] and [PaidAmount], the values are automatically aggregated:
Unfortunately, a pivot table cannot be directly re-imported into the PowerPivot window, but we can do this by converting to formulas (In the Excel ribbon, go to PivotTableTools, OLAP Tools, Convert to Formulas). After the pivot table cells have been replaced with a CUBE formula reference, go to the PowerPivot menu and click on ‘Create Linked Table’. For some mysterious reason, doing this will result in measure cells losing its reference as the header is now static text. This is easy to fix though: remap the formula reference to appropriate tabular measure:
Once the table is imported in the PowerPivot model, we can now issue a simple DAX calculation:
AVERAGE( [Sum of PaidAmount] )
The output is 240, which is correct: it worked because the table over which the calculation acts is now at the appropriate level of granularity.
There are a few disadvantages on this approach, however. First, it relies on PowerPivot linked table feature, so it is a client operation. Second, the re-import process required remapping CUBE functions. And third, what if the base table contained hundreds of millions of rows? The solution above would not scale.
In BISM Tabular, we can do all this in-memory as part of a calculation. Let’s try it with the same example illustrated above. Connect SSMS to the Tabular model and try the following DAX query:
That’s all. We have now our calculated aggregate table. Monthly average is now quite simple:
Unlike our first attempt using Values(), our new summarized table has a reference to the aggregated [PaidAmount] column.
Going back to our original attempt of calculating the employee monthly payment average, you may have already noticed there are two entries for Mark on 1/1/2011. In order to properly compute the average, we must take this into account and only count distinct employees by month. Here is the final calculation:
Hopefully this post will have demonstrated how powerful this DAX functionality can be, mainly when requiring technical aggregate tables which are only an intermediate step in the process of arriving to a specific output.