Grouping by an Aggregation in DAX

Business users really like the option of comparing category members against the average for all. Let’s review how to accomplish this in DAX.

image

Suppose you what to find out which months had an above average sales performance on the data above (tables come from AdventureWorksDW).  We must first create a new table in the model, to hold the categories we want. In PowerPivot, these values can imported as a linked table:

image 

The resulting model is the following:

image

As sales data is per day, but we want to find out the monthly average, we must use an expression that computes this aggregate at the correct level of granularity:

image

With this, we can now use a measure leveraging a conditional DAX statement to categorize the monthly output as needed:

image

Notice here we also compute at the monthly grain by use of the SUMX function iterating over the list of unique month names. This is important in order to allow for proper aggregation at the subtotal level.

image

The data model allows now to create an interesting visualization, using color as an indicator, to emphasize those months that are below average (to graph the dotted line, I added one more value to the linked table to allow for display of the computed average value itself):

image

Grouping on Quartiles

Another interesting example of is grouping data by quartiles.  In order to do this, I modified Colin Banfield’s quartile DAX expression to dynamically compute based on the “utility” linked table which has no relationships with other tables in the data model, and contains the needed multipliers to compute at each percentile value.

image

A final expression allocates the sales amount to the appropriate quartile column, and ensure the computations happens at the product level:

image

Here is the result:

image

Notice that subtotals are computed on quartiles appropriate for the current category in the filter context.  Same applies to the grand total.

In Power View, I cross referenced the sales amount (on X axis) with two additional, quartile based metrics: count of sold products per category (Y axis) and % sold products over all available category products (bubble size). 

You can see that the “Component” product category had lots of sold products (out the entire category) on any given quartile, however “Bikes” had more products with sales on the 4th quartile than it did on the second or third, and overall was the most profitable category.

image

Leveraging DAX query tables for staging Data Transformations

When designing a data warehousing solution, it is typical to allocate certain data transformations to the ETL layer, particularly those that would be easier or more efficient to do outside of the cube.  At the same time – prior to Excel 2013 release – developing multi-step data transformations in a Data Model was, whenever possible, a factor that increased the complexity (and lowered maintainability) of DAX formulas.

Interestingly though, using DAX query tables – a feature Kasper de Jonge blogged about some time ago here – we now have the ability to generate materialized intermediate tables that can be consumed back into the Data Model, allowing us to generate a process that includes not only a semantic layer but a degree of automated data transformations.

So, as an example, here is what we are going to do:

image

To be clear, the output of the transform (the intermediate or *staging* table) also lives in the Data Model.  Nevertheless, those of you familiar with the data warehousing process will notice a similarity to a typical integration process. Also note that I used linked tables here for simplicity, but other sources are possible as well.

The goal is to take the table below (‘Check-Ins’) and generate a report that breaks down total full hours versus partial hours worked on a daily basis.

image 

As you notice, data is not currently in a way in which is easy to count how many full or partial work hours where recorded.  It would be easier to un-pivot this data, and cross join it with a reference table to break it down by hours. As such, the following generic and re-usable lookup table (‘Time Ranges’) should be added to the Data Model:

image

After importing them as linked tables, we then generate a place-holder table from any of the tables in the model.  This can be done, from Excel, under Data –> Existing Connections – > Tables Tab:

image

At this point, a DAX transformation query can be used to generate a new output.  Right clicking on the table generated gives us the option to edit the DAX query used to generate the intermediate (or staging) table. We are interested in seeing, for each person in the source table, the break down of full and partial hours.  Here is the query (thanks to Laurent C for the ideas on this particular data transformation):

image

                                image

image 

                                                                                   image

image

This table breaks down the full and partial hours worked per person.  It can now be re-imported into the Data Model, and enhanced with additional calculated columns:

CalculatedColumn1 ( FullHours)

=IF([CheckedIn] = "12/30/1899 1:00:00 AM",1)

CalculatedColumn2 (PartialHours)

=IF([CheckedIn] <> "12/30/1899 1:00:00 AM",1)

Out of which two fully additive measures can then be generated:

image 

With data from this intermediate table, the final report is now easy to create:

image

DAX query tables are great but they don’t – by themselves – offer a way to solve an important question:  The sequencing needed to compute the output.  In addition to that, we have the issue of the amount of data refreshes needed:

image

Coding against the sheet object model, however, we can consolidate the refreshes into one while controlling the order of execution required by the calculation.  This is what makes the solutions so similar to an ETL process. We get one button control that can:

1) import all changes from the linked table sources

2) refresh the intermediate DAX table query and

3) reflect the changes on the report, in one pass while controlling the other of execution.  Here is the code:

image

New data can be added to the linked table source, and by clicking on a button (with the VBA RefreshReport macro assigned to it), one can perform the sequential refresh needed for the Data Model to consumes itself in the staging table.

With the particular scenario used here, the whole computation could have been done in a single DAX expression, however, this approach would prevent reusability of the staging table, having to recomputed it again for each measure as it is not materialized.

[Notice Jason and Brent have been added to the source linked table]

image

                                image 

image

An outstanding question remains on how to move these ‘Personal BI’ initiatives into a wider Enterprise model.  It is still necessary to consider BISM Tabular models strip out the Excel dependency of imported PowerPivot models, yet this technique requires that exact dependency to be present.

This article is a companion to Jason Thomas’ most recent blog entry, which uses this technique to generate a data model that allows for cross-dimension filtering.