Leveraging DAX query tables for staging Data Transformations
January 15, 2013 3 Comments
When designing a data warehousing solution, it is typical to allocate certain data transformations to the ETL layer, particularly those that would be easier or more efficient to do outside of the cube. At the same time – prior to Excel 2013 release – developing multi-step data transformations in a Data Model was, whenever possible, a factor that increased the complexity (and lowered maintainability) of DAX formulas.
Interestingly though, using DAX query tables – a feature Kasper de Jonge blogged about some time ago here – we now have the ability to generate materialized intermediate tables that can be consumed back into the Data Model, allowing us to generate a process that includes not only a semantic layer but a degree of automated data transformations.
So, as an example, here is what we are going to do:
To be clear, the output of the transform (the intermediate or *staging* table) also lives in the Data Model. Nevertheless, those of you familiar with the data warehousing process will notice a similarity to a typical integration process. Also note that I used linked tables here for simplicity, but other sources are possible as well.
The goal is to take the table below (‘Check-Ins’) and generate a report that breaks down total full hours versus partial hours worked on a daily basis.
As you notice, data is not currently in a way in which is easy to count how many full or partial work hours where recorded. It would be easier to un-pivot this data, and cross join it with a reference table to break it down by hours. As such, the following generic and re-usable lookup table (‘Time Ranges’) should be added to the Data Model:
After importing them as linked tables, we then generate a place-holder table from any of the tables in the model. This can be done, from Excel, under Data –> Existing Connections – > Tables Tab:
At this point, a DAX transformation query can be used to generate a new output. Right clicking on the table generated gives us the option to edit the DAX query used to generate the intermediate (or staging) table. We are interested in seeing, for each person in the source table, the break down of full and partial hours. Here is the query (thanks to Laurent C for the ideas on this particular data transformation):
This table breaks down the full and partial hours worked per person. It can now be re-imported into the Data Model, and enhanced with additional calculated columns:
CalculatedColumn1 ( FullHours)
=IF([CheckedIn] = "12/30/1899 1:00:00 AM",1)
=IF([CheckedIn] <> "12/30/1899 1:00:00 AM",1)
Out of which two fully additive measures can then be generated:
With data from this intermediate table, the final report is now easy to create:
DAX query tables are great but they don’t – by themselves – offer a way to solve an important question: The sequencing needed to compute the output. In addition to that, we have the issue of the amount of data refreshes needed:
Coding against the sheet object model, however, we can consolidate the refreshes into one while controlling the order of execution required by the calculation. This is what makes the solutions so similar to an ETL process. We get one button control that can:
1) import all changes from the linked table sources
2) refresh the intermediate DAX table query and
3) reflect the changes on the report, in one pass while controlling the other of execution. Here is the code:
New data can be added to the linked table source, and by clicking on a button (with the VBA RefreshReport macro assigned to it), one can perform the sequential refresh needed for the Data Model to consumes itself in the staging table.
With the particular scenario used here, the whole computation could have been done in a single DAX expression, however, this approach would prevent reusability of the staging table, having to recomputed it again for each measure as it is not materialized.
[Notice Jason and Brent have been added to the source linked table]
An outstanding question remains on how to move these ‘Personal BI’ initiatives into a wider Enterprise model. It is still necessary to consider BISM Tabular models strip out the Excel dependency of imported PowerPivot models, yet this technique requires that exact dependency to be present.
This article is a companion to Jason Thomas’ most recent blog entry, which uses this technique to generate a data model that allows for cross-dimension filtering.