Creating lookup tables in PowerPivot using SQL queries
July 30, 2012 3 Comments
On the last post I showed how to model for a scenario in which two datasets must be merged. A custom lookup table was created using the ‘remove duplicates’ functionality in Excel. In this post I will walk through the same scenario, but this time using the SQL query wizard in PowerPivot to achieve the same result. This post is intended for power users who will benefit from creating robust PowerPivot data models.
First of all, lets clarify why this is useful: in order to create a robust PowerPivot report, it is sometimes necessary to create lookup tables to allow PowerPivot relationship rules to work seamlessly (as noted on this post). In some cases you may want to prototype this manually (as described on my prior post). However, once you have a final model you want to re-use, you may want to consider using SQL queries instead of manually removing duplicates, in order to benefit from a fully dynamic PowerPivot model that automatically updates lookup tables upon refresh.
SQL query example
I have used the same extract table samples as in the last post. In this scenario, we don’t want to generate a lookup ‘reference’ table manually, but instead relay on the underlying relational data source.
The modeling solution discussed on the last post is still valid as well: we must create two new lookup tables: Products and Dates.
Lets start with Products. Suppose I have already loaded both datasets from a SQL Server source. By clicking the ‘reuse existing connections’ option in the ribbon under the ‘design’ tab, we can re-connect to the source and use PowerPivot ‘Table Import Wizard’ to create the lookup table. Instead of selecting from a list of tables, I can use the option ‘Write a query that will specify the data to import.’ Even though I am telling PowerPivot I will write a query, there will be no need to write code at all.
Select the ‘design’ button:
At this point the SQL wizard opens. Here are the steps:
1. Select the Product ID field from Table 1 and the Category from Table 2:
2. Click on the ‘Group and Aggregate’ button on the top right corner
3. Expand the relationships tab
4. Deselect “Auto Detect” on that same tab. Click on the ‘add relationship’ button:
5. Click on the Left Table, specify Table 1. Then click on the Right Table and specify Table 2. Change the join type to ‘Full Outer’ (this is needed when creating lookups that span across multiple tables on the same underlying data source. Here is a good explanation of what joins do).
Finally, select the Join Fields by double clicking on the field and selecting Product ID (on Table 1) equals to Product ID (on Table 2). Here is how the window should look like:
What we have done is select all the Product IDs from Table 1 (as that table contained a comprehensive list of products), and all the Categories from Table 2 (as there were no category columns on the first table). The ‘grouped by’ option is needed to generate a distinct list of product/category combinations (remove duplicates), and the join was necessary to grab data from both tables.
Once finished, clicking will show you the query PowerPivot has written for you. At this point, you can rename it so it will have a more user friendly name in your data model:
At this point, I can now join the two initial datasets with this lookup table:
Again, it was necessary to do this in order to allow filtering of both initial datasets by a common category. Without this lookup table, it would not be possible (or really difficult) to report on both sets of data in a consolidated view.
Now lets go for the Dates lookup table. In this case there is an small extra challenge: neither table has the complete list of dates. So after we configure the wizard using the same technique above, we get the following:
which results on this in-memory table:
With DAX, we can easily overcome this. By creating a new calculated column, we can now have the complete list of unique days in only one place:
=IF([Table_1 Date] = BLANK(), [Table_2 Date], [Table_1 Date])
Further calculated columns can add more calendar levels for analysis, like month, quarter, year and so on.
The data model (in the PowerPivot design window) now should look like this (once the relationships have been created):
To be able to slice “Total Sales Amount” by Product or Date, we only need to do one extra step – create a DAX measure to add the Amounts from both extracts (Table_1 and Table_2, just was we did in the last blog post):
With this, you can now slice the consolidated amount (from both extracts) by any relevant Product or Date category with the added advantage that PowerPivot will update the lookup tables once a data refresh has been executed. In the end, not only the DAX calculations will be dynamic (based on query and filter context), but the data model will by dynamic based on the newly imported data.