Automate lookup table maintenance when using multiple Excel sources in a PowerPivot model
October 23, 2012 1 Comment
When developing a PowerPivot model, one of the most important tasks is deciding which tables will serve as lookups. The urgency is due the fact that those tables define important elements as default granularity of aggregation as well as direction of filter context propagation(this last one explained here). Without valid lookups in your model, developing a PowerPivot report can be a bit frustrating as you constantly hit walls that arise when breaking fundamental rules of its calculation engine.
I explained here a solution for creating lookup tables when the model uses data that exists in Excel worksheets. The solution works well to satisfy one-time ad hoc reporting requests, or when building a BISM Tabular prototypes.
However, what if you want to avoid having to manually maintain the values that make up those lookup tables? By using a slight variation of the data merging method detailed by Debra Dalgleis here, we can achieve this automation. In this case, though, we won’t be merging files with identical structures but instead selecting common columns across datasets with the explicit objective of generating lookup tables.
Suppose we have two Excel file sources:
ORDERS1.XLSX
and ORDERS2.XLSX
Note that we have two files with different structure (column names, and total amount of columns). If we wanted to tie both tables in a PowerPivot model (for example, using the common Order Date column on both sides) we would get the famous error:
As already mentioned, what we need is to create a valid lookup table which serves as an intermediary between these two base tables (base, as neither one can serve as a lookup). In this case, instead of using the ‘remove duplicates’ feature in Excel, we are going to leverage the ability to use SQL syntax when querying multiple Excel files.
Add a connection on the Excel window (not the PowerPivot window), by browsing to the Excel file source
In the PowerPivot window, open the connection to the workbook by going to ‘Existing connections’, ‘Workbook Connections’:
When opening that workbook connection, change the friendly name of the in-memory table to ‘Date Lookup’.
Open the ‘Table Properties’ menu on the PowerPivot window, switch from ‘Table Preview’ to ‘Query Editor’ and change the query to:
SELECT [Order Date] FROM [Sheet1$]
UNION
SELECT [Order Date] FROM `C:\temp\Orders2.xlsx`.[Sheet1$]
With the result of this query, we can now designate a valid lookup table that is used to connect all other tables appropriately:
We could now generate DAX artifacts to consolidate values across the model. For example, the following calculated column can compute a Order Count across all dates (in both original tables):
=CALCULATE(
COUNTROWS( VALUES(Orders1[Order Number] ) )
+
COUNTROWS( VALUES( Orders2[Invoice Number] ) )
)
Note at this point that unlike the technique described by Debra, the structure of the files here was not identical: one dataset listed order numbers whereas the other one listed invoice numbers. Using a combination of custom SQL syntax and DAX, we are able to achieve the structure needed to consolidate both lists of values into a comprehensive list that serves as a perfect candidate for a lookup (dimensional) table in our model.
This method provides a way to better automate PowerPivot reports that get data outside of traditional corporate repositories. Many systems can be automated to generate Excel ‘data dumps’. Traditionally, this have been a challenge when dealing with PowerPivot models with clearly defined dimensional schemas. This technique, however, provides a viable method of maintaining values of lookup tables without the need for manual intervention — as new data is added to either file the load query is able to automatically capture all new values from both sides and in doing so, regenerate the lookup table.
Pingback: Profiling data in PowerPivot: comparing two tables « Javier Guillén