Automate lookup table maintenance when using multiple Excel sources in a PowerPivot model

When developing a PowerPivot model, one of the most important tasks is deciding which tables will serve as lookups.  The urgency is due the fact that those tables define important elements as default granularity of aggregation as well as direction of filter context propagation(this last one explained here).  Without valid lookups in your model, developing a PowerPivot report can be a bit frustrating as you constantly hit walls that arise when breaking fundamental rules of its calculation engine.

I explained here a solution for creating lookup tables when the model uses data that exists in Excel worksheets.  The solution works well to satisfy one-time ad hoc reporting requests, or when building a BISM Tabular prototypes.

However, what if you want to avoid having to manually maintain the values that make up those lookup tables?  By using a slight variation of the data merging method detailed by Debra Dalgleis here, we can achieve this automation. In this case, though, we won’t be merging files with identical structures but instead selecting common columns across datasets with the explicit objective of generating lookup tables.

Suppose we have two Excel file sources:

ORDERS1.XLSX

image

and ORDERS2.XLSX

image

Note that we have two files with different structure (column names, and total amount of columns).  If we wanted to tie both tables in a PowerPivot model (for example, using the common Order Date column on both sides) we would get the famous error:

image

As already mentioned, what we need is to create a valid lookup table which serves as an intermediary between these two base tables (base, as neither one can serve as a lookup). In this case, instead of using the ‘remove duplicates’ feature in Excel, we are going to leverage the ability to use SQL syntax when querying multiple Excel files.

Add a connection on the Excel window (not the PowerPivot window), by browsing to the Excel file source

image

In the PowerPivot window, open the connection to the workbook by going to ‘Existing connections’, ‘Workbook Connections’:

image

When opening that workbook connection, change the friendly name of the in-memory table to ‘Date Lookup’.

Open the ‘Table Properties’ menu on the PowerPivot window, switch from ‘Table Preview’ to ‘Query Editor’ and change the query to:

SELECT [Order Date]  FROM [Sheet1$]
UNION
SELECT [Order Date] FROM `C:\temp\Orders2.xlsx`.[Sheet1$]

With the result of this query, we can now designate a valid lookup table that is used to connect all other tables appropriately:

image

We could now generate DAX artifacts to consolidate values across the model.  For example, the following calculated column can compute a Order Count across all dates (in both original tables):

=CALCULATE(

               COUNTROWS( VALUES(Orders1[Order Number] ) )

               +

                COUNTROWS( VALUES( Orders2[Invoice Number] ) )

)

image

Note at this point that unlike the technique described by Debra,  the structure of the files here was not identical: one dataset listed order numbers whereas the other one listed invoice numbers. Using a combination of custom SQL syntax and DAX, we are able to achieve the structure needed to consolidate both lists of values into a comprehensive list that serves as a perfect candidate for a lookup (dimensional) table in our model.

This method provides a way to better automate PowerPivot reports that get data outside of traditional corporate repositories. Many systems can be automated to generate Excel ‘data dumps’.  Traditionally, this have been a challenge when dealing with PowerPivot models with clearly defined dimensional schemas.  This technique, however, provides a viable method of maintaining values of lookup tables without the need for manual intervention — as new data is added to either file the load query is able to automatically capture all new values from both sides and in doing so, regenerate the lookup table.

About these ads

One Response to Automate lookup table maintenance when using multiple Excel sources in a PowerPivot model

  1. Pingback: Profiling data in PowerPivot: comparing two tables « Javier Guillén

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 42 other followers

%d bloggers like this: