Automate lookup table maintenance when using multiple Excel sources in a PowerPivot model

When developing a PowerPivot model, one of the most important tasks is deciding which tables will serve as lookups.  The urgency is due the fact that those tables define important elements as default granularity of aggregation as well as direction of filter context propagation(this last one explained here).  Without valid lookups in your model, developing a PowerPivot report can be a bit frustrating as you constantly hit walls that arise when breaking fundamental rules of its calculation engine.

I explained here a solution for creating lookup tables when the model uses data that exists in Excel worksheets.  The solution works well to satisfy one-time ad hoc reporting requests, or when building a BISM Tabular prototypes.

However, what if you want to avoid having to manually maintain the values that make up those lookup tables?  By using a slight variation of the data merging method detailed by Debra Dalgleis here, we can achieve this automation. In this case, though, we won’t be merging files with identical structures but instead selecting common columns across datasets with the explicit objective of generating lookup tables.

Suppose we have two Excel file sources:

ORDERS1.XLSX

image

and ORDERS2.XLSX

image

Note that we have two files with different structure (column names, and total amount of columns).  If we wanted to tie both tables in a PowerPivot model (for example, using the common Order Date column on both sides) we would get the famous error:

image

As already mentioned, what we need is to create a valid lookup table which serves as an intermediary between these two base tables (base, as neither one can serve as a lookup). In this case, instead of using the ‘remove duplicates’ feature in Excel, we are going to leverage the ability to use SQL syntax when querying multiple Excel files.

Add a connection on the Excel window (not the PowerPivot window), by browsing to the Excel file source

image

In the PowerPivot window, open the connection to the workbook by going to ‘Existing connections’, ‘Workbook Connections’:

image

When opening that workbook connection, change the friendly name of the in-memory table to ‘Date Lookup’.

Open the ‘Table Properties’ menu on the PowerPivot window, switch from ‘Table Preview’ to ‘Query Editor’ and change the query to:

SELECT [Order Date]  FROM [Sheet1$]
UNION
SELECT [Order Date] FROM `C:\temp\Orders2.xlsx`.[Sheet1$]

With the result of this query, we can now designate a valid lookup table that is used to connect all other tables appropriately:

image

We could now generate DAX artifacts to consolidate values across the model.  For example, the following calculated column can compute a Order Count across all dates (in both original tables):

=CALCULATE(

               COUNTROWS( VALUES(Orders1[Order Number] ) )

               +

                COUNTROWS( VALUES( Orders2[Invoice Number] ) )

)

image

Note at this point that unlike the technique described by Debra,  the structure of the files here was not identical: one dataset listed order numbers whereas the other one listed invoice numbers. Using a combination of custom SQL syntax and DAX, we are able to achieve the structure needed to consolidate both lists of values into a comprehensive list that serves as a perfect candidate for a lookup (dimensional) table in our model.

This method provides a way to better automate PowerPivot reports that get data outside of traditional corporate repositories. Many systems can be automated to generate Excel ‘data dumps’.  Traditionally, this have been a challenge when dealing with PowerPivot models with clearly defined dimensional schemas.  This technique, however, provides a viable method of maintaining values of lookup tables without the need for manual intervention — as new data is added to either file the load query is able to automatically capture all new values from both sides and in doing so, regenerate the lookup table.

Advertisements

4 Responses to Automate lookup table maintenance when using multiple Excel sources in a PowerPivot model

  1. Pingback: Profiling data in PowerPivot: comparing two tables « Javier Guillén

  2. Pingback: Target well-formed Data Models when using Power Query | Javier Guillén

  3. Jeroen says:

    -This is only possible if there are no duplicates in the OrderDate of the table Dates Lookup.
    – Isn’t it easier to create a timetable?

  4. javierguillen says:

    Hi Jeroen

    Using the SQL function “UNION” in the query, as described on the post, has the automatic effect of eliminating duplicates.

    Creating a timetable is actually the idea of the technique described, in this case is referred to as Dates Lookup. It is usually a good practice to source your time table from your actual data, so as not to have dates on it that have no fact data. Not doing so can have undesirable effects like, for example, listing all dates on a slicer (regardless of if there was data for them or not).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: