Target well-formed Data Models when using Power Query

Even though Power Query gives us the ability to generate impressive data transformations, is up to the model author to define a well formed Power Pivot Data Model that respects relationship principles. In the past, I blogged about 3 options to ‘merge’ data using Power Pivot:

  • Issuing a SQL-like query to Excel data sources (here)
  • Issuing a SQL query to relational sources (here)
  • Using Excel inherent capabilities (here)
    All of the options above required applying Power Pivot data modeling principles which in essence implied the creation of two additional ‘lookup’ (dimension) tables to connect the existing fact data sets.

Using Power Query, the logic is similar although we use the M language.  In this blog entry, I will be also using the same small dataset which, even though its tiny, exemplifies the typical issues encountered by analysts when attempting to merge data (duplicate key records – absent lookup tables, missing values and so on).

The first step is to generate 2 queries and directly load them into the model:

image

                                                                                                           image

To properly establish a relationship we must now extract two lookup tables from the data given:

  • A date lookup containing all dates across both datasets
  • a product lookup containing all products across both tables and assign an unknown value to missing categories

LOOKUP # 1 (Date Lookup)

The first one can be done with the following M query -

let
    Table1Dates =
        Table.FromList(
            Table.Column(
                Table.TransformColumnTypes( Table1, { "Date", type text } ), "Date" 
                )
            ) ,
    Table2Dates =
        Table.FromList(
            Table.Column(
                Table.TransformColumnTypes( Table2, { "Date ", type text } ), "Date " 
            )
        ) ,
    CombinedTables = Table.Combine ( { Table1Dates, Table2Dates } ),
    DuplicatesRemoved = Table.Distinct( CombinedTables ),
    ChangedType = Table.TransformColumnTypes(DuplicatesRemoved,{{"Column1", type datetime}}),
    RenamedColumns = Table.RenameColumns(ChangedType,{{"Column1", "DateKey"}})
in
    RenamedColumns

Once in the model, we can apply a DAX calculated column to extract the month name that we want to use to slice and dice:

image

 

LOOKUP # 2 (Product Lookup)

The second lookup table can be generated using Power Query with the following script:

let
    Source =
        Table.Join(
            Table1,{"Product ID"},
            Table.RenameColumns(
                Table2,{ "Product ID", "Product ID2"}
                ),{"Product ID2"},
            JoinKind.FullOuter
            ),
    RemovedColumns = Table.RemoveColumns(Source,{"Amount", "Date", "Amount ", "Date "}),
    DuplicatesRemoved = Table.Distinct( RemovedColumns ),
    ConsolidateProductKey = Table.AddColumn(
                                DuplicatesRemoved,
                                "ProductKey",
                                each if ([Product ID] = null) then [Product ID2] else [Product ID]
                            ),
    RemoveOriginalProductKeys = Table.RemoveColumns(ConsolidateProductKey ,{"Product ID", "Product ID2"}),
    AssignUnknownCategory = Table.AddColumn(
                                RemoveOriginalProductKeys ,
                                "Custom",
                                each if( [Category] = null ) then "Uncategorized" else [Category]
                                ),
    FinalTable = Table.RemoveColumns(AssignUnknownCategory ,{"Category"})
in
    FinalTable

With the following result:

image

With this, we can finalize the Power Pivot Data Model using appropriate lookups and one final DAX measure:

image

The resulting model is well-formed and can slice two tables that by themselves could not be reconciled directly in the Data Model:

image

 

Is this the only way?

One can argue that rather that developing a well formed data model including dimension tables it may be simpler to append one table to the other one and have the Data Model consume it as a single table.  In fact, generating a single table off the data shown is quite possible using Power Query.

However, single table data models have challenges that make their use not ideal:

  • Calculations must rely on a single base level of granularity which may not properly reflect the business model and make require workarounds and ill performing computations.
  • Given the limitation outlined above, it may be necessary to create multiple copies of the data – each targeting different granularities – with unwanted consequence of having multiple versions of the same metadata
  • Reporting tools using parameters must execute SQL DISTINCT like expressions over dimension data to remove duplicates on parameter lists, most likely affecting performance and report usability
  • Advanced modeling scenarios like many to many cannot be represented

Finally, in closer examination even appending won’t do the trick in cases like the one discussed as it will fail to properly consolidate dimensions. Using the standard appending functionality, Power Query will leave, on the data used here, two designations for Product A: one with a category assigned to it, and the other one without.

image

An assumption could be that Product A *always* belongs to the same category ("C.A").  Data Analysts, as Subject Matter Experts, can confirm if such assumptions are correct.  In the example above we assumed it was, and further modeling and consolidation was needed.  The problem with the source files was that one of the columns – Category, in this case – was omitted on Dataset # 1. We ‘consolidated’ the product definitions via the M language, issuing a full outer join over the two datasets.

With this in mind, be careful on your approach to modeling of Power BI solutions.  Simplicity on Data Model development is important, but oversimplifying can harm more than help.

You can access the demo workbook here: PQ_and_data_modeling.xlsx 

Update 9/24/13: Added more details to the explanation of why appending tables should be avoided in cases like the one described.

About these ads

2 Responses to Target well-formed Data Models when using Power Query

  1. Pingback: Datenmodelle und Power Query | MS Excel | Power Pivot | DAX

  2. Pingback: Power Query for Excel – on the rise | Microsoft Excel and Access Experts Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 67 other followers

%d bloggers like this: