OFFSET double lookup in PowerPivot

One of those really great functions in the native Excel environment is OFFSET.  In combination with other functions like MATCH, it can generate a type of ‘double lookup’ that can be really helpful when creating dynamic reports.  For example, the table below describes a matrix defining a coordinate of units for each combination of ‘Band’ and ‘Group’.

image

We want to use this table as a lookup reference when generating matches for the following values:

image

Take for example, the first row.   We first look at the ‘Band’ value which is B in this case and find it in our lookup table.  Then we go down the rows and, using an approximate match logic, find the group for which the value of 120 belongs to.   That would be Group 3 (which has unit values going from 70 to 129).

In Excel, we can use a formula like the one below to generate this ‘double lookup’ calculation:

VLOOKUP(A11,

OFFSET($A$1,1,MATCH(B11,$A$1:$D$1)-1,4, 4),(4 – MATCH(B11,$A$1:$C$1)) + 1,TRUE)

image

This formula finds the appropriate columns first (column B) and then uses an approximate match VLOOKUP to find the row (#4) and column ( D ) to retrieve the right group.  The final result is:

image

Pretty powerful.

If we are analyzing data with PowerPivot, we want to be able to generate the same effect within the PowerPivot itself without having to drop all data into Excel and do the OFFSET lookup there. This could be because we are scanning millions of rows or simply because we want to keep all calculations in the same place (the PowerPivot database)

In order to do this we need to first understand one thing: we must unpivot the lookup table so we can successfully generate the lookup in DAX.   Unlike what you can do with the OFFSET function – in which you can specify a column / row coordinate dynamically – in DAX we will be only filtering rows.

If our data is in a database like SQL Server, we can do this by executing the following T-SQL expression:

SELECT Band, [Group], Units FROM

    (SELECT BandA, BandB, BandC, [Group] FROM Table1) t

UNPIVOT

    (Units FOR Band IN (BandA, BandB, BandC)) as unpvt

ORDER BY Band, [Group], Units

The result will be the following:

image

In the case you are relying on data which resides only on your spreadsheet, you can use the ‘Multiple Consolidation Ranges’ wizard to generate the same effect.  This wizard is found by going to File –> Options –> Customize Ribbon.  On the ‘Choose Commands From’ dropdown, select ‘All Commands’ and then find and add to your ribbon the ‘Pivot Table and Pivot Chart Wizard’.  Finally, click on the wizard button we just added to the ribbon.  You should see the following window:

image

Notice that I moved the ‘Group’ column to the left. This is because the wizard expects the categories on the first column. Select ‘Multiple consolidation ranges’ and follow the instructions.  Once complete, you will get a pivot table using the data from our lookup table. Move the ‘Row’ and ‘Column’ fields to the row labels.  Finally, flatten the table out (using the PivotTable Options context menu) and you will end up with the following result:

image

Now you are ready to import the unpivoted lookup data into the PowerPivot window.  Be sure to change the column names to something more meaningful than ‘Column’, ‘Row’ and ‘Total’.  In my case, I have called them ‘Band’, Group’ and ‘Units’.

Finally, import the items that we are going to be using against the lookup (2nd screenshot on this blog entry) – I called it ‘Data’. Notice that this is yet another scenario in which there is no actual relationships between the tables in the data model. All the work will be done using a DAX expression:

DoubleLookup:=IF ( HASONEVALUE( Data[Band] )  ,

CALCULATE(

        LASTNONBLANK( Lookup[Group], 1) ,                   

                       FILTER( Lookup,

                        Lookup[Band] = VALUES(Data[Band] ) &&                     

                        Lookup[Units] <  VALUES( Data[Units] )

                    ) 

            )

)

Notice the expression is very readable and shows how a fairly simple DAX formula can have a powerful dynamic effect. LASTNONBLANK selects the last [Group] value on a filter context for the current Band where the lookup units are less than the current unit value. The output matches the more complex expression we defined using OFFSET, MATCH and VLOOKUP:

image

The formula works even if we have repeated ‘Band’ values.  In that case, I suggest adding an identity column to the table in order to break down the resulting value accordingly –

image

  Output:

image

This post has been featured on Excel’s MVP Bill Jelen “VLOOKUP Week” (March 25 – 31, 2012 ) which groups podcasts and blogs describing the use of this and other interesting Excel-related lookup functions and techniques. For more information, visit Microsoft’s Excel team blog @ http://tinyurl.com/c2aw89y , or directly access VLOOKUP’s week website @ http://vlookupweek.wordpress.com/

Advertisements

USERELATIONSHIP and direction of context propagation

My friend Hrvoje Piasevoli ( twitter ) pointed out to me recently that when using USERELATIONSHIP in a calculated column, the output was not the expected value based on the selected relationship.   I have used USERELATIONSHIP successfully in DAX measures a good number of times so I was surprised to hear the news.

In my effort to replicate the issue so I could better understand it, I used a calculated column on a BISM Tabular model based on the Adventure Works database.  The expression, on the FactInternetSales table, was as follows:

CALCULATE( VALUES(DimDate[DateKey] ) ,

USERELATIONSHIP( FactInternetSales[ShipDateKey], DimDate[DateKey] )

)

When using these tables in a Tabular model, the default ‘active’ relationship on this table is based on Order Date:

image

Clearly, the output I expected was the [DateKey] value associated with the current row through the relationship defined using the [ShipDateKey]. In other words, I expected the output to be the same as what the row showed for Ship Date.

Interestingly, the value I got back completely ignored the USERELATIONSHIP and was based on the active relationship using Order Date:

image

So is it that USERELATIONSHIPS are only supported on Measures and not Calculated Columns?  Not quite.  The problem is not with the type of DAX expression but with the direction of context propagation.  As an example, lets examine the following DAX query on the same Tabular model (click the image to expand it):

image

ADDCOLUMNS is the equivalent expression to adding a Calculated Column in the Grid pane of BISM Tabular or PowerPivot. And as you can see, neither of the expressions using USERELATIONSHIP returned the expected value.  The only expression that returned the correct Ship Date was the one using a calculated relationship based on FILTER, which did not leverage the model’s defined relationships.

However, if we reverse the expression and evaluate it from the lookup table (DimDate), we get:

image

In other words, from the lookup table to the base table, USERELATIONSHIP works just fine.  The issue only emerges when invoking the function from the base table and making a call into the lookup table.

In DAX, context is automatically propagated from the lookup table to the base table.  In other words, when you assign filter context on a lookup table, only the corresponding base table rows are activated and taken into account when resolving the expression.  I described this on a prior blog entry related to Tabular relationships (Point # 2 here.)  Propagating context in the other direction does not happen automatically, and one must use functions like CALCULATE or RELATEDTABLE to generate this effect.

In this case, though, even using CALCULATE is not sufficient to propagate the context appropriately and one must use a calculated relationship.  In general terms, this should not be very problematic. However, the expression cannot benefit from the superior query performance delivered by leveraging the model’s defined relationships.

So in summary, USERELATIONSHIP can be used in both, Measures and Calculated Columns.  However – be mindful of the relationship direction as context will only respect USERELATIONSHIP when the filter is generated from the lookup table. Use a calculated relationship if you must place the filter on the base table.

Detecting Total, Subtotal and Hierarchy Levels in PowerPivot

Recently I participated on a thread on the PowerPivot MSDN forums in which it was asked how to detect the level you are on in the pivot table hierarchy in order to conditionally handle the output of a measure. As I think this is a fairly simple topic but one that can be a bit confusing, I hope this post can help others dealing with the very same issue. Thanks to SQL Server MVP Frederik Vandeputte (twitter) for the topic! 🙂

In general, it has become a standard practice to use the following expression if you want to avoid computing it at the ‘Grand Total’ level:

IF( COUNTROWS( VALUES( Table1[Column] ) ) = 1, <expression> )

For example, if you have a measure that should not be aggregated as in the table below:

image

You could create the following DAX measure to display the value for each product, and avoid any output at the total level:

CalcAtProductLevelOnly:=
IF(
COUNTROWS(VALUES(Table2[Products]))=1,
VALUES(Table2[Size])
)

image

As you can see, the pivot table ignores the grand total even if it was configured to show one.  With PowerPivot 2.0 (Currently in RC0), you can use the HASONEVALUE function to make the expression even more readable:

CalcAtProductLevelOnly:=
IF(
HASONEVALUE( Table2[Products] ) ,
VALUES( Table2[Size] )
)

The output will be the same in both expressions.   The idea here is that we are checking the current filter context for how many Products are currently active in the cell that is evaluating the expression.  It will only yield an output for those pivot table cells that have one and only one Product in context.   From here, you can see why the Grand Total is ignored:  it deals with more than one product.

This is all well known.  The question is, can we apply the same technique to detect levels of a hierarchy?  Here you must remember that hierarchies are ‘display-only’ groupings in PowerPivot (and SSAS BISM Tabular). Even with the first release of PowerPivot, row labels are grouped in a ‘hierarchical’ way based on the order of attributes you place on the row labels:

image

If we follow the logic we used from Grand Totals, we can infer that a subcategory can have more than one product. In a similar way, we are aware that a category can have more than one subcategory.   Here is the dataset:

image

Following this logic, we can create a DAX measure to help us determine which level we are dealing with – product, subcategory or category:

IF(
COUNTROWS(VALUES(Table1[Products]))=1,”Product”,
IF(
COUNTROWS(VALUES(Table1[Subcategories])) = 1,”SubCategory”,”Category”
)
)

Unfortunately, this doesn’t give the correct result:

image

For Category A / SubCategory A the calculation works without a problem as there are two products (which allows us to detect the SubCategory level) and two SubCategories (which allow us to detect the Category level).

However, for Category B there is only one Product.  As the expression detects a level based on count of attribute values below the current location in the display hierarchy, it has no way to know which level is it on in this case.

So how can we fix it?  Fortunately, this is once again a situation in which the ability to manipulate context comes handy. I thank my good friend and colleague Jason Thomas (blog | twitter ) who helped me think of a way to deal with these kind of scenarios.

If we use the following DAX measure, we get the correct values:

IF(
CALCULATE(COUNTROWS(VALUES(Table1[Products])),
ALLEXCEPT(Table1, Table1[Products])) = 1,
“Product”,
IF(
CALCULATE(COUNTROWS(VALUES(Table1[Subcategories])),
ALLEXCEPT(Table1, Table1[Subcategories])) = 1,
“SubCategory”,
“Category”
)
)

image

The inner CALCULATE clears the filter context on all columns except subcategory.  As such, it doesn’t limit itself to count the number of subcategories for the current category; instead it counts all subcategories across all categories.   In this way, even if a category only has one subcategory (like CategoryB in the example above) it won’t be a problem as the count will rely on the subcategories across the entire table.  The same logic is used by the outer CALCULATE when determining if the current level is as product or not.

The only situation in which this wont be the case would be, for example, if there is only row in the table showing one product, one subcategory and one category.  It is unlikely this type of situation will affect most reporting scenarios, but you should be aware of this limitation.

Also, is it interesting to notice that the Grand Total is detected as being on the ‘Category’ level.  This, again, shouldn’t be a problem as normally you would want your most encompassing level in the Total row.  And here once again you can change the behavior by combining the expression used with the technique we discussed at the beginning of this blog entry.

Finally, notice that by detecting the current level we will be able to assign a specific subtotal or total calculation which depends on the level which is active in the filter context.  For example, we can add another DAX measure like the one below which leverages the expression we used to generate conditional outputs:

ConditionalLevelOutput:=
IF([CurrentLevel] <> “Product”,
SUM( Table1[Amount] ) * 0.9,
SUM(Table1[Amount])
)

On it, the category and subcategory levels will be discounted by 10% due to specific reporting conditions (perhaps a wholesale promotion, etc.)

UPDATE: Alberto Ferrari pointed out to the ability to use a simpler syntax in DAX 2.0 leveraging the ISFILTERED function. 

Interestingly, when using this new function in DAX 2.0 we avoid the issue described earlier in this post in which a table with a single row will give the incorrect level output.  First, lets review the syntax using ISFILTERED:

IF(
    ISFILTERED(Table2[Products]),”Product”, IF(
        ISFILTERED(Table2[Subcategories]), “Subcategory”, IF(
            ISFILTERED(Table2[Categories]), “Category”, “Category” )
        )
    )

As you can see, the formula is much readable now. Now, lets see what happens when we compare both expressions (the one using ALLEXCEPT and the one using ISFILTERED):

image

Using ISFILTERED yields the expected output in a table with one row whereas the expression using ALLEXCEPT doesn’t.   This is because we are evaluating the current level in a different way:  ISFILTERED is able to detect direct filters which will yield a more accurate result.

Consequently, if you have the opportunity to use the latest version of PowerPivot (Currently in RC0), this last calculation is definitely the way to go.  Thanks Alberto for the suggestion.

Simulating an "approximate match" VLOOKUP in PowerPivot

It is often said that relationships in PowerPivot work in a similar way to VLOOKUPs.  In reality, this is only partially true. If we examine the claim a bit closer we realize that Excel’s VLOOKUP function has a parameter in which it is possible to use an approximation when matching values against the lookup table:

image

Regular PowerPivot relationships emulate the ‘exact match’ option of the Excel VLOOKUP function And although PowerPivot relationships are a lot more powerful, being able to retrieve lookup values over huge tables at amazing speed, we are still left with the question: how can we achieve same type of behavior used by the ‘approximate matches’ option of the VLOOKUP function? Lets walk through the process.

Imagine we have the following tables:

image

As you can see, the table at the left is the one that will be executing the lookup call against that table on the right.   The first thing we need to ensure is that there are no relationships among those tables in the PowerPivot model.   That’s right, we will be generating the entire lookup as a DAX calculation.  In other words, we will be generating a calculated relationship.

Lets first examine what Excel yields when using the VLOOKUP function with approximate matches:

=VLOOKUP([@TableA],Table2[TableB],1,TRUE)

image

If an exact match is not found, VLOOKUP returns the next largest value that is less than the lookup value.

In PowerPivot, we would generate the same effect with this calculation:

If (
Hasonevalue ( TableA ),
Calculate (
Max ( TableB[TableB] ),
Filter ( TableB, TableB[TableB] <= Values ( TableA[TableA] ) )
)
)

The first condition (HASONEVALUE) forces the computation to be executed only when there is one and only one value on TableA (in other words, it avoids calculating for the grand total row if there was one as this would be meaningless to do so).

The actual computation is done via the CALCULATE expression.  As you can see, it takes the maximum value of TableB in a context modified by the second parameter.  As there is no relationship between TableA and TableB, context is not propagated and the calculation uses all rows in TableB.   The FILTER function then filters the table and returns only those rows which are less than or equal to the current value in TableA.

The result is this:

image

As you can see, we were able to match the output we got with VLOOKUP.   But what will happen if we use are trying to match text values instead of numbers?

image

We get the following error:

Calculation error in measure ‘TableA'[Measure 3]: The function MAX takes an argument that evaluates to numbers or dates and cannot work with values of type String.

A solution is found by using the LASTNONBLANK function:

If (
Hasonevalue( TableA_Text ),
Calculate (
LASTNONBLANK ( TableB_Text[TableB], 1 ),
Filter ( TableB_Text, TableB_Text[TableB] <= Values ( TableA_Text[TableA] ) )
)
)

image

LASTNONBLANK requires two parameters, and we can use the first one for the column for which we want to get the last value in the lookup table.  The second parameter must be an expression that evaluates for blanks;  by using ‘1’  here we retrieve all rows on that table but that is OK as the table is being filtered by the second parameter of the CALCULATE function.  In other words we truly get the last value in context which is exactly what we need.

This last calculation can be used with text values, but it will also work with numbers or dates, and it is a great pattern when simulating VLOOKUPs with approximate matches.

image

This post has been featured on Excel’s MVP Bill Jelen “VLOOKUP Week” (March 25  – 31, 2012 ) which groups podcasts and blogs describing the use of this and other interesting Excel-related lookup functions and techniques.  For more information, visit Microsoft’s Excel team blog @ http://tinyurl.com/c2aw89y , or directly access VLOOKUP’s week website @ http://vlookupweek.wordpress.com/