Detecting Total, Subtotal and Hierarchy Levels in PowerPivot

Recently I participated on a thread on the PowerPivot MSDN forums in which it was asked how to detect the level you are on in the pivot table hierarchy in order to conditionally handle the output of a measure. As I think this is a fairly simple topic but one that can be a bit confusing, I hope this post can help others dealing with the very same issue. Thanks to SQL Server MVP Frederik Vandeputte (twitter) for the topic! 🙂

In general, it has become a standard practice to use the following expression if you want to avoid computing it at the ‘Grand Total’ level:

IF( COUNTROWS( VALUES( Table1[Column] ) ) = 1, <expression> )

For example, if you have a measure that should not be aggregated as in the table below:

image

You could create the following DAX measure to display the value for each product, and avoid any output at the total level:

CalcAtProductLevelOnly:=
IF(
COUNTROWS(VALUES(Table2[Products]))=1,
VALUES(Table2[Size])
)

image

As you can see, the pivot table ignores the grand total even if it was configured to show one.  With PowerPivot 2.0 (Currently in RC0), you can use the HASONEVALUE function to make the expression even more readable:

CalcAtProductLevelOnly:=
IF(
HASONEVALUE( Table2[Products] ) ,
VALUES( Table2[Size] )
)

The output will be the same in both expressions.   The idea here is that we are checking the current filter context for how many Products are currently active in the cell that is evaluating the expression.  It will only yield an output for those pivot table cells that have one and only one Product in context.   From here, you can see why the Grand Total is ignored:  it deals with more than one product.

This is all well known.  The question is, can we apply the same technique to detect levels of a hierarchy?  Here you must remember that hierarchies are ‘display-only’ groupings in PowerPivot (and SSAS BISM Tabular). Even with the first release of PowerPivot, row labels are grouped in a ‘hierarchical’ way based on the order of attributes you place on the row labels:

image

If we follow the logic we used from Grand Totals, we can infer that a subcategory can have more than one product. In a similar way, we are aware that a category can have more than one subcategory.   Here is the dataset:

image

Following this logic, we can create a DAX measure to help us determine which level we are dealing with – product, subcategory or category:

IF(
COUNTROWS(VALUES(Table1[Products]))=1,”Product”,
IF(
COUNTROWS(VALUES(Table1[Subcategories])) = 1,”SubCategory”,”Category”
)
)

Unfortunately, this doesn’t give the correct result:

image

For Category A / SubCategory A the calculation works without a problem as there are two products (which allows us to detect the SubCategory level) and two SubCategories (which allow us to detect the Category level).

However, for Category B there is only one Product.  As the expression detects a level based on count of attribute values below the current location in the display hierarchy, it has no way to know which level is it on in this case.

So how can we fix it?  Fortunately, this is once again a situation in which the ability to manipulate context comes handy. I thank my good friend and colleague Jason Thomas (blog | twitter ) who helped me think of a way to deal with these kind of scenarios.

If we use the following DAX measure, we get the correct values:

IF(
CALCULATE(COUNTROWS(VALUES(Table1[Products])),
ALLEXCEPT(Table1, Table1[Products])) = 1,
“Product”,
IF(
CALCULATE(COUNTROWS(VALUES(Table1[Subcategories])),
ALLEXCEPT(Table1, Table1[Subcategories])) = 1,
“SubCategory”,
“Category”
)
)

image

The inner CALCULATE clears the filter context on all columns except subcategory.  As such, it doesn’t limit itself to count the number of subcategories for the current category; instead it counts all subcategories across all categories.   In this way, even if a category only has one subcategory (like CategoryB in the example above) it won’t be a problem as the count will rely on the subcategories across the entire table.  The same logic is used by the outer CALCULATE when determining if the current level is as product or not.

The only situation in which this wont be the case would be, for example, if there is only row in the table showing one product, one subcategory and one category.  It is unlikely this type of situation will affect most reporting scenarios, but you should be aware of this limitation.

Also, is it interesting to notice that the Grand Total is detected as being on the ‘Category’ level.  This, again, shouldn’t be a problem as normally you would want your most encompassing level in the Total row.  And here once again you can change the behavior by combining the expression used with the technique we discussed at the beginning of this blog entry.

Finally, notice that by detecting the current level we will be able to assign a specific subtotal or total calculation which depends on the level which is active in the filter context.  For example, we can add another DAX measure like the one below which leverages the expression we used to generate conditional outputs:

ConditionalLevelOutput:=
IF([CurrentLevel] <> “Product”,
SUM( Table1[Amount] ) * 0.9,
SUM(Table1[Amount])
)

On it, the category and subcategory levels will be discounted by 10% due to specific reporting conditions (perhaps a wholesale promotion, etc.)

UPDATE: Alberto Ferrari pointed out to the ability to use a simpler syntax in DAX 2.0 leveraging the ISFILTERED function. 

Interestingly, when using this new function in DAX 2.0 we avoid the issue described earlier in this post in which a table with a single row will give the incorrect level output.  First, lets review the syntax using ISFILTERED:

IF(
    ISFILTERED(Table2[Products]),”Product”, IF(
        ISFILTERED(Table2[Subcategories]), “Subcategory”, IF(
            ISFILTERED(Table2[Categories]), “Category”, “Category” )
        )
    )

As you can see, the formula is much readable now. Now, lets see what happens when we compare both expressions (the one using ALLEXCEPT and the one using ISFILTERED):

image

Using ISFILTERED yields the expected output in a table with one row whereas the expression using ALLEXCEPT doesn’t.   This is because we are evaluating the current level in a different way:  ISFILTERED is able to detect direct filters which will yield a more accurate result.

Consequently, if you have the opportunity to use the latest version of PowerPivot (Currently in RC0), this last calculation is definitely the way to go.  Thanks Alberto for the suggestion.

Advertisements

Simulating an "approximate match" VLOOKUP in PowerPivot

It is often said that relationships in PowerPivot work in a similar way to VLOOKUPs.  In reality, this is only partially true. If we examine the claim a bit closer we realize that Excel’s VLOOKUP function has a parameter in which it is possible to use an approximation when matching values against the lookup table:

image

Regular PowerPivot relationships emulate the ‘exact match’ option of the Excel VLOOKUP function And although PowerPivot relationships are a lot more powerful, being able to retrieve lookup values over huge tables at amazing speed, we are still left with the question: how can we achieve same type of behavior used by the ‘approximate matches’ option of the VLOOKUP function? Lets walk through the process.

Imagine we have the following tables:

image

As you can see, the table at the left is the one that will be executing the lookup call against that table on the right.   The first thing we need to ensure is that there are no relationships among those tables in the PowerPivot model.   That’s right, we will be generating the entire lookup as a DAX calculation.  In other words, we will be generating a calculated relationship.

Lets first examine what Excel yields when using the VLOOKUP function with approximate matches:

=VLOOKUP([@TableA],Table2[TableB],1,TRUE)

image

If an exact match is not found, VLOOKUP returns the next largest value that is less than the lookup value.

In PowerPivot, we would generate the same effect with this calculation:

If (
Hasonevalue ( TableA ),
Calculate (
Max ( TableB[TableB] ),
Filter ( TableB, TableB[TableB] <= Values ( TableA[TableA] ) )
)
)

The first condition (HASONEVALUE) forces the computation to be executed only when there is one and only one value on TableA (in other words, it avoids calculating for the grand total row if there was one as this would be meaningless to do so).

The actual computation is done via the CALCULATE expression.  As you can see, it takes the maximum value of TableB in a context modified by the second parameter.  As there is no relationship between TableA and TableB, context is not propagated and the calculation uses all rows in TableB.   The FILTER function then filters the table and returns only those rows which are less than or equal to the current value in TableA.

The result is this:

image

As you can see, we were able to match the output we got with VLOOKUP.   But what will happen if we use are trying to match text values instead of numbers?

image

We get the following error:

Calculation error in measure ‘TableA'[Measure 3]: The function MAX takes an argument that evaluates to numbers or dates and cannot work with values of type String.

A solution is found by using the LASTNONBLANK function:

If (
Hasonevalue( TableA_Text ),
Calculate (
LASTNONBLANK ( TableB_Text[TableB], 1 ),
Filter ( TableB_Text, TableB_Text[TableB] <= Values ( TableA_Text[TableA] ) )
)
)

image

LASTNONBLANK requires two parameters, and we can use the first one for the column for which we want to get the last value in the lookup table.  The second parameter must be an expression that evaluates for blanks;  by using ‘1’  here we retrieve all rows on that table but that is OK as the table is being filtered by the second parameter of the CALCULATE function.  In other words we truly get the last value in context which is exactly what we need.

This last calculation can be used with text values, but it will also work with numbers or dates, and it is a great pattern when simulating VLOOKUPs with approximate matches.

image

This post has been featured on Excel’s MVP Bill Jelen “VLOOKUP Week” (March 25  – 31, 2012 ) which groups podcasts and blogs describing the use of this and other interesting Excel-related lookup functions and techniques.  For more information, visit Microsoft’s Excel team blog @ http://tinyurl.com/c2aw89y , or directly access VLOOKUP’s week website @ http://vlookupweek.wordpress.com/

Can EARLIER be used in DAX measures?

EARLIER is a DAX function that acts exclusively on row context,  and its purpose is to create a reference to a column value on an outer loop of the evaluation of the expression.  It is commonly used in calculated columns during nested row by row iterations.

One of the things that I have been wondering for about a year now (an eternity in this digital era!) is if it would be possible to use it in a measure. An example would perhaps make this more clear:  suppose you want to use EARLIER to calculate running total values on a table like this one:

image

I know, there are special functions like TOTALYTD that can be used for this purpose, but it is always a good idea to explore other calculation options for the pure fun of learning (Yes, I’m a DAX geek).   If you use EARLIER in a DAX calculated column like the one defined below, the Tabular model is able to compute the running total without a problem:

SumX (
Filter ( Table1, Table1[date] <= Earlier ( Table1[date] ) ),
Table1[amount]
)

 

image

This is easy enough.  For each row iteration in the in-memory table, we create a new filter that grabs all the rows with a date prior or equal to the one currently in context in the outer loop  of the nested operation. In other words, the calculation defines a new loop – an inner loop – that can then invoke the value of  [date] on the outer loop and in this way dynamically filter the table passed to the SUMX function.  The effect is that for each row, we are able to add all the values up until the one currently in context.

Can you use the same formula as part of a measure?  You can’t.  At least not in the exact same way.  And here is where I have been stuck for a while now, until Marco Russo ( blog | twitter ) and Alberto Ferrari ( blog | twitter ) helped me with some extra guidance (Thanks so much guys!!!).  In addition the famous CALCULATE wall, there seemed to be an EARLIER wall as well Smile

So lets see what happens if you try to use the same expression as a measure.  You get this error:

image

“EARLIER/EARLIEST refer to an earlier row context which doesn’t exist.”  Hmm.   You might wonder, why?   Doesn’t SUMX generate an iteration, just like FILTER does?   And given the existence of these two loops, shouldn’t we assume a nested iteration has been created?

Not really.   Marco and Alberto explained to me the fact that FILTER evaluates first, and by the time SUMX’s row context is being evaluated FILTER has already returned a table, hence the two contexts do not interact with each other.  In that sense, EARLIER cannot be used.

If we really want to use EARLIER as part of a measure, we must introduce nested iterations of row context that interact with each other.  In other words, the evaluation must happen for each row for each row.

So how can we recreate the running total calculation as a measure and using EARLIER?  Here is what I came up with:

RunningTotal:= CALCULATE(
SUM( Table1[amount] ),
FILTER(  ALL(Table1) ,
SUMX( FILTER( Table1, EARLIER( Table1[date] ) <= Table1[date] ), Table1[amount] )
)
)

image

The first thing to notice is the SUMX expression is very similar to the one used in the calculated column, with one exception:   Instead of using the expression below as in the calculated column

Table1[date] <= EARLIER( Table1[date] )

I used the following one in the measure:

EARLIER( Table1[date] ) <= Table1[date] )

Lets see why.  I introduced an outer loop by wrapping SUMX with another FILTER function, which in turn has cleared the existing filter context by invoking the ALL function.   When this outer FILTER iterates across all values of the table, the inner FILTER uses a reference to the prior row context through the use of EARLIER and then selects only those rows in that have a date up to the one in current context in the outer loop.  How is it different from the use in the calculated column?  The difference is that as a measure we are really filtering the table passed to the FILTER in the outer loop, not the inner loop FILTER as in the calculated column expression.

The other thing to notice is that the inner FILTER didn’t remove any existing filters in the filter context.  By passing the table without the use of the ALL function, context has been refined rather than expanded. However, as the outer FILTER did in fact remove filters by invoking the ALL function. The resulting interaction of filters allows the calculation to evaluate for the current day in context but is still able to operate over a all rows existing in the table up to the current day.

And finally, we have the use of the SUMX aggregate.  It is very interesting that in this calculation that aggregate is irrelevant, in fact I would get the same running sum total if I where to use the following:

CALCULATE(
SUM( Table1[amount] ),
FILTER(  ALL(Table1) ,
         COUNTROWS(  FILTER( Table1, EARLIER( Table1[date] ) <= Table1[date] ) )   
)
)

(Notice the use of COUNTROWS instead of SUMX)

This is because the aggregate in this calculation is really just a placeholder, the actual evaluation context manipulation happens as a consequence of the intersection of filters on the two nested FILTER functions, which generates the required context to enable the running total to occur. And this is the filter context that the outermost function – CALCULATE – uses when evaluating the SUM of [amount]  (In other words, it is this aggregate the one that actually matters in this expression).

Interestingly, the grand total generated with the DAX measures is correct as shown in the screenshot below: it reflects the last running value.   On the other hand, if we do this as a calculated column and then place it in a pivot table, the grand total will incorrectly aggregate the values instead of showing the last one.

image

The use of nested row contexts is definitely an complex topic, probably not suited for self-service BI – but one that SSAS developers need to address in order to generate advanced calculations in the Tabular model.