Approximate match with multiple criteria in DAX, part II

On the prior blog entry, I explained how to mix exact and approximate matches when retrieving a value form  a lookup table. A solution was explained which required a specific approach that must now be explained. This blog entry goes into more advanced topics related to formula context and some not very intuitive DAX behaviors.

As mentioned here, PowerPivot gives you the ability to mimic the behavior of VLOOKUP with approximate matches.  If you must extend the formula to include exact matches, the initial thought is to add more elements to the FILTER function in DAX, as highlighted below:

image

Notice the statement in purple uses a ‘less than or equal’ operator (unlike the exact matching criteria).  As such, DAX will retrieve a number of rows up until the value on the right (Orders[Date]).  What those rows selected, we attempt to select the most recent value using LASTNONBLANK.

Lets apply this expression to the following scenario:

image

We want to retrieve the Product Cost from the lookup table to the base table, and we want to do so by an exact match on Category and Product, and an approximate match on Date.

When we use the expression outlined above we get the following:

image

Clearly, the value for 1/3/2012 and 1/5/2012 is incorrect (Should have been 97 for both, as that was either an exact match or the most approximate value).

What went wrong?  The issue emerges from the intrinsic behavior of the LASTNONBLANK function.  The name may lead you to believe the value will be the last value that is not empty.  However, LASTNONBLANK has an interesting behavior when is acting over numeric values:  it will sort them from small to large, regardless of the actual row order in the table.  We can verify we are in fact retrieving valid rows from the lookup table up to the date in question, by changing the expression to the following:

=CALCULATE(
    COUNTROWS( Lookup ) ,
    FILTER(
        Lookup,
            Lookup[Category] = Orders[Category] &&
            Lookup[Product] = Orders[Product] &&
            Lookup[Date] <= Orders[Date] 
        ) 
    )

image

So what LASTNONBLANK is doing is finding the highest numeric value within the rows that were returned. This can be easily verified by creating a linked table in PowerPivot with the following data:

image

If we use a simple DAX measure with the following definition:

LASTNONBLANK( Table[LastNonEmptyIssue] , 1)

the output is 45.   As you can see, LASTNONBLANK is returning the highest value (MAX), not the last value in the table.  In our solution for approximate matches we must get the last numeric value available in the table, not the max value.

How to solve this? We must observe that LASTNONBLANK is actually able to get correct date value.

If you try the same DAX expression we used above in the following table…

image

the output will be 1/9/2012.

This is because the last value in a date column is normally the maximum date, as we typically assume the date values are ordered in descending way.

So instead of using FILTER to get all rows up until the order’s date, we will use an approximate match to retrieve the last date value and *then* use this single value to do a lookup based on that exact date.  Here is the final – valid – solution (as shown in the last post):

=CALCULATE( 
    LASTNONBLANK( Lookup[Cost] , 1 ),
     FILTER( 
        Lookup,
        Lookup[Date] = CALCULATE(
                                     LASTNONBLANK( Lookup[Date],1 ), 
                                     FILTER(
                                         Lookup,
                                         Lookup[Category] = Orders[Category] &&
                                         Lookup[Product] = Orders[Product] &&
                                         Lookup[Date] <= Orders[Date]
                                       )
                                  )
&& 
        Lookup[Category] = Orders[Category] &&
        Lookup[Product] = Orders[Product]
   )  
)

It is also possible to break down the “approximate date” part of the formula (in red above) into a separate calculated column and have the final calculation refer to that column instead.  This can improve future troubleshooting efforts. For example:

image

 

What if you need this expression as a measure?

The example above should be used when attempting approximate matches as DAX calculated columns.  If we use it as a measure, perhaps because the granularity of the calculation should be different from that of the in-memory table, there is one extra caution we must keep in mind in order to generate the correct output:  we cannot break down the calculation in two as I just suggested you do when acting on calculated columns.  Why?  Because a measure used inside of another measure is automatically surrounded by a CALCULATE statement which affects the output of the two-step expression.

[Approximate Match Date Measure]:=IF(HASONEVALUE(Orders ) , CALCULATE(
    LASTNONBLANK( Lookup[Date],1 ),
    FILTER(
        Lookup,
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product]) &&
            Lookup[Date] <= VALUES(Orders[Date]) 
        ) 
    ) )

Notice that we are using the VALUES function when acting as a DAX measure, as we want to be sure to grab the current single value in context.  Next, we create another measure to call this one:

[RetrievedLookupCostValue_Wrong]:=IF(HASONEVALUE(Orders) , CALCULATE(
    LASTNONBLANK( Lookup[Cost] , 1 ),
    FILTER(
        Lookup,
            Lookup[Date] = [Approximate Match Date Measure] &&
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product] )
        )
) )

The result is incorrectly showing the MAX cost instead of the most approximate cost value:

image

When DAX automatically surrounds a measure reference with the CALCULATE statement, it coverts row context into filter context.  In this specific scenario, what that means is that even though we are passing an exact date, CALCULATE grabs all rows that match the exact criteria (Product and Category) and for each of those rows retrieved it grabs the last date in context for each row.  As this concept is a bit difficult to understand at first, lets try it with another example:

image

In the table above we have a stream of dates (column 1) and two measures.  The [Last Date 1] represents the last date for the whole table, whereas [Last Date 2] represents the last date as of the current date in context (which happens to be the same date value on column 1). Which one is correct? It depends on the business question right?

  • If I am computing ratios over total (% of total), I want to see – in the denominator and for each row – the total value from the first date until the last date for the whole dataset (that would be [Last Date 1] ).
  • On the other hand, If I am calculating a running total, I want to see the value of 1/3/2012, for example, as the value represented *up until* 1/3/2012 and nothing after that (that would be [Last Date 2]).

In other words, each date can be the correct solution depending the business scenario.

When wrapping a measure inside of another measure, DAX assigns a CALCULATE function automatically which turns an output like the one in [Last Date 1] to that one of [Last Date 2].  Why?  because now each row will be taken into account in the computation of how the table is filtered (filter context), so each row’s date value is now filtering the selection of rows to be considered when retrieving the “last value that is not blank”.

The way around it is by computing the whole expression at once — only in that way we avoid the automatically assigned ‘CALCULATE’. So the final calculation, as a DAX measure, is:

IF(HASONEVALUE(Orders) , CALCULATE(
    LASTNONBLANK( Lookup[Cost], 1 ), 
    FILTER(
        Lookup,
            Lookup[Date] = CALCULATE(
                            LASTNONBLANK( Lookup[Date],1 ),
                                FILTER(
                                    Lookup,
                                        Lookup[Category] = VALUES(Orders[Category]) &&
                                        Lookup[Product] = VALUES(Orders[Product]) &&
                                        Lookup[Date] <= VALUES(Orders[Date]) 
                                    ) 
                            )  &&
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product] )
        )
    )
)

Advertisements

6 Responses to Approximate match with multiple criteria in DAX, part II

  1. David Hager says:

    I guess that is why the LASTNONBLANK function is in the Date and Time Intelligence function category 🙂

    • javierguillen says:

      Hi David

      Interesting observation! Is interesting to notice also that in either case (whether a numeric value or a date one), the expression LASTNONBLANK( [column] , 1) will retrieve the maximum value in the table; Even If the most current date is not the last row in the table, it will still be retrieved using this function. Most of the times, we want the maximum date value (the last date) when analyzing data, but in some scenarios an analyst may actually want the last date value in the table even if this value is not the most current date.

      If that is the case, one must have to rely on another column and not the date itself: a row identifier would be needed.

      • David Hager says:

        And I can see a scenario where the EARLIER function can be used with the return value from LASTNONBLANK providing an “anchor”.

  2. As usual Javier, brilliant. Thanks for working through the semantics on this one. Seems like a fairly simple problem that pp should handle. Your dax does just that!

    Thanks again!

  3. Pingback: DAX context propagation, inactive relationships and calculated columns | Javier Guillén

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: