LastNonEmpty in Tabular mode: Part 2, Last Ever Non Empty in DAX

One of the most useful inventory calculations is what Chris Webb referred to as ‘Last Ever Non Empty’.  This calculation differs from the regular MDX Last Non Empty aggregation in that it will yield, for each dimension member at the granularity specified, the fact value on the last date for which there was data even if that date happened before the selected time frame.

image

Jason Thomas (blog | twitter) and I have recently been working on classifying a number of different types of Last Non Empty scenarios that people may encounter when working on inventory control analytics.  Though the examples we use are for inventory reporting, these patterns are clearly applicable to reports beyond this subject area and can be applied anywhere there is a need for Last Non Empty aggregations.

We have come up with the following categorizations:

image

This is part 2 of this series, in which I will explain in more detail a DAX pattern to tackle Last Ever Non Empty scenarios in BISM Tabular.  This is one type of aggregation that is not easy to handle in Multidimensional, so its worth exploring if we are able to achieve a better performing calculation with DAX. If you are interested in the first two Last Non Empty scenarios, please refer to part I of the series available here.

Handling these calculations require leveraging another DAX pattern which we blogged about before, one that allows us to explicitly execute a Tabular calculation at an explicit level of granularity.  These articles can be found here and here.

Lets approach the first scenario of Last Ever Non Empty calculations.  In order to explain this better, lets use an example from Contoso’s database.  Suppose we want to find out what the last amount of inventory in stock across all stores is.  In this case, we want to analyze stores by grouping them by the product manufacturers we distribute. In other words, we want to find out what the last date with inventory data was for each manufacturer, and then add the inventory quantity in stock on that day across all stores that distribute products from the manufacturer in question. In order to compute this, we can start with a simple pivot table like the following:

image

I have selected only a few zip codes (10018 and 10093) to narrow the analysis.   We first want to identify what the latest date (with data) was for each manufacturer.  Using just a CALCULATE function coupled with LASTDATE won’t do the trick.  For example, the following expression will return the last date of the date dimension:

Measure 1:=CALCULATE( VALUES( DimDate[Datekey] ) , LASTDATE( DimDate[Datekey] ) )

image

 

If we tried using the date on the fact table instead, we would still get incorrect results as the last date with activity for those stores happened before the selected year of 2011:

Measure 1:=CALCULATE( VALUES( FactInventory[Datekey] ) , LASTDATE( FactInventory[Datekey] ) )

image

Instead, what we want is to iterate though each store (in effect, changing the granularity of the calculation) and retrieve the last date with activity which is less than or equal the year we selected. In order to do this, we must define an iterator (MAXX, for example) and then use CALCULATE to retrieve the last day with fact data.  The calculation for this is:

LastEverNonEmptyDate:=IF(HASONEVALUE(DimStore[StoreName]) , MAXX(
        SUMMARIZE
        (
        DimStore,
        DimStore[StoreName]
        ,"LastEverNonEmptyDate", CALCULATE( MAX( FactInventory[DateKey] )
                                , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear])
        ) )
    , [LastEverNonEmptyDate] ) )

In reality, the use of MAXX is not because we want the MAX value, but because we want to iterate through the in-memory table defined by the SUMMARIZE function.   If we used MINXX instead of MAXX, we would still get the same output:

image

With this information, we can now tackle scenario # 1 of the Last Ever Non Empty pattern.   We want to get the value for each manufacturer on the last date in which it had activity (last non empty date across all stores).  By changing the grain of the calculation to iterate over manufacturers instead of stores, we can get that date and retrieve the quantity of inventory in stock on that date:

LastEverNonEmpty-AllStores:=SUMX(
    ADDCOLUMNS(
        ADDCOLUMNS(
        VALUES(DimProduct[Manufacturer])
            ,"LastEverNonEmptyDate",
                CALCULATE( MAX( FactInventory[DateKey] )
                , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear])
                )
            )
        , "Quantity", CALCULATE ( SUM( FactInventory[OnHandQuantity] )
                        , FILTER(ALL(Dimdate), DimDate[DateKey] = [LastEverNonEmptyDate] ) )
        )
    , [Quantity] )

The inner ADDCOLUMNS function is used to retrieve the last ever non empty date for each manufacturer (up until the selected year).  The outer ADDCOLUMNS function allows us to use this date and dynamically filter the fact table as we iterate across all manufacturers.   In the pivot table, at the store level, we are still able to see what the value was for that store on the last ever non empty date.   However, notice that at the manufacturer total level, only the stores with activity on the last ever non empty date (for the current store) will be aggregated:

image

This calculation performs rather well, and in a fact table of a little more than 8 million rows it is able to generate an output in approximately 2 seconds.

Finally, we want to tackle the more complex scenario in Last Ever Non Empty patterns.  This one requires that the manufacturer total takes into account the last ever non empty value for each store.  In other words, in the example above, the total should take all rows into account, not just those for 12/26/2009.  In DAX, we are able to change the granularity of the calculation to now be a composite of manufacturer and store.   The final expression is:

LastEverNonEmpty-EachStore:=SUMX(
    ADDCOLUMNS(       
        ADDCOLUMNS(   
            CROSSJOIN(
                VALUES(DimStore[StoreName]),
                VALUES(DimProduct[Manufacturer])
            )
    ,"LastEverNonEmptyDate", CALCULATE( MAX( FactInventory[DateKey] )
                                        , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear]) )
        )   
    , "Quantity", CALCULATE ( SUM( FactInventory[OnHandQuantity] )
                        , FILTER(ALL(Dimdate), DimDate[DateKey] = [LastEverNonEmptyDate] ) )
    )
, [Quantity] )

Here, we are using a nested expression to calculate, first, the crossjoin of store names and manufacturers.  Then, we get the last ever non empty date for each combination.  Finally we get the total quantity for each pair, as we iterate over that table using SUMX.  The result is correct, even at the grand total level and even thought the computation is more complex it executes again in about 2 seconds in the Contoso 8 million row fact table; in other words, I couldn’t identify – during my tests – any performance overhead compared to the prior calculation.  Getting this kind of performance on a last ever non empty calculation in Multidimensional is hard or even impossible to achieve, and it would require tweaks to the ETL which are not always ideal.

image

Notice how the highlighted row is now included on the manufacturer total.   Explicitly specifying the grain of the DAX calculation to use both, manufacturer and store, is necessary to iterate properly at the grand total level as many stores can be classified under multiple manufacturers and simply getting the last ever non empty date at the store level will incorrectly represent the amounts as the context of the calculation would only use the last ever non empty date of each store across all manufacturers which is clearly incorrect.

Also note that when leveraging the iteration used in the expression, we are looping over the values on the lookup table and not the base table.  This is an important point as the date lookup is filtering the fact table, but the product and store lookup tables are not filtered as context is unable to propagate to them by default.  In this way, we are able to grab stores that had activity even before the selected calendar value (a year, in this case).

These examples show the versatility of Tabular and the DAX language.  By respecting the rules of context propagation, we can develop advanced calculations that are not built into  the model.

About these ads

2 Responses to LastNonEmpty in Tabular mode: Part 2, Last Ever Non Empty in DAX

  1. Pingback: A Different Approach To Last-Ever Non-Empty in DAX « Chris Webb's BI Blog

  2. Pingback: A Different Approach To Last-Ever Non-Empty in DAX - Atlas Analytics Inc.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 67 other followers

%d bloggers like this: