DAX context propagation, inactive relationships and calculated columns

Marco Russo wrote recently an excellent blog post detailing the options you face when retrieving values from one table to another one via DAX’s USERELATIONSHIP function.   I found his post particularly interesting given I had written a post about a similar subject some time ago, in which a reached some conclusions that I want to revisit.

One of the solutions he describes (although he doesn’t recommend) uses the following formula signature:

CALCULATE (
    CALCULATE (
        VALUES ( <value on related column to retrieve> ), 
        <many-side table> 
    ),
    USERELATIONSHIP ( <one-side key>, <many-side key> ),
    ALL ( <one-side table> )
)

His recommended approach is instead to use LOOKUPVALUE.  However, I found the above calculation interesting and decided to break it down to understand it a bit better.   And here I ran areas of DAX that may be less than intuitive, yet worthy of a blog post.

 

Breaking it down

Here is the data model we use, which includes inactive relationships:

image

Step 1: To test the behavior of the inner most CALCULATE, is worth starting from a simpler starting point:

FactInternetSales[CalculatedColumn1]=
COUNTROWS( DimDate )

image

Clearly, the number reflects the total amount of records on the date table regardless of the row in context.  This in itself can be non intuitive for people starting to use DAX, as by default calculated columns evaluate under row context.  However, aggregate functions (SUM, MIN, COUNTROWS, etc) even in calculated columns will, by default, operate under filter context:  as there is no current filters in the filter context, we get the total count of rows across all date values.

Step 2: Context Transition

FactInternetSales[CalculatedColumn1]=
CALCULATE( COUNTROWS( DimDate ) )

image

Row context gets transformed into filter context and the only surviving row in the date table follows the active relationship on the data model.

Step 3: Context propagates for ALL rows in fact table at once

FactInternetSales[CalculatedColumn1]=
CALCULATE( COUNTROWS( DimDate ), FactInternetSales )

Internally, DAX generates an extended table using left outer join logic to propagate context. Given the setFilter parameter of the CALCULATE function operates on a filter context including all rows from the fact table, the resulting output is the total count of rows on the date table that have registered internet sales.  Non-intuitively, context transition does not occurs on a per row basis, even though we use CALCULATE.

image

Step 3a (test)-  Context propagates for one table, but still under active relationship:

FactInternetSales[CalculatedColumn1]=
CALCULATE( VALUES( DimDate[FullDateAlternateKey] ), FactInternetSales[ShipDateKey]  )

One may be inclined to think that using the key column of the inactive relationship as the setFilter parameter of CALCULATE will force the context propagation to retrieve the value used by the inactive relationship.  However, this is not the case and even though DAX retrieves only one value it is still the one associated with the active relationship given the internal extended table was already resolved (OrderDate)

image 

Step 3b (test)– CALCUALTETABLE returns standard context transition behavior

FactInternetSales[CalculatedColumn1]=
CALCULATE( COUNTROWS( DimDate ), CALCULATETABLE(FactInternetSales ) )

image

Wrapping CALCULATETABLE around the fact table used as setFilter parameter allows row context transition to happen again using the active relationship.

Step 4 – An outer CALCULATE propagates context using the inactive relationship, yet it collides with the current context transition using the active relationship

FactInternetSales[CalculatedColumn1]=
CALCULATE(
    CALCULATE(
        COUNTROWS( DimDate ), 
        FactInternetSales ),
    USERELATIONSHIP( DimDate[DateKey], FactInternetSales[ShipDateKey] )
)

image

If we take a peek at the DAX query plan, we understand the reason there is no output here.  Profiler trace shows 2 Vertipaq storage engine scans using different join conditions:

SELECT
[DimDate].[DateKey], […]
FROM [FactInternetSales]
    LEFT OUTER JOIN [DimDate] ON [FactInternetSales].[OrderDateKey]=[DimDate].[DateKey]

and

SELECT
[…]
FROM [FactInternetSales]
    LEFT OUTER JOIN [DimDate] ON [FactInternetSales].[ShipDateKey]=[DimDate].[DateKey]
WHERE
    ([DimDate].[DateKey], [DimDate].[FullDateAlternateKey], […]) IN {(20080421, 39559.000000, …[60398 total tuples, not all displayed]}

 

When the formula engine attempts to tie the results of the queries, the resulting context is empty as no date has the same shipdate and orderdate, making the two scan results incompatible.

Step 5 – Clearing the context on the active relationship key

FactInternetSales[CalculatedColumn1]=
CALCULATE(
    CALCULATE(
        COUNTROWS( DimDate ), 
        FactInternetSales ),
    USERELATIONSHIP( DimDate[DateKey], FactInternetSales[ShipDateKey] ) , ALL ( DimDate)
)

 

image

By the use of ALL, this time we force the query plan to remove the scan using a left outer join condition on the Order Date Key (the active relationship).  As the only other join condition that remains uses the Ship Date Key, this is the one that survives. Now we see the value returned is in fact using the inactive relationship (matching Ship Date).

 

Alternate Approaches for non-exact matches

As mentioned by Marco, use LOOKUPVALUE is your intent is to retrieve one value based on an exact matching condition.  In some case, though, you may want to use approximate matches – particularly when creating calculated columns as intermediate steps to measures, a common practice described on my prior post.  In this case, the approach using CALCULATE is still useful.

Two alternate solutions using a slightly more legible approach could be:

1) Using CALCULATETABLE

FactInternetSales[CalculatedColumn1]=
CALCULATE(
    VALUES(  DimDate[FullDateAlternateKey]  ),
    CALCULATETABLE( FactInternetSales,
                    USERELATIONSHIP( DimDate[DateKey], FactInternetSales[ShipDateKey] )  ,
                    ALL( DimDate ) ,
                    DimDate[CalendarYear] > 2007
                )
        )

By moving the USERELATIONSHIP to an inner CALCULATETABLE expression, it could be directly apparent we are manufacturing a table context of our own choosing and setting it as a filter parameter to the outer calculate. The third parameter of the CALCULATETABLE uses a condition not based on exact matches.

2) Using FILTER

FactInternetSales[CalculatedColumn1]=
CALCULATE(
    VALUES( DimDate[FullDateAlternateKey] ),
    FILTER(
        DimDate,
        FactInternetSales[ShipDateKey] = DimDate[DateKey] &&
        DimDate[CalendarYear] > 2007
    )
)

Interestingly, the calculation using FILTER appears to use a very efficient query plan with no extended tables generated and represented in the form Vertipaq SE scans using left outer joins. Given the setFilter parameter uses DimDate instead of FactInternetSales, there is no need to build extended tables to propagate context and the 3 scan are direct queries to tables in the data model:

SELECT
[FactInternetSales].[ShipDateKey]
FROM [FactInternetSales]

+

SELECT
[DimDate].[DateKey], [DimDate].[CalendarYear]
FROM [DimDate]

+

SELECT
[DimDate].[DateKey], [DimDate].[FullDateAlternateKey]
FROM [DimDate]
WHERE
    [DimDate].[DateKey] IN (20080105, 20080219, …[220 total values, not all displayed])

I am curious as to why the query plan did not register, on this last expression, the FILTER condition of ‘greater than’ (as it does when using the CALCULATETABLE expression).  However, due to a simpler query plan, I wonder if it could be possible the FILTER method would perform better than the CALCULATE method using USERELATIONSHIP even in cases in which only the exact match condition is needed.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: