Running Total Techniques in DAX

Running Total (whether sum, average or any other aggregate) is one of the most common calculations used when analyzing business as it offers insights in relation to data trends. This article will describe some of the most common scenarios when developing running total calculations.

Note: The following examples use data from the PowerPivot model ‘Contoso Sample DAX Formulas’ available for download here.

Scenario # 1:  Year to Date (as a measure)

Tabular models offers built in DAX calculations that serve as shortcut for generating running total across a time dimension.

For example, the following calculation computes the YTD running total for any given expression over any particular year:

TOTALYTD( SUM( FactSales[SalesAmount] ) , DimDate[Datekey] )

image

Notice that the running total aggregate automatically resets itself at the beginning of each year.  In reality, this calculation is a less verbose variation of the a fundamental expression using CALCULATE:

CALCULATE( SUM( FactSales[SalesAmount] ), DATESYTD( DimDate[Datekey] ) )

Using CALCULATE has the added advantage that you can further modify the current filters in context when generating the running total calculation.  For example, the following DAX measure will compute the running total including only for products which had a price of under $5.00.

CALCULATE(
                      SUM( FactSales[SalesAmount] ),
                      DATESYTD( DimDate[Datekey] ),
                      DimProduct[UnitPrice] < 5
)

image

Scenario # 2:  Year to Date (as a Calculated Column)

As a calculated column over a table displaying aggregate values at the monthly level, we cannot leverage either TOTALYTD nor DATESYTD functions directly (as these require a column of dates). However, we can leverage the EARLIER function when comparing the current row in context against all other rows in the table: (This table is called FactMonthlySales)

CALCULATE(
    SUM( FactMonthlySales[Sales Amount] ),
    FILTER(
        ALLEXCEPT(
            FactMonthlySales,
            FactMonthlySales[Calendar Year]
        ) ,
    FactMonthlySales[Month] <= EARLIER( FactMonthlySales[Month] )
    )
)

image

Notice the Month has been prefixed with the Month Number.  This is necessary as the expression relies on these numbers when performing the comparison necessary to generate the running total. 

Scenario # 3:  Running Total Since Inception (as a Measure)

Another common calculation is the running sum since the beginning of the dataset.  Here is the DAX formula:

CALCULATE(     SUM( FactSales[SalesAmount] ),
    FILTER(
        ALL( DimDate) ,
        DimDate[Datekey] <= MAX( DimDate[Datekey] )
    )
)

Notice we should use MAX instead of a more intuitive LASTEDATE function.  This is because in this context, LASTDATE would evaluate to the last date of the entire dataset, instead of the last date of any given month.

The running total output can then be sliced by different categories to generate interesting comparisons. Below is the running sales amount total comparing orders associated to promotions of any type versus orders with no promotion.

image

Notice how the gap widens over time:  our promotion dollars are having a tangible impact in sales.

Notice also that, unlike traditional Excel formulas, the calculation does not need to change in order to arrive to this output:  We can keep slicing data by different attributes and the calculation will automatically recalculate based on the new filters (by territory, by product type, etc). This is what makes DAX formulas portable – an property that is not new for Business Intelligence professionals but can be very enlightening to Excel pros.

Scenario # 4: Running Total for Selected Years (as a Measure)

Your analysis may require a running total but only for selected years (or any other user selected date attribute). In this case the expression would be a very similar one:

CALCULATE( SUM( FactSales[SalesAmount])  ,
             FILTER(
                ALLSELECTED( DimDate),
                DimDate[Datekey] <= MAX( DimDate[Datekey] )
            )
)

By using the ALLSELECTED() function, the computation now generates a running total only for the years defined by the user:

image

Scenario # 5:  Running Total across all years (as a Measure)

A useful analysis when comparing trends for multiple years is to benchmark any given year against the average across all years. If we try using built-in YTD functions, however, the grand total will only reflect the most current year which is not what we need in this case:

image

Instead, we need to create a DAX formula that will compute – for the grand total – at the year level of granularity.  Only then we can achieve a real running average across all years:

[Rolling Total across All Years]:
=CALCULATE(SUM( FactSales[SalesAmount] )  ,
FILTER(
    SUMMARIZE(
                    all(DimDate),
                    DimDate[Calendar Month],                    
                    "date anchor", date("1999",left(DimDate[Calendar Month],2), "01")
                ), countrows ( filter (
    SUMMARIZE(
                    DimDate,
                    DimDate[Calendar Month],                    
                    "date anchor", date("1999",left(DimDate[Calendar Month],2), "01")
                ), earlier([date anchor]) <= [date anchor] ) ) )
)

This DAX measure uses a calculated table summarized at the monthly level. It is then able to execute this calculation for all the years in context:

image

The expression relies on a pre-existing column (DimDate[Calendar Month) that appends the month number to the month name (like the one used in scenario # 2), with format: “01 – January”.  This can be easily achieved in a calculated column if not present in the underlying data source.

Finally, we must divide the grand total by the year count in order to generate a valid running average.  For denominator, we can use the following DAX expression:

[Year Count]:
=COUNTROWS(
    FILTER(
        CROSSJOIN(
            VALUES(DimDate[CalendarYear] ) ,
            VALUES( DimDate[CalendarMonthLabel] )
        ) ,
    SUM(FactSales[SalesAmount])  <> BLANK() 
    )
)

With this formula as denominator, we can now generate a grand total row we can use to benchmark all years against:

[Rolling Total]:=[Rolling Total Across All Years] / [Year Count]

image

 

This allows for comparison and identification of yearly trends against overall average values, helping to narrow time periods in which performance was particularly good or bad:

image

Running totals are always interesting metrics in analytical reporting, and DAX allows for great degree of manipulation of calculation context which enables very custom aggregates over a Time dimension.

Profiling data in PowerPivot: comparing two tables

On a prior post, I described the process in which one could use PowerPivot to find duplicates when exploring a dataset.  Another common scenario when first diving into new data is finding common records across two different tables.   

Level: Easy

Problem:  You have two datasets and want to compare attribute values across them.  In concrete terms, you want to understand if a particular key value is present in the other dataset or not.

Relevancy: Comparing datasets is a common issue when modeling solutions in PowerPivot or BISM Tabular, as you must determine which table can and should be used as a lookup in your model.  For lookup tables to be valid dimensions, they must contain all members across tables and only contain unique values on the key column. The process of comparing tables can assist you with selecting the appropriate table to use as lookup.

Scenario: The following tables where sent to you as lists of orders. You want to know which table includes all orders.

image

Solution

One important test is to verify how many unique records are on each table.  This will allow us to quickly understand if there are duplicates on either one.

For this, two DAX measures can be used (two for each table):

CountRowsTable1:=COUNTROWS(Table1)

DistinctCountOrdersTable1:=DISTINCTCOUNT(Table1[Orders])

CountRowsTable2:=COUNTROWS(Table2)

DistinctCountOrdersTable2:=DISTINCTCOUNT(Table2[Orders])

Here are the results:

Table1
image

 

Table2
image

 

We now see there appears to be a duplicate value on Table1. We can now use two calculated columns (one each table) to further investigate:

TABLE 1

=CALCULATE( COUNTROWS( Table2 ),

               FILTER( Table2, Table2[Orders] = Table1[Orders] ) )

image

 

TABLE 2

=CALCULATE( COUNTROWS( Table1 ),

               FILTER( Table1, Table1[Orders] = Table2[Orders] ) )

 image

This calculation counts the rows on the other table, but only those rows that survived the filter applied:  an equality condition based on the Order number in the current row context.

With these results we can now see all values on Table1 are represented on Table2.  However, looking at the results on Table2 we see one value is not represented on Table1: Order # 100.  Not only that, but Order # 116 appears twice.

If we switch back to Table1 in the PowerPivot window, we can confirm this by filtering the column:

image

With this knowledge we can now be certain to select Table2 as our lookup (dimension equivalent) in the model, as it is the only one that has a comprehensive list of all order values and only unique entries.

In case neither table had a full list or order values, other techniques like the ones described here, here or here could be the applied to the same data in order to develop a valid PowerPivot (or BISM Tabular) lookup table, and in turn, a maintainable model.