Calculating Median in PowerPivot / DAX
September 13, 2011 3 Comments
Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality. There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column.
His calculation is as follows:
MINX( FILTER( VALUES( People[Age] ),
CALCULATE( COUNTROWS( People ),
People[Age] <= EARLIER( People[Age] ) )
> COUNTROWS( People ) / 2 ),
Recently I tried to adapt this calculation to a PowerPivot application I was working on. I ran into two issues:
1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column? In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters. Due to this, the calculation above won’t work
2) I realized there should be a separate logic depending on whether the set of values is even or odd. For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.
Fortunately, it is fairly easy to deal with both issues. With Alberto Ferrari’s help, I implemented the calculation below. There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle. I then created an alternate execution path for even sets, which simply averages the two values in the middle.
To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):
Create a measure. This is to test that we can calculate a median over a column or measure using the same method.
[Test Measure] =SUM([Amount]) / 2
Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)
[Index] =IF (
COUNTROWS ( VALUES ( Table1[Month] ) ) = 1,
ALL ( Table1[Month] ),
[Test Measure] <= CALCULATE (
[Test Measure], Table1[Month] = VALUES ( Table1[Month] )
Create median expression, reflecting the logic discussed above:
[Median] =IF( COUNTROWS( VALUES( Table1[Month] ) ) > 1,
IF( MOD( COUNTROWS( ALL( Table1[Month] ) ) ,2 ) = 0,
CALCULATE( MAXX( Table1, [Test Measure] ),
[Index] <= ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
CALCULATE( MINX( Table1, [Test Measure] ),
[Index] > ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
) / 2
[Index] = ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
The calculation first uses the MOD function to determine if the set is even (which is the case in our example). It then gets the two values in the middle by:
1) Filtering the months for all that have an index of less or equal to the count of months divided by two. Take the maximum value of [Test Measure] for this set.
2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set
3) add these values and divide by two (simple average)
The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets. In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.