Calculating Median in PowerPivot / DAX

Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality.   There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column. 

His calculation is as follows:

MINX( FILTER( VALUES( People[Age] ),
              CALCULATE( COUNTROWS( People ),
                         People[Age] <= EARLIER( People[Age] ) ) 
              > COUNTROWS( People ) / 2 ),
      People[Age] )

Recently I tried to adapt this calculation to a PowerPivot application I was working on.  I ran into two issues:

1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column?  In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters.  Due to this, the calculation above won’t work

2) I realized there should be a separate logic depending on whether the set of values is even or odd.   For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.

Fortunately, it is fairly easy to deal with both issues.  With Alberto Ferrari’s help, I implemented the calculation below.  There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle.  I then created an alternate execution path for even sets, which simply averages the two values in the middle.

To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):

image

Step 1

Create a measure.  This is to test that we can calculate a median over a column or measure using the same method.

[Test Measure] =SUM([Amount]) / 2

image

Step 2

Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)

[Index] =IF (
    COUNTROWS ( VALUES ( Table1[Month] ) ) = 1,
    COUNTROWS (
        FILTER (
            ALL ( Table1[Month] ),
            [Test Measure] <= CALCULATE (
               [Test Measure], Table1[Month] = VALUES ( Table1[Month] )
            )
        )
    )
)

image

Step 3

Create median expression, reflecting the logic discussed above:

[Median] =IF( COUNTROWS( VALUES( Table1[Month] ) ) > 1,
     IF( MOD( COUNTROWS( ALL( Table1[Month] ) ) ,2 ) = 0,
          (
            CALCULATE( MAXX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] <= ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            ) +
            CALCULATE( MINX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] > ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            )
          ) / 2
  ,
          CALCULATE(
           [Test Measure],
            FILTER(
             ALL(Table1[Month]),
             [Index] = ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
            )
          )
        )
    )

The calculation first uses the MOD function to determine if the set is even (which is the case in our example).  It then gets the two values in the middle by:

1) Filtering the months for all that have an index of less or equal to the count of months divided by two.  Take the maximum value of [Test Measure] for this set.

2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set

3) add these values and divide by two (simple average)

image

The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets.    In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.

Advertisements

3 Responses to Calculating Median in PowerPivot / DAX

  1. Marco Russo says:

    Very nice calculation – it will be useful whenever you cannot create a calculated column, for example by connecting to BISM Tabular. Of course, not a solution for big datasets – do you have some numbers about when the calculation starts to be “visible” (i.e. more than one second)?

    • javierguillen says:

      Hi Marco

      I don’t have those numbers yet, but I hope to do some testing on that. It appears we may be able to gain some efficiencies in the calculation by replacing the [Index] measure with the new RankX function in Denali… I will post an entry about it soon.

  2. Pingback: Calculate the median days between events

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: