What-if analysis using the right level of granularity in PowerPivot

PowerPivot offers a big advantage when conducting  ‘what-if’ scenario analysis:  the ability to specify ad hoc values in a report and use them – in conjunction with the existing data – to generate personalized outputs without ever having to leave the workbook.  (As Cathy Dumas pointed out in the comments section of this entry, SSAS Tabular models can also include ‘ad hoc’ data by giving the ability to directly paste it into the model;    PowerPivot, interestingly, makes the job even easier by the use of ‘linked tables’).

As an example, look at the following data model (based on the Adventure Works database):

image

Imagine that we wanted to motivate our sales force to increase orders by giving them a special bonus computed based on the average monthly sales.  We are investigating what the actual dollar pay-out will be, at different rates (I am keeping the scenario very simple)

So we can just type in Excel the different percentages we are exploring.  Something like:

image

At this point, it is necessary to import these values into PowerPivot, by using the ‘Create Linked table’ option. This new table will have no relationships to the other existing ones in the PowerPivot data model: and we want to keep that configuration, as there are no common keys with the other tables.

We can grab the Pay-out Rate value selected by the user using the VALUES function:

User Selected Rate:=VALUES(‘Pay-out Rate'[Pay-out Rate])

image

We can include this value against the sales amount to compute the bonus pay-out:

Bonus:

=IF(HASONEVALUE(‘Pay-out Rate'[Pay-out Rate]) &&

        NOT(HASONEVALUE(DimDate[EnglishMonthName])) ,

              AVERAGE( FactResellerSales[SalesAmount] ) *

              VALUES(‘Pay-out Rate'[Pay-out Rate])

)

image

Before we starting complaining how cheap management is, lets examine why this initial calculation is incorrect. Even thought the calculation changes based on user selection,  as you may recall, the bonus should be based of the monthly averages.  Our fact table stores data at the daily sales by product granularity. The calculation is incorrect as the average is generated at this level (by product by day);  The correct approach, instead, should not take into account product in the computation – it should only compute the monthly average and then apply the rate selected by the user. The sum of each output at the monthly level will determine the year’s bonus.

With DAX, this can be achieved by leveraging the power of the SUMMARIZE function.  By generating an on-the-fly table of the distinct months that are currently in context, we can then average data for those months and then apply the given rate.

Bonus – Correct:=IF(

    HASONEVALUE(‘Pay-out Rate'[Pay-out Rate]) &&

    NOT(HASONEVALUE(DimDate[EnglishMonthName]) ) ,

             SUMX(

                     SUMMARIZE(

                            DimDate,

                            DimDate[EnglishMonthName],

                            “Sales”,

                                     SUM(FactResellerSales[SalesAmount] ) /

                                     COUNTROWS( VALUES (DimDate[DateKey] ) ) ),

        [Sales] * VALUES(‘Pay-out Rate'[Pay-out Rate] )

    )

)

image

Again, we ensure the calculation is correct by ignoring the actual granularity that exists in the fact table; instead we compute the monthly average as the sum of amount over the day count, after which we include the user selected ‘scenario’ rate in the computation.  The power of this technique is that we can affect the data at the appropriate level, and then let it roll-up to higher levels.

If more scenarios need to be added, it is just a matter of typing the new value in the spreadsheet table, re-opening the PowerPivot window (which will automatically import the new values) and hitting refresh in the ‘Data’ tab in Excel.  The new ‘scenario’ rate will now be available for reporting:

image

The combination of these fairly simple techniques can be useful to generate very powerful, scenario driven what-if analytics. This is what self-service BI is all about! Smile

Advertisements

OFFSET double lookup in PowerPivot

One of those really great functions in the native Excel environment is OFFSET.  In combination with other functions like MATCH, it can generate a type of ‘double lookup’ that can be really helpful when creating dynamic reports.  For example, the table below describes a matrix defining a coordinate of units for each combination of ‘Band’ and ‘Group’.

image

We want to use this table as a lookup reference when generating matches for the following values:

image

Take for example, the first row.   We first look at the ‘Band’ value which is B in this case and find it in our lookup table.  Then we go down the rows and, using an approximate match logic, find the group for which the value of 120 belongs to.   That would be Group 3 (which has unit values going from 70 to 129).

In Excel, we can use a formula like the one below to generate this ‘double lookup’ calculation:

VLOOKUP(A11,

OFFSET($A$1,1,MATCH(B11,$A$1:$D$1)-1,4, 4),(4 – MATCH(B11,$A$1:$C$1)) + 1,TRUE)

image

This formula finds the appropriate columns first (column B) and then uses an approximate match VLOOKUP to find the row (#4) and column ( D ) to retrieve the right group.  The final result is:

image

Pretty powerful.

If we are analyzing data with PowerPivot, we want to be able to generate the same effect within the PowerPivot itself without having to drop all data into Excel and do the OFFSET lookup there. This could be because we are scanning millions of rows or simply because we want to keep all calculations in the same place (the PowerPivot database)

In order to do this we need to first understand one thing: we must unpivot the lookup table so we can successfully generate the lookup in DAX.   Unlike what you can do with the OFFSET function – in which you can specify a column / row coordinate dynamically – in DAX we will be only filtering rows.

If our data is in a database like SQL Server, we can do this by executing the following T-SQL expression:

SELECT Band, [Group], Units FROM

    (SELECT BandA, BandB, BandC, [Group] FROM Table1) t

UNPIVOT

    (Units FOR Band IN (BandA, BandB, BandC)) as unpvt

ORDER BY Band, [Group], Units

The result will be the following:

image

In the case you are relying on data which resides only on your spreadsheet, you can use the ‘Multiple Consolidation Ranges’ wizard to generate the same effect.  This wizard is found by going to File –> Options –> Customize Ribbon.  On the ‘Choose Commands From’ dropdown, select ‘All Commands’ and then find and add to your ribbon the ‘Pivot Table and Pivot Chart Wizard’.  Finally, click on the wizard button we just added to the ribbon.  You should see the following window:

image

Notice that I moved the ‘Group’ column to the left. This is because the wizard expects the categories on the first column. Select ‘Multiple consolidation ranges’ and follow the instructions.  Once complete, you will get a pivot table using the data from our lookup table. Move the ‘Row’ and ‘Column’ fields to the row labels.  Finally, flatten the table out (using the PivotTable Options context menu) and you will end up with the following result:

image

Now you are ready to import the unpivoted lookup data into the PowerPivot window.  Be sure to change the column names to something more meaningful than ‘Column’, ‘Row’ and ‘Total’.  In my case, I have called them ‘Band’, Group’ and ‘Units’.

Finally, import the items that we are going to be using against the lookup (2nd screenshot on this blog entry) – I called it ‘Data’. Notice that this is yet another scenario in which there is no actual relationships between the tables in the data model. All the work will be done using a DAX expression:

DoubleLookup:=IF ( HASONEVALUE( Data[Band] )  ,

CALCULATE(

        LASTNONBLANK( Lookup[Group], 1) ,                   

                       FILTER( Lookup,

                        Lookup[Band] = VALUES(Data[Band] ) &&                     

                        Lookup[Units] <  VALUES( Data[Units] )

                    ) 

            )

)

Notice the expression is very readable and shows how a fairly simple DAX formula can have a powerful dynamic effect. LASTNONBLANK selects the last [Group] value on a filter context for the current Band where the lookup units are less than the current unit value. The output matches the more complex expression we defined using OFFSET, MATCH and VLOOKUP:

image

The formula works even if we have repeated ‘Band’ values.  In that case, I suggest adding an identity column to the table in order to break down the resulting value accordingly –

image

  Output:

image

This post has been featured on Excel’s MVP Bill Jelen “VLOOKUP Week” (March 25 – 31, 2012 ) which groups podcasts and blogs describing the use of this and other interesting Excel-related lookup functions and techniques. For more information, visit Microsoft’s Excel team blog @ http://tinyurl.com/c2aw89y , or directly access VLOOKUP’s week website @ http://vlookupweek.wordpress.com/

Detecting Total, Subtotal and Hierarchy Levels in PowerPivot

Recently I participated on a thread on the PowerPivot MSDN forums in which it was asked how to detect the level you are on in the pivot table hierarchy in order to conditionally handle the output of a measure. As I think this is a fairly simple topic but one that can be a bit confusing, I hope this post can help others dealing with the very same issue. Thanks to SQL Server MVP Frederik Vandeputte (twitter) for the topic! 🙂

In general, it has become a standard practice to use the following expression if you want to avoid computing it at the ‘Grand Total’ level:

IF( COUNTROWS( VALUES( Table1[Column] ) ) = 1, <expression> )

For example, if you have a measure that should not be aggregated as in the table below:

image

You could create the following DAX measure to display the value for each product, and avoid any output at the total level:

CalcAtProductLevelOnly:=
IF(
COUNTROWS(VALUES(Table2[Products]))=1,
VALUES(Table2[Size])
)

image

As you can see, the pivot table ignores the grand total even if it was configured to show one.  With PowerPivot 2.0 (Currently in RC0), you can use the HASONEVALUE function to make the expression even more readable:

CalcAtProductLevelOnly:=
IF(
HASONEVALUE( Table2[Products] ) ,
VALUES( Table2[Size] )
)

The output will be the same in both expressions.   The idea here is that we are checking the current filter context for how many Products are currently active in the cell that is evaluating the expression.  It will only yield an output for those pivot table cells that have one and only one Product in context.   From here, you can see why the Grand Total is ignored:  it deals with more than one product.

This is all well known.  The question is, can we apply the same technique to detect levels of a hierarchy?  Here you must remember that hierarchies are ‘display-only’ groupings in PowerPivot (and SSAS BISM Tabular). Even with the first release of PowerPivot, row labels are grouped in a ‘hierarchical’ way based on the order of attributes you place on the row labels:

image

If we follow the logic we used from Grand Totals, we can infer that a subcategory can have more than one product. In a similar way, we are aware that a category can have more than one subcategory.   Here is the dataset:

image

Following this logic, we can create a DAX measure to help us determine which level we are dealing with – product, subcategory or category:

IF(
COUNTROWS(VALUES(Table1[Products]))=1,”Product”,
IF(
COUNTROWS(VALUES(Table1[Subcategories])) = 1,”SubCategory”,”Category”
)
)

Unfortunately, this doesn’t give the correct result:

image

For Category A / SubCategory A the calculation works without a problem as there are two products (which allows us to detect the SubCategory level) and two SubCategories (which allow us to detect the Category level).

However, for Category B there is only one Product.  As the expression detects a level based on count of attribute values below the current location in the display hierarchy, it has no way to know which level is it on in this case.

So how can we fix it?  Fortunately, this is once again a situation in which the ability to manipulate context comes handy. I thank my good friend and colleague Jason Thomas (blog | twitter ) who helped me think of a way to deal with these kind of scenarios.

If we use the following DAX measure, we get the correct values:

IF(
CALCULATE(COUNTROWS(VALUES(Table1[Products])),
ALLEXCEPT(Table1, Table1[Products])) = 1,
“Product”,
IF(
CALCULATE(COUNTROWS(VALUES(Table1[Subcategories])),
ALLEXCEPT(Table1, Table1[Subcategories])) = 1,
“SubCategory”,
“Category”
)
)

image

The inner CALCULATE clears the filter context on all columns except subcategory.  As such, it doesn’t limit itself to count the number of subcategories for the current category; instead it counts all subcategories across all categories.   In this way, even if a category only has one subcategory (like CategoryB in the example above) it won’t be a problem as the count will rely on the subcategories across the entire table.  The same logic is used by the outer CALCULATE when determining if the current level is as product or not.

The only situation in which this wont be the case would be, for example, if there is only row in the table showing one product, one subcategory and one category.  It is unlikely this type of situation will affect most reporting scenarios, but you should be aware of this limitation.

Also, is it interesting to notice that the Grand Total is detected as being on the ‘Category’ level.  This, again, shouldn’t be a problem as normally you would want your most encompassing level in the Total row.  And here once again you can change the behavior by combining the expression used with the technique we discussed at the beginning of this blog entry.

Finally, notice that by detecting the current level we will be able to assign a specific subtotal or total calculation which depends on the level which is active in the filter context.  For example, we can add another DAX measure like the one below which leverages the expression we used to generate conditional outputs:

ConditionalLevelOutput:=
IF([CurrentLevel] <> “Product”,
SUM( Table1[Amount] ) * 0.9,
SUM(Table1[Amount])
)

On it, the category and subcategory levels will be discounted by 10% due to specific reporting conditions (perhaps a wholesale promotion, etc.)

UPDATE: Alberto Ferrari pointed out to the ability to use a simpler syntax in DAX 2.0 leveraging the ISFILTERED function. 

Interestingly, when using this new function in DAX 2.0 we avoid the issue described earlier in this post in which a table with a single row will give the incorrect level output.  First, lets review the syntax using ISFILTERED:

IF(
    ISFILTERED(Table2[Products]),”Product”, IF(
        ISFILTERED(Table2[Subcategories]), “Subcategory”, IF(
            ISFILTERED(Table2[Categories]), “Category”, “Category” )
        )
    )

As you can see, the formula is much readable now. Now, lets see what happens when we compare both expressions (the one using ALLEXCEPT and the one using ISFILTERED):

image

Using ISFILTERED yields the expected output in a table with one row whereas the expression using ALLEXCEPT doesn’t.   This is because we are evaluating the current level in a different way:  ISFILTERED is able to detect direct filters which will yield a more accurate result.

Consequently, if you have the opportunity to use the latest version of PowerPivot (Currently in RC0), this last calculation is definitely the way to go.  Thanks Alberto for the suggestion.

Simulating an "approximate match" VLOOKUP in PowerPivot

It is often said that relationships in PowerPivot work in a similar way to VLOOKUPs.  In reality, this is only partially true. If we examine the claim a bit closer we realize that Excel’s VLOOKUP function has a parameter in which it is possible to use an approximation when matching values against the lookup table:

image

Regular PowerPivot relationships emulate the ‘exact match’ option of the Excel VLOOKUP function And although PowerPivot relationships are a lot more powerful, being able to retrieve lookup values over huge tables at amazing speed, we are still left with the question: how can we achieve same type of behavior used by the ‘approximate matches’ option of the VLOOKUP function? Lets walk through the process.

Imagine we have the following tables:

image

As you can see, the table at the left is the one that will be executing the lookup call against that table on the right.   The first thing we need to ensure is that there are no relationships among those tables in the PowerPivot model.   That’s right, we will be generating the entire lookup as a DAX calculation.  In other words, we will be generating a calculated relationship.

Lets first examine what Excel yields when using the VLOOKUP function with approximate matches:

=VLOOKUP([@TableA],Table2[TableB],1,TRUE)

image

If an exact match is not found, VLOOKUP returns the next largest value that is less than the lookup value.

In PowerPivot, we would generate the same effect with this calculation:

If (
Hasonevalue ( TableA ),
Calculate (
Max ( TableB[TableB] ),
Filter ( TableB, TableB[TableB] <= Values ( TableA[TableA] ) )
)
)

The first condition (HASONEVALUE) forces the computation to be executed only when there is one and only one value on TableA (in other words, it avoids calculating for the grand total row if there was one as this would be meaningless to do so).

The actual computation is done via the CALCULATE expression.  As you can see, it takes the maximum value of TableB in a context modified by the second parameter.  As there is no relationship between TableA and TableB, context is not propagated and the calculation uses all rows in TableB.   The FILTER function then filters the table and returns only those rows which are less than or equal to the current value in TableA.

The result is this:

image

As you can see, we were able to match the output we got with VLOOKUP.   But what will happen if we use are trying to match text values instead of numbers?

image

We get the following error:

Calculation error in measure ‘TableA'[Measure 3]: The function MAX takes an argument that evaluates to numbers or dates and cannot work with values of type String.

A solution is found by using the LASTNONBLANK function:

If (
Hasonevalue( TableA_Text ),
Calculate (
LASTNONBLANK ( TableB_Text[TableB], 1 ),
Filter ( TableB_Text, TableB_Text[TableB] <= Values ( TableA_Text[TableA] ) )
)
)

image

LASTNONBLANK requires two parameters, and we can use the first one for the column for which we want to get the last value in the lookup table.  The second parameter must be an expression that evaluates for blanks;  by using ‘1’  here we retrieve all rows on that table but that is OK as the table is being filtered by the second parameter of the CALCULATE function.  In other words we truly get the last value in context which is exactly what we need.

This last calculation can be used with text values, but it will also work with numbers or dates, and it is a great pattern when simulating VLOOKUPs with approximate matches.

image

This post has been featured on Excel’s MVP Bill Jelen “VLOOKUP Week” (March 25  – 31, 2012 ) which groups podcasts and blogs describing the use of this and other interesting Excel-related lookup functions and techniques.  For more information, visit Microsoft’s Excel team blog @ http://tinyurl.com/c2aw89y , or directly access VLOOKUP’s week website @ http://vlookupweek.wordpress.com/

Working with relationships in PowerPivot & SSAS BISM Tabular

Relationships in PowerPivot and SQL Server Analysis Services BISM Tabular can be a bit tricky.  It is easy to start with a few simple ones only to end up stuck at some point trying to figure out why table values are not rolling up across relationships as you would expect.

Particularly with PowerPivot,  it is easy to get the feeling you can just throw some tables are it, connect them and you should be ready to go.

Not so fast, though. There are specific rules to be followed when establishing valid relationships in PowerPivot and SSAS BISM Tabular. Understanding them is very important in order to get the most of these great analytical tools. Here is a list of the most important points to keep in mind:

1. Relationships must be based on a valid lookup.

When you relate two tables, one must be the lookup for the other one. This is really another way of referring to a dimensional structure. For example:

image

These two tables can be easily related in the model. PowerPivot will not complain at all, as the lookup table on the right contains unique entries for key column used to define the relationship (Currency).

In the PowerPivot Window diagram, we can visualize this relationship as follows:

clip_image004

If you think about it this rule matches the way traditional UDM/OLAP cubes work: they also expect dimension tables to have unique key values. And this is really another way of saying: PowerPivot and BISM Tabular love star schemas. However, a star schema is not required and this technology opens the path to more flexible data modeling structures as cleverly pointed out by Marco Russo on his presentation on BISM Tabular data modeling: http://sqlbits.com/Sessions/event9/Vertipaq_vs_OLAP_Change_Your_Data_Modeling_Approach

2. Context is automatically propagated from the lookup table to the base table.

I have seen plenty of people getting stuck on this one, particularly when using PowerPivot for self service Business Intelligence.

Suppose you have the following tables:

clip_image006

As you can see, all currencies have a unique entry on each of the tables. So if you establish a relationship among them by using currency as the key, who is the lookup of who? PowerPivot will not complain when you create such a relationship.

However, if we try to pivot the data, we will see the following:

clip_image008

The results show numbers that are not being properly sliced when involving both tables. PowerPivot even complains by saying there is a missing relationship even though we have one! Notice in the field list that we selected the Currency from Table1 and Amount from Table2, so let’s try reversing this now and select the Currency from Table2 and Amount from Table1. We get the following:

clip_image010

This time the numbers are sliced appropriately! Why? Let’s examine the relationships again.

clip_image012

Table2 is the lookup table. As such, context is automatically propagated from it to the base table (Table1 in this case). This means that if I use Table1 as my categories or ‘dimension’ and Table2 as my metrics or ‘measures’ then relationships work fine without extra work on my side, and reports involving both tables will slice and dice as expected.

However, if I reverse this and use Table1 for my ‘measures’ and Table2 as for my categories (dimension) then context does not automatically propagate from one to the other one, which results in the pivot table giving the total amount for all currencies – and this is repeated for each currency in the row labels. In other words, it cannot grasp what the amount is for USD or for JPY or any other currency. It behaves in a similar way to a UDM cube when there is no specified dimension usage between a measure group and a dimension – it tries to evaluate context by using the [All] member if there is one.

In this model I have chosen to define Table2 as the lookup table but I could have defined it the other way, with Table1 being the lookup. As there are unique values on the key column, either relationship is possible.  However, context will still only automatically propagate in one way: from the lookup table to the base table.

You can force the context to propagate in reverse, by using DAX (see point #5 below).

3. Context can propagate many through many levels

PowerPivot and BISM Tabular can handle snowflake schema relationships without a problem. Given the one-directional automatic context propagation described above, you can retrieve the value on one base table from a lookup table that is two or more levels away from it. For example:

clip_image014

On this snowflake schema, the table on the left contains the measures (Amount) and is the ‘base table’. The other three tables on the right have a descriptions for product, subcategory and category and they are all ‘lookup tables’.  In fact each table is a lookup to the next one (subcategory is a lookup to product, and category is a lookup to subcategory).

However, the Product table is the only one that has a direct relationship with the base table (FactInternetSales). And without us having to manipulate context, we can easily reach the “English Product Category Name”  from the base  table on the left all the way to the product category table all the way on the right, without a problem at all.  For this, we use the following DAX as a calculated column on the base table.

RELATED(DimProductCategory[EnglishProductCategoryName])

This is what the result looks like from the FactInternetSales table. The column highlighted was able to jump across two relationships, reaching a table three levels away:

clip_image016

These calculations are normally used to denormalize the model to – among other things – achieve a more user friendly experience as they browse the model from a client tool (Excel in the case of PowerPivot)

4. Relationships in the tabular model can be calculated.

Unlike traditional OLAP/UDM cubes, relationships in PowerPivot and BISM Tabular can follow different paths at calculation runtime which can accommodate the equivalent of different versions of the data model based on calculation needs. For example, look at this data model diagram taken from a PowerPivot application:

clip_image018

As you can see there are three relationships between DimDate and FactResellerSales. This does not work exactly in the same way traditional SSAS role playing dimensions work; in PowerPivot and BISM Tabular, relationships can be active or inactive.

In a data model like the one above, a relationship path can be activated on the fly through the ‘UseRelationship()’ function:

CALCULATE( SUM( FactResellerSales[SalesAmount] ),

          USERELATIONSHIP(FactResellerSales[ShipDateKey], DimDate[DateKey])

)

A relationship will be ‘active’ for the scope of execution of the calculation. By using this approach calculations that involve relationships can resolve using different paths in the same model, achieving results like this one:

clip_image020

The middle column uses the default relationship – so there is no need to invoke the UseRelationship() function here.  However,  the third column explicitly activates a different relationship path by using the DAX measure specified above.

Aside from this neat feature that gives lots of flexibility to the data model it is also possible to create pure calculated relationships. These do not rely on activating relationships already existing on the data model, but they literally generate the entire lookup operation as part of the formula evaluation.

For example, If we use the same tables in point # 1, but delete the relationship from the data model we get:

clip_image022

Notice there are no relationships at all. However, we can still generate the following pivot table:

clip_image024

How is this possible? As you can see, the amount has been properly sliced based on the currency ‘long name’ attribute, which is only present in Table2. Clearly, this operation involves both tables. In order to generate a calculated relationship on the fly, we can define the following DAX expression:

IF(

HASONEVALUE(Table2[Currency]),

       CALCULATE( SUM( Table1[Amount_base] ),

                  FILTER(Table1, Table1[Currency] = VALUES(Table2[Currency]) )

                   )

    )

On it, the aggregate definition of SUM(Table1[Amount_base]) is modified by a filter context in which only the currency on Table1 that matches the current currency in context on Table2 will be used.

In addition to this, you perhaps noticed the aggregate function refers to an [Amount _base] column that did not exist on the original data model. I did this to keep the model user-friendly, by replacing the original measure with a new measure defined by the DAX calculation we defined above. In order to do this, I had to rename the original measure as two measures cannot have the same name. I called that original measure [Amount_base] and configured it to be hidden, which resulted in the following in-memory data model:

clip_image026

The [Amount] measure was defined as a DAX measure whereas the [amount_base] now appears grayed out as client tools won’t be able to see it anymore. This avoids potential user confusion of having to look at two fields with similar names.

5. DAX calculations can modify the way context is propagated by default.

As you may recall from point # 2, context automatically propagates in one direction – from the lookup table to the base table. However, this can be modified by the use of DAX.

Let’s take the same example used on point # 2. By using the RELATED() function, we are able to reach out to a lookup table and grab a value based on the current row context. For example, if we are looking at base table ‘Table1’ and use the RELATED() function in a calculated column, we are able to fetch the related Amount value on Table2 for the currency in current context on Table1.

clip_image028

However, if we try to do the same operation from the lookup table, Table2, we get an error:

clip_image030

The error, when expanded, shows this:

clip_image031

Remember that in our example each table had unique currency values so any table could have been defined as the lookup. And this is the same exact behavior we got with the pivot tables on point #2: context will automatically propagate only one way- from the lookup table to the base table. In this case, I am unable to reach Table1 from Table2, as Table1 is not defined as the lookup table in the relationship.

clip_image032

As we saw before, this is not the case when trying to get the related values on Table2 (the lookup) from Table1 (the base).  Things here work as expected and without errors. Both attempts are reflected in the data model diagram above in which the last column on Table2 has a warning message next to it signaling it wasn’t able to successfully resolve the computation.

We can override this behavior by using the CALCULATE() function. Replacing the bad calculation in Table2 with the calculation below will allow us to propagate context in the other direction and fetch the right values from Table1 to the current currency in context in Table2, even if Table2 is still defined as the lookup table:

CALCULATE(SUM(Table1[Amount]))

clip_image033

Why is this? Because CALCULATE() automatically turns row context into filter context. What this means is that for the currency in context – as the formula evaluates row by row – is able to reach out to the other table following the relationship and select the subset of rows on that table that pertain to this currency. In our example  there is only one matching row on the other side of the relationship but even if we were to have many rows on the other side we would be able to aggregate them properly (we are specifying the use of the aggregate SUM() function as part of the formula expression. Clearly, we can also use other aggregate functions like MIN(), MAX(), etc.).

Another way we could have done this is by using RELATEDTABLE() as internally this function can also propagate context on the reverse:

SUMX(RELATEDTABLE(Table1), Table1[Amount])

clip_image034

I would recommend sticking with the approach using CALCULATE() if possible as not only is faster than SUMX() but in many cases it is easier to troubleshoot.

You may wonder: why use calculated relationships when one could stick to regular relationships? The reason is that there are some business problems that require the use of flexible data models that may not be able to leverage regular relationships, yet with the use of calculated relationships it is possible to find a working – and elegant – solution. A great example of this is the use of calculated relationships to define banding, as explained in this great article by Alberto Ferrari: http://sqlblog.com/blogs/alberto_ferrari/archive/2010/10/12/banding-with-powerpivot.aspx

Hopefully this post will assist with clarifying many of the confusions out there regarding how relationships work. Once you become aware of the principles behind valid relationships and their impact in calculation context it will be easier to generate insightful PowerPivot reports or versatile BISM Tabular models.

Rolling sum across categories that are not dates

Sometimes you may need to create a PowerPivot report that displays rolling totals on categories that do not reflect years, months or days.    You might feel a bit stuck as most of the examples on rolling aggregates are related to time, for example Year to Date figures.   Those can be implemented using DAX functions like TOTALYTD(), but as you examine the required parameters for that function you discover that a date column is necessary.

For example, take a look at the following table:

 

There are two categories used as row labels:  [Color] and [SizeRange] (This data comes from the sample database AdventureWorks and represents sales by product attributes).  If you need to display an aggreage figure showing the running total by [SizeRange] in descending order you of course won’t be able to use calendar based functions as TOTALYTD().   However, the effect can still be created using DAX formulas.

The first thing needed is to generate a numeric value that uniquely identifies the [SizeRange] data.   We need this because we will use it in the Running Sum logic by telling the PowerPivot engine to aggregate a number of values up until the current value displayed.  Those sort of operations are always easier to do when identifier columns are used.

If you are lucky to be using a database engine to populate the PowerPivot window, you can use their inherent functionality to generate a lookup table.  For example, if the underlying database is SQL Server we could use the following statement to generate a new in-memory PowerPivot table:

select

                row_number() over (order by sizerange desc) id,

                SizeRange from (select distinct sizerange from DimProduct) a

Notice the order by statement in row_number():  I am specifically generating the identifier based on a descending order.  This should match the way I order the row labels in the spreadsheet, in order to generate the ‘rolling sum’ effect.

Establishing a relationship to the Product table, the underlying data model looks like this now:

 

To make things easier, we now use the RELATED() function to invoke the identity column on the [SizeRangeLookup] table from the DimProduct table:

 

We now have all we need to generate the rolling sum effect.  Adding the following DAX measure to the PowerPivot table will display the new aggregate in the spreadsheet:

IF( COUNTROWS( VALUES( DimProduct[SizeRange] ) ) = 1,

                CALCULATE( SUM( FactResellerSales[SalesAmount] )  ,

                FILTER(SizeRangeLookup,

                                     SizeRangeLookup[id] <= VALUES(DimProduct[SizeRangeLookupId]) ) ,

                ALLEXCEPT( DimProduct, DimProduct[Color] ) 

                )

)

The measure uses CALCULATE() to affect filter context by selecting only the subgroup of rows that have a SizeRangeLookup[id] equal or lower than the currently displayed one.   This in effects adds up all rows with [SizeRange] values that have been tagged with a numeric identifier up until the one that is currently in context.   The resulting effect is displayed below:

 

As you can see, the ‘rolling sum by color’ measure incrementally adds up the categories in the row labels, but only until the last value in the current [Color].  It then restarts from the first value of the following [Color] and so on.

One of the interesting challenges of these type of rollups are the generation of identifier columns.  It would probably be great to have some kind of DAX function that would to this for you on the PowerPivot window, with no need to generate it on the underlying data source or the query.  On the other hand, having to use this method gives you more control as in some scenarios you may want to establish relationships via valid source identifiers and generating other ones in the PowerPivot window could make any potential relationships invalid.

PowerPivot Help

As an active contributor to the PowerPivot online community, I was recently wondering how Excel data analysts look for help when they are stuck with PowerPivot.   So I ran a search in Google for ‘PowerPivot help’, and I got back a few online reference documents that walk you over what you can do with this tool.   However, this may not be what you need.   What if you could ask your specific question and get an aswer to the particular issue you are working on?

Well, such a place exists.  It is called the “MSDN SQL Server PowerPivot for Excel forums”.   Feel free to go there and ask any PowerPivot question you wish.   One note: if your question is about DAX calculations or PowerPivot data modeling, it really helps if you post sample data so that participants can reproduce the issue on their side.

Here is the link:  http://social.msdn.microsoft.com/Forums/en-US/sqlkjpowerpivotforexcel/threads

See you there!

PowerPivot contest

Microsoft has opened a contest to create interesting mashups with PowerPivot CTP3.  If you have any interest in mixing/merging data in cool PowerPivot apps (like this one), then you may consider submitting to this contest!  there is a $3,000 prize for the first place 🙂

Here is the link:   http://www.facebook.com/microsoftbi (look under the MSBI Data MashUp tab)

Quartile, Percentile and Median in PowerPivot / DAX

Lets examine what the options are for calculating Quartile / Percentiles in PowerPivot.  On my last post, I explained one way to calculate Median using pure DAX.   Though the calculation works, don’t expect it to be speedy when dealing with large datasets;  The reason for this is that it relies on row context iterations to rank values prior to selecting the median.  

As similar technique can be used to calculate Quartile or Percentile, following this blog entry.  However, the result is not always optimal:  not only the calculation can be slow, be the results do not match Excel’s native functions.    The reason for this is that Excel uses interpolation to calculate both Quartile and Percentile, whereas the DAX expression above picks one value out of the set (in other words, its just an approximation).

Can we use a technique to match Excel’s results within a PowerPivot model?   The answer is yes, though the technique discussed here does not use DAX.   What I want to share is another way of using the inherent capabilities of PowerPivot: being itself a version of SQL Server Analysis Services, it is actually able to understand MDX.  It is quite interesting to notice that PowerPivot can behave in ways that reveal its SSAS nature:  from the ability to respond to Excel cube functions to generating a trace file which can be read in SQL Profiler.

So what does this have to do with calculating Percentile?   Well, even though DAX does not include any of those functions you can leverage PowerPivot’s multidimensional interface to create an MDX calculated member that will encapsulate each of those functions.  Lets see how.

In order to see PowerPivot queries to the Vertipaq engine in MDX format, we can use a free codeplex add-in called ‘OLAP PivotTable Extensions’ (download it here, and be sure to select the correct bitness version).  This add-in was created with SSAS in mind, but – without any further effort from the add-in itself – PowerPivot responds to it. Once installed, create a PowerPivot table and then use the add-in to view the MDX query (select the ‘MDX’ tab on the window that appears when you click on the OLAP Pivot Table Extensions menu item)

image

Sample PowerPivot table using AdventureWorks data

The MDX query we get is:

SELECT
    NON EMPTY Hierarchize(
        {DrilldownLevel({[DimDate].[EnglishMonthName].[All]},,,INCLUDE_CALC_MEMBERS)})
        DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS 
FROM [Sandbox]
    WHERE (
        [DimDate].[CalendarYear].&[2004],
        [Measures].[Sum of SalesAmount]
        )

So lets create a calculated member to compute Median.

Select the ‘Calculations’ tab on the add-in menu.  This is used to submit calculated members along with the rest of the pivot table query.  Type the following query:

Median(                                  
        (
         [DimDate].[CalendarYear].CurrentMember *
         [DimDate].[EnglishMonthName].[All].Children
        ) 
        , CoalesceEmpty ( [Measures].[Sum of SalesAmount] , 0 )                          
      )

This should look in the add-in window as follows:

image

The result matches Excel regular ‘Median’ function:

image

Lets examine the actual query.   MDX has a built-in ‘Median’ function ( see definition here ).   As you can see, the expression uses parameters that look like regular SSAS dimensions.  This is the PowerPivot data model though, and we didn’t have to use Visual Studio to create it: the PowerPivot engine automatically creates dimensions based on the tabular model loaded into Vertipaq.

The measure [Sum of SalesAmount] is the aggregate PowerPivot created based on the Sales Amount column of the in-memory fact table.  Once it became an actual measure in the PowerPivot model, we can use it as the measure expression of the MDX function.

Lets try now creating Quartile and Percentiles.  It gets more interesting here:  Unlike the ‘Median’ function, MDX has no built-in Percentile or Quartile function.   So here we have it:  neither DAX nor MDX will help us here.  However, we can leverage a another feature of SSAS: the ability to perform COM interop calls through the ExcelMDX assembly.   You can see this library on a regular SSAS installation by browsing the assemblies folder:

image

It is quite interesting when you think about it.   We are asking an Excel pivot table to communicate with the PowerPivot engine through a calculated member which in turn invokes Excel functionality not present in the MDX language. Pretty cool indeed. So lets try it:  Using the following expression we can compute Quartiles:

Excel!Quartile(
                                SetToArray ( 
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,1)

The output is the following:

image

As you can see, the output matches Excel’s native quartile function. The calculation is now built into the PowerPivot model, and hence is dynamic and can respond to slicers and filters, as well as cell context by using MDX properties like ‘CurrentMember’.  And though the PowerPivot is performing a COM interop call to Excel (marshalling calls from one environment to the other), it should still perform better than a DAX calculation based heavily on row context.

If you examine the MDX that is sent to the Vertipaq engine, you will notice the calculated member is declared as a query scoped expression:

WITH
MEMBER [Measures].[Quartile] as Excel!Quartile(
                                SetToArray ( 
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,1)

SELECT
    {
    [Measures].[Sum of SalesAmount],
    [Measures].[Quartile]
    }
    DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON COLUMNS ,
NON EMPTY
    Hierarchize(
        DrilldownMember(
            CrossJoin(
                {
                [DimDate].[CalendarYear].[All],
                [DimDate].[CalendarYear].[CalendarYear].AllMembers
                },
                {([DimDate].[EnglishMonthName].[All])}
                ),
                [DimDate].[CalendarYear].[CalendarYear].AllMembers,
                [DimDate].[EnglishMonthName]))
    DIMENSION PROPERTIES PARENT_UNIQUE_NAME,HIERARCHY_UNIQUE_NAME ON ROWS 
FROM [Sandbox] CELL PROPERTIES VALUE, FORMAT_STRING, LANGUAGE, BACK_COLOR, FORE_COLOR, FONT_FLAGS

The dimension used (DimDate) as well as the measure (Sum of SalesAmount) were automatically created by the PowerPivot engine.  This is what provides compatibility with OLAP interfaces:  for each table in the PowerPivot model, a default ‘count’ measure has been created as well as a dimension, in which each column is an attribute.  In other words, dimensions and measures are somewhat interchangeable in PowerPivot, following a more flexible model than with the traditional SSAS engine.  Also, notice the query is executed against a cube called ‘Sandbox’.  This is the default cube-interface name for Vertipaq PowerPivot databases.  It is the same name you will see in the cubes generated in the dedicated SSAS instance used on PowerPivot for SharePoint (This is an instance of SQL Server Analysis Services 2008 R2 in Vertipaq mode).

One downside of this method is that the PowerPivot field list is not aware of the calculation.  However, if you open the standard pivot table field list you will see it there. In other words, and as I said before, this calculation is now part of the data model.  Which means you can access it outside of the constraints of the pivot table: through Excel cube functions.   As an example, using the following formula expression on a worksheet cell we can dynamically invoke the MDX calculated member against PowerPivot, with the ability to specify a query context to it:

=CUBEVALUE("PowerPivot Data",CUBESET("PowerPivot Data","Head([DimDate].[CalendarYear].[All].Children)"),"[Measures].[Quartile]")

The result will be 925,015 again.  In other words, we were able to dynamically invoke the first member of the CalendarYear attribute on the DimDate dimension and leverage the MDX calculated member, all in one cell – no need of pivot tables.  Pretty powerful right?

You can verify that our Quartile calculation is now part of the intrinstic model by browsing through measures members when constructing the cube formula:

image

What I find pretty interesting is that the list of available measures on the PowerPivot model includes now regular DAX measures ([Measures].[Sum of Sales]) as well as MDX measures ([Measures].[Quartile]).

How about Percentile?  Using the same method, it is pretty simple to calculate it.  Just replace Quartile in the MDX expression… that’s it!

Excel!Percentile(
                                SetToArray ( 
                                                                ( [DimDate].[CalendarYear].CurrentMember * [DimDate].[EnglishMonthName].[All].Children 
                                                ) 
                                , CoalesceEmpty ( [Measures].[Sum of SalesAmount]  , 0 )
                                ) 
                ,0.30)

image

Be careful to not define your underlying measure as ‘currency’ type.  Though I always thought data types in the PowerPivot window were purely decorative (as they don’t carry over the pivot table), the MDX calculated member will not work on currency types.  Instead, try decimal type and you will be fine.

And one last thing… as you may have noticed, this technique opens the possiblity to a huge range of calculated expressions.  An though it is not the standard PowerPivot/DAX implementation, it still leverages PowerPivot technology by making use of its inherent data model. In other words, PowerPivot applications can not only benefit from the amazing power of DAX , but also that of MDX (which in turn opens the door to further functionality through interfaces with Excel and VBA).  Things like Internal Rate of Return (IRR) are also possible using this method Smile

Excel!IRR( SetToArray (  [Table3].[b].[All].Children, [Measures].[Sum of a]  ) )

image

Calculating Median in PowerPivot / DAX

Although PowerPivot – in its currently implementation – lacks a dedicated ‘median’ function, it is possible to recreate it using existing functionality.   There is a great blog entry here by Marco Russo where he describes how to calculate median over the values of a PowerPivot column. 

His calculation is as follows:

MINX( FILTER( VALUES( People[Age] ),
              CALCULATE( COUNTROWS( People ),
                         People[Age] <= EARLIER( People[Age] ) ) 
              > COUNTROWS( People ) / 2 ),
      People[Age] )

Recently I tried to adapt this calculation to a PowerPivot application I was working on.  I ran into two issues:

1) The calculation can be applied over a regular column on a table in the PowerPivot window or a DAX calculated column. However, what if you must calculated the median over the output of a DAX measure instead of a regular or calculated column?  In this case, you won’t be able to use the DAX functions EARLIER and VALUES, as these only take columns as parameters.  Due to this, the calculation above won’t work

2) I realized there should be a separate logic depending on whether the set of values is even or odd.   For even sets, there is no discrete middle value and one must calculate an average of both values in the middle.

Fortunately, it is fairly easy to deal with both issues.  With Alberto Ferrari’s help, I implemented the calculation below.  There are three steps for it, the first one which implies the creation of an index, which we will use to select the values in the middle.  I then created an alternate execution path for even sets, which simply averages the two values in the middle.

To see how it works, copy and past the data below and import it into PowerPivot (as a linked table):

image

Step 1

Create a measure.  This is to test that we can calculate a median over a column or measure using the same method.

[Test Measure] =SUM([Amount]) / 2

image

Step 2

Create another measure which will serve to rank the months by the value in [Test Measure]. We will use this ranking value to determine which value is the median (in case of odd sets), or to average the two values in the middle (in case of even sets)

[Index] =IF (
    COUNTROWS ( VALUES ( Table1[Month] ) ) = 1,
    COUNTROWS (
        FILTER (
            ALL ( Table1[Month] ),
            [Test Measure] <= CALCULATE (
               [Test Measure], Table1[Month] = VALUES ( Table1[Month] )
            )
        )
    )
)

image

Step 3

Create median expression, reflecting the logic discussed above:

[Median] =IF( COUNTROWS( VALUES( Table1[Month] ) ) > 1,
     IF( MOD( COUNTROWS( ALL( Table1[Month] ) ) ,2 ) = 0,
          (
            CALCULATE( MAXX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] <= ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            ) +
            CALCULATE( MINX( Table1, [Test Measure] ),
                FILTER(
                 ALL(Table1[Month]),
                 [Index] > ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
                )
            )
          ) / 2
  ,
          CALCULATE(
           [Test Measure],
            FILTER(
             ALL(Table1[Month]),
             [Index] = ROUND( COUNTROWS(ALL(Table1[Month])) / 2, 0)
            )
          )
        )
    )

The calculation first uses the MOD function to determine if the set is even (which is the case in our example).  It then gets the two values in the middle by:

1) Filtering the months for all that have an index of less or equal to the count of months divided by two.  Take the maximum value of [Test Measure] for this set.

2) Filter the months for all that have an index greater than the count of months divided by two. Take the minimum value of [Test Measure] for this set

3) add these values and divide by two (simple average)

image

The calculation is correct and matches Excel’s regular median function output. One word of caution: this expression relies heavily on row context which can greatly impact performance over big datasets.    In the following blog post, I will talk about a potential way around this issue in which we can arrive to correct the Median output by leveraging PowerPivot’s intrinsic compatibility with MDX.