Speaking on PowerPivot this weekend at Carolina Code Camp 2012

If you are in the Charlotte, NC area this weekend and are interested in learning about all the exciting features that new release of PowerPivot offers, check out my session at this year’s Carolina Code Camp!!!  I will be available after the session to answer any questions on PowerPivot features, data modeling, DAX calculations, best practices or any other related  feature you may want to talk about.  You know me: I love this stuff!! Smile

http://codecamp.developersguild.org/

Carolina Code Camp

Active day count in DAX

Alberto Ferrari posted a very nice blog entry some time ago regarding how to count active days in PowerPivot (http://sqlblog.com/blogs/alberto_ferrari/archive/2011/05/12/powerpivot-counting-active-days.aspx ).   His example relies on a data model that specifies ‘Active From’ and ‘Active To’ fields in order to compute the count.

I want to write a follow up entry for when such fields do not exist in the data model.  If all you have is a start date reflected as a transaction recorded in your fact table, counting active days is still possible in DAX.  Lets see how.

If we drop the entire list of products in the Adventure Works database in a pivot table, we are able to determine the first ‘active’ day by using the LOOKUPVALUE function:

First Activity Date:=LOOKUPVALUE(

                 DimDate[FullDateAlternateKey],

                 DimDate[DateKey],

                 MIN(FactInternetSales[OrderDateKey])

)

image

Having tested this, we can now simply use the same technique to count the rows between the first activity date and the last activity date in the fact table.  For this, I use the following expression:

Active Days Since Product Inception:=IF(

    CALCULATE( COUNTROWS( FactInternetSales ) ) ,

    CALCULATE(COUNTROWS(DimDate),

        DATESBETWEEN(

            DimDate[FullDateAlternateKey],

            LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MIN(FactInternetSales[OrderDateKey]) 

            ),

        CALCULATE(

            LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MAX(FactInternetSales[OrderDateKey])

            ), ALL(FactInternetSales) ) 

        ) 

    ) – 1,

    BLANK()

)

This expression is pretty straight forward:  it uses the DATESBETWEEN function for compute the amount of dates between the first date for which there is transaction data for any given product in the fact table and the last date in the fact table across all products.  The result is the following:

image

The power of this calculation is that it works even if we change the granularity of elements in the row labels.  As an example, if we replace product by category we get:

image

If what we want is to compute the active days between the first and last dates for which there is transaction activity for any given product or category, we can leverage the same technique we used to get the first activity date, but this time we use it to get the last activity date. We then compute the dates between the two:

Active Days:=IF(

    CALCULATE( COUNTROWS( FactInternetSales ) ) ,

    CALCULATE(COUNTROWS(DimDate),

        DATESBETWEEN(

            DimDate[FullDateAlternateKey],

            LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MIN(FactInternetSales[OrderDateKey]) 

            ),        

            LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MAX(FactInternetSales[OrderDateKey])

            )

        ) 

    ) – 1,

    BLANK()

)

image

Notice that is necessary for these calculations to have a proper there is a need to have Date table. That means there should be a list of dates without gaps, otherwise the count would be incorrect.

We can see how these two formulas compare, by placing them side to side.  I have highlighted the rows in which their outputs are different, indicating the products for which the last active date was not the same as the last transaction date in the table:

image

There is one last scenario that may be of use to know about.  Sometimes we want to calculate the amount of days from the first activity date until the most current date.  In other words, it is a countdown of days until a specific event, for example, the last date for which there was fact data. In this sense, it is the reverse of what we have computed so far.

In order to compute this figure, we can’t rely anymore on the DATESBETWEEN function. Instead, we can simply subtract the two dates, as follows:

Countdown to last fact date:=IF(

    CALCULATE( COUNTROWS( FactInternetSales ) ) ,

    INT(

        CALCULATE( LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MAX(FactInternetSales[OrderDateKey]) 

            ), ALL(DimDate) )    –

            LOOKUPVALUE(

                DimDate[FullDateAlternateKey],

                DimDate[DateKey],

                MIN(FactInternetSales[OrderDateKey]) 

            )

    )   

         

    , BLANK()

)

 

image

Clearly, in this last formula we can replace the first date argument with any other arbitrary date for which we need to compute a countdown.  Interestingly, as this expression does not rely on the built-in tabular time intelligence functions, it doesn’t need a date table which was necessary in the prior expressions used in this blog post. 

We can use this last technique to dynamically calculate things like countdown to date when payment is due, for example.   

What-if analysis using the right level of granularity in PowerPivot

PowerPivot offers a big advantage when conducting  ‘what-if’ scenario analysis:  the ability to specify ad hoc values in a report and use them – in conjunction with the existing data – to generate personalized outputs without ever having to leave the workbook.  (As Cathy Dumas pointed out in the comments section of this entry, SSAS Tabular models can also include ‘ad hoc’ data by giving the ability to directly paste it into the model;    PowerPivot, interestingly, makes the job even easier by the use of ‘linked tables’).

As an example, look at the following data model (based on the Adventure Works database):

image

Imagine that we wanted to motivate our sales force to increase orders by giving them a special bonus computed based on the average monthly sales.  We are investigating what the actual dollar pay-out will be, at different rates (I am keeping the scenario very simple)

So we can just type in Excel the different percentages we are exploring.  Something like:

image

At this point, it is necessary to import these values into PowerPivot, by using the ‘Create Linked table’ option. This new table will have no relationships to the other existing ones in the PowerPivot data model: and we want to keep that configuration, as there are no common keys with the other tables.

We can grab the Pay-out Rate value selected by the user using the VALUES function:

User Selected Rate:=VALUES(‘Pay-out Rate'[Pay-out Rate])

image

We can include this value against the sales amount to compute the bonus pay-out:

Bonus:

=IF(HASONEVALUE(‘Pay-out Rate'[Pay-out Rate]) &&

        NOT(HASONEVALUE(DimDate[EnglishMonthName])) ,

              AVERAGE( FactResellerSales[SalesAmount] ) *

              VALUES(‘Pay-out Rate'[Pay-out Rate])

)

image

Before we starting complaining how cheap management is, lets examine why this initial calculation is incorrect. Even thought the calculation changes based on user selection,  as you may recall, the bonus should be based of the monthly averages.  Our fact table stores data at the daily sales by product granularity. The calculation is incorrect as the average is generated at this level (by product by day);  The correct approach, instead, should not take into account product in the computation – it should only compute the monthly average and then apply the rate selected by the user. The sum of each output at the monthly level will determine the year’s bonus.

With DAX, this can be achieved by leveraging the power of the SUMMARIZE function.  By generating an on-the-fly table of the distinct months that are currently in context, we can then average data for those months and then apply the given rate.

Bonus – Correct:=IF(

    HASONEVALUE(‘Pay-out Rate'[Pay-out Rate]) &&

    NOT(HASONEVALUE(DimDate[EnglishMonthName]) ) ,

             SUMX(

                     SUMMARIZE(

                            DimDate,

                            DimDate[EnglishMonthName],

                            “Sales”,

                                     SUM(FactResellerSales[SalesAmount] ) /

                                     COUNTROWS( VALUES (DimDate[DateKey] ) ) ),

        [Sales] * VALUES(‘Pay-out Rate'[Pay-out Rate] )

    )

)

image

Again, we ensure the calculation is correct by ignoring the actual granularity that exists in the fact table; instead we compute the monthly average as the sum of amount over the day count, after which we include the user selected ‘scenario’ rate in the computation.  The power of this technique is that we can affect the data at the appropriate level, and then let it roll-up to higher levels.

If more scenarios need to be added, it is just a matter of typing the new value in the spreadsheet table, re-opening the PowerPivot window (which will automatically import the new values) and hitting refresh in the ‘Data’ tab in Excel.  The new ‘scenario’ rate will now be available for reporting:

image

The combination of these fairly simple techniques can be useful to generate very powerful, scenario driven what-if analytics. This is what self-service BI is all about! Smile