Reframing Business Intelligence: Embracing change in our own discipline

I tend to see Business Intelligence as a field that not only delivers analytics but also generates change. A successful BI solution has the power to influence a company’s culture. The more fact-based decision making, the less personal bias there is. This is good, as decisions based on actual data are more likely to increase the ability of a business to increase profits or reduce costs.

image

On the other hand, many practitioners of Business Intelligence have known for a long time that the success ratio of BI/DW projects is not quite what it should be. To be sure we are in the same page, let me define what I consider success: a pervasive implementation, one in which the delivered tools are widely used by the intended target audience.

How many times have you seen technically perfect implementations go into misuse? The reasons are varied, but in a good number of cases we can track them down to the human side of the equation. Taking the shape of corporate politics, misunderstandings or quite simply resistance to change, people at every level of the organization can jeopardize a beautiful ETL/cube/report deployment.

To top this off, BI projects have been traditionally kicked off at a level of the organization that is neither senior management, nor front line data analysts. Sometimes this approach can correctly align us with the people that manage budgets, but this may not necessarily translate in accurate implementations. Not surprisingly, when deciding what technology to use, purchasers heavily rely on tools that have the potential to woo the highest levels of the organization with fancy visualizations.

There is nothing wrong with eye candy, except when it doesn’t serve the purpose of generating real business insight.

It appears to me we must also embrace change ourselves, so we can better align with real business needs.

We need is a solution that can help us stay close to the actual subject matter experts from the very first stages of BI development. With this, not only we ensure accuracy and relevancy but also gain an important asset: internal sponsorship. Gartner has said that this factor is one of the most important elements contributing to the success of a Business Intelligence project.

Clearly, the end-to-end framework of the xVelocity engine and DAX comes in handy when bridging the gap between business experts and IT folks. Business Intelligence success can occur more naturally when it happens through organic growth based on concrete, small deliverables that come to the aid of information workers. These are people whose main purpose is to produce business insight and without proper guidance can spend many hours per day copying and pasting instead of doing what they are supposed to do: analyze data. With tabular technology, we now have a way to directly assist them and grow with them. It is not unlikely that an engagement that starts as a punctual, report-specific project can turn into a more involved data mart/ETL development initiative.

As you know, I am fascinated with PowerPivot (well, ok, and Tabular technology). I tend to think that using PowerPivot for active/passive prototyping can be fantastic way to achieve this precise goal. By including subject matter experts’ PowerPivot data models in the corporate BI infrastructure (and giving credit where credit is due), we ensure accuracy along with a deliverable that is well understood and has been championed in collaboration with key users from all of the business areas involved (this generally translates into sponsorship).

Notice here there is change on focus: instead of a “let’s fix the world” approach to data warehousing, which can take months or years to be implemented (by that time business needs have probably changed), we are focusing in smaller, targeted successes. These in turn can naturally grow into a more comprehensive solution, all under the scrutiny of subject matter experts that naturally attempt to achieve the highest level of data quality and measure accuracy they can get.

We must directly target the data analysts from the very first moment of the Business Intelligence engagement. And what better tool to do that than, yes, Microsoft Excel (with PowerPivot, of course). This is a big contrast with the way things have been done traditionally, in which it is common to engage only at higher levels, defining an overarching structure, and after months of ETL work deliver SSRS reports which are frequently received with comments like, “Hmmm, that’s not exactly what I was looking. ” A round of expensive refactoring must start at that point.
So how can we engage data analysts early on, yet give decision makers a chance to try out the solution without incurring a big financial risk? Yes, you heard it before: we must prototype. PowerPivot can be an excellent prototyping tool.

Prototyping, as part of a pilot project, can be a great way to sample a solution prior to a full commitment to the endeavor. Even in the case of failure, much will be learned regarding the actual viability of the real project. In addition to this, when using PowerPivot to prototype, it is possible to generate a tabular model that is reusable once the actual project has been approved (PowerPivot data models are extensible).
This is not to say traditional data warehousing should be shelved. What I advocate for is a mixed approach:  a top/down strategy to data warehousing married to a bottom/up data exploration process can be a powerful combination.

At my company, Mariner, we have embraced this philosophy and have formal programs for pilot projects and prototyping that can help an organization increase the chances of success in Business Intelligence.   Please follow this link for more detailed info.

Advertisements

LastNonEmpty in Tabular mode: Part 2, Last Ever Non Empty in DAX

One of the most useful inventory calculations is what Chris Webb referred to as ‘Last Ever Non Empty’.  This calculation differs from the regular MDX Last Non Empty aggregation in that it will yield, for each dimension member at the granularity specified, the fact value on the last date for which there was data even if that date happened before the selected time frame.

image

Jason Thomas (blog | twitter) and I have recently been working on classifying a number of different types of Last Non Empty scenarios that people may encounter when working on inventory control analytics.  Though the examples we use are for inventory reporting, these patterns are clearly applicable to reports beyond this subject area and can be applied anywhere there is a need for Last Non Empty aggregations.

We have come up with the following categorizations:

image

This is part 2 of this series, in which I will explain in more detail a DAX pattern to tackle Last Ever Non Empty scenarios in BISM Tabular.  This is one type of aggregation that is not easy to handle in Multidimensional, so its worth exploring if we are able to achieve a better performing calculation with DAX. If you are interested in the first two Last Non Empty scenarios, please refer to part I of the series available here.

Handling these calculations require leveraging another DAX pattern which we blogged about before, one that allows us to explicitly execute a Tabular calculation at an explicit level of granularity.  These articles can be found here and here.

Lets approach the first scenario of Last Ever Non Empty calculations.  In order to explain this better, lets use an example from Contoso’s database.  Suppose we want to find out what the last amount of inventory in stock across all stores is.  In this case, we want to analyze stores by grouping them by the product manufacturers we distribute. In other words, we want to find out what the last date with inventory data was for each manufacturer, and then add the inventory quantity in stock on that day across all stores that distribute products from the manufacturer in question. In order to compute this, we can start with a simple pivot table like the following:

image

I have selected only a few zip codes (10018 and 10093) to narrow the analysis.   We first want to identify what the latest date (with data) was for each manufacturer.  Using just a CALCULATE function coupled with LASTDATE won’t do the trick.  For example, the following expression will return the last date of the date dimension:

Measure 1:=CALCULATE( VALUES( DimDate[Datekey] ) , LASTDATE( DimDate[Datekey] ) )

image

 

If we tried using the date on the fact table instead, we would still get incorrect results as the last date with activity for those stores happened before the selected year of 2011:

Measure 1:=CALCULATE( VALUES( FactInventory[Datekey] ) , LASTDATE( FactInventory[Datekey] ) )

image

Instead, what we want is to iterate though each store (in effect, changing the granularity of the calculation) and retrieve the last date with activity which is less than or equal the year we selected. In order to do this, we must define an iterator (MAXX, for example) and then use CALCULATE to retrieve the last day with fact data.  The calculation for this is:

LastEverNonEmptyDate:=IF(HASONEVALUE(DimStore[StoreName]) , MAXX(
        SUMMARIZE
        (
        DimStore,
        DimStore[StoreName]
        ,"LastEverNonEmptyDate", CALCULATE( MAX( FactInventory[DateKey] )
                                , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear])
        ) )
    , [LastEverNonEmptyDate] ) )

In reality, the use of MAXX is not because we want the MAX value, but because we want to iterate through the in-memory table defined by the SUMMARIZE function.   If we used MINXX instead of MAXX, we would still get the same output:

image

With this information, we can now tackle scenario # 1 of the Last Ever Non Empty pattern.   We want to get the value for each manufacturer on the last date in which it had activity (last non empty date across all stores).  By changing the grain of the calculation to iterate over manufacturers instead of stores, we can get that date and retrieve the quantity of inventory in stock on that date:

LastEverNonEmpty-AllStores:=SUMX(
    ADDCOLUMNS(
        ADDCOLUMNS(
        VALUES(DimProduct[Manufacturer])
            ,"LastEverNonEmptyDate",
                CALCULATE( MAX( FactInventory[DateKey] )
                , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear])
                )
            )
        , "Quantity", CALCULATE ( SUM( FactInventory[OnHandQuantity] )
                        , FILTER(ALL(Dimdate), DimDate[DateKey] = [LastEverNonEmptyDate] ) )
        )
    , [Quantity] )

The inner ADDCOLUMNS function is used to retrieve the last ever non empty date for each manufacturer (up until the selected year).  The outer ADDCOLUMNS function allows us to use this date and dynamically filter the fact table as we iterate across all manufacturers.   In the pivot table, at the store level, we are still able to see what the value was for that store on the last ever non empty date.   However, notice that at the manufacturer total level, only the stores with activity on the last ever non empty date (for the current store) will be aggregated:

image

This calculation performs rather well, and in a fact table of a little more than 8 million rows it is able to generate an output in approximately 2 seconds.

Finally, we want to tackle the more complex scenario in Last Ever Non Empty patterns.  This one requires that the manufacturer total takes into account the last ever non empty value for each store.  In other words, in the example above, the total should take all rows into account, not just those for 12/26/2009.  In DAX, we are able to change the granularity of the calculation to now be a composite of manufacturer and store.   The final expression is:

LastEverNonEmpty-EachStore:=SUMX(
    ADDCOLUMNS(       
        ADDCOLUMNS(   
            CROSSJOIN(
                VALUES(DimStore[StoreName]),
                VALUES(DimProduct[Manufacturer])
            )
    ,"LastEverNonEmptyDate", CALCULATE( MAX( FactInventory[DateKey] )
                                        , DimDate[CalendarYear] <= VALUES( DimDate[CalendarYear]) )
        )   
    , "Quantity", CALCULATE ( SUM( FactInventory[OnHandQuantity] )
                        , FILTER(ALL(Dimdate), DimDate[DateKey] = [LastEverNonEmptyDate] ) )
    )
, [Quantity] )

Here, we are using a nested expression to calculate, first, the crossjoin of store names and manufacturers.  Then, we get the last ever non empty date for each combination.  Finally we get the total quantity for each pair, as we iterate over that table using SUMX.  The result is correct, even at the grand total level and even thought the computation is more complex it executes again in about 2 seconds in the Contoso 8 million row fact table; in other words, I couldn’t identify – during my tests – any performance overhead compared to the prior calculation.  Getting this kind of performance on a last ever non empty calculation in Multidimensional is hard or even impossible to achieve, and it would require tweaks to the ETL which are not always ideal.

image

Notice how the highlighted row is now included on the manufacturer total.   Explicitly specifying the grain of the DAX calculation to use both, manufacturer and store, is necessary to iterate properly at the grand total level as many stores can be classified under multiple manufacturers and simply getting the last ever non empty date at the store level will incorrectly represent the amounts as the context of the calculation would only use the last ever non empty date of each store across all manufacturers which is clearly incorrect.

Also note that when leveraging the iteration used in the expression, we are looping over the values on the lookup table and not the base table.  This is an important point as the date lookup is filtering the fact table, but the product and store lookup tables are not filtered as context is unable to propagate to them by default.  In this way, we are able to grab stores that had activity even before the selected calendar value (a year, in this case).

These examples show the versatility of Tabular and the DAX language.  By respecting the rules of context propagation, we can develop advanced calculations that are not built into  the model.