PowerPivot and Beyond: Expanding your analytical repertoire

One of those PowerPivot features that is no very well known is that PowerPivot can be used as a database that can serve many other tools besides Excel.   

I blogged about it a little bit here when I mentioned that PowerPivot can be consumed through other tools like Tableau.  However, I think is worth expanding on the topic a bit as this feature can be very useful for people migrating from “Personal BI” to “Team BI” or even “Corporate BI”.

Imagine a scenario in which you have developed a PowerPivot workbook and have then published it to a PowerPivot Gallery in SharePoint.  Your report has now proven to be a success and its popularity has been rising recently across your company.   A business unit manager now wants to distribute this report to his team members, but filtering it to only the data that pertains to each of them.  The PowerPivot workbook has a big amount of data, demanding the 64-bit version on PowerPivot.  The manager as well as his team members cannot download the workbook and use it locally as they only have PowerPivot 32-bit version installed and due to corporate policies they do not expect to be upgraded soon.  

One option is for the manager to manually copy and paste the pivot table displayed on the browser into a separate excel file.  He would have to do this, every day, for each team member as he can only sent the portion of the data that belongs to that specific employee.  Not a great solution.

The other option would be to create a separate PowerPivot workbook for each of the team members with only his data.  Not a great solution either as the data model and DAX calculations would have to be replicated, and if you need to change one of them one day, you would have to remember to change it in each of the replicas.

What to do?  I want to point out that you can read a PowerPivot that has been published to SharePoint from a great number of other tools besides Excel. Some of these tools can enhance the PowerPivot implementation, giving us features to solve the scenario in question. 

For this example, let’s stick with Microsoft reporting tools.  Besides Excel itself, you can read PowerPivot data from Report Builder, SQL Server Reporting Services and PerformancePoint.  All of them have the capability to read from published PowerPivots.  SQL Server Reporting Services (SSRS), for example, will give you additional capabilities that will allow you to leverage your PowerPivot workbook but offer more advanced reporting layouts and delivery mechanisms, this last feature being what we need for our example.

When you read PowerPivot from another tool, you are not reading the pivot table in the workbook but the entire data model available in the PowerPivot window. This is true even for Excel itself, being a client to the PowerPivot database.

Lets continue with the scenario given.  This is how my PowerPivot looks like in SharePoint, once I have published it:

image

So far so good.  This is what most of the business users want to see and interact with.  

But other than the nice Silverlight interface, something else happens when the PowerPivot was published and someone interacted with it (via slicers or filters) or by scheduling an automatic data refresh:  a real database is generated in the form of a SQL Server Analysis Services BISM Tabular model.  You can verify this by opening up the dedicated SQL Server Analysis Services instance which the PowerPivot add-in for SharePoint uses (this is a requirement for the add-in to work).  In the case of my published report, I see the following through SQL Server Management Studio (SSMS):

image

The highlighted database is quite literally my published workbook, in database form. If you happen to know some MDX you can then open a new query window and query the PowerPivot database (called a “model”) using this language.  Note that you can be querying, in MDX, a measure that was developed in DAX – like in the case below.  Additionally, you can create your own MDX calculated members as part of the query if necessary.

image

if you feel more familiar with DAX, though, you can query the PowerPivot model with it instead of MDX:

image

The point here is that PowerPivot is now stripped out of its Excel dependency and can be now queried from by many other tools (When SharePoint renders the workbook on the browser, it uses this SQL Server Analysis Services model with an Excel Services front end; however the Excel desktop application is not needed anymore).

  Going back to our scenario, we will be building a solution in SQL Server Reporting Services that will take a parameter and deliver a scheduled report based on filtered data coming from the PowerPivot model.

If you are not familiar with SQL Server Reporting Services (SSRS), I would recommend you start with Report Builder 3.0 which is a more data analyst centric version of the full blown SSRS product, yet it leverages the same underlying technology.

When connecting SSRS to PowerPivot, remember to always do it through the URL that hosts the workbook in SharePoint.  This is a very important point, as when you decide to enhance your PowerPivot workbook and republish it, a new cache copy will be created in SQL Server Analysis Services.  By using the URL instead of a direct connection to the model, you will be certain to always consume the ‘latest & greatest’ version of the workbook.

In Visual Studio 2010, you can create a new SSRS data source by selecting the Analysis Services provider:

image

Click on Edit and use the SharePoint location of the PowerPivot workbook to select the Model:

image

With the connection to PowerPivot now created, you can use the MDX query editor to drag and drop the elements needed for the report.  In this case, I have created a parameter based on employee email, as I will be using this for generating a filtered version of the report for each employee I am interested on:

image

Below is how the final PowerPivot-connected SSRS report looks like.  The currency amounts shown are associated only to the employee in question (filtered by his email account, selected with the dropdown at the top of the report):

image 

We can now deploy the report to the report server:

image

In order to schedule an automated run that goes a number of email values and delivers the report with filtered data we must configure an SSRS snapshot.  Notice that by doing so we are now using capabilities beyond those of a pure PowerPivot implementation.

There are many websites that explain the process of generating an SSRS subscription so I won’t go into a lot detail here, but I was want to show how the process would look like in general terms.

What we want to do at this point is select the report and click on ‘Manage’.  That will give you access to a set of features built into the SSRS framework. 

image

First, select ‘data sources’.  On it, be sure to specify the account that will be used to query the PowerPivot model:

image

After that, click on ‘Subscriptions’ and then on ‘New Data Driven Subscription’.  Following that, give your new subscription a name and specify a data source that contains recipient information.  This data source does *not* need to be in PowerPivot, it can be a table in a SQL Server database that specifies the information about where and how to deliver the automatic snapshot.

In my case, I decided for the creation of a file on a network share instead of email delivery.  As such, the query specified next in the wizard should contain that information, as well as specifying how the report parameter (employee email address) should be set on each iteration of the snapshot:

image

Here is how my SSRS_Subscription table looks like:

image

In other words, I want to create two files:  One called ‘JavierSnapshot’ which was the report filtered for the email address amy0@adventure-works.com  and another one called ‘LeeSnapshot’ which should filter the report for garrett1@adventure-works.com.  These are just examples, in your case you may want to use specific output files mapping, for example, sales manager with sales territories.

The SSRS subscription wizard now instructs us to map the output of the table with the chosen delivery attributes (file delivery in this case), as well as mapping output columns with report parameters:

image

At the end, the wizard asks you for the schedule on which you want this process to execute. 

Here is what the output looks like, in my case, after the subscription has automatically executed:

image

In summary, each PDF was generated by the SSRS report querying the PowerPivot model, but only filtering for a specific employee and automatically delivering the output to a folder in the network.  Pretty useful functionality that can enhance the options available to your self-service environment.

Advertisements

Determining lost or acquired customers in DAX

I was asked recently how to determine which customers where acquired or lost from last month to the current month.  Here is a small sample of what a fact table would look like, registering customers that ordered over time:

image

Three interesting questions that can be asked are:

  • What is the amount of returning customers?
  • Which customers where added on the last month?
  • Which customers where lost on the last month?

We can solve these questions by extending the pattern outlined by Chris Webb here.  But before we apply the pattern, it is necessary to generate a real date column.  This can be easily done by creating a calculated column with the following simple formula

=[Month]

after which we change the data type to ‘date’, which in effect converts the text value into a date field:

image

We can now apply a similar pattern that one Chris describes:

Returning Customers:=IF( HASONEVALUE( Table1[Date]  )  , IF( DATEADD( VALUES( Table1[Date]), -1, MONTH ) <> BLANK(),
    COUNTROWS(
        CALCULATETABLE(
            DISTINCT( Table1[Customer Name] ),
            CALCULATETABLE( DISTINCT( Table1[Customer Name]  ),
                DATESBETWEEN( Table1[Date], BLANK(),
                    DATEADD( VALUES( Table1[Date]), -1, MONTH )
                    )           
                , ALL(Table1) )
            )
    )
) )

The result is a DAX measure that will give us the accurate count of customers that had purchased in the past and the have decided to buy from us again (Mary Evans and John Rogers):

image

This expression stacks two filtered tables comprising two different contexts:  the expression highlighted in red gives us the current customers,  the expression highlighted in blue gives us the list of customers for the past month.  In DAX, the parameters used in a CALCULATE or CALCULATETABLE statement are then intersected and the resulting output are the common rows across both filtered tables.

Lets move on now to the question of how to determine which customers where acquired this month.  We can use a similar technique but rather than leveraging the default intersect behavior when modifying the filter context via CALCULATE or CALCUALTETABLE, we must now generate an except condition between two sets of data:   we want to count all customers that purchased this month except the ones that had purchases last month:

Acquired Customers:=IF(HASONEVALUE( Table1[Month] ),
    IF( DATEADD( VALUES(Table1[Date]) , -1, MONTH) <> BLANK(), 
    CALCULATE(
        COUNTROWS(VALUES( Table1[Customer Name] ) ),  
           FILTER(
                VALUES(Table1[Customer Name]), 
                    Table1[Customer Name] <>
                    CALCULATETABLE ( VALUES( Table1[Customer Name] ),
                        FILTER( all(Table1[date]), Table1[Date] =
                            DATEADD( VALUES(Table1[Date]) , -1, MONTH)
                        ), ALL(Table1[Month]) 
                    )         
                )

            )            
        )
    )

It is normally assumed we use a single value when passing parameters to the FILTER, however, “except” conditions can be generated by comparing two filtered tables as well. The result is the following:

image

Notice that [Returning Customers] + [Acquired Customers] = [Total Customers].  So why not just subtract the total amount of distinct customers from the [Returning Customers] measure to the [Acquired Customers] value? I would say most of the times that what you need to do. However, what if we want to see exactly who those new customers where?  The expression above can be translated into a DAX query, and whether we use SSMS, DAX Studio or a reporting tool like SSRS, we can then visualize the exact customers that made up that count:

image 

resultset viewed in DAX Studio

Now lets solve the final question.  For this we use the same technique as the one outlined above, but reversing the parameters:  we want to know which customers had sales last month except the ones that purchased this month. This will give us the count of lost customers:

Lost Customers:=IF(
    HASONEVALUE( Table1[Date] )   ,
    COUNTROWS ( FILTER ( CALCULATETABLE ( VALUES( Table1[Customer Name] ),
                FILTER( ALL(Table1[date]), Table1[Date] =
                DATEADD( VALUES( Table1[Date] ), -1, MONTH )
                ), ALL(Table1[Month]) 
                ) ,
            Table1[Customer Name] <>            
            CALCULATETABLE( VALUES ( Table1[Customer Name]  ) )           
        )
  )
)

the result is the following:

image

Here, again, we can view the exacts customers we lost through a modified version of the measure above applied as a DAX query, which could serve as a drilldown view of the totals seen above:

image

resultset viewed in DAX Studio

Using DAX studio to create PowerPivot measures

The newly released DAX studio is an excellent tool to develop and troubleshoot DAX calculations as it provides a way to visualize tables as they are internally represented during the computation.

The more advanced your DAX calculations, the more you will need to use intermediate ‘filtered tables’ in order to arrive to a specific output, and this tool allows you to better understand how your filtering actions are being applied across the tables and rows that become active during the evaluation of the expression.

Allow me to give an example. Take the following data:

image

The first table on the left registers orders by store, product and vendor.  The two additional tables are lookups.

Once loaded into PowerPivot, this is how the data model looks like:

image

We want to find out how many vendors sold Product ID “A” in Store ID “1” AND Product ID “B” in Store ID “2”. Typically, you would develop & troubleshoot this PowerPivot measure by using pivot tables as your vehicle to test each step of the calculation, leveraging functions like COUNTROWS to understand how many rows are actually being filtered by a given context.   Not a terrible approach, but you are left wondering if there is a better way, a way to take a peek at the in-memory tables and see how how they look like given the application of a certain amount of filters.

This is what DAX Studio can do for us.  Connecting it to the sample data model above, we can start with a simple expression to get a distinct list of StoreIDs:

image

We are using DAX as a query language here so we must start with the EVALUATE clause.

As a next step, let’s write a slightly more complex DAX query to calculate a column that will count how many times each vendor ordered Product ID “A” in Store ID “1”:

image

As you can see, column [Store1_ProdA] shows which vendors in the list had orders with the required condition.  We can double check this is indeed correct by going back to the PowerPivot window and applying filters in the grid view; We are able to verify the vendors flagged above accurately represent the data indeed:

image

Now we are ready to add the second condition in DAX Studio, counting orders for Product ID “B” in Stored ID “2”:

image

Remember, once we are done developing the calculation this table will not be shown to the user.  However, being able to look at it allows us to better understand how PowerPivot will represent the data on the fly as it is computing an output for any given context.

As what we want to do is count the vendors in which these two conditions apply, we can add a FILTER function to wrap our table:

image

And we have our result.  We must now convert this DAX query expression into a DAX measure.  For that, we must understand that DAX measures return scalar values, whereas DAX Studio requires tables to display on the grid.  To arrive to a scalar value from here, we can simply wrap the expression with COUNTROWS. 

Before we switch back to the Excel to do this, though, lets visualize the resulting scalar value in DAX Studio.  Remember, DAX Studio needs a table to evaluate, so we must wrap our COUNTROWS expression with the ROW function. This part won’t be needed in the measure, we are using it here only to visualize the single count value within the grid window of DAX Studio.

image

If we want to verify the expression is properly allocating context when adding another lookup element, we can add a SUMMARIZE function to the query.  With this, we are able to verify the count is assigned correctly to the one day in which the conditional expression proved to be true:

image

We are now ready to move the expression into a DAX measure.  Here, we will only be taking the part that starts with COUNTROWS, as the rest was needed just for testing.  This is the final expression as it looks in the DAX measure window invoked from the PowerPivot field list:

image

If we visualize now the final result in a pivot table, we are able to confirm the results from our prior DAX query:

image

In summary, DAX Studio is a great tool for developing (and troubleshooting) calculations that require the use of intermediate tables, which is generally the case for advanced DAX development. In the end, DAX is mainly about filtering (row, query and filter context) and this tool allows us to see those filtered tables as we work our way through a computation.  

Many thanks to Marco Russo, Darren Gosbell, Paul te Braak and Grant Paisley for their effort in developing this awesome tool, available for download here: http://daxstudio.codeplex.com/

Book Review: SQL Server 2012 Analysis Services, The BISM Tabular Model

If you are looking to increase your Business Intelligence ‘IQ’, there is a book I want to recommend to you: “SQL Server 2012 Analysis Services, The BISM Tabular Model” by Marco Russo, Alberto Ferrari and Chris Webb (published by Microsoft Press).

It provides the most in-depth training available on the new Tabular framework in SQL Server Analysis Services 2012.  The coverage of features is truly impressive, ranging from data modeling to development and administration.  It is indeed the best and most comprehensive guide on the topic available at the moment.

As a technologist, you must be facing the interesting question of when to use BISM Tabular and how does it compare with Multidimensional cubes. This book will guide you through that decision, comparing both frameworks and pointing out their strengths and weaknesses.

Even after deciding to use BISM Tabular for your project, there is a need to select an option between loading data in-memory (using xVelocity) or relying on DirectQuery to benefit from real time reporting. The book dives deeply into this topic and establishes a comparison of features which will allow you make the right selection when implementing Tabular technology in your organization.

In addition to this, the in-depth coverage of DAX makes it a fantastic reference book when working through complex queries and formulas in order to get the most out of Tabular technology.  A lot of attention is given to DAX’s conceptual underpinnings, from basic to advanced levels, always from the perspective of a BI professional.

To be a seasoned SSAS developer you must have a good understanding of how to best model the data in order to generate a user-friendly and intuitive semantic layer.  With this book you will also gain that understanding, with particular emphasis on how modeling for xVelocity is different from regular data modeling  and how DAX can be used to handle advanced relationships.

All these are great topics, but the book goes even further.  Alternatives and best practices regarding the use of presentation layer features are also covered as well as how a Tabular deployments can benefit from the use of PowerPivot for self-service BI.

Finally, the necessary topics of how to handle security and optimize & administer the environment are explained clearly.

I must say I liked the experience of being part of the peer review team for this book. I particularly enjoyed the collection of useful links it provides.  The authors have mined the best online resources available on Tabular model and they share each of them as they delve into each and every topic.

If you are interested in anything Tabular (SSAS, PowerPivot, DAX, MDX, Power View, AMO and more) I suggest you get a copy of this book. I am certain you will enjoy it!  Here is the Amazon.com link to the book:

http://www.amazon.com/Microsoft-Server-2012-Analysis-Services/dp/0735658188

Announcing a new BI user group and SQL Saturday in Charlotte!

I am super exited to share the news that a few of my friends from the SQL Server community and I have decided to create a new user group:  the PASS local Chapter for SQL Server Business Intelligence in Charlotte, North Carolina.  Our motto: We want to make it fun to learn new skills while interacting with others.  If you live in Charlotte or are close to the “Queen City”, I want to know about you and invite you to attend!

On top of this, we have decided to organize a SQL Saturday BI edition here in Charlotte!  Find out more information at the SQL Saturday website here: http://www.sqlsaturday.com/174/eventhome.aspx

image

The Charlotte SQL Server BI group and the Charlotte SQL Saturday are organized by:

Jason Thomas: Blog | LinkedIn | Twitter

Melissa Coates: Blog | LinkedIn | Twitter

Rafael Salas: Blog | LinkedIn | Twitter

and myself! Blog | LinkedIn | Twitter

If you are interested in speaking or volunteering for the event (or the user group) please do let me know! 🙂

Approximate match with multiple criteria in DAX, part II

On the prior blog entry, I explained how to mix exact and approximate matches when retrieving a value form  a lookup table. A solution was explained which required a specific approach that must now be explained. This blog entry goes into more advanced topics related to formula context and some not very intuitive DAX behaviors.

As mentioned here, PowerPivot gives you the ability to mimic the behavior of VLOOKUP with approximate matches.  If you must extend the formula to include exact matches, the initial thought is to add more elements to the FILTER function in DAX, as highlighted below:

image

Notice the statement in purple uses a ‘less than or equal’ operator (unlike the exact matching criteria).  As such, DAX will retrieve a number of rows up until the value on the right (Orders[Date]).  What those rows selected, we attempt to select the most recent value using LASTNONBLANK.

Lets apply this expression to the following scenario:

image

We want to retrieve the Product Cost from the lookup table to the base table, and we want to do so by an exact match on Category and Product, and an approximate match on Date.

When we use the expression outlined above we get the following:

image

Clearly, the value for 1/3/2012 and 1/5/2012 is incorrect (Should have been 97 for both, as that was either an exact match or the most approximate value).

What went wrong?  The issue emerges from the intrinsic behavior of the LASTNONBLANK function.  The name may lead you to believe the value will be the last value that is not empty.  However, LASTNONBLANK has an interesting behavior when is acting over numeric values:  it will sort them from small to large, regardless of the actual row order in the table.  We can verify we are in fact retrieving valid rows from the lookup table up to the date in question, by changing the expression to the following:

=CALCULATE(
    COUNTROWS( Lookup ) ,
    FILTER(
        Lookup,
            Lookup[Category] = Orders[Category] &&
            Lookup[Product] = Orders[Product] &&
            Lookup[Date] <= Orders[Date] 
        ) 
    )

image

So what LASTNONBLANK is doing is finding the highest numeric value within the rows that were returned. This can be easily verified by creating a linked table in PowerPivot with the following data:

image

If we use a simple DAX measure with the following definition:

LASTNONBLANK( Table[LastNonEmptyIssue] , 1)

the output is 45.   As you can see, LASTNONBLANK is returning the highest value (MAX), not the last value in the table.  In our solution for approximate matches we must get the last numeric value available in the table, not the max value.

How to solve this? We must observe that LASTNONBLANK is actually able to get correct date value.

If you try the same DAX expression we used above in the following table…

image

the output will be 1/9/2012.

This is because the last value in a date column is normally the maximum date, as we typically assume the date values are ordered in descending way.

So instead of using FILTER to get all rows up until the order’s date, we will use an approximate match to retrieve the last date value and *then* use this single value to do a lookup based on that exact date.  Here is the final – valid – solution (as shown in the last post):

=CALCULATE( 
    LASTNONBLANK( Lookup[Cost] , 1 ),
     FILTER( 
        Lookup,
        Lookup[Date] = CALCULATE(
                                     LASTNONBLANK( Lookup[Date],1 ), 
                                     FILTER(
                                         Lookup,
                                         Lookup[Category] = Orders[Category] &&
                                         Lookup[Product] = Orders[Product] &&
                                         Lookup[Date] <= Orders[Date]
                                       )
                                  )
&& 
        Lookup[Category] = Orders[Category] &&
        Lookup[Product] = Orders[Product]
   )  
)

It is also possible to break down the “approximate date” part of the formula (in red above) into a separate calculated column and have the final calculation refer to that column instead.  This can improve future troubleshooting efforts. For example:

image

 

What if you need this expression as a measure?

The example above should be used when attempting approximate matches as DAX calculated columns.  If we use it as a measure, perhaps because the granularity of the calculation should be different from that of the in-memory table, there is one extra caution we must keep in mind in order to generate the correct output:  we cannot break down the calculation in two as I just suggested you do when acting on calculated columns.  Why?  Because a measure used inside of another measure is automatically surrounded by a CALCULATE statement which affects the output of the two-step expression.

[Approximate Match Date Measure]:=IF(HASONEVALUE(Orders ) , CALCULATE(
    LASTNONBLANK( Lookup[Date],1 ),
    FILTER(
        Lookup,
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product]) &&
            Lookup[Date] <= VALUES(Orders[Date]) 
        ) 
    ) )

Notice that we are using the VALUES function when acting as a DAX measure, as we want to be sure to grab the current single value in context.  Next, we create another measure to call this one:

[RetrievedLookupCostValue_Wrong]:=IF(HASONEVALUE(Orders) , CALCULATE(
    LASTNONBLANK( Lookup[Cost] , 1 ),
    FILTER(
        Lookup,
            Lookup[Date] = [Approximate Match Date Measure] &&
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product] )
        )
) )

The result is incorrectly showing the MAX cost instead of the most approximate cost value:

image

When DAX automatically surrounds a measure reference with the CALCULATE statement, it coverts row context into filter context.  In this specific scenario, what that means is that even though we are passing an exact date, CALCULATE grabs all rows that match the exact criteria (Product and Category) and for each of those rows retrieved it grabs the last date in context for each row.  As this concept is a bit difficult to understand at first, lets try it with another example:

image

In the table above we have a stream of dates (column 1) and two measures.  The [Last Date 1] represents the last date for the whole table, whereas [Last Date 2] represents the last date as of the current date in context (which happens to be the same date value on column 1). Which one is correct? It depends on the business question right?

  • If I am computing ratios over total (% of total), I want to see – in the denominator and for each row – the total value from the first date until the last date for the whole dataset (that would be [Last Date 1] ).
  • On the other hand, If I am calculating a running total, I want to see the value of 1/3/2012, for example, as the value represented *up until* 1/3/2012 and nothing after that (that would be [Last Date 2]).

In other words, each date can be the correct solution depending the business scenario.

When wrapping a measure inside of another measure, DAX assigns a CALCULATE function automatically which turns an output like the one in [Last Date 1] to that one of [Last Date 2].  Why?  because now each row will be taken into account in the computation of how the table is filtered (filter context), so each row’s date value is now filtering the selection of rows to be considered when retrieving the “last value that is not blank”.

The way around it is by computing the whole expression at once — only in that way we avoid the automatically assigned ‘CALCULATE’. So the final calculation, as a DAX measure, is:

IF(HASONEVALUE(Orders) , CALCULATE(
    LASTNONBLANK( Lookup[Cost], 1 ), 
    FILTER(
        Lookup,
            Lookup[Date] = CALCULATE(
                            LASTNONBLANK( Lookup[Date],1 ),
                                FILTER(
                                    Lookup,
                                        Lookup[Category] = VALUES(Orders[Category]) &&
                                        Lookup[Product] = VALUES(Orders[Product]) &&
                                        Lookup[Date] <= VALUES(Orders[Date]) 
                                    ) 
                            )  &&
            Lookup[Category] = VALUES(Orders[Category]) &&
            Lookup[Product] = VALUES(Orders[Product] )
        )
    )
)

Approximate match with multiple criteria in DAX, part I

On a prior blog entry, I described how to use DAX to match the behavior of Excel’s VLOOKUP using approximate matches.   A few readers suggested I expand the example to include scenarios in which the approximate match should be used along with exact matches.

So here is the scenario: We want to calculate profit margin over products sold.  Here is the list of orders we received:

image

In order to calculate margin, we must find the cost for each product.  This information is stored in a separate table. The cost of producing each product changes over time, though, and the cost lookup table only registers a records for the days in which the cost changes:

image

Notice that in the Orders table there are sales for the fifth of January, whereas in the Cost Lookup table only cost records have been stored until the third.  We assume there is no other changes beyond the third.  As you probably already guessed, we must perform a dual lookup:  an exact match based on Product and Category and an approximate match based on Date.

Do not create a relationship between these two tables. Relationships are based on exact matches only, and we don’t need them here.

Following the same pattern applied in the approximate lookup blog entry, we attempt this solution:

=CALCULATE(
LASTNONBLANK( Lookup[Cost], 1 ),
FILTER(
Lookup,
Lookup[Category] = Orders[Category] &&
Lookup[Product] = Orders[Product] &&
Lookup[Date] <= Orders[Date]
)
)

Category and Product must be exact between both tables, while the date must be the last one for which there was a cost record in the lookup table.  When using this expression as a calculated column called [Cost], we get:

image

Notice there were two empty values as those products did not belong to a product/category combination that had a registered cost.  This could be a valid combination of product/category for which there was no valid data in the lookup table, or simply a bad record.  For all other records, there is a value.  However, we looking a bit closer, we realize the numbers are not correct.  For example if we filter only for records of Category A and Product P1, we get:

Orders

image

Lookup

image

As you can see, the calculated column in the Orders table should display 97 for both 1/3/2012 and 1/5/2012.  So why is this problem happening?  The issue has to do with the LASTNONBLANK function. Lets us go directly into the correct calculation:

=CALCULATE(
LASTNONBLANK( Lookup[Cost] , 1 ),
FILTER(
Lookup,
Lookup[Date] = CALCULATE(
LASTNONBLANK( Lookup[Date],1 ),
FILTER(
Lookup,
Lookup[Category] = Orders[Category] &&
Lookup[Product] = Orders[Product] &&
Lookup[Date] <= Orders[Date]
)
)
&&
Lookup[Category] = Orders[Category] &&
Lookup[Product] = Orders[Product]
)
)

Notice the modification to the prior formula is that we execute another lookup internally (colored in red). For those interested in learning more about the issue and the solution, read my next blog entry available here https://javierguillen.wordpress.com/2012/08/05/approximate-match-with-multiple-criteria-in-dax-part-ii/.

Now we are able to generate the approximate match and retrieve the valid cost.  An extra calculated column allows us to compute the Profit Margin:

=IF([Cost] <> BLANK(), [Price] – [Cost] )

image