Measure expressions: how performant are they?

They ship parcels domestically, internationally (and between dimensions).  We are only concerned with the operations of their trucks.  The trucks either

·        Collect parcels from depots to deliver to customers

·        Collect parcels from customers to deliver to depots

 

The customer pays for the delivery/collection, so revenue can be associated with a customer.  The job of collecting the parcels and delivering them has costs associated with it – e.g. petrol.  Costs are therefore associated with a job.

 

 

It is common for a truck to be part loaded in order to serve multiple customers.  In this case there would be multiple delivery/collection addresses for the same job.  It is therefore conceivable that we could have the following data in our Revenue table.

 

The parcels for JobID of 1 were delivered to two customers (those with CustomerIDs of 1 and 2).  However, Customer 1 only took up a small amount of space in the truck.  The job took place in December and he was shipping Halloween decorations so there wasn’t much demand.  Customer 2, on the other hand, was shipping Christmas trees, so he used the vast majority of the truck’s space.

How then should we assign costs to a customer?  You have probably already noticed that Customer is a many-to-many dimension with regard to Cost.  A customer can have many jobs performed, and a job can be done for more than one customer at a time.

 

In our example, the total cost of Job 1 (JobID of 1) is £500.  With the default behaviour of many-to-many dimensions (i.e. no measure expression or MDX to split the costs), both customers 1 and 2 would be assigned a cost of £500.  This seems a little unfair – especially for Customer 1 because he only used a small part of the truck space.  It is a business decision as to how to split costs in this type of scenario.  In this hypothetical case, it makes sense to do a proportional split based on revenue.

 

This is what the CostMultiplier field is for.  For a given JobID, the sum of CostMultiplier always equals 1.  So, for the first row, 500 / 5,000 is 0.1.  For the second row, 4,500 / 5,000 is 0.9.  For the third row, 3,000 / 3,000 is 1.

 

Using MDX or a measure expression, we can multiply the cost for a customer by the CostMultiplier.  The following screenshot is therefore how an OLAP client tool would present the costs in our example.

 

 

Note: doing the proportional split based on revenue results in the data being aggregated using a regular sum.  The cost for Customer 1 + Customer 2 + Customer 3 is £800.  £800 is also the actual total cost of Job 1 and Job 2.  Summing the individual customers using the many-to-many dimension without the proportional split would have resulted in £1,300 (£500 + £500 + £300).

 

Note: the proportional split has to be done at leaf level.  This is because the calculation is, as George Spofford would say, “non-commutative”.  It involves addition/subtraction and multiplication/division, so we will get different numbers at the aggregated level depending on whether or not the calculation is done at leaf level.

 

So, let’s do it!  I created a relational database containing the above 4 tables and populated it with (quite a lot of) data.  I then created a cube on top of it to allow testing of how performant the different design options are.

 

You can download a backup of the cube that uses calculated members (Option 1 and 2) from here.  For a backup of the cube that uses the measure expression (Option 3), download from here.

 

Here is the metadata of the cube.

 

 

Here are the numbers of rows for each table

Customer –        53,186 rows

Revenue –    4,370,206 rows

Job –                979,921 rows

Cost –              334,342 rows

 

To populate the CostMultipler field, we can run the following query.

 

UPDATE Revenue

      SET CostMultiplier =

      CASE

            WHEN dt.Amount = 0 or dt.Amount IS NULL THEN 0

            ELSE r.Amount / dt.Amount

      END

      FROM Revenue r

      INNER JOIN

      (

            SELECT JobID, SUM(Amount) Amount

            FROM Revenue r

            GROUP BY JobID

      ) dt ON r.JobID = dt.JobID

 

Now, when we run the following query, we should get back 1 for every single JobID.

 

SELECT

      JobID,

      SUM(CostMultiplier)

FROM

      Revenue

GROUP BY

      JobID

 

I set the attribute relationship’s Cardinality and RelationshipType properties to improve performance (see this post).  I also of course designed aggregations on the 2 measure group partitions.

 

Various options are evaluated for performance in getting the costs by customer including the proportional split.  It is assumed that we do not have a query cache when testing performance.  The AS query cache can be cleared by running an XMLA command as pointed out by Darren Gosbell here.

 

OPTION 1 – TRADITIONAL STYLE MDX

 

I created a calculated measure called “Cost – Traditional MDX”.  It takes the traditional approach of defining the calculation in the member declaration.  See the MDX below.  Obviously only the calculated measure would be exposed in the perspectives; not the base measure.

 

Note: this will only work if the only exposed hierarchy from the [Customer] dimension is [Category – Customer].

 

CREATE MEMBER CURRENTCUBE.[Measures].[Cost - Traditional MDX]

AS

    IIf

    (

        [Customer].[Category - Customer].CurrentMember.Level IS

        [Customer].[Category - Customer].[(All)],

        // If at the all level of customer, just return the hidden base measure

        [Measures].[Cost],

        // Otherwise split the cost at leaf level proportionally to revenue

        Sum

        (

            Descendants

            (

                [Customer].[Category - Customer].CurrentMember, , LEAVES

            ) *

            Descendants

            (

                [Job].[Reference].CurrentMember, , LEAVES

            ),

            ([Measures].[Cost] * [Measures].[Cost Multiplier])

        )

    ),

FORMAT_STRING = "Currency",

NON_EMPTY_BEHAVIOR = { [Cost] },

VISIBLE = 1;

 

Guess how long the following query takes (on my dev VPC)?  About 1 minute 15 seconds.  There are only about 100 rows returned.  The reason it takes so long is that it has to go down to leaf level at runtime for all the Job/Customer combinations.

 

SELECT

      [Measures].[Cost - Traditional MDX] ON 0,

      {

            [Customer].[Category - Customer].[All],

            [Customer].[Category - Customer].[All].Children

      } ON 1

FROM

      [Measure Expressions]

 

OPTION 2 – MDX SCRIPT

 

We can improve on Option 1 using MDX script assignments.

 

CREATE MEMBER CURRENTCUBE.[Measures].[Cost - MDX Script]

AS

    [Measures].[Cost],

FORMAT_STRING = "Currency",

NON_EMPTY_BEHAVIOR = { [Cost] },

VISIBLE = 1;

SCOPE ([Measures].[Cost - MDX Script]);

   

    // Assign to all cells in the [Customer].[Category - Customer]

    // hierarchy except the [All] member

    ( [Customer].[Category - Customer].Members(1) : Null ) =

    Sum

    (

        Descendants

        (

            [Customer].[Category - Customer].CurrentMember, , LEAVES

        ) *

        Descendants

        (

            [Job].[Reference].CurrentMember, , LEAVES

        ),

        ([Measures].[Cost] * [Measures].[Cost Multiplier])

    );

   

END SCOPE;

 

This brought the query time down from over 1 minute 15 seconds to about 50 seconds.  It does of course return the same data.

 

UPDATE: LEAF-LEVEL MDX SCRIPT ASSIGNMENT 

 

Some readers may be asking why we didn't execute something like the following MDX script assigment.

 

SCOPE (Leaves(Customer), Leaves(Job));

    [Measures].[Cost] = [Measures].[Cost] * ValidMeasure([Measures].[Cost Multiplier]);

END SCOPE;

 

This would be taking a similar approach to what is described by Mosha here.  The Leaves function returns a set of tuples representing the lowest level of granularity across all attributes in the dimension that is passed in as a parameter.  ValidMeasure is used purely as a performance optimisation (see Mosha's post above).  As [Measures].[Cost] is a base measure rather than a calculated measure, it should automatically aggregate up the hierarchy (i.e. to customer category level, which is what our query is interested in) right?  Wrong!  Take a look at Page 388 to 390 of the MDX Solutions book.  It states that the outer dimension in a many-to-many relationship (in this case Customer) will not automatically aggregate up the hierarchy.  This is similar behaviour to MDX script assigments involving calculated measures.  This approach is normally very performant as it doesn't need to use the Descendants function; Analysis Services deals with the aggregation automatically.  Unfortunately, we have to rule it out in this case.

 

OPTION 3 – MEASURE EXPRESSIONS 

 

I created another cube and set up a measure expression for this purpose.  It is simply “[Cost]*[Cost Multiplier]”.

 

Note: when using measure expressions, it is often a good idea to set the DirectSlice property for the dimension relationship (in the Dimension Usage tab).  For the case of MDPS, setting the direct slice to Root([Customer]) (as shown by the following screenshot) means that the All member for Job/Customer attributes shows the actual total of costs – not just those which have a corresponding [Cost Multiplier].  If we do not set DirectSlice, Jobs with no revenue would not contribute to the total cost for all Jobs.

 

The Root(<dimension_expression>) function returns a tuple with the All/Default member of every attribute/hierarchy in the dimension.

 

 

 

To re-iterate, it makes sense to set the DirectSlice as above for the case of MDPS because we want to report the total cost for all jobs (e.g. [Job].[Reference].[All]) irrespective of whether revenue has been received for them.

 

Using measure expressions brought the query time down to less than 20 seconds.

 

SUMMARY

 

Considering that querying Cost by Customer without any MDX or measure expressions (i.e. without the proportional split) takes about 12 seconds (due to sheer volume of data), I’d say measure expressions are indeed pretty performant!

 

 

你可能感兴趣的:(express)