The Pros And Cons Of Modelling Measures As A Dimension In Power BI from Chris Webb’s BI Blog
You probably know that dimensional modelling and building a star schema are very important in Power BI. In a star schema you’ll have at least one fact table, and in your fact table you’ll have two main types of column: dimension keys, which link the fact table to dimension tables, and measures, which store the data you want to aggregate and analyse. Here’s a simple example of a fact table called Sales with three dimension key columns (Product, Store and Customer) and three measure columns (Sales, Tax and Volume Sold):
Quite often, though, I see people taking a slightly different approach to modelling their fact tables: instead of having separate columns for each measure they unpivot their data, create one row in their fact table for each measure value, use a single column to store all the measure values and create a new dimension to allow the user to select which measure values they want. Here’s an example of another fact table, called Sales Unpivot, showing how the data from the Sales fact table above can be remodelled using this technique:
In this fact table the dimension keys remain the same, but the Value column stores all the data from the Sales, Tax and Volume Sold measures in the original table and the Measure Name column tells you what type of measure value is stored on any given row. Let’s call this approach the Measures Dimension approach.
There are some advantages to building fact tables using the Measures Dimension approach, for example:
You can now use a slicer in a report to select the measures that appear in a visual
You can now easily add new measures without having to add new columns in your fact table
You can use row-level security to control which measures a user has access to
Generally speaking, though, any time you deviate from a conventional dimensional model you risk running into problems later on and this is no exception. Let’s go through the disadvantages of modelling data using a Measures Dimension.
Notice that the Sales and Tax measure columns from the Sales fact table are currency values and that Volumn Sold contains decimal values with four decimal places. It’s quite easy to set different formats for different measures when each measure is a separate column:
However, when all your values are stored in one column, as in the Measures Dimension example, formatting is not so straightforward. You might be able to get away with using one generic format for all your data:
…but that isn’t ideal. Of course you can create DAX measures and format them appropriately but then you lose some of the flexibility of this approach; you could also use a calculation group and dynamic format strings as Kasper describes here.
More seriously, Power BI does a much more efficient job of storing and compressing the data in a conventional fact table compared to when the Measures Dimension approach is used and this has consequences for query performance. Using the View Metrics button in DAX Studio to see the details of how the data is stored for each table is revealing. Here are some selected highlights:
First of all, notice that the Sales Unpivot table (which uses the Measures dimension approach) is 66% larger than the Sales table. Notice also that in the Sales table the Sales and Tax measure columns, which contain currency values, can use the Currency data type (which shows up Decimal here, confusingly) which in turn means that they can use Value encoding; only the Volume Sold column needs to be stored using the Decimal Number data type (which shows up as Double here), and must use Hash encoding. In the Sales Unpivot table, since all the measure values are stored in the Value column, this column has to use the Decimal Number data type and Hash encoding. As this article explains (the Definitive Guide To DAX goes into a lot more detail) Value encoding can give you a lot of performance benefits.
When you start to build more complex DAX calculations then the disadvantages of the Measures Dimension approach become even more apparent. Let’s say you want a visual in your report that shows Sales, Tax and a measure that subtracts Tax from Sales called Sales After Tax:
Here’s the DAX needed for this visual:
Sales Measure = SUM('Sales'[Sales]) Tax Measure = SUM('Sales'[Tax]) Sales After Tax = [Sales Measure] - [Tax Measure]
To achieve the same result with the Measures Dimension approach, though, you need to know how to use the DAX Calculate() function, something like this:
Sales Measure 2 = CALCULATE ( SUM ( 'Sales Unpivot'[Value] ), KEEPFILTERS ( 'Sales Unpivot'[Measure Name] = "Sales" ) ) Tax Measure 2 = CALCULATE ( SUM ( 'Sales Unpivot'[Value] ), KEEPFILTERS ( 'Sales Unpivot'[Measure Name] = "Tax" ) ) Sales After Tax 2 = [Sales Measure 2] - [Tax Measure 2]
[Note that in most cases I’d create a separate dimension table for the Measures dimension, but to keep things simple here I’ve not done that]
If you expect other people to build measures on your dataset then this additional complexity can be a significant barrier to overcome. Calculate isn’t an easy function to use properly.
Last of all, there’s also also a performance penalty to pay with the Measures dimension. Taking the Sales After Tax example from the previous section, here’s what the Server Timings tab in DAX Studio shows for the query associated with the visual showing Sales, Tax and Sales After Tax:
Notice that there’s just one Storage Engine query: DAX fusion has kicked in so that the Sales and Tax values required can be retrieved in the same scan.
However, here’s what the Server Timings tab shows for the same visual using the Measures Dimension approach and the second set of measures using the Sales Unpivot table shown above:
Not only is this query slower but there are now two Storage Engine queries: one to get the Sales data and one to get the Tax data. Since separate scans are needed to get each measure value, the more measures you have in a visual or the more measures needed by your calculations, the more scans are needed. This can very quickly add up to a big performance problem, especially if each scan is relatively slow – which is more likely to be the case since the Measures Dimension approach means Power BI is less able to compress data effectively.
As you’ve probably guessed by now I’m not a big fan of the Measures Dimension approach. While there are definitely some advantages to using it I think the disadvantages – which aren’t always immediately obvious – outweigh them.