Performance tip to speed up slow pivot operations in Power Query and Power BI from Power BI – The BI
Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure I can tell that I found a trick to speed them up. The sweetness comes from the fact that the performance improvement is very significant. But the bitterness comes from the fact that I could have used this for almost 4 years now, but was too blind to realize at the time when I first worked with the code.
Trick to speed up a slow pivot table
This might not work everywhere, but for my tests, it worked really well: Don’t use an aggregation function when you want fast pivoting:
Don’t aggregate when you want a fast pivot in Power Query
But if your data isn’t aggregated on the row- & column values already, you’ll get this error message:
Error when the values are not adequately aggregated
So to make this work, you have to aggregate the values on the axis’ values before.
Let’s walk through the steps:
Start is this table:
The pivot shall bring the values from the “Column”-column into the column-area and sum the values from column AB like so:
If I pivot without an aggregation like mentioned above I will get the dreaded error-message like above, because there are multiple rows for each Row- and Column-combination:
Values are not aggregated on the row- and column axis
The step to success is a grouping operation beforehand:
Group on all columns that shall define the row- & column values of the pivot
This returns a table with unique row- & column – combinations:
Aggregated table with just as many rows as the number of fields in the desired pivot table
9 rows for a desired 3×3-matrix looks just about right. So if I pivot here, there will be no further aggregation needed and the desired result will be shown.
Who found it?
Code from Bill Szysz for a fast matrix multiplication (posted by DataChant)
This article is almost 4 years old, and I’ve even played around with the code at that time. Sight.. 4 years wasted time where I didn’t realize that the key for the performance improvement lied in a technique that would significantly improve the refresh speed of so many other applications as well…
Why does it work?
Here is my guess:
It looks as if the group operation creates some primary keys that create partitions for every row (or even every cell?) of the pivot table to be. I tested this guess by adding a primary key on those 2 columns (instead of grouping) and the refresh time sped up just like with the group operation. So if your data is already aggregated to the right level you can just add a key (or remove duplicates – as long as you don’t loose any rows), no need to do the group.
This means that the pivot operation doesn’t have to work on the full table, but just on the partitioned parts. (In this article I have described the performance improvements through partitions the first time).
But this also reminds me of the performance improvement for aggregations after joins, that I’ve blogged about here. Let’s see if there will be more use cases to be found.
Enjoy and stay queryious 😉
The post Performance tip to speed up slow pivot operations in Power Query and Power BI appeared first on The BIccountant.