I was working on a demo for my upcoming Pluralsight course, and I noticed something odd. It used to be that a empty PBIX file was 123 KB, but some point since May 2017, the file size has become 10 (!) KB. So what’s the cause of the difference?
If you rename a .pbix file to .zip, you can crack it open. If we look at two nearly empty files side by side, we can see the difference comes from the data model. In this example, each data model has a single value that I manually entered.
It used to be that you could look at the data model and see a version number.
But now, it’s almost entirely unintelligible. The only thing you can read is “This backup was created using xpress 9 compression.”
A little Googling indicates that it’s a Microsoft-specific compression algorithm used in a number of places.
It seems silly to me to compress something that’s already inside of a zip file. But that new compression does seem to have a sizable effect. In this example, I have a 6.67 MB CSV file with 1 million unique values:
When imported into power bi Desktop, the new compression model is dramatically more efficient. 184 KB versus 2,288 KB.
What I haven’t figured out yet is if this impacts in-memory use or just when it’s saved to disk. Still it’s nice to see Microsoft continuing to make improvements.