Category Archives: Power BI

Power BI Desktop files are smaller now

I was working on a demo for my upcoming Pluralsight image

It used to be that you could look at the data model and see a version number.

image

But now, it’s almost entirely unintelligible. The only thing you can read is “This backup was created using xpress 9 compression.”

image

A little image

When imported into power bi Desktop, the new compression model is dramatically more efficient. 184 KB versus 2,288 KB.

image

What I haven’t figured out yet is if this impacts in-memory use or just when it’s saved to disk. Still it’s nice to see Microsoft continuing to make improvements.

Do you need pro licensing to administer Power BI Premium?

A recent viewer of my new Pluralsight course had a question about data gateways and Power BI Premium. Specifically, do you need a pro license to install and administer data gateways? The short answer is probably not!

Installing data gateways

So when you install a data gateway, you need to log in as a user to register it with your tenant. Well it turns out that whoever is used there is set as the default admin for that gateway. I created a user with just a power BI free license, and I was able to install and administer that gateway just fine. I was also able to assign it to other gateways that already existed.

So, for normal usage you don’t have to be licensed with pro to setup and configure data gateways. I was honestly a bit surprised by this, but in retrospect is makes sense. Pro licensing is all about consuming reports.

What about Premium?

So, the original question was about Power BI Premium. Unfortunately, there’s no developer tier for me to test on, but I have a few guesses.

First, I reviewed the white paper and the distinction it makes between pro users and infrequent users is about producing versus consuming reports. It doesn’t really talk much about administration from what I could tell. Same thing for the faq:

Do I need Power BI Pro to use Power BI Premium?
Yes. Power BI Pro is required to publish reports, share dashboards, collaborate with colleagues in workspaces and engage in other related activities.

Next, I did some searching, and found a page about capacity admins, but that doesn’t relate to data gateways specifically.

So based on what I found, I would assume that you don’t need a pro license to manage data gateways for premium. I would assume it would be a similar experience to normal Power BI.

Why DAX is a PITA: part 1

  • So, I think that DAX is a pain in the butt to use and to learn. I talk about that in my intro to DAX presentation, but I think it boils down to the fact that you need a bunch of mental concepts to have a proper mental model, to simulate what DAX will do. This is very deceptive, because it looks like Excel formulas on steroids, but conceptually it’s very different.

Here is the problem with DAX, in a nutshell:

image

This example below is a perfect example of that sharp rise in learning curve, and dealing with foreign concepts like calculated columns, measures, applied filters, and evaluation contexts.

So, one of the things I’m hoping to catalog are example where DAX is a giant pain if you don’t know what you are doing. People make it look really simple and smooth, and that can be frustrating sometimes. Let’s see more failures!

How do I GROUPBY in DAX?

John Hohengarten asked me a question recently on the SQL Community Slack. He said:

I need to sum an amount column, grouped by a column
Measure 1 :=
GROUPBY (
det,
det[nbr],
    “Total AR Amt Paid calc”SUM ( det[amt] )
)
I’m getting a syntax error

So automatically, something seemed off to me. Measures are designed to return a single value, given the filter context that’s applied to them. That means you almost always need some aggregate function at an outer level. But based on the name, you wouldn’t necessarily expect GROUPBY to return a single value. It would return values for each grouping instance.

If we take a look at the definition for GROUPBY(), we see it returns a table, which makes sense. But if you are new to DAX, this is really unintuitive because DAX works primarily in columns and tables. This is a really hard mental shift, coming from SQL or Excel.

 What do you really want?

None of this made any sense to me. Why would you try to put a GROUPBY in a measure? That’s like trying to return an entire table for a KPI on a dashboard. It just doesn’t make sense. So I asked John what he was trying to do.

He sent me an image of some data he was working with. On the far left is the document id and on the far right is the transaction amount.
Pasted image at 2017_07_06 09_28 AM

He wanted to add another column on the right, that summed up all of the amounts for transactions with the same document. In SQL, you’d probably do this using a Window function with a SUM aggregate, like here.

 Calculated columns versus measures

This highlights another piece of DAX that is unintuitive. You have two ways of adding business logic: calculated columns and measures. The both use DAX, both look similar and are added in slightly different spots.

But semantically and technically, they are very different beasts. Calculated columns are ways of extending the table with new columns. They are very similar to persisted, computed columns in SQL. And they don’t care at all about your filters or front-end, because the data is defined at time of creation or time of refresh. Everything in a calculated column is determined long before you are interacting with them.

Measures on the other hand, are very different. They are kind of like custom aggregate functions, like if you could define your own version of SUM. But to carry the analogy, it would be like if you had a version of SUM that could manipulate the filters you applied in your WHERE clause. It gets weird.

My point is, if you don’t grok the difference between calculated columns and measures, you will never be able to work your way around the problem. You will be forced to grope and stumble, like someone crawling in the dark.

Filter context versus row context

So in this case we’ve determined we actually want to extend the table with a column, not create a free-floating measure. Now we run headlong into our next conceptual problem: evaluation contexts.

In DAX there are two types of evaluation contexts: row contexts and filter contexts. I won’t go too deep here, but they define what a formulas can “see” at any given time, and in DAX there are many ways to manipulate these contexts. This is how a lot of the time intelligence stuff works in DAX.

In this case, because we are dealing with a calculated column, we have only a row context, not filter context. Essentially, the formula can only see stuff in the same row. Additionally, if we use an aggregate like SUM, it only cares about the filter context. But the filter context comes from user interaction. Because this data is defined way before that, there is no filter context.

This is another area, where if you don’t understand these concepts you are SOL. Again, for the newbie, DAX is a pain.

 What’s the solution?

So what is the ultimate solution to his problem? There are probably better ways to do it, but here is a simple solution I figured out.

SUM =
CALCULATE (
    SUM ( Source_data[Amount] ),
    ALL ( Source_data ),
Source_data[Document] = EARLIER ( Source_data[Document] )
)

Walking through it, The CALULATE is used to turn our row context, into a filter context. Then it manipulates that filter context so SUM “sees” only a certain set of rows.

The first manipulation is to run ALL against the table, to undo any filters applied to it. In this case, the only filter is our converted row context. (confused yet?)

The next manipulation is to use EARLIER (which is horribly named) to get the value from the earlier row context. In this case we are filtering ALL the rows, to all of them that have the same document. Then, finally we apply the SUM, which “sees” the newly filtered rows.

Here is what we get as a result:

image

 How do we verify that?

A fourth pain with DAX is that it’s very hard to look at intermediate stages of a process, like you can with SQL or Excel formulas, but in this case we have a way. If we convert our SUM to a CONCATENATEX, we can output all the inputs as a comma separated list. This gives us a slightly better idea of what’s going on.

image (1)

 What’s the point?

My point is, that DAX, despite it’s conciseness and richness is hard to start using. Even basic tasks can require complex concepts, and that was a big frustration point for me. You can’t just google GROUPBY and understand what’s going on.

Again, check out my presentation I did for the PASS BI virtual group. I tried to cover all the annoying parts that people new to DAX will run into. That and buy a book! you’ll need it.

Wrangling GotoWebinar Stats with Power Query: Part one

So, this week I gave my first presentation to image

Ugh.

Power Query to the rescue

Normally this would be a giant pain to work with. When it comes to data quality, this is quite the image

Excel is going to make some assumptions about what is part of the table. This is convenient for our needs, but we’ll have to find a work around when we want to scale to multiple excel files.

image

We can’t tell it we have headers, because it’s going to think that first row is a header. We’ll deal with that later. Once we click OK, we are taken to the Power Query / Power Pivot window.

image

I made a mistake

Hmm, so it looks like I made a mistake. I hope my honesty won’t lose me any image

Trying again

Let’s take a different approach. I’m going to open a blank excel workbook and pull the data into there. Okay, so let’s go to manage under the Power Pivot tab.

image

Next, we are going to click “Get External Data From Other Sources”

image

Then I’m going to scroll to the bottom and select Excel File.

image

Once selected, I only have the whole first sheet as an option. If I had table objects or named ranges, that would be different.

image

Hmmm, I still can’t find a way to edit the Power Query. Fiddlesticks!

Normally, in Power BI it would be right here:

image

Trying to do this in Excel is quite the image

Okay, let’s try opening that Excel file. Ah, much better. Now I want to click Edit at the bottom right.

image

Cleaning the Data

So, First thing we need to do is get rid of all of the non-header rows at the top.

image

To do that, I just select Remove Rows –> Remove Top Rows.

image

Then I specify I want to get rid of the top 7 rows.

image

Next, I want to turn the actual header row into a header.

image

Okay, so now it looks like a real table.

image

Comma Delimited BS

Okay, so now we need to parse out the times someone was watching. The problem is that some people were in and out. Their entries are comma delimited. Ugh.

image

Okay, let’s split them up. I’m going to select Split Column –> By Delimiter

image

Unfortunately, splitting by column a) splits into more columns and b) you have to specify how many.

image

Thankfully, we can select those new columns and unpivot them.

image

Perfect. Now we have a row for every time a person as watching.

image

String parsing

Okay, so now we just need to parse out the dates. First, we are going to split on the dash, and then the parenthesis.

image

This is starting to look good.

image

Now we just need to get rid of the timezone and convert it to a datetime. First we need to select Replace Values.

image

image

Lastly, we select the data type we want.

image

What’s next?

Now that are data is cleaned up, we’ll join to sessions table and do some simple data modeling. But that’s for the next blog post.