I’ve talked before about the difference between the Power Query Formula language, or M, and the DAX language.
I would describe Power Query as the intern you pay minimum wage or the sous chef, and DAX as the $35 per hour analyst or the head chef. This wasn’t to be mean but instead was just because Power Query was all about automating repetitive data manipulations. It handled the less exciting, less complicated work.
Last week, however, I presented on Power Query, and I had to update the slide about where it’s available. I used to say that wherever DAX is, Power Query was not very far behind. Doing all of the grunt work so that DAX could shine. But this time I had to update my slides because Power Query is starting to take center stage.
Now instead of just being available in Excel, Power BI and SSAS, Power Query is available in Microsoft Flow, SSIS and ADF! At the time this post was published, these are all in preview. But it’s really exciting to see Power Query no longer trailing behind DAX, ready to take center stage.
When I heard about custom data connectors for Power Query, I had assumed there would be a lot of work involved. While there is definitely quite a bit of work in implementing advanced features like query folding, creating your very first connector is simple.
So, first you need Visual studio installed and the Power Query SDK installed as well. Once you do that, you will see Power Query as an option when creating a new project. Visual studio will also have support for .pq or Power Query files.
Once you create a new data connector project, you are presented with two main Power Query files. The first one, is simply a test query you can run on demand to test your connector.
The other file is your data connector. It has a bit of boilerplate to specify the types of credentials it accepts and publishing details such as beta status. Otherwise there is just a little bit of code defining the actual functionality. In this case we are defining the Contents function, which acts as a hello world:
If we run it as is, our test query will be run and we’ll see the results in a testing program.
Adding a function
So now, what if we want to add some more functionality? Say maybe a function to square numbers. First, we’ll add a SquareNumbers.Squared function to the main file:
shared SquareNumbers.Squared = (x as number) =>
y = x * x
Then we update the sample query to call out function:
result = SquareNumbers.Square(7)
And it works as expected:
Exporting the connector
Once you have the connector working the way that you want, run a release build in visual studio. This will create a .mez in the bin/Release folder of your solution. Copy that file to the [Documents]\Power BI Desktop\Custom Connectors folder. You will likely have to create that folder.
Whenever you open Power BI Desktop, it will recognize the connector but won’t let you use it because of security settings.
To get around this, go into the options for Power BI Desktop and then security. Under security, select “Allow any extension to load without validation or warning.” Then Restart Power BI Desktop.
Now we can see it is available in our list of connectors.
By default it will call the Contents function:
But we can easily modify the M code to call our squared function as well.
Which will give us the output we expect.
If you are interested in going deeper with Custom Data Connectors, such as adding a navigation view or query folding, check out the TripPin tutorials.
So, I thought I’d put together a learning path for Power BI, a technology that changes literally every month. This is a bit of challenge because there are so many moving parts when it comes to Power BI. Accordingly, let’s break down those moving parts into different categories.
So, when I think about Power BI, I like to think about the flow of data. First we have the Data prep piece with Power Query, where we clean up dirty data. Next we model the data with DAX. I’ve written before about the difference between Power Query and DAX. They are like peanut butter and jelly and compliment each other well.
Now, if you are a SQL expert, you may not need to worry about Power Query or DAX much. Maybe you do a lot of the work in SQL. But either way, once your data is modeled, you need to visualize it in some way. You need to learn how to create your reports with Power BI Desktop. Once your report is created, you then need to publish it.
Finally, there is what I would call the IT Ops side of Power BI. You have to install an on-premises data Gateway to access local data. You need to license your users. You need to lock down security. All of these things might be outside of what a normal BI developer has to deal with, but are still important pieces. However, unlike the data flow model we talked about, the ops pieces happens at all of the stages of development and deployment.
With that overview in place, let’s get on to the individual sections and the learning paths as a whole.
Getting started with Power BI
When it comes to getting started with Power BI, I have two recommendations. First get your hands dirty, and secondly buy a book. Power BI is in many ways an amalgamation of disparate technologies. It took me a long time to to understand it and it didn’t really click until I took the edX course and did actual labs.
The reason I say to buy a book is this is a technology that is hard to learn piecemeal. When you are starting out you are much better off having a curated tour of things.
Stacia Misner Varga (b|t) has a solid course on Pluralsight. It’s worth a watch.
Consider reading the Applied Power BI by Teo Lachev (b|t). It’s a real deep dive which is great, but can be a lot to take in if you are just getting started. A neat feature is that it’s organized by job role.
Learning Power Query and M
When it comes to self-service data preparation, Power Query is THE tool. The way I describe it is as a macro language for manual data manipulations. If you can pay someone minimum wage to do it in Excel, you can automate it in Power Query. Again, check out this post for the differences between Power Query and DAX.
Matt Masson has a phenomenal deep dive video on the Power Query formula language, a.k.a M, from a year ago. It really helps elucidate the guiding principals of Power Query and M.
Blogs to check out:
Imke Feldmann (b|t) regularly has complex functions and interesting transformations on her blog.
Ken Puls (b|t) focuses on Excel and along with that, Power Query.
Gil Raviv (b|t) often has neat examples of things you can do with Power BI and Power Query.
Chris Webb (b|t) regularly dives into the innards of Power Query and what you can do with it.
Ben Howard (b|t) has a Pluralsight course on Power Query. It’s a bit introductory, but great if you are just getting started.
Gil Raviv recently (October 2018) released a book on Power Query. What I really like about this book is it has more of a progression style instead of a cookbook kind of feel.
Ken Puls and Miguel Escobar (b|t) also have a book on Power query that has a cookbook feel. I found it helpful in learning Power Query, but it’s heavily aimed at excel users.
Finally, Chris Webb also has a book on Power Query. He goes into a lot of detail with it. However, the 2014 publish date means it’s starting to get a bit old.
I always say that DAX is good at two things: aggregating and filtering. You aren’t doing those two things, then DAX is the wrong tool for you. DAX provides a way for you to encapsulate quirky business logic into your data model, so that end users doing have to worry about edge cases and such.
The piece of Power BI that is most prominent are they visuals. While it’s incredibly easy to get started, I find this area to be the most difficult. If you are heavily experience in reporting this shouldn’t be too difficult to learn.
I’ve written before about how to keep up with technology. In the post, I describe 3 currencies we can spend to extend out learning: time, focus and actual money. As you get older, you start to get less time and even less focus, but your pay rate goes up. So, every year it becomes more and more important to learn on curation to find just the good stuff.
As part of that I’m starting my own curated mailing list for BI links. Power BI changes on a monthly basis and it’s such a pain to keep up with it. This week is the 3rd week so far.
So what’s the catch? Well, I’ll also be including whatever things I’m up to at the bottom of each email. So if you don’t like me, maybe don’t sign up, hah. Here is this week’s weekly BI 5:
David Eldersveld talks a bit about #MakeoverMonday. This sounds like a great community program and I always find making things pretty to be the hardest part.
Wolfgang Strasser is keeping track of all the November updates for Power BI. I keep seeing memes about this from Microsoft employees, so I’m expecting something big to drop at Pass Summit.
Ginger Grant continues her series on SSAS best practices. I love seeing posts about how to do things right instead of just how to do the basics. Great stuff.
This week we’ve got another episode of SQLChefs with Bert Wagner, where we talk about the different between datasets, reports and dashboards in Power BI.
What are datasets?
A Power BI Dataset is a series of Power Query queries that have been shaped in a DAX model. Each dataset can combine different files, database tables and online services all into one tabular model. In our cookie analogy, these are all different “ingredients”.
Unlike SSRS, a dataset in Power BI does not represent a single table or query of data. A dataset should be considered more like a “flavor” of data used to accomplish a specific type of reporting: financial, operational, HR, etc. So in our analogy, the dataset is the “raw dough”.
So in Power Query, you are going to have a set of queries which each combine a data source with a usually linear set of transformations.
Then, in DAX, you are going to take each of those outputs and combine them into a model. This consists of defining relationships between the outputted tables and adding business logic via calculated columns and measures.
For more on the difference between Power Query and DAX, see our previous episode of SQLChefs.
What are reports?
A power BI report is a series of visualizations, filters and static elements on a canvas. Power BI reports are saved as a single PBIX file and connect to a single dataset. Remember, a Power BI dataset can have many data sources.
Each report can have multiple sheets, just like an Excel workbook. In our analogy, this is us placing our “cookies” on multiple “cookie sheets” making one big batch, all of the same “flavor”.
One report per dataset
A quick aside to something that used to confuse me. In most cases, a report and a dataset are going to have a one to one relationship. A dataset can have one report and a report can have one data set.
Recently this has changed, however. A while back, they added the ability to use an existing dataset as a data source for a report. and at Ignite they announced the ability to share datasets outside of the app workspace they were made in.
That being said, while you are still learning Power BI, it’s easier to remember that in many cases, your dataset and your report are going to have a one-to-one relationship and be tightly linked.
What are dashboards?
In Power BI, dashboards are a way of pulling together visualizations from various reports. When you think dashboard, you are probably thinking something like Microsoft’s definition: “A Power BI dashboard is a single page, often called a canvas, that uses visualizations to tell a story. Because it is limited to one page, a well-designed dashboard contains only the most-important elements of that story.”
However, if you look at the report example above, it probably fits that definition. It is not a Power BI Dashboard. In Power BI, a dashboard is tool for pinning visuals from different reports and other sources of data.
In my opinion, a Power BI Dashboard is as much a tool for organization and navigation, as it is for actual reporting. I think that’s the real value add with Power BI dashboards.
Last week, I had the pleasure of recording some video with Bert Wagner about Power BI. In the video, I got to use one of my favorite analogies for M versus DAX: Are you chopping broccoli or planning a menu?
One of the challenges with learning Power BI, is that you have to learn not 1, but 2 new data manipulations languages. And it’s not always clear what they are good for, especially if you come from the SQL world.
Is M a general purpose knife, or one of those weird egg slicers?
Head Chefs versus Sous Chefs
I have never worked in the restaurant business, but I’m going to make some gross generalizations anyway.
Sous chefs, as far as I can tell, do a lot of the prep work. They are cutting vegetables, cleaning food, making sauces, etc. While this is all important work, much of it doesn’t inform the final outcome. If you are making beef teriyaki or if you are making broccoli salad, you still need to chop the broccoli.
The head chef however, gets paid for her brains just as much as her hands. The head chef is figuring out the menu and how to combine all of the ingredients. She is involved very heavily with what the final result is going to be. A head chef has to think of the broader goals and strategy of the restaurant, not just how to get the immediate task done.
M is the Sous Chef; DAX is the Head Chef
Again this is all a gross generalization, but in the restaurant called Casa De Meidinger this is actually the case! I do a lot of the grunt work when we cook a meal. My wife says, “zest this lemon” and I mindlessly do it. I could probably be replaced with a robot some day, and that would be fine by me.
Annie, however, actually enjoys planning a meal, deciding what to cook, and thinking about how to make the final product. To me, cooking is just a necessary evil for eating. I don’t necessarily get any joy from the process itself.
Working with M
I like to think of M as this sous chef. It does all the grunt work that we’l like to automate. Let’s say that my boss asks for a utilization report for all of the technicians. What steps am I doing to do in M?
Extract the data from the line of business system
Remove extraneous columns
Enrich the services table with a Billable / NonBillable column
Generate a date table
This is all important work, but I would have to do the same work for a variety of reports. Many of the steps tell me nothing about the final product. I would generate a date table for most of my reports, for example.
Working with DAX
Now, if I’m working DAX, what am I going to do?
Ask what the heck “utilization” really means
This was a real-life example that happened to me. What is utilization as a key metric? Well it turns out it depends what you are trying to report on. A simple definition is usage divided by availability. If a technician billed 20 hours and clocked in 40, his utilization would be 50%. Or so you would think.
How do we handle internal projects? Let’s say we have a technician who billed 2 hours to a customer, but spent 38 hours on an internal database migrations. What was his utilization?Well, if we are looking for billable utilization, it’s 5%. If we are looking for total utilization, it is 100%. These are questions that you are going to encapsulate in your DAX formulas.
The whole idea of a BI semantic layer is to hide away the meaning from the end users. When someone orders a cobb salad, they don’t want to have to articulate the ingredient list. They just want a darn salad.
Are you paid for your hands or your brain?
In the SQL Data Partners podcast, episode 114, there was a question: what’s the difference between a contractor and a consultant. One of the answers was this: a contractor is a set of hands, and a consultant is a set of brains.
I think this answer relates to M versus DAX. M is an automated set of hands, able to do work you’d normally do by hand in Excel. DAX let’s you take your domain knowledge and encode it into a data model. It’s an externalized representation for your brain.
And if you think about it, which do you want to be paid for? Do you want to get paid to unpivot data by hand every week? Or do you want to get paid for thinking, for understanding the business and for working at a higher level.
M allows you to automate the first step, so you can do more of the latter with DAX.