Skip to content

Category: Power BI

Append Multiple Tables in Power Query

Power Query transformations can be very powerful, but they only work on one data source at a time. Sometimes data providers will only provide their data in discrete chunks, like one category per table, or data may come from different providers with the same schema. Ultimately, we want to show these different sources together with different attributes, so that it may all be analyzed simultaneously. Power Query supports this requirement through its “Append” function.

Consider the following scenario. We want to analyze alcohol consumption data. The World Health Organization provides extensive data on this, but it is reported separately for each type of alcohol.

image

(source: Global Health Observatory Data Repository)

There is a source for total, but it does not break the consumption down by type. What we need to do is to append the four (beer, wine, spirits, other) categories together. To start with, we need to query for each type separately. The data is provided by the WHO as a CSV data file, but it’s directly downloadable, so we will use the “From Web” data source (which makes refresh simple and removes a download step). First we open up Excel, click the Power Query tab then click on the “From Web” external data source. We then enter the URL of our first category (beer) and click OK. The query editor window will then be opened.

We don’t need to do much in the way of transformation, just turn the first data row into headers (by clicking on the upper left grid icon). Then, we give the query a name (Beer), and importantly, we deselect the “Load to worksheet” load setting.

image

By default, the “Load to worksheet” option is selected (I’ve griped about this elsewhere), but in this case, we don’t want to load the data into the model OR the worksheet. Why not? We’re going to be using this query as an append source with other queries into a final all encompassing appended query, so there’s no point in incurring the data load or storage overhead of the extra data.

Once complete, we repeat this procedure for the other categories. Each of these queries have the same schema, so no transformations need to be made, but keep in mind that there may be cases where we need to do extra work to make sure that the schemas match. Once all of the category queries have been defined, we are ready to perform the append.

From the Power Query tab, we click on the “Append” button which allows us to select two tables.

image

This will create a new query with the result of the append operation. But wait a minute, we have four tables to merge, and the UI only gives us an option for two. We could append our two other tables together, create another append destination, and then append the two append results together, but that’s very cumbersome, and it certainly doesn’t scale much beyond 4 input sources. The ideal scenario would be to append all four sources in one step. Fortunately, that’s possible with Power Query – it’s just not obvious.

From the query that results from the initial append operation, we can see a formula in the formula editor – Table.Combine({Beer,Wine}).

image

This formula uses Power Query’s “M” language, and the good news is that not only can it be easily edited, the Table.Combine function takes more than 2 arguments. It’s a simple matter to add in our other queries to the function to get a single append function.

image

It should be noted that if the queries have a space in their name, it is necessary to refer to them as #”query name” – i.e. #”beer consumption”, etc.   At this point, we give the resultant query a name, and change the load options to load into the data model. Once loaded, we can import any other supporting data, enhance our model, and start analyzing.

This single append also demonstrates that whether or not a particular feature is supported through the user interface, It may be possible to accomplish the goal through some creative M language work. If you’re interested on some more things that can be done with M, I suggest you check out these examples on Chris Webb’s BI blog.

9 Comments

Delete a Shared Power Query in Power BI

One of the very compelling new features in Power BI is the ability to share a query. Essentially what happens is that a user will build up a Power Query Query with all of the necessary transformations and filters, and when it is complete, the query itself (not the data) can be published to the data catalogue. To do so, from the query window, hover over your query to see the fly out dialog, click the ellipsis at the bottom, and select “Share”.

image

Once shared, if blessed by the data steward, the query is added to the data catalogue, and can be discovered by others simply by searching for it from the Power Query Online Search interface

image

You can also see all of the queries that are shared to the organization. To do this, from a Power BI app, first, click on the “Manage Data” link at the top of the browser window.

image

Once in the Manage Data section, you can select “my queries”, where you will be presented with a list of all shared queries. You can search for queries, or even analyze your query’s usage patterns.

image

However, one thing that you cannot do is delete the query. This can’t be done from within the management application, it must be done within the Power Query application itself. To delete a shared query, first, open Excel, click on the Power Query tab, and then click on the Shared button in the ribbon. This will open up your list of shared queries in the data catalogue. Next, hover over the query to be deleted, wait for the query fly out dialog to open, and then click the delete link.

image

Reopening the data catalogue in the manage data app will show the query to have been deleted. Pretty straightforward, but also fairly well hidden.

Leave a Comment

Whither Power Pivot for SharePoint? A Comparison With Power BI

Power BI is a hot topic within the Microsoft Business Intelligence community. Since it was announced last July at the Worldwide Partner Conference in Houston, it’s pretty well been the only thing that has gotten discussed with regard to Microsoft BI. There are good reasons for this, the addition of a mobile BI story, its ease of use, and it’s compelling new features (Power Q&A!) make it the shiny new toy. I’ve certainly been paying a great deal of attention to it, but what about the more traditional products, in particular Power Pivot for SharePoint?

One of the big questions around Power BI is whether or not there will be an on premises version of Power BI. Whenever asked, Microsoft responds with “Power BI will initially be available through Office 365”. This answer causes concern to those with requirements that can not or may not be met by a cloud based solution. Many worry that Microsoft’s move into devices and services are leaving on premises installations behind. I’ve been of the opinion that this is evidence of a “cloud first” release strategy, as opposed to a “cloud only” release strategy. Recent statements by Microsoft officials would tend to confirm this, but the question should be asked, does it even make sense to bring Power BI on premises?

A little explanation is in order. The collection of tools that is Power BI is centred around the x-Velocity data model that is part of Excel 2013, and available through Power Pivot in Excel 2010. All of the client based design tools can be used with Excel without the need for a Power BI license. With Power Pivot for SharePoint, it has been possible to interact with these data models through a browser for several years already, and with the 2013 updates to Office 365, it is even possible to interact with these models in the cloud. What has been missing from the Office 365 BI story has been an automatic way to keep on premises data refreshed, and the ability to work with large models.

While these two capabilities have been available on-prem for years through Power Pivot for SharePoint, they are only coming available to Office 365 now with Power BI. It doesn’t really make sense to bring these capabilities on-prem when they already exist. However, complicating this picture is the host of new capabilities that are being introduced by Power BI. In many ways, it’s a “leapfrog” product, filling in gaps in some areas, while moving forward in others. A comparison of the two products can be seen below.

image

Let’s walk through these features. Obviously both products work with the embedded x-Velocity data models. Power Pivot for SharePoint from SQL Server 2012 SP1 can render Power View in Excel, as can Power BI. Power View has some interesting variations however. Through the Power View that is available via Power Pivot gallery, live Power View reports can be exported to PowerPoint decks. This feature is not available through Excel Power View, or through Power BI. On the flip side, on-prem Power View reports (both types) use Silverlight for rendering, whereas Power BI will allow both Silverlight and HTML 5 renderings (confused yet?).

Both Power Pivot for SharePoint and Power BI are powered by an Analysis Services engine. The Power Pivot gallery available on-prem provides for the ability to connect to that engine through Excel with an Analysis Services connection. This makes the embedded model created in one workbook available to Excel clients as what appears to be a data cube. This is not available through Power BI, although the OData publishing features fills that gap somewhat.

The ability to refresh the data in the embedded model is critical and is to my mind, the most important feature in Power BI. However, at best, this brings it to parity with Power Pivot for SharePoint. For the moment it supports only SQL Server on-prem data sources where Power Pivot for SharePoint supports all Power Pivot data sources for refresh. As of this writing (December 2013) neither product supports the refresh of Power Query data sources, but this has been promised for Power BI “soon”. No announcement has yet been made as to the refresh of Power Query data sources on-prem.

The default maximum file size for SharePoint 2013 is 200 MB, and the default maximum workbook size for Excel Services is 10 MB. These values can be changed on prem, making the maximum possible size for a data model equal to the maximum possible file size in SharePoint – 2 GB. This equates to the maximum file size in Office 365 as well, but that 10 MB Excel Services limit can’t be changed in Office 365. Power BI supports model sizes up to 250 MB by removing the model portion from the workbook, and housing it in an Analysis Services instance, allowing the workbook to remain within the 10 MB limit. It’s wonderful to be able to move beyond the 10 MB limit that we’ve had, but it’s not without its limits.

Both products have a thumbnail gallery, but the one available through Power BI sites is arguably more sophisticated, and it doesn’t rely on Silverlight for rendering. The rest of the feature set outlined above is all in Power BI’s favour. Power Pivot SharePoint will optimize workbooks (move the data model into Analysis Services) on first interaction, but Power BI can do that ahead of time, minimizing user inconvenience. The rest of the feature set, OData feeds for on-prem data, Power Q&A, and mobile clients are only available with Power BI.

The fact that Power BI for SharePoint on premises has not been announced isn’t as disappointing as it may seem. Parts of it, the Data Management Gateway in particular, aren’t even necessary in an on-prem scenario. This is pure speculation on my part, but if I had to bet, I would expect to see the relevant features from Power BI (Power Q&A, OData publishing) put into Power Pivot for SharePoint in a future release of that product. It also wouldn’t surprise me to see it renamed to Power BI for SharePoint. As to when this could happen I have absolutely no idea, but we should keep in mind that this is a product that ships with SQL Server, not with SharePoint, and I haven’t heard of anything like this in the previews of SQL Server 2014.

3 Comments

Changes to the Power BI Data Management Gateway – Nov 2013

Several of the Power BI preview components were updated last week, most notably, Power Query and the administration app in BI sites. Without much fanfare, the Data Management Gateway was updated as well. There is a fair bit of documentation on what’s new in Power Query, and I’ve added my own thoughts here. There’s also some documentation outlining the changes to the the admin application here. However, although I looked, I haven’t found anything for the DMG.

I therefore decided to poke around a bit myself, and discovered a few things. This should obviously not be taken as a comprehensive list, and if I find anything else, I’ll update it. What follows are the changes that I’ve observed.

Performance Improvements With Azure data sources

In my article on working with the data management gateway, I observed that refreshing a relatively large model (1MM rows, 20 columns) required upwards of 10 minutes. After updating the my version of DMG, refresh required only about 1.5 minutes. I am told that performance was a focus for the team, and that focus appears to have paid off.

No more Gateway Limits

If you set up an on premises data source, you can expose it as an OData feed, and consume it with Power Query. In early testing, users would bump into a limit with large data sets. As soon as the data stream hit 100 MB, an error would result. This was due to a built in limit in the gateway. In current testing, it appears that this limit has been completely removed in this version.

Oracle support

Until now, the only data refresh support has been for SQL server. In this version of the gateway, Oracle support has been added. I haven’t tested it, as I don’t have any Oracle handy, but it’s there.

image

Credential storage and use

When creating a data connection, it is necessary to input credentials. These credentials are used to connect to the original data source when the model is refreshed. Due to security constraints, Microsoft cannot “see” the credentials, and they are stored locally. With this version of the gateway, there is now an option to save the (encrypted) credentials in the cloud. This helps with restoring gateways. 

System Health OData Feed

In the admin center, it’s possible to get quick visibility into the performance of all gateway machines.

image

With the November 2013, you can also get this performance information via an OData feed, and perform your own analysis, and use tools like Power View to visualize it.

image

This is what I’ve uncovered so far. I’ll update this post if I learn of any more.

Leave a Comment

Changes to Data Loading in the November 2013 Power Query Update

Last week, Microsoft released a new version of the Power Query preview for Power BI. It’s a significant update and contains some welcome improvements. I’m not going to list them all out here, as the Power BI team has done so here on their blog. I do however want to focus in on the changes to the basic data load procedure, and how queries are built. The process is easier, and significantly faster, but I still have some concerns, which I will get to.

Most of the changes concern data navigation. One of the major changes is the ability to directly use T-SQL when importing data from SQL server. This makes the tools significantly more useful when dealing with complex queries, or if you just happen to be handy with T-SQL. The opportunity to use your query occurs immediately after you select SQL Server as a data source.

image

Once you click OK, you are presented with the new Query editor window. Previously, the navigator and the query were contained in this window, but the “meta” query information, such as the query name and load properties were displayed along with the data in the worksheet. This lead to a fair bit of confusion, and this new change is welcome. You can see this new unified dialog below.

image

It’s worth pointing out a few things about this window. Firstly, at the top, we see the new editor ribbon. All of the functions in the ribbon were previously available in context menus, but the ribbon makes them far more discoverable (and therefore easier to use). In the command line, you can see that the native T-SQL query has been transformed into the internal Power Query language. It can be edited further at this point. On the right of the window you can see that in addition to the Applied Steps list, you can edit the name of the query, and change the load settings. Previously, the load had to be initiated before the settings could be changed, and this is therefore quite welcome. However, I still have some concerns with how this works, and I’d like to expand on it further.

To start with, I believe that the defaults are backwards. Previously (and with an older version of Power Query) I discussed the different ways that you can bypass Excel and load data directly into the data model. This is important not simply because of the row limit in Excel (1,048,576 rows) but because of the limits imposed on Excel Services by Office 365 (10 MB file size). Power BI allows for data models up to 250 MB, but only if the data is contained in the model – the 10 MB Excel limit still applies. I discuss this at length in this article.  With this default, users will load data into the worksheet without thinking about it, upload the file to Power BI, and run straight into the file size limit. I can see this as being very frustrating, and limiting adoption.

Another option might be to warn the user – “your file size is approaching 10 MB – would you like to load data into the model exclusively?”, or something of that nature.

My other concern is that now you don’t necessarily need to see the Query editor at all when importing data. On the surface, this is a good thing, and if the data load defaults were as I suggest, I would cheer it, but we run directly into the same problem right now. Let’s say that I just want to import a lot of data from a single SQL table. From Excel, I will just select the Power Query tab, and select “From SQL Server Database”. Now if I don’t enter a custom query as above, and click OK, I won’t see the editor window. Instead, I see the data navigator in the worksheet.

image 

One big feature to note here is the checkbox on the top of the navigator “Select Multiple Items”. With this version of Power Query, you can now select multiple tables at once – something that was previously unavailable. While you can bring up the editor window by selecting the Edit Query button at the bottom, you can load the data simply by clicking on the Load button. This is simple, but doing so will load all of the data directly into Excel. At best, you wind up with a workbook that is likely too big for Power BI (obviously depending on the amount of data), but at worst you bump into the row limit.

image

Unfortunately, when this happens, it isn’t immediately obvious how to fix the problem – the edit query button disappears. You can edit the query by double clicking on the query name, and changing the load options. However, users may just assume that they can’t use Power BI to do what they need, which is dead wrong. Again, even a few warning dialogs here would help significantly.

I like this new Power Query. It’s easier, has more features, and is significantly faster than its predecessors. I would also like to see it succeed, and I’d hate misunderstandings, or minor UI issues to get in the way of that. 

4 Comments
%d bloggers like this: