Tag Archives: Power Query

Changes to the Default Data Loading Options in Power Query

This past week, the March 2014 update for Power Query became available. As always, I wasted no time in getting it installed. There are a number of significant new features in this release (all outlined at the above link), but one that caught my attention is the new “Selection Well” for multi table import.

The Selection Well allows you to not only see all of the items that have been selected for import, but also allows each query to be edited before the query is executed. This is a significant enhancement to the previous behaviour in which the queries would be executed once, and only then could be edited.

However, the thing that jumped out at me most was the change to the default load settings. “Load to worksheet” is no longer the default, but “Load to Data Model” is.


Hallelujah! As I’ve pointed out before, I strongly feel that this should be the default given the file size limitations for Excel workbooks in Power BI (still 10 MB outside of the model).  This, on the surface is a very welcome development. However, as it turns out, it’s not quite that simple.

This new default only takes effect when the “Select multiple items” option is selected in the query navigator. (For the record, it is possible to select this option, and then select only one item, and thus take advantage of the new default, but if you need to think about a default, it sort of defeats the purpose). If only one item is selected, then edited, the default is still “Load to worksheet” which I maintain is the wrong default, for all of the reasons outlined in my earlier article.


I presume that the thinking here is that casual users will expect to see their imported data immediately in Excel, but a user is using multiple tables, then they will understand that they need to use a data model, and will want to create relationships. The fear is that casual users will be confused if they don’t immediately see their data in the workbook.

This behaviour is also similar to that found in the import data dialog. If you choose to select multiple tables:


The data will automatically be added to the data model, whether or not Power Pivot has been enabled. The final screen shows that, and unless the “Table” view is selected, the data will be imported only into the model, not the workbook.


I applaud the move within Power Query, but I still feel that load to model should be the default in all cases. The current behaviour is inconsistent. Sometimes it is the default, and sometimes it isn’t, and if you’re not watching, you’ll wind up with the wrong behaviour. I also feel that we should be encouraging Excel’s use as a data client, and discouraging its use as a data base. Leaving the import to worksheet as a default further exacerbated the use of Excel as a “spreadmart” tool.

Friends don’t let friends Load to worksheet.

Scheduled Data Refresh in Power BI

It’s finally here.

Quietly, sometime over the past few days, Microsoft updated the Power BI application in Office 365 along with the Data Management Gateway (get it here). Chief among the changes is the ability to schedule data refresh, which to my thinking, is the single most important feature for deploying Business Intelligence solution in the cloud.

Until now, it has been possible to refresh Excel worksheets with embedded data models on demand, In fact, if your data source was also in the cloud (and was one of the supported data sources), you don’t need Power BI to do it, it’s supported natively in Office 365. If your data source is on-premises (and either Oracle or SQL Server), you can do it through the Data Management Gateway. What has been missing is the ability to have the data model refreshed in the absence of interaction. No longer.

This capability can of course be found in PowerPivot for SharePoint on premises. It is configured on a per workbook basis in the PowerPivot Gallery, which is a PowerPivot focused view of a document library that contains workbooks. In works in much the same manner with Power BI, with the Power BI application taking the place of the Power Pivot gallery.

Configuring Scheduled Refresh

To turn on automatic refresh for a workbook, you need to access the workbook’s BI context menu. To do this, first, open the Power BI application, then locate the workbook that you wish to have refreshed automatically. Click on the ellipsis to access the menu.


A number of items have been added to the menu, and to the preview graphic itself. To the left of the ellipsis is information on when the model was last updated, and the context menu adds the ability to edit in Excel and to add to Q&A as well. However, the feature that we’re interested in is the scheduled data refresh, and selecting that option takes us to the scheduled refresh screen.


If refresh has not already been configured, it opens into the settings tab, otherwise, the history tab will be opened.

To turn on refresh, simply select the “on” slider. If your model has multiple data sources you can choose them to be included or not. As far as I have seen, you can only have one schedule per workbook, so if a data source isn’t included, it simply won’t be updated.

Next, select your refresh schedule, which will be either daily or weekly. By default, your schedule will have a shelf life of 90 days, and will turn off after that time. You can adjust this period by changing the value of the “Ends By” field. You can then select a time (or a day and time if appropriate) for the refresh to occur. Finally, any errors will be sent to the email address that you specify in the notification field.

That’s really all there is to it. Selecting “save settings” will save the schedule, and “save and refresh report” will save the schedule, and attempt to run an immediate refresh.

If the data source is cloud based, it will be queried directly by Power BI, and if it is on premises, it will contact the appropriate Data Management Gateway process and refresh through it. I would love for there to be a little more status information for refreshes in the administration portal, but for now, the refresh will either succeed or fail. However, If the data source is on premises, you can open the Resource Monitor on the gateway machine, and monitor the “diawp.exe” process.


Once the refresh kicks in you’ll notice it using a lot of send bandwidth.

Selecting the “history” tab will of course show the refresh history, and what the refresh schedule for the workbook is. At a glance you can see whether or not refreshes succeeded or failed, how long they took, and how they were initiated.


I should note here that I have been working with the Power BI preview for several months now, and in order to get scheduled refresh to work with on premises data, I did need to install the latest Data Management Gateway. I’m not sure if this was because scheduled refresh required it, or just because it had expired (it had), but I would recommend installing it in any case. Update 10/02/14 – I have been informed that scheduled refresh does not require the latest data management gateway, but I would recommend getting it all the same – it’s the release version.

One interesting side note. After installing the latest DMG, accessing its configuration shows its version to be 1.0, where previous versions were all point releases (the latest being 0.11). I can’t help but assume that the General Availability of Power BI isn’t far away. UPDATE 10/02/14 – In fact, Power BI went GA today, and this is in fact the GA version of the Data Management Gateway.


There are a number of behaviours and limitations that you should be aware of when using scheduled refresh in Power BI. The below items are by no means exhaustive, but simply things that I have either run into, or been made aware of.

Too much data

As I have outlined previously, the maximum size for an embedded workbook model in Power BI is 250 MB. If a user attempts to enable a larger model, they will receive an error message. However, scheduled refresh now allows for the possibility that the model could start small, and then grow to exceed this limit through refresh. What then happens when the limit is exceeded?

When the model is opened for refresh, its size is checked. If it’s OK, the refresh proceeds, and the model is updated.  If the model now exceeds the limits, the next refresh will fail, as will any attempts to work with the file through a browser, until the size of the model is reduced.


Refreshes can take a fair amount of time. During this period, the file is not checked out exclusively to the refresh process, and if it is edited by a user in that time an edit collision could occur. If this situation arises, scheduled refresh will simply discard its updates and fail.


As mentioned above, the two options for schedule frequency are daily and weekly. I was really hoping to see hourly. Monthly and annually would be great too. As it stands, if your data needs to be more current than daily, then Power BI still won’t work for you (without heavy customization). Of course, the reality is that daily is frequent enough for most situations, and this at least puts data refresh on par with its counterpart in PowerPivot for SharePoint.

If someone from the product team is reading this, hourly updates would be my #2 feature ask, for both Power BI and Power Pivot for SharePoint. (for the #1 ask, read on).

Limited Data Sources

At the moment, the refreshable data sources are those that are currently supported by Office 365 in the cloud (Azure SQL, SQL on Azure VMs, and OData feeds with simple or no authentication), and those supported by the Data Management Gateway (SQL Server 205 +, Oracle 10g +). A full list can be found in the official documentation here.

This is a great starting list, but it is limited. There are quite a number of other data sources that would be great to see on this list, multidimensional sources being right up there. However to my thinking, the most glaring omission on this list is Power Query.

The above data sources are supported if the data was imported into the model through Power Pivot’s import feature (or the native features in Excel 2013). However, if a user takes advantage of the many excellent features available in Power Query, their model will not be automatically refreshable. I have already seen in the preview forums that this difference confuses users, and given that Power Query is a highly touted integral component of Power BI, it needs to become a first class citizen, and soon. That’s my #1 ask – again, both for Power BI and Power Pivot for SharePoint.

However, for the moment, what you need to know is that if your model is built with Power Query, it can’t be refreshed automatically.

Limitations aside, it appears to me that Power BI is an absolutely compelling value proposition, and the inclusion of scheduled refresh completes the picture. I can’t wait for it to be released into the wild. Let the games begin! 

Append Multiple Tables in Power Query

Power Query transformations can be very powerful, but they only work on one data source at a time. Sometimes data providers will only provide their data in discrete chunks, like one category per table, or data may come from different providers with the same schema. Ultimately, we want to show these different sources together with different attributes, so that it may all be analyzed simultaneously. Power Query supports this requirement through its “Append” function.

Consider the following scenario. We want to analyze alcohol consumption data. The World Health Organization provides extensive data on this, but it is reported separately for each type of alcohol.


(source: Global Health Observatory Data Repository)

There is a source for total, but it does not break the consumption down by type. What we need to do is to append the four (beer, wine, spirits, other) categories together. To start with, we need to query for each type separately. The data is provided by the WHO as a CSV data file, but it’s directly downloadable, so we will use the “From Web” data source (which makes refresh simple and removes a download step). First we open up Excel, click the Power Query tab then click on the “From Web” external data source. We then enter the URL of our first category (beer) and click OK. The query editor window will then be opened.

We don’t need to do much in the way of transformation, just turn the first data row into headers (by clicking on the upper left grid icon). Then, we give the query a name (Beer), and importantly, we deselect the “Load to worksheet” load setting.


By default, the “Load to worksheet” option is selected (I’ve griped about this elsewhere), but in this case, we don’t want to load the data into the model OR the worksheet. Why not? We’re going to be using this query as an append source with other queries into a final all encompassing appended query, so there’s no point in incurring the data load or storage overhead of the extra data.

Once complete, we repeat this procedure for the other categories. Each of these queries have the same schema, so no transformations need to be made, but keep in mind that there may be cases where we need to do extra work to make sure that the schemas match. Once all of the category queries have been defined, we are ready to perform the append.

From the Power Query tab, we click on the “Append” button which allows us to select two tables.


This will create a new query with the result of the append operation. But wait a minute, we have four tables to merge, and the UI only gives us an option for two. We could append our two other tables together, create another append destination, and then append the two append results together, but that’s very cumbersome, and it certainly doesn’t scale much beyond 4 input sources. The ideal scenario would be to append all four sources in one step. Fortunately, that’s possible with Power Query – it’s just not obvious.

From the query that results from the initial append operation, we can see a formula in the formula editor – Table.Combine({Beer,Wine}).


This formula uses Power Query’s “M” language, and the good news is that not only can it be easily edited, the Table.Combine function takes more than 2 arguments. It’s a simple matter to add in our other queries to the function to get a single append function.


It should be noted that if the queries have a space in their name, it is necessary to refer to them as #”query name” – i.e. #”beer consumption”, etc.   At this point, we give the resultant query a name, and change the load options to load into the data model. Once loaded, we can import any other supporting data, enhance our model, and start analyzing.

This single append also demonstrates that whether or not a particular feature is supported through the user interface, It may be possible to accomplish the goal through some creative M language work. If you’re interested on some more things that can be done with M, I suggest you check out these examples on Chris Webb’s BI blog.

Delete a Shared Power Query in Power BI

One of the very compelling new features in Power BI is the ability to share a query. Essentially what happens is that a user will build up a Power Query Query with all of the necessary transformations and filters, and when it is complete, the query itself (not the data) can be published to the data catalogue. To do so, from the query window, hover over your query to see the fly out dialog, click the ellipsis at the bottom, and select “Share”.


Once shared, if blessed by the data steward, the query is added to the data catalogue, and can be discovered by others simply by searching for it from the Power Query Online Search interface


You can also see all of the queries that are shared to the organization. To do this, from a Power BI app, first, click on the “Manage Data” link at the top of the browser window.


Once in the Manage Data section, you can select “my queries”, where you will be presented with a list of all shared queries. You can search for queries, or even analyze your query’s usage patterns.


However, one thing that you cannot do is delete the query. This can’t be done from within the management application, it must be done within the Power Query application itself. To delete a shared query, first, open Excel, click on the Power Query tab, and then click on the Shared button in the ribbon. This will open up your list of shared queries in the data catalogue. Next, hover over the query to be deleted, wait for the query fly out dialog to open, and then click the delete link.


Reopening the data catalogue in the manage data app will show the query to have been deleted. Pretty straightforward, but also fairly well hidden.

Power Pivot Updates to Excel 2013 – November 2013

Did you notice the update to Power Pivot in Excel 2013? Did you know that one was available? Neither did I until I heard about it 2 weeks ago. It certainly came without fanfare, and I’ve only finally gotten around to getting it on my system recently (more on that later). The only visible change that I can see is the support for synonyms. You have always been able to rename columns in Power Pivot, but now you can specify alternate names, or synonyms. This is to better support the upcoming natural language query feature called Power Q&A.

You’ll know if you have this update by clicking on the Power Pivot tab, clicking the manage button, and finally selecting the Advanced tab from the Power Pivot window. If you see the synonyms button in the ribbon, then you have the update.


Working with synonyms is pretty straightforward. I have a table of airline codes loaded, and the table has a “comments” column. I may want to refer to that column as “notes”, or “other information”. To do so,I click the synonyms button in the ribbon. Power Pivot flips to diagram view, and opens up the synonyms editor on the right side of the window. For the “Comments” field, I simply enter my alternate terms separated by commas. And that’s about it.


Once it is available, Power Q&A will make use of these synonyms when performing natural language queries, but for now, they’re not really used for anything. What I find interesting here though is the way that this update has been delivered. 

As I mentioned above, I was unaware of this update until I heard about it through word of mouth. It was not added through Windows update of WSUS, but it was streamed out to users that are using subscription based Office installs from Office 365. In particular, it was sent to users that installed Office using the new Click-To-Run delivery method. I was using an MSI based Office installation, so I never saw it. In order to get this update, I literally had to uninstall Office, then reinstall it using Click-To-Run. As far as I am aware, this is the only way to receive this update.

I also find it odd that while client side changes to Lync were called out in the “What’s New: November 2013” article on the Office 365 Technology blog, no mention whatsoever was made of this change. It’s almost as though this delivery system is being tested with a low impact feature first. I also find it interesting in that the new “cloud first” deployment approach applies not only to the services themselves, as you might expect, but to the clients that use those services. It makes sense, but may take some getting used to.

I’ve been one of those MSI holdouts. I’m an old dog, and Click-To-Run is a new trick, but I’ll be using it moving forward. There’s really no reason not to.