Changes to Data Loading in the November 2013 Power Query Update

Last week, Microsoft released a new version of the Power Query preview for Power BI. It’s a significant update and contains some welcome improvements. I’m not going to list them all out here, as the Power BI team has done so here on their blog. I do however want to focus in on the changes to the basic data load procedure, and how queries are built. The process is easier, and significantly faster, but I still have some concerns, which I will get to.

Most of the changes concern data navigation. One of the major changes is the ability to directly use T-SQL when importing data from SQL server. This makes the tools significantly more useful when dealing with complex queries, or if you just happen to be handy with T-SQL. The opportunity to use your query occurs immediately after you select SQL Server as a data source.

image

Once you click OK, you are presented with the new Query editor window. Previously, the navigator and the query were contained in this window, but the “meta” query information, such as the query name and load properties were displayed along with the data in the worksheet. This lead to a fair bit of confusion, and this new change is welcome. You can see this new unified dialog below.

image

It’s worth pointing out a few things about this window. Firstly, at the top, we see the new editor ribbon. All of the functions in the ribbon were previously available in context menus, but the ribbon makes them far more discoverable (and therefore easier to use). In the command line, you can see that the native T-SQL query has been transformed into the internal Power Query language. It can be edited further at this point. On the right of the window you can see that in addition to the Applied Steps list, you can edit the name of the query, and change the load settings. Previously, the load had to be initiated before the settings could be changed, and this is therefore quite welcome. However, I still have some concerns with how this works, and I’d like to expand on it further.

To start with, I believe that the defaults are backwards. Previously (and with an older version of Power Query) I discussed the different ways that you can bypass Excel and load data directly into the data model. This is important not simply because of the row limit in Excel (1,048,576 rows) but because of the limits imposed on Excel Services by Office 365 (10 MB file size). Power BI allows for data models up to 250 MB, but only if the data is contained in the model – the 10 MB Excel limit still applies. I discuss this at length in this article.  With this default, users will load data into the worksheet without thinking about it, upload the file to Power BI, and run straight into the file size limit. I can see this as being very frustrating, and limiting adoption.

Another option might be to warn the user – “your file size is approaching 10 MB – would you like to load data into the model exclusively?”, or something of that nature.

My other concern is that now you don’t necessarily need to see the Query editor at all when importing data. On the surface, this is a good thing, and if the data load defaults were as I suggest, I would cheer it, but we run directly into the same problem right now. Let’s say that I just want to import a lot of data from a single SQL table. From Excel, I will just select the Power Query tab, and select “From SQL Server Database”. Now if I don’t enter a custom query as above, and click OK, I won’t see the editor window. Instead, I see the data navigator in the worksheet.

image 

One big feature to note here is the checkbox on the top of the navigator “Select Multiple Items”. With this version of Power Query, you can now select multiple tables at once – something that was previously unavailable. While you can bring up the editor window by selecting the Edit Query button at the bottom, you can load the data simply by clicking on the Load button. This is simple, but doing so will load all of the data directly into Excel. At best, you wind up with a workbook that is likely too big for Power BI (obviously depending on the amount of data), but at worst you bump into the row limit.

image

Unfortunately, when this happens, it isn’t immediately obvious how to fix the problem – the edit query button disappears. You can edit the query by double clicking on the query name, and changing the load options. However, users may just assume that they can’t use Power BI to do what they need, which is dead wrong. Again, even a few warning dialogs here would help significantly.

I like this new Power Query. It’s easier, has more features, and is significantly faster than its predecessors. I would also like to see it succeed, and I’d hate misunderstandings, or minor UI issues to get in the way of that.