In a recent post, I discussed that while Power BI sites allow for data models up to 250 MB, the size of the worksheets portion of any workbook cannot exceed 10 MB. This is because of the fact that while the data model is passed to Analysis Services for processing, the worksheet itself is still subject to the hard 10 MB limit imposed by Excel Services. It is therefore vital to keep a minimum amount of data in the workbook.
The best way to minimize the workbook size is by storing the data exclusively in the data model. There are a number of data import methods in Excel/Power Pivot, including the new Power Query. Doing this is more obvious in some than others.
Power Pivot Import
Prior to Excel 2013 and/or Power Query, importing into the model exclusively was almost default behaviour (yes it was possible to create models from Excel data, but that wasn’t the most common use case). You would invoke the Power Pivot add-in by clicking on the tab, click the manage button, and you would be taken to the Power Pivot management window where you could import data from a wide variety of sources
This approach still works just fine. Data imported using this method will be brought directly into the model, which is what we want to achieve. However, Excel 2013 now supports the data model by default, and allows data to be imported to it through several approaches. The Power Pivot editor is included, but is not enabled by default, so it is not immediately obvious that it is an option.
With Excel 2013, you can now use native Excel functionality to get data into the model. The most familiar way to do this is through the regular Excel data import interface. You click on the Data tab in Excel, choose “From Other Sources”, and select your data source.
Once done, and the connection is selected, you are presented with the Data Connection Wizard. There is a subtle change to the wizard in Excel 2013 when compared to previous versions, the addition of “Enable selection of multiple tables”.
Selecting this option will of course allow you to import multiple tables simultaneously, but what it will also do is to automatically add those table to the data model for you. However if you don’t select this option, there is still an opportunity to add to the data model. After clicking next and creating a connection, you are presented with the Import Data dialog.
At this point, if you only selected one table, you can select the option to add it to the model here. If you had chosen the multi table option earlier, this option would be greyed out and selected – you have no choice. However, the most important part of this dialog is the “Select how you want to view this data” section. By default, Table is selected. If it is left selected, data will be imported directly into the model, but it will also be imported as a table into the worksheet. This is a BAD THING. Yes, it’s what users are accustomed to, but instead of just storing the data in a nice tiny compressed data model, you’re also storing it in a big fat worksheet, and it won’t take much data to exceed your 10 MB Office 365 cap. In addition to that issue, by doing this, you also limit yourself to about a million rows of data (Excel limit) instead of the hundreds of millions of rows that the model can handle.
Any of the other options will allow you to import directly into the data model, bypassing the worksheet altogether. These options are your friends. Use them.
Power Query Import
The latest, and most impressive method of importing data into Excel is Power Query. A description of Power Query is beyond the scope of this article, but it allows for relational drill down, complex transformations, nested queries and a host of other options in addition to a wide variety of new data sources (Facebook, Hadoop, etc). It’s also the ONLY way to consume data served up by the on-premises Data Management Gateway component of Power BI. It provides a very wide array of data import options to the end user.
However, one of the problems with the tool is that it tends to promote the importing of data into worksheets, which is a practice that in my opinion should be strongly discouraged. It is certainly easily possible to avoid this behaviour, as I’m going to demonstrate shortly, but it does require that the user be aware of the importance of this. Importing to the worksheets should be an option, not a default as it currently is. My concern is that far too many people will build a large fancy report that winds up being over the 10 MB workbook limit and publishing it to Office 365, only to have Power BI fail because it’s too large, and then give up in frustration.
To import data directly into the model with Power Query, first click on the Power Query tab in Excel, and then select your data source. After entering the server/database specifics and your credentials if necessary, you are presented with the Power Query Editor dialog.
After performing any necessary transformations, you are ready to import your data. To do so simply click the Done button. You will then be returned to Excel, which will immediately import your data into a worksheet, and open up the Query Setting window. So – what happens when the data source exceeds 1,048,567 rows of data? Pretty much what you would expect – you receive an error. A pretty explicit one at that.
In this case, the data is not brought into the worksheet, but the query is still defined, so the model can still be successfully populated by clicking the “Load to data model” link. However, if the source data does fall within Excel’s parameters, it will be brought in to the worksheet.
In order to load it into only the data model, we must first deselect the Load to worksheet slider, and then click the Load to data model link. (Note: I have no idea why these are two different control styles). The first option can be selected as the data is loading, so you don’t have to wait for the load to complete. If you select the first and not the second, the data is not loaded anywhere. This is a perfectly valid situation. With Power Query, you can base queries on other queries, or append/merge queries. By doing this, you can load only the end result, and not the intermediate queries.
Once done, the worksheet will display the “Load to worksheet disabled” message in place of the result set.
However, opening the Power Pivot management window will display the imported data, and you can work with the model.
In summary, Power Query brings many new capabilities to the data loading and transformation process. However, with great power comes great responsibility. Unless Microsoft makes a change to the default behaviour of the Power Query import process, I’ll be telling anyone that listens to make sure that they always turn off the “Load to Worksheet” option if they’ll be publishing to Office 365.