At the recent SharePoint Conference 2014, I had the opportunity to be interviewed by Karuana Gatimu of Microsoft IT and Channel 9. The original video can be found on Channel 9 here, and I include it below as well. In it, we discuss the Microsoft Business Intelligence stack, the impact of the new Power BI products, and how to get started.
It’s finally here.
Quietly, sometime over the past few days, Microsoft updated the Power BI application in Office 365 along with the Data Management Gateway (get it here). Chief among the changes is the ability to schedule data refresh, which to my thinking, is the single most important feature for deploying Business Intelligence solution in the cloud.
Until now, it has been possible to refresh Excel worksheets with embedded data models on demand, In fact, if your data source was also in the cloud (and was one of the supported data sources), you don’t need Power BI to do it, it’s supported natively in Office 365. If your data source is on-premises (and either Oracle or SQL Server), you can do it through the Data Management Gateway. What has been missing is the ability to have the data model refreshed in the absence of interaction. No longer.
This capability can of course be found in PowerPivot for SharePoint on premises. It is configured on a per workbook basis in the PowerPivot Gallery, which is a PowerPivot focused view of a document library that contains workbooks. In works in much the same manner with Power BI, with the Power BI application taking the place of the Power Pivot gallery.
Configuring Scheduled Refresh
To turn on automatic refresh for a workbook, you need to access the workbook’s BI context menu. To do this, first, open the Power BI application, then locate the workbook that you wish to have refreshed automatically. Click on the ellipsis to access the menu.
A number of items have been added to the menu, and to the preview graphic itself. To the left of the ellipsis is information on when the model was last updated, and the context menu adds the ability to edit in Excel and to add to Q&A as well. However, the feature that we’re interested in is the scheduled data refresh, and selecting that option takes us to the scheduled refresh screen.
If refresh has not already been configured, it opens into the settings tab, otherwise, the history tab will be opened.
To turn on refresh, simply select the “on” slider. If your model has multiple data sources you can choose them to be included or not. As far as I have seen, you can only have one schedule per workbook, so if a data source isn’t included, it simply won’t be updated.
Next, select your refresh schedule, which will be either daily or weekly. By default, your schedule will have a shelf life of 90 days, and will turn off after that time. You can adjust this period by changing the value of the “Ends By” field. You can then select a time (or a day and time if appropriate) for the refresh to occur. Finally, any errors will be sent to the email address that you specify in the notification field.
That’s really all there is to it. Selecting “save settings” will save the schedule, and “save and refresh report” will save the schedule, and attempt to run an immediate refresh.
If the data source is cloud based, it will be queried directly by Power BI, and if it is on premises, it will contact the appropriate Data Management Gateway process and refresh through it. I would love for there to be a little more status information for refreshes in the administration portal, but for now, the refresh will either succeed or fail. However, If the data source is on premises, you can open the Resource Monitor on the gateway machine, and monitor the “diawp.exe” process.
Once the refresh kicks in you’ll notice it using a lot of send bandwidth.
Selecting the “history” tab will of course show the refresh history, and what the refresh schedule for the workbook is. At a glance you can see whether or not refreshes succeeded or failed, how long they took, and how they were initiated.
I should note here that I have been working with the Power BI preview for several months now, and in order to get scheduled refresh to work with on premises data, I did need to install the latest Data Management Gateway . I’m not sure if this was because scheduled refresh required it, or just because it had expired (it had), but I would recommend installing it in any case. Update 10/02/14 – I have been informed that scheduled refresh does not require the latest data management gateway, but I would recommend getting it all the same – it’s the release version.
One interesting side note. After installing the latest DMG, accessing its configuration shows its version to be 1.0, where previous versions were all point releases (the latest being 0.11). I can’t help but assume that the General Availability of Power BI isn’t far away. UPDATE 10/02/14 – In fact, Power BI went GA today, and this is in fact the GA version of the Data Management Gateway.
There are a number of behaviours and limitations that you should be aware of when using scheduled refresh in Power BI. The below items are by no means exhaustive, but simply things that I have either run into, or been made aware of.
Too much data
As I have outlined previously, the maximum size for an embedded workbook model in Power BI is 250 MB. If a user attempts to enable a larger model, they will receive an error message. However, scheduled refresh now allows for the possibility that the model could start small, and then grow to exceed this limit through refresh. What then happens when the limit is exceeded?
When the model is opened for refresh, its size is checked. If it’s OK, the refresh proceeds, and the model is updated. If the model now exceeds the limits, the next refresh will fail, as will any attempts to work with the file through a browser, until the size of the model is reduced.
Refreshes can take a fair amount of time. During this period, the file is not checked out exclusively to the refresh process, and if it is edited by a user in that time an edit collision could occur. If this situation arises, scheduled refresh will simply discard its updates and fail.
As mentioned above, the two options for schedule frequency are daily and weekly. I was really hoping to see hourly. Monthly and annually would be great too. As it stands, if your data needs to be more current than daily, then Power BI still won’t work for you (without heavy customization). Of course, the reality is that daily is frequent enough for most situations, and this at least puts data refresh on par with its counterpart in PowerPivot for SharePoint.
If someone from the product team is reading this, hourly updates would be my #2 feature ask, for both Power BI and Power Pivot for SharePoint. (for the #1 ask, read on).
Limited Data Sources
At the moment, the refreshable data sources are those that are currently supported by Office 365 in the cloud (Azure SQL, SQL on Azure VMs, and OData feeds with simple or no authentication), and those supported by the Data Management Gateway (SQL Server 205 +, Oracle 10g +). A full list can be found in the official documentation here.
This is a great starting list, but it is limited. There are quite a number of other data sources that would be great to see on this list, multidimensional sources being right up there. However to my thinking, the most glaring omission on this list is Power Query.
The above data sources are supported if the data was imported into the model through Power Pivot’s import feature (or the native features in Excel 2013). However, if a user takes advantage of the many excellent features available in Power Query, their model will not be automatically refreshable. I have already seen in the preview forums that this difference confuses users, and given that Power Query is a highly touted integral component of Power BI, it needs to become a first class citizen, and soon. That’s my #1 ask – again, both for Power BI and Power Pivot for SharePoint.
However, for the moment, what you need to know is that if your model is built with Power Query, it can’t be refreshed automatically.
Limitations aside, it appears to me that Power BI is an absolutely compelling value proposition, and the inclusion of scheduled refresh completes the picture. I can’t wait for it to be released into the wild. Let the games begin!
Did you notice the update to Power Pivot in Excel 2013? Did you know that one was available? Neither did I until I heard about it 2 weeks ago. It certainly came without fanfare, and I’ve only finally gotten around to getting it on my system recently (more on that later). The only visible change that I can see is the support for synonyms. You have always been able to rename columns in Power Pivot, but now you can specify alternate names, or synonyms. This is to better support the upcoming natural language query feature called Power Q&A.
You’ll know if you have this update by clicking on the Power Pivot tab, clicking the manage button, and finally selecting the Advanced tab from the Power Pivot window. If you see the synonyms button in the ribbon, then you have the update.
Working with synonyms is pretty straightforward. I have a table of airline codes loaded, and the table has a “comments” column. I may want to refer to that column as “notes”, or “other information”. To do so,I click the synonyms button in the ribbon. Power Pivot flips to diagram view, and opens up the synonyms editor on the right side of the window. For the “Comments” field, I simply enter my alternate terms separated by commas. And that’s about it.
Once it is available, Power Q&A will make use of these synonyms when performing natural language queries, but for now, they’re not really used for anything. What I find interesting here though is the way that this update has been delivered.
As I mentioned above, I was unaware of this update until I heard about it through word of mouth. It was not added through Windows update of WSUS, but it was streamed out to users that are using subscription based Office installs from Office 365. In particular, it was sent to users that installed Office using the new Click-To-Run delivery method. I was using an MSI based Office installation, so I never saw it. In order to get this update, I literally had to uninstall Office, then reinstall it using Click-To-Run. As far as I am aware, this is the only way to receive this update.
I also find it odd that while client side changes to Lync were called out in the “What’s New: November 2013” article on the Office 365 Technology blog, no mention whatsoever was made of this change. It’s almost as though this delivery system is being tested with a low impact feature first. I also find it interesting in that the new “cloud first” deployment approach applies not only to the services themselves, as you might expect, but to the clients that use those services. It makes sense, but may take some getting used to.
I’ve been one of those MSI holdouts. I’m an old dog, and Click-To-Run is a new trick, but I’ll be using it moving forward. There’s really no reason not to.
I’ve been writing a fair bit lately about specific aspects of Power BI, and how they all fit together. I decided to put together a “real world” demonstration of as many aspects of the product as possible, and this article is the result. I don’t dive too deep on any of the specific areas, but hopefully this walkthrough will help give a sense of the power and simplicity of working with this (these?) tool (s).
In this example, we will explore some publicly available data published by the US National Oceanic and Atmospheric Administration (NOAA). To do so, we will first use Power Query to acquire and transform the data. Next, we will utilize Power Pivot to build our model, and then create a Power Pivot and a Power Map to visualize it. Finally, we will publish the data to a Power BI enabled site where it can be consumed by anyone with a browser or a mobile client.
Data Acquisition and Transformation with Power Query
The NOAA keeps a record from about 800 weather stations around the world. The data collected includes daily maximum and minimum temperatures, rainfall, and a host of other measures. All of this data is available through their web/ftp site here. Well it should be. As I write this, the site is offline, as are most other US government websites. Luckily, I have a local copy of all the files. Feel free to contact me for a copy if they’re not back soon. Update – For the interim, I’ve made the files available here.
The way that the data is provided is via a series of text files. Each text file in a folder represents a single weather station, and each row contains all the data for a given measure for a month. The data are not delimited, they are distinguished by their position in the row, with the data for each day in the month being presented as a separate measure. Normally, a tool like SSIS would be required to transform the data into something that could be analyzed, but as you’ll shortly see Power Query is up to the task.
Unfortunately, Power Query cannot work with FTP sources, but it’s a rather simple matter to copy the files from the FTP site to a local drive, and there are tools that can keep the content synchronized for refresh purposes. Note, if anyone from Microsoft is reading this, I believe that FTP and WebDAV folders would be a great supplement to the Folder data source, particularly when it comes to automatic refresh. Once we have the files locally, we can begin the import process. First, we open a new Excel workbook, select the Power Query tab, and then in the “From File” dropdown, we select the new feature “From Folder”.
Next, we select the folder that contains all of the data files. Power Query returns the data for the files in the folder.
In our case, the file name does correspond to the ID of the station, but the rest of the file metadata is not that useful to us. However, what is useful is the fact that using Power Query, we can drill into the files. Depending on the requirement, this can be done one of two ways. By clicking on the word “Binary” in the Content column, you will be able to drill into that file, and subsequent operations will be performed on that file. However, if you click on the little drilldown icon at the right of the Content column header, you will drill into all files in the folder. Subsequent operations will be performed on all files in the folder, and refresh operations will pick up any new files.
It should be noted that for this to work, all files in the folder must have the same schema. (In addition to data like this, this feature is particularly useful for parsing and iterating through server log files.) Since we want all station data, this is the operation that we will perform next. What we get back is a column of seemingly randomly structured data.
Luckily for us, there is a key to the data structure (found in the file readme.txt if you’re following along). The basic key pattern is:
------------------------------ Variable Columns Type ------------------------------ ID 1-11 Character YEAR 12-15 Integer MONTH 16-17 Integer ELEMENT 18-21 Character VALUE1 22-26 Integer MFLAG1 27-27 Character QFLAG1 28-28 Character SFLAG1 29-29 Character VALUE2 30-34 Integer MFLAG2 35-35 Character QFLAG2 36-36 Character SFLAG2 37-37 Character . . . . . . . . . VALUE31 262-266 Integer MFLAG31 267-267 Character QFLAG31 268-268 Character SFLAG31 269-269 Character ------------------------------
In plain English, this means that the first 11 characters are the ID (the station ID actually, from which the country can be extracted), the next 4 characters are the year, the next 2 the month, the next 4 the “element” (the measure name). The next 4 “variables” are the value (5 characters), and 3 different flags (one character each, which we are not interested in) for the first day of the month. This pattern of 4 variables is then repeated 30 more times, once for each day of the month. A value of –9999 represents a null value.
We have some parsing to do.
Luckily, Power Query makes this (relatively) easy. Right click on the column header, select split column, and then select number of characters.
Next, for our first column, we set the number of characters to 11, and we tell it to split once, as far left as possible. We’ll use this option for our remaining splits.
Click on OK, and we will now have 2 columns. We then right click on the leftmost column, and rename it to ID.
Next, we right click on our rightmost column header and repeat the process (using 4 characters this time), this time renaming it to Year. We then do the same for the month (2 characters) and element (4 characters) columns. Once this is done, we can click the dropdown on the elements column. Since we are only interested in the maximum and minimum daily temperatures, we deselect all, and reselect these two elements. This will limit the amount of data that we will pull into our model.
At this point, it is also worth noting the steps window on the right hand side of the query editor. This shows all of the transformation steps that have occurred to get us where we “are”. Any of these steps can be deleted, which is very useful if a mistake is made, and all subsequent steps will be affected by it. Clicking on a previous step takes you back to that point in the development of the transformation as well.
Once we have filtered our data, we continue on with the parsing. We then split off the next 5 columns, and rename the column to “1”. The next three columns we don’t need, so we first split them off (3 characters) and then delete the column by right clicking and selecting “Remove”. We then repeat this, renaming the value column to 2. Finally, we continue repeating this process until the value for 31 is complete. This is a tremendously tedious process, but it works rather well. I’ll make another feature request here for anyone in a position to do something about it. In the Power Query editor, it would be nice to be able to select a number of steps and either copy them forward, or be able to repeat them x times. This would save a great deal of time in these situations.
Its also a good idea to give the query an understandable name, so we right click on “Query 1” and rename it to “Temperature Data” .
As mentioned above, the figure –9999 represents a null value. Well, we know this, but the model won’t. At this point we need to convert these to proper nulls. The way that this is done is to select all of the numbered column headers, right click on them and then choose “Replace Value”. We replace the value –9999 with the word null.
Now that we have our nulls, our data is good. However, from an analysis perspective, the data is hardly in the best shape. What we need is to is to have each day’s data represented by a row, not a column. Once again, Power Query provides a method for us to accomplish this through the unpivot function. Simply select all of the numbered columns, right click, and choose “Unpivot Columns”.
Instantly, the numbered rows are transformed into rows, with the column titles in a column named “Attribute”. Simply rename it to “Day”, and we’re ready to import our data. To do so, click “Done” and Power Query will begin the extract, transform, and load process. In this case, it takes quite some time to iterate through all of the files (about 15 minutes) so be patient.
While the data is loading, you will need to make some adjustments to your Query Settings (to the right of your Excel window). Specifically, you’ll want to turn off the “Load to worksheet” option. As I’ve discussed previously this is important to keep the size of the actual worksheet small. in this case, it’s also important as the sheer volume of data exceeds Excel’s ability to store it. When complete, your worksheet should appear as follows.
You can see that there’s a fair amount of data here, about 31.5 million rows. Did I mention that we’re working with Excel here? We’ve disallowed loading to the worksheet, so at this point, we need to load into the embedded data model. Do that by clicking on the link at the bottom of the Query Settings screen.
There is also some more supporting data that we want to bring into our model. The following files need to be imported:
- ghcnd-stations.txt – contains metadata for all of the different weather stations
- ghcnd-countries.txt – contains metadata for all participating countries
- ghcnd-states – contains metadata for US states and Canadian provinces
The keys to these files are also in the readme.txt file, and we bring in their data using the same procedure that we used above for the actual weather data, with the exception that we use the “From file” data source. Finally, we give each query a logical name.
The beauty of all this is that anytime we wish, we can refresh the data. Any new or changed data will be picked up and added to the model following our transformation.
Model Editing In Power Pivot
Once this is complete, we are ready to edit our model. We do so by selecting the Power Pivot tab, and clicking “Manage” in the ribbon. First, we work on the actual temperature data tab. The temperature values reported are in tenths of a degree Celsius. This doesn’t mean much to most people, we should at least report the data in degrees C. To do this, Click on Add Column, and add the formula [Value]/10. Rename the column to Temperature.
We always want to be working with temperature averages. To do so, add a calculated measure named Average Temperature (C) (in the calculation area at the bottom). Its formula should be Average Temperature (C):=AVERAGE([Temperature]). We no longer need the value column or the temperature column for analysis, so we need to hide them from client tools. To do so, select both the Value header and the Temperature header, right click and select “Hide from Client Tools”. Finally, we set the formatting of our calculated measure to be Decimal Number.
Next, we want to establish a relationship between the data and the country table. We don’t have a column for country, but according to the data key, the first two characters of the country code correspond to the country code. We can therefore add another calculated column with the formula LEFT([ID],2). We then rename the column to Country Code. We only need this column for the relationship, so we can hide this from client tools as well.
To establish the relationship, change to the diagram view in the ribbon. Drag the Country Code field from the Temperature Data table onto the CountryCode field from the Countries table. While we’re in this view, we can drag the ID field from Temperature Data onto the ID field of the stations data. this will allow us to take advantage of the location of each of the stations. Finally, drag the State column from the Stations table onto the State Code column of the States table. At this point, we can go ahead and hide any unnecessary columns from client tools.
Our model is ready. At this point, we can move forward with analysis.
Create a Simple Power View Report
We want to create a simple Power View report to display temperature trends over time, Move back to the Excel Workbook and click the Insert tab in the ribbon. Then select the Power View button – this opens up a new Power View design surface. We won’t be needing the Filters window, so we can close that.
Right away, if you look at the Power View Fields window, you will notice that every table is duplicated – one with a “filled in” title bar, and one without. This window is displaying both defined Power Query queries, and the table contained within the model (Power Pivot). The “filled in” one is the one that we want to be working with. Using the other will cause duplicate tables to be created in the model – something that we obviously don’t want. This is due to an issue with the preview for Power Query and should be fixed in the near future.
We’ll be working with “Average Temperatures” as a measure so from the start, we’ll pick it and “Year” from the Temperature Data table. Year is a number, so Power View will want to total it by default. We don’t want this, so in the Fields selector, select the dropdown for year and select “Do Not Summarize”.
Next, resize the table on the design surface so that it fills the width of the report. Then, with the table selected, from the Design tab in the ribbon, select “Other Chart” and “Line”
What we get is a graph of overall average of all recorded temperatures starting around 1840. It might be more interesting to see the maximum and minimum temperatures, so to do that, just select the “Element” field, and Power View will add it to the Legend, showing both values, maximum temperature in blue, minimum in red. It would be nice if we could switch those colours, but unfortunately, all you can do at the moment is to select a different theme, none of which start with red.
Next, it would be nice to be able to see this data by individual reporting country. We can do this by first displaying a chart showing the overall average temperature per country. We click on the design surface outside of the temperature chart, and select the ”Country” field from the Country table, and the “Average Temperature” field from the Temperature Data table. Again,we resize the table to the width of the report, and then select “Column Chart”, “Clustered Column from the Design tab. We then get a scrollable list of countries with their overall average temperatures.
We could sort this by temperature, but since it will service as a picker, alphabetical makes more sense. Being from Canada, I’m naturally interested in our weather patterns, so I click on the Canada data bar (a negative value…shiver). Immediately the report above is filtered by Canadian data.
It seems that this whole climate change thing isn’t working out as well as I’d hoped….
Finally, we want to give the report a good title, and clean up some formatting. After we do this, we can move on to the next step, creating a Power Map report.
Create a Simple Power Map Report
Creating a Power Map is similar to a Power View. From the Insert tab on the ribbon, select Map, and then “Launch Power Map” from the dropdown.
At this point a new window will be launched where you can build your Power Map report. What we want to do is to show the relative temperatures for each country using a colour scale from low to high, and to show the average maximums and minimums for each recording station. To start with, for our first layer, we are going to use Country from the Countries table for our Geography data, so we select it. Every country with data will get a small point on the map.
Clicking next, we then get to select our data for the first layer. The data that we want to map is Average Temperature, So we select it. The resulting column visualization makes the earth look like a pincushion. What we want is to fill in the country boundaries with relative colour values. To do this, we select the region type for the visualization. This is better, but it fills the regions with shades of blue from light blue to dark blue for the warmest averages. It’s far more typical for red to represent heat, and we can do this by first selecting the settings icon.
We can then select a colour palette for our data, and red is a more logical color. Now you can spin the globe around to check out your next vacation hotspot. However, we want to get more specific than country, we also want to plot details for each reporting weather station. First, we need to add a layer. To do so, click on the Layers icon and select the add layer icon.
We then repeat the same steps that we did for the first layer, only instead of selecting country, we select Latitude and Longitude for the geography fields. After clicking Next, we leave the report as a Column type, and select Average Temperature as the height. Next, we select Element as the category option. We want the report to use clustered columns, so we now select that option to the right of the category title.
Finally, we want TMAX to be red, and TMIN to be blue, so we again click on the settings icon for this layer, and set the colours appropriately.
From this, it’s easy to see why Canada’s numbers are so low – there are a disproportionate number of stations in the far north! We’re now ready to share our report with Power BI.
Sharing Reports With BI Sites
Sharing your report is almost as simple as saving to a BI Sites enabled Office 365 library. We can either upload it directly, or save it to a SkyDrive Pro enabled folder, which is what I do. At this point, the file is 100 MB. On a side note…. this Excel file, that contains over 30 million rows of data is only 100 MB in size. That’s impressive. After the file appears in the O365 library, we can attempt to open it. However if we do, we’ll get a message saying that the file is to big to open in the browser.
That would be true in a regular Office 365 library, but it isn’t too large for Power BI. The problem is that first, the file needs to be enabled. More information of file size limits and enablement can be found here. To enable the workbook for Power BI, first navigate the the Power BI app, locate your file (it will be showing a small caution symbol), click the ellipsis, and select “Enable”.
Enabling the file will perform a few operations:
- The model will be extracted from the file and stored elsewhere, drastically reducing the file size
- A thumbnail is extracted from the file and used in the tile
It should be noted that if the file is uploaded via the Power BI app, the saving and enablement steps will be performed automatically.
Clicking on the file, either here, of within its actual library will now open the file, and we can also interact with the model (in this case by clicking on the data for Bermuda).
At the top, you’ll notice a warning message. It is letting you know that although this workbook can be viewed in a browser, and interacted with, it cannot be edited in a browser. This is because it is beyond the maximum edit size limit of 30 MB. You may also notice that we are using an HTML5 version of our Power View report. By default, Silverlight is used, but if you click on the new icon on the bottom left of the report, HTML5 will be used. It will also be used if you have a system that does not support Silverlight. This is true anywhere in Office 365, not just in Power BI enabled sites.
Finally, you may also notice that our Power Map report is missing from the equation. This is because for the moment, Power Map is not supported in the browser. A workaround for the time being is to export the Power Map report to an MP4 video file, and store it separately. MP4 files can be rendered in a browser, and on mobile devices.
Viewing In the Power BI Mobile Apps
Speaking of mobile devices the Power BI app for Windows 8 is currently available in the Windows store, with a version for iOS coming shortly. Once installed and run you’ll be shows a couple of sample reports from a demo site. To navigate to your own site, right click anywhere, or swipe up from the bottom, and select “Browse”.
On the next screen, we can select from our registered sites. If our site has not yet been registered, we can do so by again right-clicking, or swiping up. Then, we can enter the URL of your site. The URL to be entered here is that of the parent site of the library in which your report is stored. Once we load from this site, it will be automatically registered.
The next few screens allow us to navigate down to the file in question. Once the file appears in the list, selecting it (not opening it) gives us the option to save it as a favourite. If we mark a report as a favourite, it will appear on the home screen whenever the app is started.
Finally, opening it will open the first names object in the file. You can navigate to other named objects by right clicking, or swiping down from the top.
As you will notice, all names objects in the workbook appear by default. We can control what appears through the Excel publishing options, as outlined here.
As you can see, we have just taken a very large amount of publicly available data, transformed it and analyzed it within Excel, and published it out for everyone in the organization no matter what device they are using. I haven’t even gotten into the Q&A piece – primarily because it’s not available yet for custom data sets. I plan on updating this article when it is.
Power BI is powerful indeed.
Note – this article was written during the preview cycle of Power BI, and certain behaviours and screens may change by final release.
Quick, what’s the biggest file that you can upload into SharePoint? As with anything SharePoint, of course it depends. This article explains some of the file size boundaries in SharePoint and Office 365, how they impact Power BI, how to take advantage of the storage changes that Power BI provides, and some of the intricacies of using the xVelocity model with Power BI.
The maximum file upload size is 250 Mb by default with SharePoint 2013, and 50 with prior versions. This setting can be changed on an application by application basis within Central Administration, and once set, is a hard limit. If your file exceeds it, you won’t be able to upload it into SharePoint.
If you use Excel Services, you may also note that the default maximum size for an Excel workbook is 10 Mb. This too can be changed in Central Administration (it’s a property of trusted file locations). If you upload a workbook that exceeds this limit, Excel Services won’t be able to render it, and you’ll need to use the Excel client. Depending on performance limitations, I often like to adjust this setting to match the maximum file upload size to avoid end user confusion.
SharePoint Online in Office 365 works a little bit differently. Until recently, the default upload limit was unchangeable. Recently however, SharePoint Online removed the upload limit altogether, so now the maximum file upload size is 2 Gb, which represents the maximum file size that SharePoint can store. For Excel Services Online there are settings that can be adjusted, but the maximum workbook size isn’t one of them. With Excel Services in Office 365, the maximum workbook size is 10 MB. Period.
If you use Power Pivot to do analysis, data is brought into the data model that is embedded within the workbook. That workbook is then stored in SharePoint, and on first use of the model (clicking a slicer, expanding a dimension, or anything requiring the retrieval of data from the model), an instance of the model is created in a backing Analysis Services engine by Excel Services.
Given that all of this is wrapped up into a single workbook file (xlsx), 10 MB would seem to be pretty constraining for data storage, and it is. The data model is very efficient, and data is highly compressed, but 10MB is simply too small for most involved analyses. This isn’t a problem on premises, where additional resources can be allocated and the limit increased, but in Office 365, you’re simply stuck with the 10 MB limit.
Enter BI Sites, the recently announced Office 365 based Business Intelligence app that is part of the Power BI application suite. BI Sites is a SharePoint app that provides additional capability to workbooks with embedded models stored in SharePoint Online libraries. BI sites allows for data models as large as 250MB. The BI Sites app doesn’t actually store content, it just renders it, and provides additional capability such as automatic data refresh. BI Sites also relies on Excel Services to render the content, so does a Power BI subscription increase that workbook limit to 250 Mb? Nope – that limit is still firmly in place. So how does it get around this limit?
As mentioned above, when a model is accessed by a user by forcing a data retrieval from the model, Excel Services creates an instance of the model in the backing Analysis Services instance, if it hasn’t already been created. Subsequent calls just utilize the backing version. What Power BI does is it preloads this model, and then drops the model from the workbook (it is reconstituted on download). Given that the model is the large part of the workbook this drops the file size considerably, allowing it to work within the limits imposed by Excel Services.
Notice that the limit of 250 Mb above is specified for the model, NOT for the workbook. The workbook limit is still 10 Mb, and this is quite visible if you do things in the wrong order, at least in the Power BI preview. To demonstrate we will work with a model that is similar to the one that I created in this article, which is a rudimentary budget analysis with Great Plains data. There are three versions of the analysis file for demonstration purposes.
In the first version, the data was first imported into Excel worksheets using Power Query, and then loaded into the model before the model was edited. It is obviously the largest of the 3, as it contains both the original data, and the more optimized model. In the second file, the data was loaded directly into the model with Power Query. After model edits, the analysis was created using a simple pivot table and pivot chart. It is the smallest of the three, as the data is contained exclusively within the optimized model. In the last version of the file, the data was imported into Excel worksheets using Power Query. Take note of the fact that the file is 12 MB, 50% larger that the model only version, and all of which is counted when considering the Excel Services limit.
After uploading these three files to an Office 365 site, Only the EnbGPDataModelOnly file can be accessed via Excel Services directly. This makes sense because the other two are larger than the 10 MB limit, and Excel Services can’t do anything with them at all, resulting in the error below:
If we now navigate into the Power BI application, We will see a warning symbol on the tiles for our workbooks. This is because they have not yet been optimized for viewing on the web. What does that mean? It means that the model has not yet been extracted from the workbook, and attached to the Analysis Services engine.
To do this, click the ellipsis on the tile, and then select Enable. After confirming, the model will be extracted, and you receive a confirmation. In our case, the only one that will be successfully enabled is our EnbGPDataModelOnly file, but the reason is different for the other two files. In the case of EnbGPExcelOnly, the data model was never populated, and results in “This Workbook could not be enabled for web viewing because it does not contain a data model”.
This makes perfect sense, but it does mean that all of the features available through the BI Sites feature are unavailable to workbooks that don’t use data models. There is one exception to this however. The Power BI app for Windows 8 and iOS can render workbooks without models, provided that they are within the 10MB limit.
If we try to enable the EnbGPDataModelandExcel file, which does contain a data model, we get the error “This workbook could not be enabled in Power BI for Office 365 because the worksheet contents (the contents of the workbook that are not contained in the data model) exceed the allowed size.
If we look at the file size with only the model, it’s about 8.7 MB. The file without the model is 12 Mb, so even with the model extracted, the limit is exceeded, and the enablement process detects this.
On a side note, I think that these error messages have some interesting language. They make reference to “Power BI for Office 365”. This implies, to my mind at least, that there may be a version coming that isn’t for Office 365. No on premises version of Power BI has ever been mentioned by Microsoft, but this may hint at it.
When complete, the failed and successful enablements are easy to spot.
Clicking on the middle tile will successfully render the workbook.
Next, let’s work with a file that can benefit from these new features. I created a model that’s identical to the “model only” version with the only difference that I import all of the data from the 3 tables, not just the selected columns. The resultant file (named EnbGPBigModel.xlsx) is 54 MB on disk – well above the 10 MB Excel Services limit, but below the 250 MB Power BI limit. However, Once uploaded, clicking on it directly in the library gives the “file size exceeded” error message. What’s up with that?
The reason is that Excel Services just sees that it is too big, and gives up. In order to extract the model, we must first enable it in Power BI before we can work with it in Excel Services. To do that, we simply repeat the above process by navigating to the Power BI app, and enabling it.
Once this has been done, the workbook can be accessed by clicking on it within the Power BI app here, in the Power BI mobile app, or by navigating back to the source library and using it directly with Excel Services.
Therefore, when building models, it is vitally important to distinguish between the size of the model, and the size of the rest of the workbook. While the model can grow to 250 MB, the sum of the rest of the content cannot exceed 10 MB. Note to Microsoft – we could really use a good term here, as opposed to “rest of the workbook”. Let’s call it the spreadsheets for our purposes right now.
So, how do we know the size of the model vs the size of the spreadsheets? Well, an Excel file (xlsx) is really just a zip file in disguise. In fact, if we rename the file to end in a .zip extension, we can crack it open with a ZIP viewer (or as a folder in Windows). If we do so with our file that contains both the spreadsheets and the model, open it up, and then drill down to the xlmodel folder we’ll see the file item.data.
This file is your data model. in this case, it is 7.8 MB in size. Subtract that value from the size of the overall xlsx file, and you have the size of your spreadsheets, which is the part that must fit within the 10 MB limit. When done, just rename the extension xlsx.
If we continue to have a problem, or as a matter of good practice, an excellent tool to use is the Workbook Size Optimizer tool from Microsoft – available here. This is (yet another) Excel add-in that will help to further optimize your model and to help reduce the file size. Just open your workbook, run the add in, and follow the prompts.
We can see that although the 250 MB model size in Power BI helps to make Office 365 a viable BI platform, it does still require a certain awareness on the part of users. The most important lesson to learn from this is to try to avoid importing data directly into Excel. Whenever possible, bring the data directly into the model, bypassing any Excel storage. This is a good idea in any event, but Power BI further underscores the need for it. When using Power Pivot, it’s fairly straightforward, but the data acquisition interface available in Power Query tends to prefer Data import. When using Power Query, care must be taken to avoid Excel import, and I’ll be posting another article on how to do this shortly.