Skip to content

Tag: Power Pivot

Power BI – Working With the Data Management Gateway

 

Update August 2014 – Version 1.2 of the DMG offers significant changes, which I’ve written about here.

While it isn’t the flashiest, the Data Management Gateway is arguably the most important part of Power BI. Most of the other features of Power BI have either been available already, or could be achieved through alternate means. What was next to impossible was to have a workbook stored in Office 365 automatically update its data from an on premises data source in a manner similar to Power Pivot for SharePoint. The Data Management Gateway does this, and a fair bit more. This article will attempt to explore some of its intricacies, tricks, and things to look out for. This article is being written while using the invitation only preview of Power BI, and some of the specifics could will change by public preview, and certainly by General Availability.

The Data Management Gateway is a Windows service, and can run on any Windows PC. Well, not just any PC, it needs Windows 7 or above, but it’s a relatively light service. In my case for the preview, I’m running it on my main workstation. It’s also relatively small, and you can run multiple gateways within your organization. If you want to have a look at it, you can download it here whether or not you have a Power BI enabled Office 365 tenant, but if you don’t, you won’t be able to do much with it.

Installing the gateway is pretty straightforward, and is well documented here, so I won’t go into details on that. Once it is installed, it establishes a communication channel with your Power BI service in the cloud (no firewall holes required), and essentially waits for requests. Once it gets one, it acts as a broker between your cloud service and your on premises data.

Most configuration is done in the cloud, in the Power BI Admin Center. If you’re interested in monitoring its activity on the gateway machine, You can do so using Resource Monitor or Task Manager. The process that you’re looking for is diawp.exe or diahost.exe. I have no idea there those names come from, but I’m going to guess Data Integration Agent Worker Process and host.

image

Once installed, the gateway performs two major semi related functions. The first is the aforementioned on premises data refresh capability. In addition to this, the gateway also provides the ability to publish your on-premises data sources as an OData feed that is will be accessible in the cloud. In its current version, the gateway only supports a limited set of data sources – all of them SQL Server. The official list can be found at the bottom of this document. Although its not listed specifically, SQL Azure databases are also supported.

I’m going to drill down a little on these two main functions, and share some of my experiences. We’ll start with the OData feed.

Publishing an OData Feed

An OData feed is published by creating a data source. This is done from within the Admin Center by navigating to the data sources tab, and selecting “new data source”.

image

This starts a data source “wizard”. The first question to be answered is what will this data source be used for.

image

It isn’t necessary to create an OData feed in order to refresh on-premises data, but it is necessary to to create a data connection (more on this later). The “enable cloud access” option essentially tells the gateway that it’s OK to allow on demand and automatic data refreshes from the cloud. The “Enable OData Feed” option is pretty self explanatory – if you don’t enable it, the only thing that the connection can be used for is data refresh. These two can be selected independently of each other. After selecting next, you are presented with the connection info screen.

image

To start with, you’ll begin by giving the connection a name. Since the connection will typically be to a specific database, the name should reflect that. You may also wish to add something about the database’s location if you deal with similar data sets. The next selection is that of the gateway. You MUST have a gateway in order to create a data connection. You can have multiple gateways registered in your tenant, and the selection of the gateway will dictate the connection provider choices. It’s an easy thing to forget to choose your gateway, and then think that the page is broken because there are no Provider choices.

You can choose one of two methods to create your connection – connection string, or (if you’re lazy like me) you can choose Connection Properties, and let the tool build your string for you. When it comes to troubleshooting data refresh, you may find that the connection string method is more helpful, but either way should be equivalent. Once you have completed the bulk of this form, you need to enter credentials. First, select the method of authentication – your choices are Windows Authentication or SQL Authentication. Then select the credentials button which launches the Data Source Manager.

image

The Data Source Manager is a one click application that communicates directly with the gateway and is used to register the connection’s credentials directly with it – they are not stored in the Office 365 tenant or the BI Sites app. Because of this direct connection, you need to be on a network that is local to the gateway. This will not work from a remote location. If, like me, you are not joined to a domain, you will also need to be on the gateway machine itself.

On launch, it will go through a series of checks (this can take a while), verifying the gateway and the tenant. When it’s ready, it will show “Data Source Manager initialized successfully”, and you can go ahead and enter the credentials. Once you do, be sure to use the “test connection” button to verify that everything is working. When ready, click OK to register the credentials, then click save to save the data source. You will then be taken to the data settings page.

image

From here you can choose to not only index the metadata in the data catalogue, but also the data itself, and you can choose how frequently it is updated. The indexing option is currently disabled in the preview, so I have little to say about it at this point. Its purpose is to improve discoverability of the data via Power Query. The second section is the selection of tables to expose to the OData feed. You can choose from any tables or views, but as you can see from the example above, if your table/view contains any unsupported types (in this case geography fields), the entire object will be unavailable for publishing.

Clicking OK brings you to the users and groups page. From here, you can select those that will be able to use these data sources in Power Query, or manually refresh workbooks in the browser. As with all things SharePoint, it’s a good idea to use a group here.

Once done, your data connection is ready, and your OData feed is available. To use it, you’ll need to discover its address. You can do this by clicking on the ellipsis beside its name in the list.

image

image

As you can observe from the URL, this is a cloud service. You should be able to connect to the service from anywhere, and it will connect through the gateway to serve up the data. While this is great from a mobility standpoint, if you happen to be on premises, this would be quite inefficient, as the data would first need to be transferred to the endpoint in the cloud, and then back to the source network.

The good news is that the gateway is able to detect when you are accessing the feed locally, and it will redirect you to the source without sending the data up to the cloud and back. The bad news for us preview users is that this is the only thing working at the moment. Therefore, for the preview period at least, in order to access the OData feed, you must be on a local network. Specifically, you must be able to resolve the server name defied in the connection string.

If you meet these conditions, you can test the feed using Power Query. In Excel, go to the Power Query tab, and select “From OData Feed” in the “From Other Data Sources” dropdown.

image

You will then be prompted to log in. You need to use an Organizational Account (Office 365) to do this. This is an account associated with the Power BI license. However on this screen it’s referred to as a “Microsoft Online Services ID”.

image

I personally feel this could easily be confused with a Microsoft Account (Live ID). What’s worse is that Microsoft Accounts can be used with Office 365, so it’s certainly less than clear as to which credentials should be used here.

Clicking on the Sign In button takes you to a standard authentication dialog. In the preview, there is a small bug that requires you to enter your account completely in lower case. Failing to do so will cache the wrong credentials, and you’ll be denied access moving forward. If you encounter this problem, the solution is to close Excel, clear the browser cache and to restart.

Once you have authenticated successfully, you can save the connection (Power Query will save it in its cache), and work with it as with any other data source.

Thus far, I’ve only been able to use Power Query to connect to the OData feed. I have tried using Power Pivot directly, but although it is supposed to support OAuth, I can’t seem to save my Office 365 credentials. I’ve only tried via these two mechanisms – if anyone has tried others, I would love to know about it.

Refreshing Data in a Power BI Enabled Workbook

To start with, at the time of this writing (Power BI preview) quite a number of features that will be available in GA have not been enabled and/or are not supported. Data refresh scheduling is as yet unavailable, and the data sources that can be refreshed are restricted to direct SQL connections. Models created with Power Query cannot yet be refreshed. As these features become available, I will update this article with any relevant findings.

Given the fact that the Gateway is required to support both an OData feed and for data refresh, you might think that you must use the OData feed in your data models in order for them to be refreshable. This is not the case. When a refresh is requested, the model is interrogated for its data connections. The data catalogue is then interrogated for a data source with a matching connection string, and if found, is used. The Gateway is then called to retrieve the data if it is on premises. If the data source is SQL Azure, the Gateway is still used, but the data is loaded directly from SQL Azure – it does not need to be sent to the Gateway first.

As mentioned above, Power Query queries cannot yet be refreshed by the gateway. The only type of connection that I’ve been able to successfully refresh thus far is one created directly in Power Pivot. When creating a connection in Power Pivot, pay close attention to the connection string. You may need it later if you have refresh issues.

In addition to the data sources supported by the gateway, two other data sources can be refreshed. Project Online has a number of OData feeds that can be refreshed directly, and public OData feeds (not requiring authentication) can also be refreshed. I don’t currently have an instance of Project Online, so I don’t have much to add apart from the fact that it is supposed to work. I have tested refresh with public feeds and they do in fact work well. The interesting thing I noticed was that while it worked in my Power BI tenant, it also worked in my regular Office 365 tenant. Apparently this feature has been there for some time.

As I mentioned, scheduled refresh is not yet available. When it is, this will be done from the BI Sites app, the same way that you “enable” a workbook. For now, it must be done manually. This is done the same way that it is with an on premises workbook. First open the workbook in the browser, and then, from the data tab, select “Refresh All Connections”.

image

The workbook will go dim, and you’ll see “Working on it…” or “Still working on it…” for a bit – it does take some time to refresh, depending on the data set. Using the methods that I mentioned at the beginning of this article, you can monitor the progress of the refresh, and the impact on the gateway machine. Also, for the moment at least, if your data source is SQL Azure, prepare for a long wait – refresh time takes an exceptionally long time (in my case, about 10 min for a 1 million row x 20 column set of simple data types). The Azure refresh time should be addressed by GA.

Monitoring The Gateway

There are already a number of tools in the preview to help with troubleshooting Power BI, chiefly aimed at the Data Management Gateway. They can be found on the “system health” tab in the Admin Center.

image

The default screen shows the CPU utilization of your gateway machines (in this case, I have all of 1…) at the top, and the availability of your gateways at the bottom. At first glance, you would think that while my gateway is good from an availability standpoint, it’s taking a lot of my CPU. The reality is that the top chart shows total CPU utilization on the machine. If you want to see the utilization of the Gateway itself, you need to choose the specific machine from the Display dropdown.

image

Here you can see that while the machine utilization hovers around 40% (blue), the gateway utilization is barely noticeable (red).

Finally, what is likely the most useful part of monitoring, the logs, is well hidden. The logs are useful for troubleshooting data refresh issues, and can be accessed by clicking on “logs” which is at the top of this screen beside “dashboards”.

image

In the browser, you can see basic event information and basic error information if you have errors. However, if you download the logs as a CSV, you will see much more detailed information. If you are having problems with data refresh, particularly in the preview, I strongly recommend downloading it. One of the important pieces of information that it contains is the connection string that is being used:

image

You can compare that to the connection string that is being used by Power Pivot in the workbook. You find the string in Power Pivot by first clicking on “Manage” in the Power Pivot tab, and then in the Power Pivot window, choose “Existing Connections” in the ribbon. You should see your connection under “PowerPivot Data Connections”. Select it, and click the Edit button.

image

In the next screen, click the “Advanced” button. You should then be presented with the data connection property dialog.

image

Here, at the bottom, you will find the connection string being used by Power Pivot, and this is what the Gateway uses to look for one of its registered data connections. If you find any discrepancies, the chances are that they are at the source of your refresh problems, and that they should be addressed.

This is an early, first glance walkthrough of some experiences using the Data Management Gateway. Hopefully it can be of some help for the early adopters. I will try to keep this updated as Power BI moves from preview to General Availability.

16 Comments

How To Load Data Directly Into the Excel (Power Pivot) Data Model

In a recent post, I discussed that while Power BI sites allow for data models up to 250 MB, the size of the worksheets portion of any workbook cannot exceed 10 MB. This is because of the fact that while the data model is passed to Analysis Services for processing, the worksheet itself is still subject to the hard 10 MB limit imposed by Excel Services. It is therefore vital to keep a minimum amount of data in the workbook.

The best way to minimize the workbook size is by storing the data exclusively in the data model. There are a number of data import methods in Excel/Power Pivot, including the new Power Query. Doing this is more obvious in some than others.

Power Pivot Import

Prior to Excel 2013 and/or Power Query, importing into the model exclusively was almost default behaviour (yes it was possible to create models from Excel data, but that wasn’t the most common use case). You would invoke the Power Pivot add-in by clicking on the tab, click the manage button, and you would be taken to the Power Pivot management window where you could import data from a wide variety of sources

image

This approach still works just fine. Data imported using this method will be brought directly into the model, which is what we want to achieve. However, Excel 2013 now supports the data model by default, and allows data to be imported to it through several approaches. The Power Pivot editor is included, but is not enabled by default, so it is not immediately obvious that it is an option.

Excel Import

With Excel 2013, you can now use native Excel functionality to get data into the model. The most familiar way to do this is through the regular Excel data import interface. You click on the Data tab in Excel, choose “From Other Sources”, and select your data source.

image

Once done, and the connection is selected, you are presented with the Data Connection Wizard. There is a subtle change to the wizard in Excel 2013 when compared to previous versions, the addition of “Enable selection of multiple tables”.

image

Selecting this option will of course allow you to import multiple tables simultaneously, but what it will also do is to automatically add those table to the data model for you. However if you don’t select this option, there is still an opportunity to add to the data model. After clicking next and creating a connection, you are presented with the Import Data dialog.

image

At this point, if you only selected one table, you can select the option to add it to the model here. If you had chosen the multi table option earlier, this option would be greyed out and selected – you have no choice. However, the most important part of this dialog is the “Select how you want to view this data” section. By default, Table is selected. If it is left selected, data will be imported directly into the model, but it will also be imported as a table into the worksheet. This is a BAD THING. Yes, it’s what users are accustomed to, but instead of just storing the data in a nice tiny compressed data model, you’re also storing it in a big fat worksheet, and it won’t take much data to exceed your 10 MB Office 365 cap. In addition to that issue, by doing this, you also limit yourself to about a million rows of data (Excel limit) instead of the hundreds of millions of rows that the model can handle.

Any of the other options will allow you to import directly into the data model, bypassing the worksheet altogether. These options are your friends. Use them.

Power Query Import

The latest, and most impressive method of importing data into Excel is Power Query. A description of Power Query is beyond the scope of this article, but it allows for relational drill down, complex transformations, nested queries and a host of other options in addition to a wide variety of new data sources (Facebook, Hadoop, etc). It’s also the ONLY way to consume data served up by the on-premises  Data Management Gateway component of Power BI. It provides a very wide array of data import options to the end user.

However, one of the problems with the tool is that it tends to promote the importing of data into worksheets, which is a practice that in my opinion should be strongly discouraged. It is certainly easily possible to avoid this behaviour, as I’m going to demonstrate shortly, but it does require that the user be aware of the importance of this. Importing to the worksheets should be an option, not a default as it currently is. My concern is that far too many people will build a large fancy report that winds up being over the 10 MB workbook limit and publishing it to Office 365, only to have Power BI fail because it’s too large, and then give up in frustration.

To import data directly into the model with Power Query, first click on the Power Query tab in Excel, and then select your data source. After entering the server/database specifics and your credentials if necessary, you are presented with the Power Query Editor dialog.

image

 

After performing any necessary transformations, you are ready to import your data. To do so simply click the Done button. You will then be returned to Excel, which will immediately import your data into a worksheet, and open up the Query Setting window. So – what happens when the data source exceeds 1,048,567 rows of data? Pretty much what you would expect – you receive an error. A pretty explicit one at that.

 

image

In this case, the data is not brought into the worksheet, but the query is still defined, so the model can still be successfully populated by clicking the “Load to data model” link. However, if the source data does fall within Excel’s parameters, it will be brought in to the worksheet.

image

In order to load it into only the data model, we must first deselect the Load to worksheet slider, and then click the Load to data model link. (Note: I have no idea why these are two different control styles). The first option can be selected as the data is loading, so you don’t have to wait for the load to complete. If you select the first and not the second, the data is not loaded anywhere. This is a perfectly valid situation. With Power Query, you can base queries on other queries, or append/merge queries. By doing this, you can load only the end result, and not the intermediate queries.

Once done, the worksheet will display the “Load to worksheet disabled” message in place of the result set.

image

However, opening the Power Pivot management window will display the imported data, and you can work with the model.

In summary, Power Query brings many new capabilities to the data loading and transformation process. However, with great power comes great responsibility. Unless Microsoft makes a change to the default behaviour of the Power Query import process, I’ll be telling anyone that listens to make sure that they always turn off the “Load to Worksheet” option if they’ll be publishing to Office 365.

5 Comments

Working With Power BI, Office 365 and File Size Limits

Note – this article was written during the preview cycle of Power BI, and certain behaviours and screens may change by final release.

Quick, what’s the biggest file that you can upload into SharePoint? As with anything SharePoint, of course it depends. This article explains some of the file size boundaries in SharePoint and Office 365, how they impact Power BI, how to take advantage of the storage changes that Power BI provides, and some of the intricacies of using the xVelocity model with Power BI.

The maximum file upload size is 250 Mb by default with SharePoint 2013, and 50 with prior versions. This setting can be changed on an application by application basis within Central Administration, and once set, is a hard limit. If your file exceeds it, you won’t be able to upload it into SharePoint.

If you use Excel Services, you may also note that the default maximum size for an Excel workbook is 10 Mb. This too can be changed in Central Administration (it’s a property of trusted file locations). If you upload a workbook that exceeds this limit, Excel Services won’t be able to render it, and you’ll need to use the Excel client. Depending on performance limitations, I often like to adjust this setting to match the maximum file upload size to avoid end user confusion.

SharePoint Online in Office 365 works a little bit differently. Until recently, the default upload limit was unchangeable. Recently however, SharePoint Online removed the upload limit altogether, so now the maximum file upload size is 2 Gb, which represents the maximum file size that SharePoint can store. For Excel Services Online there are settings that can be adjusted, but the maximum workbook size isn’t one of them. With Excel Services in Office 365, the maximum workbook size is 10 MB. Period.

If you use Power Pivot to do analysis, data is brought into the data model that is embedded within the workbook. That workbook is then stored in SharePoint, and on first use of the model (clicking a slicer, expanding a dimension, or anything requiring the retrieval of data from the model), an instance of the model is created in a backing Analysis Services engine by Excel Services.

Given that all of this is wrapped up into a single workbook file (xlsx), 10 MB would seem to be pretty constraining for data storage, and it is. The data model is very efficient, and data is highly compressed, but 10MB is simply too small for most involved analyses. This isn’t a problem on premises, where additional resources can be allocated and the limit increased, but in Office 365, you’re simply stuck with the 10 MB limit.

Enter BI Sites, the recently announced Office 365 based Business Intelligence app that is part of the Power BI application suite. BI Sites is a SharePoint app that provides additional capability to workbooks with embedded models stored in SharePoint Online libraries. BI sites allows for data models as large as 250MB. The BI Sites app doesn’t actually store content, it just renders it, and provides additional capability such as automatic data refresh. BI Sites also relies on Excel Services to render the content, so does a Power BI subscription increase that workbook limit to 250 Mb? Nope – that limit is still firmly in place. So how does it get around this limit?

As mentioned above, when a model is accessed by a user by forcing a data retrieval from the model, Excel Services creates an instance of the model in the backing Analysis Services instance, if it hasn’t already been created. Subsequent calls just utilize the backing version. What Power BI does is it preloads this model, and then drops the model from the workbook (it is reconstituted on download). Given that the model is the large part of the workbook this drops the file size considerably, allowing it to work within the limits imposed by Excel Services.

Notice that the limit of 250 Mb above is specified for the model, NOT for the workbook. The workbook limit is still 10 Mb, and this is quite visible if you do things in the wrong order, at least in the Power BI preview. To demonstrate we will work with a model that is similar to the one that I created in this article, which is a rudimentary budget analysis with Great Plains data. There are three versions of the analysis file for demonstration purposes.

image 

In the first version, the data was first imported into Excel worksheets using Power Query, and then loaded into the model before the model was edited. It is obviously the largest of the 3, as it contains both the original data, and the more optimized model. In the second file, the data was loaded directly into the model with Power Query. After model edits, the analysis was created using a simple pivot table and pivot chart. It is the smallest of the three, as the data is contained exclusively within the optimized model. In the last version of the file, the data was imported into Excel worksheets using Power Query. Take note of the fact that the file is 12 MB, 50% larger that the model only version, and all of which is counted when considering the Excel Services limit.

After uploading these three files to an Office 365 site, Only the EnbGPDataModelOnly file can be accessed via Excel Services directly. This makes sense because the other two are larger than the 10 MB limit, and Excel Services can’t do anything with them at all, resulting in the error below:

Sorry, we can't open your workbook in Excel Web App because it exceeds the file size limit. You'll need to open this in Excel.

If we now navigate into the Power BI application, We will see a warning symbol on the tiles for our workbooks. This is because they have not yet been optimized for viewing on the web. What does that mean? It means that the model has not yet been extracted from the workbook, and attached to the Analysis Services engine.

image

To do this, click the ellipsis on the tile, and then select Enable. After confirming, the model will be extracted, and you receive a confirmation. In our case, the only one that will be successfully enabled is our EnbGPDataModelOnly file, but the reason is different for the other two files. In the case of EnbGPExcelOnly, the data model was never populated, and results in “This Workbook could not be enabled for web viewing because it does not contain a data model”.

image

This makes perfect sense, but it does mean that all of the features available through the BI Sites feature are unavailable to workbooks that don’t use data models. There is one exception to this however. The Power BI app for Windows 8 and iOS can render workbooks without models, provided that they are within the 10MB limit.

If we try to enable the EnbGPDataModelandExcel file, which does contain a data model, we get the error “This workbook could not be enabled in Power BI for Office 365 because the worksheet contents (the contents of the workbook that are not contained in the data model) exceed the allowed size.

image

If we look at the file size with only the model, it’s about 8.7 MB. The file without the model is 12 Mb, so even with the model extracted, the limit is exceeded, and the enablement process detects this.

On a side note, I think that these error messages have some interesting language. They make reference to “Power BI for Office 365”. This implies, to my mind at least, that there may be a version coming that isn’t for Office 365. No on premises version of Power BI has ever been mentioned by Microsoft, but this may hint at it.

When complete, the failed and successful enablements are easy to spot.

image

Clicking on the middle tile will successfully render the workbook.

Next, let’s work with a file that can benefit from these new features. I created a model that’s identical to the “model only” version with the only difference that I import all of the data from the 3 tables, not just the selected columns. The resultant file (named EnbGPBigModel.xlsx) is 54 MB on disk – well above the 10 MB Excel Services limit, but below the 250 MB Power BI limit. However, Once uploaded, clicking on it directly in the library gives the “file size exceeded” error message. What’s up with that?

The reason is that Excel Services just sees that it is too big, and gives up. In order to extract the model, we must first enable it in Power BI before we can work with it in Excel Services. To do that, we simply repeat the above process by navigating to the Power BI app, and enabling it.

image

Once this has been done, the workbook can be accessed by clicking on it within the Power BI app here, in the Power BI mobile app, or by navigating back to the source library and using it directly with Excel Services.

image

image

Therefore, when building models, it is vitally important to distinguish between the size of the model, and the size of the rest of the workbook. While the model can grow to 250 MB, the sum of the rest of the content cannot exceed 10 MB. Note to Microsoft – we could really use a good term here, as opposed to “rest of the workbook”. Let’s call it the spreadsheets for our purposes right now.

So, how do we know the size of the model vs the size of the spreadsheets? Well, an Excel file (xlsx) is really just a zip file in disguise. In fact, if we rename the file to end in a .zip extension, we can crack it open with a ZIP viewer (or as a folder in Windows). If we do so with our file that contains both the spreadsheets and the model, open it up, and then drill down to the xlmodel folder we’ll see the file item.data.

image

This file is your data model. in this case, it is 7.8 MB in size. Subtract that value from the size of the overall xlsx file, and you have the size of your spreadsheets, which is the part that must fit within the 10 MB limit. When done, just rename the extension xlsx.

If we continue to have a problem, or as a matter of good practice, an excellent tool to use is the Workbook Size Optimizer tool from Microsoft – available here. This is (yet another) Excel add-in that will help to further optimize your model and to help reduce the file size. Just open your workbook, run the add in, and follow the prompts.

We can see that although the 250 MB model size in Power BI helps to make Office 365 a viable BI platform, it does still require a certain awareness on the part of users. The most important lesson to learn from this is to try to avoid importing data directly into Excel. Whenever possible, bring the data directly into the model, bypassing any Excel storage. This is a good idea in any event, but Power BI further underscores the need for it. When using Power Pivot, it’s fairly straightforward, but the data acquisition interface available in Power Query tends to prefer Data import. When using Power Query, care must be taken to avoid Excel import, and I’ll be posting another article on how to do this shortly. 

8 Comments

Power BI – What Is It, And Why Does It Matter?

Power BI for Office365 was first shown to the public last week at the Microsoft Worldwide Partner Conference. Power BI is in my opinion, the most significance advance that Microsoft has made in the area of Business Intelligence in the cloud, and it significantly advances their offering in personal Business Intelligence. I’ve been evangelizing Microsoft’s recent personal and team BI features for some time now, and this announcement builds on that momentum. The demonstration raised a lot of eyebrows, and showed off some of the great strides that Microsoft has made in this space. One little gem in there that was easy to miss was the fact that we got our first glimpse at the HTML 5 version of PowerView – at least the renderer. It also raised a number of questions as to how it works and what will be necessary to get it working optimally. I had the opportunity to speak with a project manager and managed to answer a few of these questions. This post is attempt to explain what this product is all about.

Overview

Firstly, if you haven’t already seen it, you need to see the demonstration that was done by Amir Netz during the Day 1 keynote. It’s really one of the most compelling demos that I’ve ever seen. I include it below.

Amir Netz Announces Power BI at WPC 2013

 

Microsoft has provided a web page to describe the product and let you sign up for a preview. It also contains links to download some of the constituent parts, but it doesn’t necessarily do a great job of explaining exactly what you’re signing up for, or what these downloads are for.

To start with, Power BI requires Excel 2013. This is because all of the constituent components rely on the xVelocity (PowerPivot) engine that is included with Excel 2013. Make no mistake, as far as Microsoft is concerned, the future of multi-dimensional analysis is with the xVelocity in-memory engine. This engine isn’t new – it’s what powers Power Pivot (note the space between the words Power and Pivot… that’s also new).

The “Power” branding I think is encouraging as well. The business intelligence offerings from marketing have been confusing to say the least. Check out this handy “quick reference guide” if you don’t quite understand what I mean. There’s a very large assortment of tools that come from two different product groups (even after the reorg) and it can be difficult and confusing to navigate. Marketing hasn’t made this any easier in the past, but the consistent “Power” prefix I think goes some way to remedying this.

Power BI itself is the culmination of several different projects – Data Explorer, GeoFlow, BI Sites, and work in the mobility space. The first two of these have been in the public for a little while now, and were given their release names this week – Power Query, and Power Maps respectively. In addition, Power Query reached its General Availability milestone, making it acceptable for production use. The BI Sites project has been a very well-kept secret, and last week was the first time that it has been seen in public. Finally apps were shown for both Windows 8 and iOS that should go a long way to address the deficiencies in the mobile space. Unfortunately, nothing was said about Android or Windows Phone, but I have to think that they’re not far off.

Given that there are several components to Power BI, I think that it’s worth outlining each one and how it will be used. To start with, Power Query can be used to discover, acquire and transform source data and load it into an xVelocity based data model in Excel. It can also register the data connections through a “data steward” service that will reside in Office 365 – this will allow for tenant wide discoverability of the data. Power Pivot will then be used as necessary to modify the model, create calculated measures, format columns, etc. Data analysis can then be performed in Excel, using pivot tables, charts and/or Power View. In addition Power Map can be used to visualize geospatial data on a rotatable, zoomable 3D map. Once ready, the workbook will be published to an Office 365 document library, where a BI site will find it. BI sites can then be used to perform natural (English) language queries on the data, add to the mobile applications, and to schedule data refreshes whether the data is public or private. Finally, the mobile apps can be used to consume the reports from the BI Site.

Power Query

Power Query went into general release last week along with the Power BI announcement, and you can download it here. It has been available in preview form as project “Data Explorer”. It’s an add in to Excel 2010 or 2013 that gives it Extract, Transform and Load (ETL) capabilities. I like to call it the personal version of SQL Server Integration Services, in the same way that Power Pivot is the personal version of Analysis Services, and Power View (to a lesser extent) is the personal version of Reporting Services.

As you might expect, it can pull data from a wide variety of sources. Web pages can be parsed, and tables scraped to provide tabular data. The usual suspects are all supported (SQL Server, Oracle DB2, etc.), as well as any OData feed, Azure Data Market, SharePoint lists, Hadoop and HDInsight, Active Directory, and Facebook. One of the more interesting data sources is a folder (not a file). If you point Power Query at a folder, it will merge the data from all of the files in that folder into a single result. Refreshes will pick up any new files as new data. I can see some real uses here for log files. Yes, the files do all need to be the same schema.

The killer feature for discovery here is the online search. In preview, online search gave you access to all of the Wikipedia data that has been expressed in tables. Now however, you not only have access to more public data sources, but you can publish your own data sources (on premises or otherwise) to a catalogue that resides in Office 365. Below is a screen shot from Amir’s demo.

image

Clicking on Online Search in the ribbon brings up the window on the right, where you can enter a search term, in this case “Microsoft Azure Datacenters”. You are then presented with a list of candidates, both public and private. Hovering over one of them will give you a preview of the data to be returned, and when satisfied, you can select Add to Worksheet, or if you would like to transform the data click Filter and shape, where you can modify data types, flatten relational data, merge or split fields, etc. Despite the “Add to  Worksheet” button name, the data can be loaded into the worksheet itself, the back end data model, or both. This distinction is  very important when working with very large data sets ( i.e. > 1million rows).

You can also see from the result set that the data is coming not only from Wikipedia sources, but from the Azure Data Market, and from On premises data sources. What the demo didn’t make clear was how those data sources get into the catalogue. Power Query is the answer here as well. When BI Sites are implemented in Office 365, Power Query will expose some additional UI elements. that will let you publish the query and transform that you have built into the enterprise catalogue. In fact, when you look closely at Amir’s demo screen, you will see these elements.

image

These don’t show up on a standard Power Query install

image

The transformation capabilities of Power Query are really quite impressive, but a detailed description is beyond the scope of this post. They can be seen in the version available for download. As an aside, models built with Power Pivot can be moved directly into Analysis Services. Given the relationship that I mentioned above between Power Query and Integration Services, I wonder if it’s on the roadmap to allow Power Queries to be brought in to SSIS. On the surface, it would seem to make sense, at least from where I’m standing.

Power Map

Power Map is the other component that’s available today, although as of this writing, it’s still in preview form. You can download it from here. Unlike Power Query, Power Map requires Excel 2013 – it won’t run on 2010 at all. Power Map has been available since last fall in the form of a preview as code name “Geo Flow”. It didn’t make it to the main stage in the announcement, but it’s a pretty impressive piece of technology.

What Power Map does is allow you to take a data model built with Excel and/or Power Pivot, and plot the data on the surface of a 3D map. The map can then be rotated and zoomed, and animations created around different views or time dimensions of the data. It wasn’t shown at the announcement last week, but below is a Power Map visualization of Hurricane Sandy data.

SandyImage

This was put together simply by plotting the location of the measurements, the pressure on one layer as a heat map, and the wind speed on another layer as columns. Storm category was used as the category to color the columns appropriately.

There are a number of limitations to Power Map currently, the big one is that it doesn’t yet work at all in SharePoint – it’s Excel only (and not Excel Web App). Given that fact, it’s really just a sort of one off visualization/presentation tool, no matter how cool it is. However, we did just see our first glimpse of an HTML 5 Power View viewer… could an HTML 5 Power Map viewer be far behind?

BI Sites

To my mind, BI Sites is the most exciting part of Power BI. Here is where we find the most significant advances yet in both Microsoft’s mobile and cloud BI strategy. Until now, the cloud strategy was hampered completely by the inability to keep data refreshed automatically, and mobile devices could settle only for mobile web approaches. With BI sites, we see not only a great first step into these areas, but a solid infrastructure to expand upon. Unfortunately, at the point of this writing, it’s not available. You can however sign up for the preview program here.

BI Sites is a SharePoint App, pure and simple. This fact was not even mentioned during the announcement, but I feel that it’s vitally important to understanding how it works, and what its limitations are. Once created, the app reads Excel content from your Office 365 tenant, and provides a great number of features. Let’s quickly run through a few of these features.

Data Refresh

The Achilles heel of using Office 365, at least for me, has been the inability to schedule data refreshes. The latest 2013 wave brought support for interacting with PowerPivot enabled workbooks, which was great, but data refreshes had to be done manually, and the results re-uploaded. On premises, PowerPivot for SharePoint can be used to perform this function, but PowerPivot for SharePoint is not available in Office 365. Since the data can exist both on premises, or in the public, how does a cloud based service refresh data from inside your firewall?

The answer is that you will install a service, a sort of “data gateway” somewhere in your environment. This gateway will manage the refresh of all the on premises data sources and push updates out to Office 365, NOT the other way around. This will not require any firewall configuration, and will not depend on App Fabric, or any other relatively complex infrastructure.

Once configured, the automatic refresh each individual workbook can be configured separately, as can be seen below.

image

 

Report Gallery

One of the things that you notice when you enable PowerPivot for SharePoint is a new library template, the “PowerPivot Gallery”. When you use this template, you will notice that you get a completely different UI for the library that is Silverlight based, and gives you previews, and access to data refresh capabilities. This is again absent in Office 365 due to the lack of PowerPivot for SharePoint. However BI Sites, gives you the same functionality, in a (presumably) HTML 5 based experience.

image

HTML 5

Power View is a self-service analytical reporting tool that was first introduced in SQL Server 2012 Reporting Services, and subsequently added to Excel 2013. It allows users to quickly analyze data from an embedded model, and is supported within SharePoint. It’s a great tool to get answers very quickly, but one of the concerns with it is the fact that is built with Silverlight. This limits is use, particularly in the mobile space, as Silverlight is not supported on iOS, Anroid, or (surprisingly) Windows Phone.

BI Sites show us the first version of a PowerView renderer based on HTML 5, which should work on all devices. When the workbooks are opened through the report gallery, you will see the Power View reports, but they are rendered in HTML 5. I suspect that we’ll continue to see Silverlight being used in the designer in Excel, but the rendering through BI Sites is done with HTML5, which opens up the world of mobile.

Data Catalogue

I hesitate to include this in this section, as it could just as easily be included in the Power Query section, but since it required the BI Sites capability in Office 365, I include it here. The Data Catalogue is that service that allows you to register data sources from Power Query. The catalogue lives in Office 365, and is managed from there.  Data sources in the catalogue can be discovered by Power Query users through the “Online Search” button.

The catalogue also supports features that help support the natural language features of BI Sites. These features are unclear to me at this point, but I will be sure to expand upon them once I get my hands on it.

Finally, to be sure, “Data Catalogue” is what I’m calling it in the absence of any official nomenclature from Microsoft.

Natural (English) Language Query

Natural language query was a very well kept secret. I had no idea that it was being developed until I saw it on stage. Natural language query allows a user to interrogate a data source almost as if they were performing a full text search. Based on their query, Power BI selects an appropriate visualization to display the results.

image

As you can see above, 1 shows that we’re interrogating a data model contained in the “Sales Reporting and …” workbook. 2 shows the query entered by the user in relatively plain English (no Select statements!), and 3 shows how Power BI has interpreted the natural language query. Finally, it determined that a tabular display was the best visualization to use for this result. However, many are available, and it uses the visualizations available to HTML 5 PowerView to display them. Number 4 shows other visualizations that may be relevant to the result set, and all you need to do is to select them to see them.

A few more results can be seen below.

image

image

image

New Visualizations

Any user of Power View will be familiar with the visualizations shown above. However, the one that brought the biggest applause at the announcement was the “king of the hill” visualization. This shows you a “winner” and “contenders” over a period of time, based upon a defined criteria. When applied to pop music over time, it looks something like this.

image

To get the full effect of this, I highly recommend watching the video starting at about 9 minutes in. Of course, this is a highly specific case, and I have to think that more interesting visualizations are on the way.

Limitations

As with most products, Microsoft giveth and Microsoft taketh away. Without having my hands on the full product, I have no hope of coming up with anything like a comprehensive list of limitations, but there are two that jump to mind immediately.

Power BI is limited to Office 365 only. That’s right, there will be no on premises version of this, at least not at launch. From a technical standpoint, I can’t see any reason why this must be, apart from either resourcing limitations within Microsoft, or strategic direction (or a combination of both). It remains to be seen whether Power BI is a “cloud first” product, or a “cloud only” product. Time will tell. In the meantime, those organizations that are still leery of cloud computing may miss out, but there is still a lot of goodness in the on premises offerings.

BI Sites is a SharePoint App, not a container. The workbooks themselves are still stored within document libraries in SharePoint. the default file size limit for Excel Services is 10 MB, and for SharePoint itself is 50 MB. It’s unclear to me if BI Sites relies on Excel Services (I assume that it does), and if so, your data models have a hard constraint of 10 MB. Even if Excel Services is not the bottleneck, they won’t be any larger than 50 MB. That is simply too small for some models, and will leave them out in the cold. I was told at WPC by a Product Manager that this is changing, but specifics are currently unavailable.

Mobile Apps

The final component (that is still unavailable) of Power BI is a collection of Mobile BI apps. Reports are selected in various BI sites and flagged for mobile use. Once flagged, they appear in the mobile apps and can be interacted with directly.

8358.Mobile BI.png-550x0

As an aside, I borrowed the above image from the Office 365 technology blog. Anyone that has attended my BI presentations in the past year will be familiar with the first data set in the app, a topic near and dear to my heart.

Apps were initially announced for Windows 8 and for iOS. No mention was made of Android or Windows Phone (?!?!), but I have to think that that isn’t far behind. Blackberry? I have no idea, but I wouldn’t hold my breath.

To finalize, I’m very excited about several of the new capabilities that Power BI brings, and the doors that it now opens. It’s another big step in moving Business Intelligence from the realm of the data gurus, and into the hands of those that can actually benefit from it. Once it is available to play with it, I intend to focus on it more thoroughly.

8 Comments