Power BI has garnered a lot of attention in the past few years as the “place to be” for Microsoft business intelligence. Power BI is of course Microsoft’s cloud platform for dashboarding and reporting. On premises, Microsoft’s reporting platform has traditionally been SQL Server Reporting Services (SSRS) and is now joined by Power BI Reporting Server (PBIRS). While there are differences between the two in the way that they are licensed and distributed, the main technical difference is that PBIRS includes SSRS, plus the ability to render both Power BI reports (pbix) and Excel based reports through Excel Online.
PBIRS is therefore able to render all four report types as defined by Microsoft:
RDL (classic SSRS style reports)
PBIX (Power BI Desktop)
RDLX or PBIX (Datazen and Power BI Desktop)
Given that PBIRS is the on-premises reporting platform, and Power BI is the cloud platform, they should theoretically be able to perform the same task. However, this isn’t the case. Currently, there is no way to render paginated (or as I like to refer to them, operational) reports in the cloud. PBIRS can render all report types, but today, the Power BI Service cannot.
While it has been hinted at before, Riccardo explicitly stated that the ability to render paginated reports in the cloud was not only on the roadmap, but it was actively being worked on. This will complete the picture for BI in both the cloud and on premises. This is important because I the ideal world, the choice of on-prem, cloud, or hybrid should be related to the data in question, not the available features. No more standing up an Azure virtual machine to be able to get printable, paginated structured reports.
Riccardo sees this service rolling out in three distinct phases. Initially for the preview, RDL reports will be able to connect to cloud data sources like Azure SQL Database, Azure Data Warehouse, Azure Analysis Services, and Power BI data models. The next stage will leverage the On Premises Data Gateway in order to connect to on-premises data like SQL Server, Oracle, Teradata and more. The next phase will begin to leverage Power Query to connect to the world of data sources that Power Query Offers.
With paginated reports adopting Power Query, and Excel having moved to Power Query as a default, it is hard to argue with making an investment in learning Power Query. It all seems to be coming together.
One of the top requested features for SSRS on-premises for year has been the ability to authenticate to it with claims-based authentication. This will be the native authentication mechanism for paginated reports in the service. When this is combined with the fact that these reports will ultimately leverage Power Query, and the vast number of data connection options that it brings, it’s not hard to wonder when these features will be coming down to the on-premises product. When asked about this, Riccardo confirmed that this is certainly the vision for the product.
The future looks bright for both cloud based, and on-premises reporting.
Power BI has direct support for reporting on SharePoint lists and documents whether SharePoint is on-premises or in the cloud. Getting at your data is relatively straightforward if you know the URL, but as I’ve written about before, SharePoint data can be idiosyncratic. Text and number fields work right away, but more complex data types require a little more thought, and this is certainly true of multi value fields.
Multi value fields are really SharePoint’s way of approximating the behavior of a one-to-many relationship of a relational database. SharePoint can store list-based data, and even though it has the concept of a lookup field, it is not a relational database. Chances are however that if we’re reporting on this type of field, we want it to behave as though it were. Fortunately, Power BI gives us several options for doing this.
Consider the following list that contains a multi-value choice field named Amenities:
In the view, the different values are flattened, and displayed as a single value using a comma to separate them. How do we go about reporting on this data in Power BI? There’s no right answer, as it depends on the report requirements, but there are several options. In all cases the data first needs to be brought into Power BI Desktop.
Loading the Data
We first launch Power BI Desktop, select “Get Data” and then choose SharePoint Online list (if connecting to SharePoint Online) or SharePoint List (if using SharePoint Server). We are then prompted for the URL of the SharePoint Site. The dialog is titled SharePoint lists, but the value is actually the URL of the site, NOT the list itself. Once this is entered we are prompted for credentials if we haven’t connected to this site before. After entering credentials, we can select the list that we want to report on. In our case, it’s named “Listings”. We select it, and then click on the Edit button.
Once the data loads in, one of the first things that you’ll notice is that there are a lot of columns to choose from, and it’s a good idea to remove the data that you don’t need. In this case, we can remove the ContentTypeId column and everything to the right of it, with one important exception We want to keep the FieldValuesAsText column (we’ll come back to that shortly). Remember, for our purposes here, we want to report on the multi-value column, Amenities. The simplest way to represent this data is as a string of comma separated values, which is the same way it is shown in a SharePoint view. It is also possible to use this data in a more sliceable, or structured way. Let’s start with the simplest.
Extracting a Simple Text Field
If our report is to show all the amenities associated with a listing, then a simple text string will do the job. Examining the data in the Query Editor, you will notice that there are no values displayed for Amenity, the way that there is for simpler field types like “Sold” or “Asking Price”. Instead, each cell contains a “List Value”. Each of these Lists contains the Amenities values for each record and clicking on one will extract those values for that one item. However, of we want to extract the values for all items, so we need to expand the column by clicking on the expand icon in the column header.
We are then given two options, “Expand to New Rows” or “Extract Value”. To extract the values into a single value, we select “Extract Values”
We are then given the opportunity to choose a delimiter, and after that the values in the column are replace with simple text values.
Another way to accomplish this same thing is to use the FieldValuesAsText column (referenced above). The fields in this column contain a single record containing the textual values for all the complex data types in a SharePoint list. If you are doing reporting with SharePoint data in Power BI, FieldValuesAsText is your friend. To use it, we just click the expand icon in its column header, deselect all of the columns that we don’t need (which in our case is everything except Amenities) and click OK. We are returned the contents of the list in a comma separated string for each item.
At this point, we can report on the data.
We click on Close and Apply in the ribbon to return to the report canvas. We add a table visual, and add some of the fields like address, street number, and Amenities to it. You’ll see that the table looks much like it would in a SharePoint list. Next, we add a slicer to the report page, and use one of the other columns, like City. Clicking on a city will filter the table by that city as expected. Now, if we want to filter by Amenity, we can add the Amenity column to the slicer, but it won’t behave the way we want it to. Each combination of amenity values will be represented as a single value. We are unable to slice on a single value like “Garage”.
Flattened multi value field as slicer
To use discrete values for our multi-value field, we need to do a little more work in the Get Data (Power Query) interface.
Extracting Discrete Values from a Multi-Value Field
We follow the initial steps from the “Simple Text Value” section above, but when the expand icon is selected, we select “Expand to new rows” and NOT “Extract values”. When this is done, each row will be copied to accommodate all of the individual Amenity values. If there were 3 Amenity values, we wind up with three rows of identical values, but with each of the values for Amenity.
Expanding to new rows
We can now go to the design surface and add a table, and a slicer that uses the multi value column, and now we can filter on discrete values. This may be sufficient in many cases. However, if we create a measure that based on an aggregate value for the table (like COUNTROWS or SUM), the value will not be correct if there are no slicer values. It works with all of the duplicated rows.
Incorrect number of listings
Again, this approach may work in many situations, but for ultimate flexibility, we need a true one-to-many relationship. Luckily, Power Query gives us a few simple tools to do this.
Creating a One-to-Many Relationship
Again, we follow the initial steps from the “Simple Text Value” section above, but do not click on the expand icon in the Amenities column. A One-to-many relationship requires at least two tables, so now is the point where we need to add a second one. We do so by copying the first. We right click on the query in the Queries pane and select “Duplicate”.
This creates a second query identical to our first, which will load into another table. We can then give the new query a better name, like “Amenities” in our case. Next, we select the Id, and the Amenities columns, right click and select “Remove Other Columns”, and we are left with only the ID and Amenities columns. Now we select the expand icon and select “Expand to New Rows”.
We now go back to the original query and remove the Amenities column. The result is two tables, Listings and Amenities, with Listings containing the bulk of the information. We then click on Close and Apply in the ribbon to load the data into the model, and the relationships icon to take is to the relationship builder. We need to establish a relationship between the two tables, and we do so by clicking on the Id field in one table and dragging it to the Id field of the other. Finally, double click on the relationship line between the tables, and be sure to select “Both” for Cross Filter Direction – that way Amenity values can filter List values. If “Both” is selected, the directional arrow will point two ways.
Now, the same visuals from above can be added using the Amenity column from the second table as a slicer, and everything should work as expected. Listings will be filtered by the value, if no amenities are selected, the correct number of listings will be shown In the data card, and no duplication will appear in the table.
SharePoint multi-value columns are an attempt to replicate the capability of a relational database. With a few simple steps, Power BI can take SharePoint data, normalize it, and analyze it as if it were a true relational database.
Although I typically advise against it, there are valid reasons to report on SharePoint list data directly. Power BI Desktop makes this data quite easy to access – you can use the built in connectors for SharePoint or SharePoint Online, or, due to the fact that any SharePoint list is available via OData, you can also use the OData data connector. Microsoft has recently made improvements to both methods, but the SharePoint connectors bring some significant usability advantages. One of these advantages is what I’m calling “summary columns”.
Consider the following SharePoint list with different field types:
Connecting to this list with Power BI desktop and editing the query returns all of the list fields regardless of their visibility in the user interface. Assuming that we want to work with the above field values for analysis purposes, we can discard the fields that we don’t need, and reorder the remaining, leaving us with only these fields.
However, you can immediately see that we’ll need to do some work in order to get our data in a usable format.
The title field is simple enough – it’s value is immediately available, no issues there, and no changes are necessary. From the Name field, several columns are returned. Two are the ID of the name in the site collection’s user list. It would be possible to connect to the root of the site collection, retrieve the user list, and establish a relationship between the tables, but clicking the expand icon will allow Power Query to do a lookup for each ID and return the desired attribute, in this case, Name.
The same is true for the lookup field type – the column can be expanded in order to include any attribute of the source item. The field using managed metadata works in a similar fashion; it can be expanded in order to retrieve the text value of the managed metadata item. Link fields work the same way – the can be split into two columns, the link itself and the description. However, the field containing multiple values is a little different. The great news here is that it’s possible – previous versions of Power Query couldn’t work with multi value fields, but now the SharePoint data source supports it.
Multiple value field values are returned as a list. With list items, the expand icon will duplicate the entire record for each value on the list. This may or may not be the desired behaviour, but remember, this is Power BI. Everything can be aggregated.
The conclusion to be drawn here is that in order to represent SharePoint list data that is using any sort of control more complex than text or number, we need to do some work. However, the good news is that someone (I’m not sure if it’s the Office Team of the Power BI team) has added a feature that makes this whole process much simpler.
After connecting to your SharePoint list, edit the query. Instead of diving in and performing all of these manual transforms, select your multi value column(s) if you have any (this will make more sense momentarily). Select any rich text columns as well, and then scroll right to find a column named “FieldValuesAsText”. Select this column, then right click on it and select “Remove Other Columns”.
The FieldValuesAsText column is our magic bullet. It automatically converts most (rich text fields being the exception) of the more complex SharePoint data types to simple text that work well within Power BI. Simple click on its expand button, select the columns that you want to include in your analysis. I find it useful to deselect “Use original column name as prefix” as well. We are left with textual representations of our field data.
You will notice that the multiple value fields here have their values separated by commas. For multiple values, I tend to prefer the “raw” approach, which is why we retained the multiple value column above. We can still expand it and create a separate line for each value, and remove the column created by “FieldValuesAsText”.
Finally, you may have noted that the Rich Text field isn’t automatically converted. In order to extract useful text from it, we still need to use Power Queries transformation functions such as Replace Value, Trim, and Clean.
In a nutshell, if you’ve been frustrated by formatting or data type limitations when using SharePoint data in Power BI, have another look, and check out the FieldValuesAsText column. It will make everything a lot simpler.
Excel has been used with external data for… well, as long as I’ve been using Excel. So why would anyone bother to write a blog post about this given that the capability is so mature? In recent years, Excel has adopted a number of new, and frankly better mechanisms for working with external data, while retaining the old. Given that there are now multiple tools in Excel for working with external data, it’s not always clear as to which one is the best, and unfortunately there is no single tool that wins over all, although I believe that that will be the case soon.
The answer, as always is, “it depends”. When it depends, the important thing is to understand the strengths and weaknesses of each approach. With that said, let’s have a look at all of the options.
ODC (Office Data Connections) are the traditional method of accessing data in Excel. You can create or reuse an ODC connection from the Data tab in the Excel ribbon.
When using an ODC connection, you establish a connection with a data source, form some sort of query and import the resultant data directly into the Excel workbook. From there, the data can be manipulated and shaped in order to support whatever the end user is trying to do. The one exception to this behaviour is the connection to SQL Server Analysis Services (SSAS). When a connection is made to SSAS, only the connection is created. No data is returned until an analysis is performed (through a pivot table, chart etc), and then only the query results are retrieved.
When the workbook using an ODC connection is saved, the data is saved within it. In the case of an SSAS connected workbook, the results of the last analysis are saved along with it. For small amounts of data, this is just fine, but any large analysis is bound to quickly run into the data limits in Excel which is 1,048,576 rows by 16,384 columns in Excel 2013. In addition such a file is very large and extremely cumbersome to work with, but even as such, Excel has been the primary tool of choice for business analysts for years.
Data loaded into the workbook can be refreshed on demand, but it can also be altered, shaped, mashed up, and as is too often the case, grow stale. Workbooks such as these have become known as “spreadmarts” and are the scourge of IT and business alike. With these spreadmarts, we have multiple versions of the same data being proliferated, and it becomes harder to discern which data is most accurate/current, not to mention the governance implications.
SharePoint has provided a way to mitigate some of the concerns with these connections. SharePoint itself supports ODC connections, and therefore users can access these workbooks stored within SharePoint and it also allows them to refresh data from the source either on demand or on open. A single point of storage along with a measure of oversight and browser access helps to restore a modicum of sanity to an out of control spreadmart environment, but the core issues remain.
In order to help with the core issues, Microsoft introduced PowerPivot in 2009.
Created in PowerPivot
PowerPivot was originally (and still is) an add-in to Excel 2010, and is a built in add-in to Excel 2013. PowerPivot allows for the analysis of massive amounts of data within Excel, limited only by the memory available to the user’s machine (assuming a 64 bit version). It does this by highly compressing data in memory using columnar compression. The end result is that literally hundreds of millions of rows of data can be analyzed efficiently from within Excel.
You can see that compression at work by comparing the same data imported into an Excel workbook directly, and into a PowerPivot model with a workbook. The following two files contain election data, and represent the maximum number of rows that Excel can handle directly (1,048,576) and 25 columns.
Getting data into the model was originally (and still can be) a completely separate process from bringing it into Excel. PowerPivot has its own data import mechanism, accessed from the Power Pivot window itself. First, click on the PowerPivot tab in Excel and then click manage. If you don’t have a PowerPivot tab, you will need to enable the add-in. If you don’t have the add-in, you have an earlier version of Excel – you’ll need to download it.
Once the PowerPivot window opens, the “Get External Data” option is on the ribbon.
Once the appropriate data source is selected and configured, data will be loaded directly into the data model – there is no option to import that data into a worksheet. Once the data is in, pivot tables and pivot charts can be added to the workbook that connect to the data model much like when creating an ODC connection to Analysis Services. In fact, it’s pretty much exactly like connecting to Analysis services, except that the AS process is running on the workstation.
Created in Excel
PowerPivot, and more importantly the tabular data model was included in Excel 2013. With that addition, Microsoft added a few features to make the process of getting data into the data model a little easier for users that were a little less tech savvy, and may be uncomfortable working with a separate PowerPivot window. That’s actually part of the thinking in leaving the PowerPivot add-on turned off by default.
When a user creates an ODC connection as outlined above, there are a couple of new options in Excel 2013. First, the “Select Table” dialog has a new checkbox – “Enable selection of multiple tables”.
When this option is selected, more than one table from the data source can be selected simultaneously, but more importantly, the data will automatically be sent to the data model in addition to any other import destinations.
Even if the multiple selection option wasn’t chosen, the next dialog in the import process, “Import Data” also has a new check box – “Add this data to the Data Model”.
Its purpose is pretty self-explanatory. It should be noted that if you choose this option, and also choose “Only Create Connection”, the data will ONLY be added to the model, nowhere else in the workbook. This is functionally equivalent to doing the import from the PowerPivot window, without enabling the add-in.
Power Query Connections
When Power BI was originally announced, Power Query was also announced and included as a component. This was very much a marketing distinction, as Power Query exists in its own right, and does not require a Power BI license to use. It is available as an add-on to both Excel 2010 and 2013, and will be included with Excel 2016.
Power Query brings some Extract, Transform and Load (ETL) muscle to the Excel data acquisition story. Data can be not only imported and filtered, but also transformed with Power Query and its powerful M language. Power Query brings many features to the table, but this article is focused on its use as a data acquisition tool.
To use Power Query, it must first be downloaded and installed. Once installed, it is available from the Power Query tab (Excel 2010 and 2013).
Or from the data tab, New Query (Excel 2016)
Once the desired data source is selected, the query can be edited, or loaded into either the workbook, the data model, or both simultaneously. To load without editing the query, the load option at the bottom of the import dialog is selected.
Selecting “Load To” will allow you to select the destination for the data – the workbook, the model or both. Selecting Load will import the data to the default destination, which is by default the workbook. Given the fact that the workbook is an inefficient destination for data, I always recommend that you change their default settings for Power Query.
To do so, select Options from the Power Query tab (2010 and 2013) or the New Query button (2016), click the Data Load section, and then specify your default settings.
Data Refresh Options
In almost every case when external data is analyzed, it will need to be refreshed on a periodic basis. Within the Excel Client, this is simple enough – click on the data tab, and then the Refresh All button, or refresh a specific connection. This works no matter what method was used to import the data in the first place. Excel data connections can also be configured to refresh automatically every time the workbook is opened, or on a periodic basis in the background.
However, workbooks can also be used in a browser through Office Web Apps and Excel Services (SharePoint and Office 365) or as a data source for Power BI dashboards. In these cases the workbooks need to be refreshed automatically in order that the consuming users will see the most up to data when the workbooks are opened. The tricky part is that not all of the connection types listed above are supported by all of the servers or services. Let’s dive in to what works with what.
SharePoint with Excel Services
Excel Services first shipped with SharePoint 2007, is a part of 2007, 2010, and will be included with 2016. From the beginning, Excel Services allowed browser users to view and interact with Excel workbooks, including workbooks that were connected to back end data. The connection type supported by Excel Services is ODC, and ODC only.
Excel Services has no mechanism for maintaining data refresh. However, the data connection refresh options are supported which means that the workbook can be automatically refreshed when opened, or on a scheduled basis (every xxx minutes in the background). Unfortunately, this can come with a significant performance penalty, and once refreshed it is only in memory. The workbook in the library is not updated. The data in the workbook can only be changed by editing the workbook in the client, refreshing it, and re-saving it
Workbooks with embedded data models (PowerPivot) can be opened in the browser, but any attempt to interact with the model (selecting a filter, slicer, etc) will result in an error unless PowerPivot for SharePoint has been configured.
SharePoint with Excel Services and PowerPivot for SharePoint
PowerPivot for SharePoint is a combination of a SharePoint Service application and Analysis Services SharePoint mode. When installed, it allows workbooks that have embedded PowerPivot data models to be interacted with through a browser. The way that it works is that when such a workbook is initially interacted, the embedded model is automatically “promoted” to the Analysis Services instance, and a connection is made with it, thus allowing the consuming user to work with it in the same manner as with a SSAS connected workbook,
The PowerPivot for SharePoint service application runs on a SharePoint server and allows for individual workbooks to be automatically refreshed on a scheduled basis. The schedule can be no more granular than once per day, but the actual data within the model on disk is updated, along with any Excel visualizations connected to it.
When the refresh process runs, it is the functional equivalent of editing the file in the client, selecting refresh all, and saving it back to the library. However, there is one significant difference. The Excel client will refresh all connection types, but the PowerPivot for SharePoint process does not understand Power Query connections. It can only handle those created through the Excel or PowerPivot interfaces.
Power Pivot for SharePoint ships on SQL Server media, and this limitation is still true as of SQL Server 2014. At the Ignite 2015 conference in Chicago, one of the promised enhancements was Power Query support in the SharePoint 2016 timeframe.
Office 365, or more precisely, SharePoint Online supports Excel workbooks with ODC connections and PowerPivot embedded models in a browser. These workbooks can even be refreshed if the data source is online (SQL Azure), but they cannot be refreshed automatically. In addition, only ODC and PowerPivot connections are supported for manual refresh. Power Query connections require Power BI for Office 365. In addition, Office 365 imposes a 30 MB model size limit – beyond that, the Excel client must be used. In short, the Office 365 data refresh options are very limited.
Power BI for Office 365
Power BI for Office brings the ability to automatically refresh workbooks with embedded data models. Data sources can be on premises or in the cloud. On premises refresh is achieved through the use of the Data Management Gateway. It also raises Office 365’s model size limit from 30 MB to 250 MB. With Power BI for Office 365 both manual and automatic refreshes can be performed for both PowerPivot and Power Query connections, however Power Pivot connections are currently restricted to SQL Server and Oracle only.
The automatic refresh of ODC connections is not supported. A workbook must contain a data model in order to be enabled for Power BI.
Power BI Dashboards
Power BI Dashboards is a new service, allowing users to design dashboards without necessarily having Office 365 or even Excel. It is currently in preview form, so anything said here is subject to change. It is fundamentally based on the data model and it works with Excel files as a data source currently, and it is promised to use Excel as a report source as well. The service has the ability to automatically refresh the underlying Excel files on a periodic basis more frequent than daily.
In order for a workbook to be refreshed by Power BI, it must (at present) be stored in a OneDrive or OneDrive for Business container. It also must utilize either a PowerPivot, or a Power Query connection. At present, the data source must also be cloud based (ie SQL Azure) but on premises connectivity has been promised.
SQL Server Analysis Services
Another consideration, while not a platform for workbooks is SQL Server Analysis Services (SSAS). Excel can be used to design and build a data model, and that data model can at any time be imported into SSAS. As of version 2014, SSAS fully supports all connection types for import – ODC, PowerPivot and Power Query. Once a data model has been imported into SSAS, it can be refreshed on a schedule as often as desired, and you can connect to it with Excel, and share it in SharePoint. You can also connect to it in Power BI Dashboards through the SSAS connector. From both a flexibility and power standpoint, this is the best option, but it does require additional resources and complexity.
Refresh Compatibility Summary
For convenience, the table below summarizes the refresh options for the different connection types.
SQL Server Analysis Services Import
Office 365 with Power BI
Power BI Dashboards
M – Manual refresh
A – Both Manual and Automatic Refresh
* only limited data sources
The Right Tool
I started out above by saying that the selection of import tool would depend on circumstances, and that is certainly true. However, based on the capabilities and the restrictions of each, I believe that a few rules of thumb can be derived. As always, these will change over time as technology evolves.
Always use the internal Data Model (PowerPivot) when importing data for analysis.
Power Query is the future – use it wherever possible
All of Microsoft’s energies around ETL and data import are going into Power Query. Power Query is core to Power BI, and announcements at the Ignite Conference indicate that Power Query is being added to both SQL Server Integration Services and to SQL Server Reporting Services. Keep in mind that we have been discussing only the data retrieval side of Power Query – it has a full set of ETL capabilities as well, which should also be considered.
PowerPivot or ODC Connections must be used on premises
PowerPivot for SharePoint does not support Power Query for refresh. This means that you MUST use PowerPivot connections for workbooks with embedded models. If you are already using SSAS, use an ODC connection within Excel.
Power Query or PowerPivot must be used for cloud BI.
PowerPivot connections will work for a few limited cases, but more Power Query support is being added constantly. Where possible, invest in Power Query
If on-premises, consider importing your models into SSAS
SSAS already supports Power Query. If, instead of using PowerPivot for SharePoint, Analysts build their models using Excel and Power Query, they can be “promoted” into SSAS. All that is then required is to connect a new workbook to the SSAS server with an ODC connection for end users. The Power Query workbooks can be used in the cloud, and the SSAS connector in Power BI Dashboard can directly use the SSAS models created.
Choose wisely. Changing the connection type often requires rebuilding the data model, which in many cases is no small feat.
In summary, when importing data into Excel, the preferred destination is the tabular model, and to import data into that model, Power Query is the preferred choice. The only exception to this is on premises deployments. In these environments, consideration should be given to connecting to a SSAS server, and failing that, PowerPivot imports are the best option.
Yesterday Microsoft announced the next step in the evolution of Power BI. It’s getting quite a bit of attention, and rightly so for its aim of bringing Business Intelligence closer to users. Democratizing BI has always proved a challenge – it’s the realm of the gurus in the white coats that hold the keys to the data. Microsoft is aiming to accomplish this democratization through a combination of user focus, and as of yesterday, a drastic change in its pricing model. Power BI just went from about $40 per user per month, to free, or $9.99/user/month for advanced capabilities. That’s quite a drop, and arguably the biggest announcement from yesterday – it will have a massive impact. The detailed price breakdown can be found here.
However, all of the focus around personal BI is, in my opinion, missing a key component. Power BI and its components have always focused squarely on both personal and team BI solutions. That is to say the ability for a power user to model data, visualize it quickly and easily and to share it out with fellow team members. While that capability is certainly retained in the new Power BI, this new version contains the first appearance of enterprise grade BI in the cloud for Microsoft.
To fully understand this, it’s necessary to touch on the Microsoft BI stack as it stands today.
Microsoft BI On Premises
The On-Premises BI story from Microsoft may be confusing, and occasionally difficult to understand, but it is very powerful, and relatively complete. In a nutshell, the story is good from a personal, team and enterprise perspective.
On the enterprise side, there are products from both the SQL Server team, and the Office team. Data warehousing is served by SQL Server and ETL duties fall to SQL Server Integration Services (SSIS). Multidimensional analysis storage is served by SQL Server Analysis Services in both OLAP and Tabular modes, and Reporting is performed by SQL Server Reporting Services (SSRS). The SQL product line doesn’t have much on the client side for analysis apart from SSRS, but this slack is taken up by the analysis tools available in Excel, and through Performance Point services in SharePoint.
Indeed, SharePoint also provides a platform for SSRS via SSRS SharePoint mode, and for Excel based analytical workbooks connected to SQL Server and to SSAS through Excel Services.
On the personal BI side, that role has traditionally fallen to Excel. The pitfalls of importing data into Excel workbooks for analysis are well documented and don’t need to be discussed here, but the bulk of those issues were addressed with the introduction of PowerPivot several years ago. PowerPivot allows for massive amounts of data to be cached within the Excel file for analysis without any data integrity concerns. The addition in recent years of analytic visuals (Power View, Power Map) and ETL capabilities (Power Query) have further rounded out the offering.
Taking that Excel workbook and sharing it brings us into the realm of Team BI. This is to say that the analyses are relatively modest in size, and of interest to a targeted group. These models may not require the rigour or reliability associated with enterprise BI models. Once again, the technology involved here is SharePoint. A user can take a workbook with an embedded PowerPivot model, share it through a SharePoint library, and other users can interact with that embedded model using only a browser. This capability requires PowerPivot for SharePoint, which is really a specialized version of SSAS, along with a SharePoint service application.
One thing to note about these seemingly disparate approaches is that a power user can build a Power Pivot data model with Excel, share it to a team via SharePoint, and when it requires sufficient rigour or management, it can be “upgraded” into SSAS in tabular mode. This common model approach is powerful, and is key to understanding Microsoft’s entire BI strategy. You can also see here that SharePoint straddles the two worlds of team and enterprise BI.
Moving to the cloud
The BI workload is one of the last Microsoft workloads to move to the cloud, and with good reason. Massive amounts of data present problems of scale, and security or data sovereignty concerns tend to keep data on premises. However, there is a very real need to provide BI to users outside of the firewall.
SharePoint is the hub of BI on prem, so it’s logical to assume that with SharePoint Online, it could continue to perform that function in the cloud. The big catch here is that on-prem, SharePoint is simply the display platform. In the enterprise scenario, users connect through SharePoint to the back end servers. This isn’t an option in the cloud, so enterprise BI was left off the table.
With the personal and team BI scenarios, data is cached in a Power Pivot data model, which could be supported in the cloud. When Office 365 moved to the SharePoint 2013 code base for SharePoint online, rudimentary support for embedded Power Pivot models was indeed added. Essentially PowerPivot for SharePoint “light” was added. I call it light for two major reasons. Firstly, data models could be no larger than 10 MB. Secondly, there was no way to update the data contained within the Power Pivot cache, outside of re-uploading the Excel workbook. This is still true without a Power BI license. The inability to refresh the data renders team BI almost useless, except in static data scenarios.
The first generation of Power BI changed all of that. With a Power BI license, it was possible to install a Data Management Gateway on premises that would connect to team BI workbooks in Office 365 and update them on a scheduled basis. Yes, the gateway had many limitations (many of which have been removed over time), but finally, the on-prem refresh story was solved. In addition, the model size limit was increased to 250 MB. However, we were still left with a number of problems or limitations.
Daily data refresh schedule. Automatic data refreshes could be daily at their most frequent. Manual refreshes could be done anytime
Capacity. The maximum size of a data model was increased to 250 MB, which is relatively small for enterprise scenarios. In addition, refreshes aren’t differential, which means that the entire model is re-uploaded on every refresh
Data sensitivity/sovereignty. The refresh problem was solved, but because the data is still cached in the workbooks, there can be reluctance to sending it outside of the corporate firewall
Per User Security – Power Pivot data models have no concept of user security in a workbook (tabular models in SSAS do). Security is at the workbook level
Cost. This initial cost of Power BI was $40 per user per month. A power BI license was required to interact with any workbook that had a data model larger than 10 MB. Considering that a full Office 365 E3 license was around $25 per user per month, this price tended to limit the audience for sharing.
All of this is to say that Power BI in its first (and as yet current) incarnation is suitable for personal and team BI only. There has been no enterprise cloud BI story.
Power BI V2
The announcements yesterday outlined the next generation of Power BI. Going forward, Power BI will be available as a standalone offering, at the price points offered above. Office 365 users will continue to be able to use it from Office 365, but Office 365 will no longer be required to use it. In it’s early days, Power BI was a SharePoint app, but a careful examination of URLs in the current offering quickly reveals that it’s actually two apps currently, both running on Azure (not in SharePoint).
If you’ve signed up for the new Power BI preview, you may notice that the URL is http://app.powerbi.com/…… so this move isn’t a big surprise.
With the new model, Excel is no longer the central container. Users connect to data and publish it directly to Power BI. Behind the scenes, the service is doing a very similar thing as what it does with Power Pivot models – it’s storing them in SSAS. In fact, the same limits still apply – 250 MB per model (at least for now) Excel can still be used, but now it is as a data source.
Visualizations are performed through Power Views, and data is acquired through Power Query. These are no longer add-ons, but available on their own through Power BI Designer. This decoupling is good for those that have not made an investment in SharePoint Online, or Excel.
These changes to the architecture and the cost are great news for adoption, but don’t address the needs of the enterprise. Except for one thing – The SSAS Connector.
One of the data sources available to the new Power BI is the SSAS data connector. This connector is a piece of code that runs on premises (it actually includes the Data Management Gateway). It acts as a bridge between the Power BI service, and an on prem SSAS server.
The biggest distinction worth noting is that with the gateway, data is NOT being uploaded to the service, it remains on prem. The way that it works is that when a user interacts with a visualization from the cloud, a query is sent to the SSAS server through the gateway. That query is run, and its results sent back to the user’s visualization, and the data is not persisted.
In addition, when the query is sent back to the SSAS it is run with the permission of the user making the request. This is accomplished through the EFFECTIVEUSERNAME feature in SSAS. This provides for full user level security, and since tabular models in SSAS can utilize per user security, we no longer need to rely on proxy accounts/document level security.
Finally, because the data is being stored in an on prem SSAS server, it can be refreshed automatically as often as desired. For the same reason, we have no capacity limits – you can grow your own SSAS servers as large as you like.
The SSAS connector removes most of the limitations that prevent cloud based enterprise Business Intelligence, and the new pricing model removes the rest. Certainly there are going to be feature limits in the near term, but it appears to me at least that the back of this thorny problem has finally been cracked.