Tag Archives: Power Map

The New Power BI – Now With Enterprise!

Yesterday Microsoft announced the next step in the evolution of Power BI. It’s getting quite a bit of attention, and rightly so for its aim of bringing Business Intelligence closer to users. Democratizing BI has always proved a challenge – it’s the realm of the gurus in the white coats that hold the keys to the data. Microsoft is aiming to accomplish this democratization through a combination of user focus, and as of yesterday, a drastic change in its pricing model. Power BI just went from about $40 per user per month, to free, or $9.99/user/month for advanced capabilities. That’s quite a drop, and arguably the biggest announcement from yesterday – it will have a massive impact. The detailed price breakdown can be found here.

However, all of the focus around personal BI is, in my opinion, missing a key component. Power BI and its components have always focused squarely on both personal and team BI solutions. That is to say the ability for a power user to model data, visualize it quickly and easily and to share it out with fellow team members. While that capability is certainly retained in the new Power BI, this new version contains the first appearance of enterprise grade BI in the cloud for Microsoft.

To fully understand this, it’s necessary to touch on the Microsoft BI stack as it stands today.

Microsoft BI On Premises

The On-Premises BI story from Microsoft may be confusing, and occasionally difficult to understand, but it is very powerful, and relatively complete. In a nutshell, the story is good from a personal, team and enterprise perspective.

On the enterprise side, there are products from both the SQL Server team, and the Office team. Data warehousing is served by SQL Server and ETL duties fall to SQL Server Integration Services (SSIS). Multidimensional analysis storage is served by SQL Server Analysis Services in both OLAP and Tabular modes, and Reporting is performed by SQL Server Reporting Services (SSRS). The SQL product line doesn’t have much on the client side for analysis apart from SSRS, but this slack is taken up by the analysis tools available in Excel, and through Performance Point services in SharePoint.

Indeed, SharePoint also provides a platform for SSRS via SSRS SharePoint mode, and for Excel based analytical workbooks connected to SQL Server and to SSAS through Excel Services.

On the personal BI side, that role has traditionally fallen to Excel. The pitfalls of importing data into Excel workbooks for analysis are well documented and don’t need to be discussed here, but the bulk of those issues were addressed with the introduction of PowerPivot several years ago. PowerPivot allows for massive amounts of data to be cached within the Excel file for analysis without any data integrity concerns. The addition in recent years of  analytic visuals (Power View, Power Map) and ETL capabilities (Power Query) have further rounded out the offering.

Taking that Excel workbook and sharing it brings us into the realm of Team BI. This is to say that the analyses are relatively modest in size, and of interest to a targeted group. These models may not require the rigour or reliability associated with enterprise BI models. Once again, the technology involved here is SharePoint. A user can take a workbook with an embedded PowerPivot model, share it through a SharePoint library, and other users can interact with that embedded model using only a browser. This capability requires PowerPivot for SharePoint, which is really a specialized version of SSAS, along with a SharePoint service application.

One thing to note about these seemingly disparate approaches is that a power user can build a Power Pivot data model with Excel, share it to a team via SharePoint, and when it requires sufficient rigour or management, it can be “upgraded” into SSAS in tabular mode. This common model approach is powerful, and is key to understanding Microsoft’s entire BI strategy. You can also see here that SharePoint straddles the two worlds of team and enterprise BI.

Moving to the cloud

The BI workload is one of the last Microsoft workloads to move to the cloud, and with good reason. Massive amounts of data present problems of scale, and security or data sovereignty concerns tend to keep data on premises. However, there is a very real need to provide BI to users outside of the firewall.

SharePoint is the hub of BI on prem, so it’s logical to assume that with SharePoint Online, it could continue to perform that function in the cloud. The big catch here is that on-prem, SharePoint is simply the display platform. In the enterprise scenario, users connect through SharePoint to the back end servers. This isn’t an option in the cloud, so enterprise BI was left off the table.

With the personal and team BI scenarios, data is cached in a Power Pivot data model, which could be supported in the cloud. When Office 365 moved to the SharePoint 2013 code base for SharePoint online, rudimentary support for embedded Power Pivot models was indeed added. Essentially PowerPivot for SharePoint “light” was added. I call it light for two major reasons. Firstly, data models could be no larger than 10 MB. Secondly, there was no way to update the data contained within the Power Pivot cache, outside of re-uploading the Excel workbook. This is still true without a Power BI license. The inability to refresh the data renders team BI almost useless, except in static data scenarios.

The first generation of Power BI changed all of that. With a Power BI license, it was possible to install a Data Management Gateway on premises that would connect to team BI workbooks in Office 365 and update them on a scheduled basis. Yes, the gateway had many limitations (many of which have been removed over time), but finally, the on-prem refresh story was solved. In addition, the model size limit was increased to 250 MB. However, we were still left with a number of problems or limitations.

  1. Daily data refresh schedule. Automatic data refreshes could be daily at their most frequent. Manual refreshes could be done anytime
  2. Capacity. The maximum size of a data model was increased to 250 MB, which is relatively small for enterprise scenarios. In addition, refreshes aren’t differential, which means that the entire model is re-uploaded on every refresh
  3. Data sensitivity/sovereignty.  The refresh problem was solved, but because the data is still cached in the workbooks, there can be reluctance to sending it outside of the corporate firewall
  4. Per User Security – Power Pivot data models have no concept of user security in a workbook (tabular models in SSAS do). Security is at the workbook level
  5. Cost. This initial cost of Power BI was $40 per user per month. A power BI license was required to interact with any workbook that had a data model larger than 10 MB. Considering that a full Office 365 E3 license was around $25 per user per month, this price tended to limit the audience for sharing.

All of this is to say that Power BI in its first (and as yet current) incarnation is suitable for personal and team BI only. There has been no enterprise cloud BI story.

Power BI V2

The announcements yesterday outlined the next generation of Power BI. Going forward, Power BI will be available as a standalone offering, at the price points offered above. Office 365 users will continue to be able to use it from Office 365, but Office 365 will no longer be required to use it. In it’s early days, Power BI was a SharePoint app, but a careful examination of URLs in the current offering quickly reveals that it’s actually two apps currently, both running on Azure (not in SharePoint).

If you’ve signed up for the new Power BI preview, you may notice that the URL is http://app.powerbi.com/…… so this move isn’t a big surprise.

With the new model, Excel is no longer the central container. Users connect to data and publish it directly to Power BI. Behind the scenes, the service is doing a very similar thing as what it does with Power Pivot models – it’s storing them in SSAS. In fact, the same limits still apply – 250 MB per model (at least for now) Excel can still be used, but now it is as a data source.

Visualizations are performed through Power Views, and data is acquired through Power Query. These are no longer add-ons, but available on their own through Power BI Designer. This decoupling is good for those that have not made an investment in SharePoint Online, or Excel.

These changes to the architecture and the cost are great news for adoption, but don’t address the needs of the enterprise. Except for one thing – The SSAS Connector.

image

One of the data sources available to the new Power BI is the SSAS data connector. This connector is a piece of code that runs on premises (it actually includes the Data Management Gateway). It acts as a bridge between the Power BI service, and an on prem SSAS server.

The biggest distinction worth noting is that with the gateway, data is NOT being uploaded to the service, it remains on prem. The way that it works is that when a user interacts with a visualization from the cloud, a query is sent to the SSAS server through the gateway. That query is run, and its results sent back to the user’s visualization, and the data is not persisted.

In addition, when the query is sent back to the SSAS it is run with the permission of the user making the request. This is accomplished through the EFFECTIVEUSERNAME feature in SSAS. This provides for full user level security, and since tabular models in SSAS can utilize per user security, we no longer need to rely on proxy accounts/document level security.

Finally, because the data is being stored in an on prem SSAS server, it can be refreshed automatically as often as desired. For the same reason, we have no capacity limits – you can grow your own SSAS servers as large as you like.

The SSAS connector removes most of the limitations that prevent cloud based enterprise Business Intelligence, and the new pricing model removes the rest. Certainly there are going to be feature limits in the near term, but it appears to me at least that the back of this thorny problem has finally been cracked.

Power BI as a Product Today

Recently, I have come across several situations where people are confused about where Power BI fits in a solution scenario. There is a fair bit of confusion as to precisely what the product is and what it does. The problem is that Power BI isn’t really a product at all, but instead a collection of different products and services. Adding to the confusion is the fact that some of these products require a Power BI license, while others do not. In fact some of these products are actually embedded in other products.

Power BI is Microsoft’s cloud based Business Intelligence solution billed as “Self service analytics for all of your data”. In reality, it’s a little more than self service, it also is a great solution for team BI as it’s based on Office 365. That’s all well and good, but what is it really? What does it consist of, and how does it work? If you look at the main product site for Power BI, it’s not immediately obvious at what you get when you purchase it, or what you need to run it. This post is an attempt to demystify the product.

To start, let’s break it down by its constituent components. Today Power BI consists of the following parts.

image

Unfortunately, this can be rather confusing from a product perspective. Looking first at the on-premises components, Power Query, Power View, and Power Map are all Excel plug ins. Excel is therefore a prerequisite for Power BI. All of these add ins also require (or in the case of Power Query, support) the embedded xVelocity data model, and therefore Power Pivot is a prerequisite. Power Pivot is included in Excel 2013 (Professional Plus), but it can also be downloaded for free for Excel 2010.

Also included in Excel 2013 is Power View, and, with Office 2013 SP1, Power Map. Power Query is downloaded separately, but is free. This is where much of the confusion arises. Due to the fact that these three add ins are included in the product definition of Power BI, it is often assumed that a Power BI license is required to use them. It is not. These products have a life of their own, and can be fully (or almost fully) used within Excel without any association with a Power BI license.

Power Query contains a few features that will only work with a Power BI tenant, mostly involved around the creation and maintenance of shared queries. Since this is part of the cloud service, this makes complete sense, but none of the other features of the product are in any way reduced in the absence of a license. Power View is enhanced through a Power BI license, but only because this makes Power View reports available within the mobile client(s). Indeed, Power Map has no use whatsoever of a Power BI license. Power Maps cannot be viewed at all within a browser – they are a client side feature only. In my opinion, they shouldn’t even be included under the Power BI umbrella, but that’s just my opinion.

Thus far, I have been talking about the modelling and visualization creation aspects of the tools, but what about pure consumption clients? The whole idea of power BI is that designers can create these models and users can interact with them. The workbooks containing these models are stored within Office 365, so do casual users need a license?

The answer is of course maybe. If these users are going to take advantage of any of the services specifically offered by Power BI, then the answer is yes. For example, any user can open a workbook in a browser in Office 365. However, if they want to interact with that model, by using a slicer, pivot table, etc, and that model is larger than 10 MB, then the answer is yes. Obviously, if the user wants to use the Power Q&A features, then the answer is also yes.

For the record, I don’t like this answer. To my mind, designers and content creators should require a license, but consumers should not. This would greatly encourage adoption of the product, so I do hope for some changes in this area.

So, precisely what do you get when you purchase a Power BI license? These are the things that you will absolutely need a Power BI license for.

  • Opening workbooks in a browser with models larger than 30 MB on Office 365
  • Interacting with (slicers, pivot tables, etc) workbooks in a browser with models larger that 10 MB on Office 365
  • Automatic refresh of on premises data
  • Sharing of Power Query queries
  • Refresh of Power Query queries
  • Power Q&A – Natural language queries
  • Power BI mobile application

and that’s it.

In fact, if you check out my earlier article “Whither Power Pivot for SharePoint”, you’ll see that many of the features of Power BI are already available in Power Pivot for SharePoint.

To my mind, the product “Power BI” should not include the Excel add ins, but only list them as a requirement, much like Excel itself is a requirement. This would help to reduce confusion. The next version of Power will support their inclusion. If you’re interested in this new version, you can sign up for the preview when it’s ready here. I’ll be writing more about that shortly.

Analyzing Data with Power BI from Beginning to End

I’ve been writing a fair bit lately about specific aspects of Power BI, and how they all fit together. I decided to put together a “real world” demonstration of as many aspects of the product as possible, and this article is the result. I don’t dive too deep on any of the specific areas, but hopefully this walkthrough will help give a sense of the power and simplicity of working with this (these?) tool (s).

In this example, we will explore some publicly available data published by the US National Oceanic and Atmospheric Administration (NOAA). To do so, we will first use Power Query to acquire and transform the data. Next, we will utilize Power Pivot to build our model, and then create a Power Pivot and a Power Map to visualize it. Finally, we will publish the data to a Power BI enabled site where it can be consumed by anyone with a browser or a mobile client.

Data Acquisition and Transformation with Power Query

The NOAA keeps a record from about 800 weather stations around the world. The data collected includes daily maximum and minimum temperatures, rainfall, and a host of other measures. All of this data is available through their web/ftp site here. Well it should be. As I write this, the site is offline, as are most other US government websites. Luckily, I have a local copy of all the files. Feel free to contact me for a copy if they’re not back soon. Update – For the interim, I’ve made the files available here.

The way that the data is provided is via a series of text files. Each text file in a folder represents a single weather station, and each row contains all the data for a given measure for a month. The data are not delimited, they are distinguished by their position in the row, with the data for each day in the month being presented as a separate measure. Normally, a tool like SSIS would be required to transform the data into something that could be analyzed, but as you’ll shortly see Power Query is up to the task.

Unfortunately, Power Query cannot work with FTP sources, but it’s a rather simple matter to copy the files from the FTP site to a local drive, and there are tools that can keep the content synchronized for refresh purposes. Note, if anyone from Microsoft is reading this, I believe that FTP and WebDAV folders would be a great supplement to the Folder data source, particularly when it comes to automatic refresh. Once we have the files locally, we can begin the import process. First, we open a new Excel workbook, select the Power Query tab, and then in the “From File” dropdown, we select the new feature “From Folder”.

image

Next, we select the folder that contains all of the data files. Power Query returns the data for the files in the folder.

image

In our case, the file name does correspond to the ID of the station, but the rest of the file metadata is not that useful to us. However, what is useful is the fact that using Power Query, we can drill into the files. Depending on the requirement, this can be done one of two ways. By clicking on the word “Binary” in the Content column, you will be able to drill into that file, and subsequent operations will be performed on that file. However, if you click on the little drilldown icon at the right of the Content column header, you will drill into all files in the folder. Subsequent operations will be performed on all files in the folder, and refresh operations will pick up any new files.

It should be noted that for this to work, all files in the folder must have the same schema. (In addition to data like this, this feature is particularly useful for parsing and iterating through server log files.) Since we want all station data, this is the operation that we will perform next. What we get back is a column of seemingly randomly structured data.

image

Luckily for us, there is a key to the data structure (found in the file readme.txt if you’re following along). The basic key pattern is:

------------------------------
Variable   Columns   Type
------------------------------
ID            1-11   Character
YEAR         12-15   Integer
MONTH        16-17   Integer
ELEMENT      18-21   Character
VALUE1       22-26   Integer
MFLAG1       27-27   Character
QFLAG1       28-28   Character
SFLAG1       29-29   Character
VALUE2       30-34   Integer
MFLAG2       35-35   Character
QFLAG2       36-36   Character
SFLAG2       37-37   Character
  .           .          .
  .           .          .
  .           .          .
VALUE31    262-266   Integer
MFLAG31    267-267   Character
QFLAG31    268-268   Character
SFLAG31    269-269   Character
------------------------------

In plain English, this means that the first 11 characters are the ID (the station ID actually, from which the country can be extracted), the next 4 characters are the year, the next 2 the month, the next 4 the “element” (the measure name). The next 4 “variables” are the value (5 characters), and 3 different flags (one character each, which we are not interested in) for the first day of the month. This pattern of 4 variables is then repeated 30 more times, once for each day of the month. A value of –9999 represents a null value.

We have some parsing to do.

Luckily, Power Query makes this (relatively) easy. Right click on the column header, select split column, and then select number of characters.

image

Next, for our first column, we set the number of characters to 11, and we tell it to split once, as far left as possible. We’ll use this option for our remaining splits.

image

Click on OK, and we will now have 2 columns. We then right click on the leftmost column, and rename it to ID.

image

Next, we right click on our rightmost column header and repeat the process (using 4 characters this time), this time renaming it to Year. We then do the same for the month (2 characters) and element (4 characters) columns. Once this is done, we can click the dropdown on the elements column. Since we are only interested in the maximum and minimum daily temperatures, we deselect all, and reselect these two elements. This will limit the amount of data that we will pull into our model.

image

At this point, it is also worth noting the steps window on the right hand side of the query editor. This shows all of the transformation steps that have occurred to get us where we “are”. Any of these steps can be deleted, which is very useful if a mistake is made, and all subsequent steps will be affected by it. Clicking on a previous step takes you back to that point in the development of the transformation as well.

Once we have filtered our data, we continue on with the parsing. We then split off the next 5 columns, and rename the column to “1”. The next three columns we don’t need, so we first split them off (3 characters) and then delete the column by right clicking and selecting “Remove”. We then repeat this, renaming the value column to 2. Finally, we continue repeating this process until the value for 31 is complete. This is a tremendously tedious process, but it works rather well. I’ll make another feature request here for anyone in a position to do something about it. In the Power Query editor, it would be nice to be able to select a number of steps and either copy them forward, or be able to repeat them x times. This would save a great deal of time in these situations.

Its also a good idea to give the query an understandable name, so we right click on “Query 1” and rename it to “Temperature Data” .

image

As mentioned above, the figure –9999 represents a null value. Well, we know this, but the model won’t. At this point we need to convert these to proper nulls. The way that this is done is to select all of the numbered column headers, right click on them and then choose “Replace Value”. We replace the value –9999 with the word null.

image

Now that we have our nulls, our data is good. However, from an analysis perspective, the data is hardly in the best shape. What we need is to is to have each day’s data represented by a row, not a column. Once again, Power Query provides a method for us to accomplish this through the unpivot function. Simply select all of the numbered columns, right click, and choose “Unpivot Columns”.

image

Instantly, the numbered rows are transformed into rows, with the column titles in a column named “Attribute”. Simply rename it to “Day”, and we’re ready to import our data. To do so, click “Done” and Power Query will begin the extract, transform, and load process. In this case, it takes quite some time to iterate through all of the files (about 15 minutes) so be patient.

While the data is loading, you will need to make some adjustments to your Query Settings (to the right of your Excel window). Specifically, you’ll want to turn off the “Load to worksheet” option. As I’ve discussed previously this is important to keep the size of the actual worksheet small. in this case, it’s also important as the sheer volume of data exceeds Excel’s ability to store it. When complete, your worksheet should appear as follows.

image

You can see that there’s a fair amount of data here, about 31.5 million rows. Did I mention that we’re working with Excel here? We’ve disallowed loading to the worksheet, so at this point, we need to load into the embedded data model. Do that by clicking on the link at the bottom of the Query Settings screen.

There is also some more supporting data that we want to bring into our model. The following files need to be imported:

  • ghcnd-stations.txt – contains metadata for all of the different weather stations
  • ghcnd-countries.txt – contains metadata for all participating countries
  • ghcnd-states – contains metadata for US states and Canadian provinces

The keys to these files are also in the readme.txt file, and we bring in their data using the same procedure that we used above for the actual weather data, with the exception that we use the “From file” data source. Finally, we give each query a logical name.

The beauty of all this is that anytime we wish, we can refresh the data. Any new or changed data will be picked up and added to the model following our transformation.

Model Editing In Power Pivot

Once this is complete, we are ready to edit our model. We do so by selecting the Power Pivot tab, and clicking “Manage” in the ribbon. First, we work on the actual temperature data tab. The temperature values reported are in tenths of a degree Celsius. This doesn’t mean much to most people, we should at least report the data in degrees C. To do this, Click on Add Column, and add the formula [Value]/10. Rename the column to Temperature.

We always want to be working with temperature averages. To do so, add a calculated measure named Average Temperature (C) (in the calculation area at the bottom). Its formula should be Average Temperature (C):=AVERAGE([Temperature]). We no longer need the value column or the temperature column for analysis, so we need to hide them from client tools. To do so, select both the Value header and the Temperature header, right click and select “Hide from Client Tools”. Finally, we set the formatting of our calculated measure to be Decimal Number.

image

Next, we want to establish a relationship between the data and the country table. We don’t have a column for country, but according to the data key, the first two characters of the country code correspond to the country code. We can therefore add another calculated column with the formula LEFT([ID],2). We then rename the column to Country Code. We only need this column for the relationship, so we can hide this from client tools as well.

image

To establish the relationship, change to the diagram view in the ribbon. Drag the Country Code field from the Temperature Data table onto the CountryCode field from the Countries table. While we’re in this view, we can drag the ID field from Temperature Data onto the ID field of the stations data. this will allow us to take advantage of the location of each of the stations. Finally, drag the State column from the Stations table onto the State Code column of the States table. At this point, we can go ahead and hide any unnecessary columns from client tools.

image

Our model is ready. At this point, we can move forward with analysis.

Create a Simple Power View Report

We want to create a simple Power View report to display temperature trends over time, Move back to the Excel Workbook and click the Insert tab in the ribbon. Then select the Power View button – this opens up a new Power View design surface. We won’t be needing the Filters window, so we can close that.

Right away, if you look at the Power View Fields window, you will notice that every table is duplicated – one with a “filled in” title bar, and one without. This window is displaying both defined Power Query queries, and the table contained within the model (Power Pivot). The “filled in” one is the one that we want to be working with. Using the other will cause duplicate tables to be created in the model – something that we obviously don’t want. This is due to an issue with the preview for Power Query and should be fixed in the near future.

We’ll be working with “Average Temperatures”  as a measure so from the start, we’ll pick it and “Year” from the Temperature Data table. Year is a number, so Power View will want to total it by default. We don’t want this, so in the Fields selector, select the dropdown for year and select “Do Not Summarize”.

image

Next, resize the table on the design surface so that it fills the width of the report. Then, with the table selected, from the Design tab in the ribbon, select “Other Chart” and “Line”

image

What we get is a graph of overall average of all recorded temperatures starting around 1840. It might be more interesting to see the maximum and minimum temperatures, so to do that, just select the “Element” field, and Power View will add it to the Legend, showing both values, maximum temperature in blue, minimum in red. It would be nice if we could switch those colours, but unfortunately, all you can do at the moment is to select a different theme, none of which start with red.

Next, it would be nice to be able to see this data by individual reporting country. We can do this by first displaying a chart showing the overall average temperature per country. We click on the design surface outside of the temperature chart, and select the ”Country” field from the Country table, and the “Average Temperature” field from the Temperature Data table. Again,we resize the table to the width of the report, and then select “Column Chart”, “Clustered Column from the Design tab. We then get a scrollable list of countries with their overall average temperatures.

We could sort this by temperature, but since it will service as a picker, alphabetical makes more sense. Being from Canada, I’m naturally interested in our weather patterns, so I click on the Canada data bar (a negative value…shiver). Immediately the report above is filtered by Canadian data.

image

It seems that this whole climate change thing isn’t working out as well as I’d hoped….

Finally, we want to give the report a good title, and clean up some formatting. After we do this, we can move on to the next step, creating a Power Map report.

Create a Simple Power Map Report

Creating a Power Map is similar to a Power View. From the Insert tab on the ribbon, select Map, and then “Launch Power Map” from the dropdown.

image

At this point a new window will be launched where you can build your Power Map report. What we want to do is to show the relative temperatures for each country using a colour scale from low to high, and to show the average maximums and minimums for each recording station. To start with, for our first layer, we are going to use Country from the Countries table for our Geography data, so we select it. Every country with data will get a small point on the map.

image

Clicking next, we then get to select our data for the first layer. The data that we want to map is Average Temperature, So we select it. The resulting column visualization makes the earth look like a pincushion. What we want is to fill in the country boundaries with relative colour values. To do this, we select the  region type for the visualization. This is better, but it fills the regions with shades of blue from light blue to dark blue for the warmest averages. It’s far more typical for red to represent heat, and we can do this by first selecting the settings icon.

image

We can then select a colour palette for our data, and red is a more logical color. Now you can spin the globe around to check out your next vacation hotspot. However, we want to get more specific than country, we also want to plot details for each reporting weather station. First, we need to add a layer. To do so, click on the Layers icon and select the add layer icon.

image

We then repeat the same steps that we did for the first layer, only instead of selecting country, we select Latitude and Longitude for the geography fields. After clicking Next, we leave the report as a Column type, and select Average Temperature as the height. Next, we select Element as the category option. We want the report to use clustered columns, so we now select that option to the right of the category title.

image

Finally, we want TMAX to be red, and TMIN to be blue, so we again click on the settings icon for this layer, and set the colours appropriately.

image

From this, it’s easy to see why Canada’s numbers are so low – there are a disproportionate number of stations in the far north! We’re now ready to share our report with Power BI.

Sharing Reports With BI Sites

Sharing your report is almost as simple as saving to a BI Sites enabled Office 365 library. We can either upload it directly, or save it to a SkyDrive Pro enabled folder, which is what I do. At this point, the file is 100 MB. On a side note…. this Excel file, that contains over 30 million rows of data is only 100 MB in size. That’s impressive. After the file appears in the O365 library, we can attempt to open it. However if we do, we’ll get a message saying that the file is to big to open in the browser.

That would be true in a regular Office 365 library, but it isn’t too large for Power BI. The problem is that first, the file needs to be enabled. More information of file size limits and enablement can be found here. To enable the workbook for Power BI, first navigate the the Power BI app, locate your file (it will be showing a small caution symbol), click the ellipsis, and select “Enable”.

Enabling the file will perform a few operations:

  • The model will be extracted from the file and stored elsewhere, drastically reducing the file size
  • A thumbnail is extracted from the file and used in the tile

It should be noted that if the file is uploaded via the Power BI app, the saving and enablement steps will be performed automatically.

Clicking on the file, either here, of within its actual library will now open the file, and we can also interact with the model (in this case by clicking on the data for Bermuda).

image

At the top, you’ll notice a warning message. It is letting you know that although this workbook can be viewed in a browser, and interacted with, it cannot be edited in a browser. This is because it is beyond the maximum edit size limit of 30 MB. You may also notice that we are using an HTML5 version of our Power View report. By default, Silverlight is used, but if you click on the new icon on the bottom left of the report, HTML5 will be used. It will also be used if you have a system that does not support Silverlight. This is true anywhere in Office 365, not just in Power BI enabled sites.

Finally, you may also notice that our Power Map report is missing from the equation. This is because for the moment, Power Map is not supported in the browser. A workaround for the time being is to export the Power Map report to an MP4 video file, and store it separately. MP4 files can be rendered in a browser, and on mobile devices.

Viewing In the Power BI Mobile Apps

Speaking of mobile devices the Power BI app for Windows 8 is currently available in the Windows store, with a version for iOS coming shortly. Once installed and run you’ll be shows a couple of sample reports from a demo site. To navigate to your own site, right click anywhere, or swipe up from the bottom, and select “Browse”.

image

On the next screen, we can select from our registered sites. If our site has not yet been registered, we can do so by again right-clicking, or swiping up. Then, we can enter the URL of your site. The URL to be entered here is that of the parent site of the library in which your report is stored. Once we load from this site, it will be automatically registered.

The next few screens allow us to navigate down to the file in question. Once the file appears in the list, selecting it (not opening it) gives us the option to save it as a favourite. If we mark a report as a favourite, it will appear on the home screen whenever the app is started.

image

Finally, opening it will open the first names object in the file. You can navigate to other named objects by right clicking, or swiping down from the top.

image

As you will notice, all names objects in the workbook appear by default. We can control what appears through the Excel publishing options, as outlined here.

As you can see, we have just taken a very large amount of publicly available data, transformed it and analyzed it within Excel, and published it out for everyone in the organization no matter what device they are using. I haven’t even gotten into the Q&A piece – primarily because it’s not available yet for custom data sets. I plan on updating this article when it is.

Power BI is powerful indeed.

Power BI – What Is It, And Why Does It Matter?

Power BI for Office365 was first shown to the public last week at the Microsoft Worldwide Partner Conference. Power BI is in my opinion, the most significance advance that Microsoft has made in the area of Business Intelligence in the cloud, and it significantly advances their offering in personal Business Intelligence. I’ve been evangelizing Microsoft’s recent personal and team BI features for some time now, and this announcement builds on that momentum. The demonstration raised a lot of eyebrows, and showed off some of the great strides that Microsoft has made in this space. One little gem in there that was easy to miss was the fact that we got our first glimpse at the HTML 5 version of PowerView – at least the renderer. It also raised a number of questions as to how it works and what will be necessary to get it working optimally. I had the opportunity to speak with a project manager and managed to answer a few of these questions. This post is attempt to explain what this product is all about.

Overview

Firstly, if you haven’t already seen it, you need to see the demonstration that was done by Amir Netz during the Day 1 keynote. It’s really one of the most compelling demos that I’ve ever seen. I include it below.

Amir Netz Announces Power BI at WPC 2013

 

Microsoft has provided a web page to describe the product and let you sign up for a preview. It also contains links to download some of the constituent parts, but it doesn’t necessarily do a great job of explaining exactly what you’re signing up for, or what these downloads are for.

To start with, Power BI requires Excel 2013. This is because all of the constituent components rely on the xVelocity (PowerPivot) engine that is included with Excel 2013. Make no mistake, as far as Microsoft is concerned, the future of multi-dimensional analysis is with the xVelocity in-memory engine. This engine isn’t new – it’s what powers Power Pivot (note the space between the words Power and Pivot… that’s also new).

The “Power” branding I think is encouraging as well. The business intelligence offerings from marketing have been confusing to say the least. Check out this handy “quick reference guide” if you don’t quite understand what I mean. There’s a very large assortment of tools that come from two different product groups (even after the reorg) and it can be difficult and confusing to navigate. Marketing hasn’t made this any easier in the past, but the consistent “Power” prefix I think goes some way to remedying this.

Power BI itself is the culmination of several different projects – Data Explorer, GeoFlow, BI Sites, and work in the mobility space. The first two of these have been in the public for a little while now, and were given their release names this week – Power Query, and Power Maps respectively. In addition, Power Query reached its General Availability milestone, making it acceptable for production use. The BI Sites project has been a very well-kept secret, and last week was the first time that it has been seen in public. Finally apps were shown for both Windows 8 and iOS that should go a long way to address the deficiencies in the mobile space. Unfortunately, nothing was said about Android or Windows Phone, but I have to think that they’re not far off.

Given that there are several components to Power BI, I think that it’s worth outlining each one and how it will be used. To start with, Power Query can be used to discover, acquire and transform source data and load it into an xVelocity based data model in Excel. It can also register the data connections through a “data steward” service that will reside in Office 365 – this will allow for tenant wide discoverability of the data. Power Pivot will then be used as necessary to modify the model, create calculated measures, format columns, etc. Data analysis can then be performed in Excel, using pivot tables, charts and/or Power View. In addition Power Map can be used to visualize geospatial data on a rotatable, zoomable 3D map. Once ready, the workbook will be published to an Office 365 document library, where a BI site will find it. BI sites can then be used to perform natural (English) language queries on the data, add to the mobile applications, and to schedule data refreshes whether the data is public or private. Finally, the mobile apps can be used to consume the reports from the BI Site.

Power Query

Power Query went into general release last week along with the Power BI announcement, and you can download it here. It has been available in preview form as project “Data Explorer”. It’s an add in to Excel 2010 or 2013 that gives it Extract, Transform and Load (ETL) capabilities. I like to call it the personal version of SQL Server Integration Services, in the same way that Power Pivot is the personal version of Analysis Services, and Power View (to a lesser extent) is the personal version of Reporting Services.

As you might expect, it can pull data from a wide variety of sources. Web pages can be parsed, and tables scraped to provide tabular data. The usual suspects are all supported (SQL Server, Oracle DB2, etc.), as well as any OData feed, Azure Data Market, SharePoint lists, Hadoop and HDInsight, Active Directory, and Facebook. One of the more interesting data sources is a folder (not a file). If you point Power Query at a folder, it will merge the data from all of the files in that folder into a single result. Refreshes will pick up any new files as new data. I can see some real uses here for log files. Yes, the files do all need to be the same schema.

The killer feature for discovery here is the online search. In preview, online search gave you access to all of the Wikipedia data that has been expressed in tables. Now however, you not only have access to more public data sources, but you can publish your own data sources (on premises or otherwise) to a catalogue that resides in Office 365. Below is a screen shot from Amir’s demo.

image

Clicking on Online Search in the ribbon brings up the window on the right, where you can enter a search term, in this case “Microsoft Azure Datacenters”. You are then presented with a list of candidates, both public and private. Hovering over one of them will give you a preview of the data to be returned, and when satisfied, you can select Add to Worksheet, or if you would like to transform the data click Filter and shape, where you can modify data types, flatten relational data, merge or split fields, etc. Despite the “Add to  Worksheet” button name, the data can be loaded into the worksheet itself, the back end data model, or both. This distinction is  very important when working with very large data sets ( i.e. > 1million rows).

You can also see from the result set that the data is coming not only from Wikipedia sources, but from the Azure Data Market, and from On premises data sources. What the demo didn’t make clear was how those data sources get into the catalogue. Power Query is the answer here as well. When BI Sites are implemented in Office 365, Power Query will expose some additional UI elements. that will let you publish the query and transform that you have built into the enterprise catalogue. In fact, when you look closely at Amir’s demo screen, you will see these elements.

image

These don’t show up on a standard Power Query install

image

The transformation capabilities of Power Query are really quite impressive, but a detailed description is beyond the scope of this post. They can be seen in the version available for download. As an aside, models built with Power Pivot can be moved directly into Analysis Services. Given the relationship that I mentioned above between Power Query and Integration Services, I wonder if it’s on the roadmap to allow Power Queries to be brought in to SSIS. On the surface, it would seem to make sense, at least from where I’m standing.

Power Map

Power Map is the other component that’s available today, although as of this writing, it’s still in preview form. You can download it from here. Unlike Power Query, Power Map requires Excel 2013 – it won’t run on 2010 at all. Power Map has been available since last fall in the form of a preview as code name “Geo Flow”. It didn’t make it to the main stage in the announcement, but it’s a pretty impressive piece of technology.

What Power Map does is allow you to take a data model built with Excel and/or Power Pivot, and plot the data on the surface of a 3D map. The map can then be rotated and zoomed, and animations created around different views or time dimensions of the data. It wasn’t shown at the announcement last week, but below is a Power Map visualization of Hurricane Sandy data.

SandyImage

This was put together simply by plotting the location of the measurements, the pressure on one layer as a heat map, and the wind speed on another layer as columns. Storm category was used as the category to color the columns appropriately.

There are a number of limitations to Power Map currently, the big one is that it doesn’t yet work at all in SharePoint – it’s Excel only (and not Excel Web App). Given that fact, it’s really just a sort of one off visualization/presentation tool, no matter how cool it is. However, we did just see our first glimpse of an HTML 5 Power View viewer… could an HTML 5 Power Map viewer be far behind?

BI Sites

To my mind, BI Sites is the most exciting part of Power BI. Here is where we find the most significant advances yet in both Microsoft’s mobile and cloud BI strategy. Until now, the cloud strategy was hampered completely by the inability to keep data refreshed automatically, and mobile devices could settle only for mobile web approaches. With BI sites, we see not only a great first step into these areas, but a solid infrastructure to expand upon. Unfortunately, at the point of this writing, it’s not available. You can however sign up for the preview program here.

BI Sites is a SharePoint App, pure and simple. This fact was not even mentioned during the announcement, but I feel that it’s vitally important to understanding how it works, and what its limitations are. Once created, the app reads Excel content from your Office 365 tenant, and provides a great number of features. Let’s quickly run through a few of these features.

Data Refresh

The Achilles heel of using Office 365, at least for me, has been the inability to schedule data refreshes. The latest 2013 wave brought support for interacting with PowerPivot enabled workbooks, which was great, but data refreshes had to be done manually, and the results re-uploaded. On premises, PowerPivot for SharePoint can be used to perform this function, but PowerPivot for SharePoint is not available in Office 365. Since the data can exist both on premises, or in the public, how does a cloud based service refresh data from inside your firewall?

The answer is that you will install a service, a sort of “data gateway” somewhere in your environment. This gateway will manage the refresh of all the on premises data sources and push updates out to Office 365, NOT the other way around. This will not require any firewall configuration, and will not depend on App Fabric, or any other relatively complex infrastructure.

Once configured, the automatic refresh each individual workbook can be configured separately, as can be seen below.

image

 

Report Gallery

One of the things that you notice when you enable PowerPivot for SharePoint is a new library template, the “PowerPivot Gallery”. When you use this template, you will notice that you get a completely different UI for the library that is Silverlight based, and gives you previews, and access to data refresh capabilities. This is again absent in Office 365 due to the lack of PowerPivot for SharePoint. However BI Sites, gives you the same functionality, in a (presumably) HTML 5 based experience.

image

HTML 5

Power View is a self-service analytical reporting tool that was first introduced in SQL Server 2012 Reporting Services, and subsequently added to Excel 2013. It allows users to quickly analyze data from an embedded model, and is supported within SharePoint. It’s a great tool to get answers very quickly, but one of the concerns with it is the fact that is built with Silverlight. This limits is use, particularly in the mobile space, as Silverlight is not supported on iOS, Anroid, or (surprisingly) Windows Phone.

BI Sites show us the first version of a PowerView renderer based on HTML 5, which should work on all devices. When the workbooks are opened through the report gallery, you will see the Power View reports, but they are rendered in HTML 5. I suspect that we’ll continue to see Silverlight being used in the designer in Excel, but the rendering through BI Sites is done with HTML5, which opens up the world of mobile.

Data Catalogue

I hesitate to include this in this section, as it could just as easily be included in the Power Query section, but since it required the BI Sites capability in Office 365, I include it here. The Data Catalogue is that service that allows you to register data sources from Power Query. The catalogue lives in Office 365, and is managed from there.  Data sources in the catalogue can be discovered by Power Query users through the “Online Search” button.

The catalogue also supports features that help support the natural language features of BI Sites. These features are unclear to me at this point, but I will be sure to expand upon them once I get my hands on it.

Finally, to be sure, “Data Catalogue” is what I’m calling it in the absence of any official nomenclature from Microsoft.

Natural (English) Language Query

Natural language query was a very well kept secret. I had no idea that it was being developed until I saw it on stage. Natural language query allows a user to interrogate a data source almost as if they were performing a full text search. Based on their query, Power BI selects an appropriate visualization to display the results.

image

As you can see above, 1 shows that we’re interrogating a data model contained in the “Sales Reporting and …” workbook. 2 shows the query entered by the user in relatively plain English (no Select statements!), and 3 shows how Power BI has interpreted the natural language query. Finally, it determined that a tabular display was the best visualization to use for this result. However, many are available, and it uses the visualizations available to HTML 5 PowerView to display them. Number 4 shows other visualizations that may be relevant to the result set, and all you need to do is to select them to see them.

A few more results can be seen below.

image

image

image

New Visualizations

Any user of Power View will be familiar with the visualizations shown above. However, the one that brought the biggest applause at the announcement was the “king of the hill” visualization. This shows you a “winner” and “contenders” over a period of time, based upon a defined criteria. When applied to pop music over time, it looks something like this.

image

To get the full effect of this, I highly recommend watching the video starting at about 9 minutes in. Of course, this is a highly specific case, and I have to think that more interesting visualizations are on the way.

Limitations

As with most products, Microsoft giveth and Microsoft taketh away. Without having my hands on the full product, I have no hope of coming up with anything like a comprehensive list of limitations, but there are two that jump to mind immediately.

Power BI is limited to Office 365 only. That’s right, there will be no on premises version of this, at least not at launch. From a technical standpoint, I can’t see any reason why this must be, apart from either resourcing limitations within Microsoft, or strategic direction (or a combination of both). It remains to be seen whether Power BI is a “cloud first” product, or a “cloud only” product. Time will tell. In the meantime, those organizations that are still leery of cloud computing may miss out, but there is still a lot of goodness in the on premises offerings.

BI Sites is a SharePoint App, not a container. The workbooks themselves are still stored within document libraries in SharePoint. the default file size limit for Excel Services is 10 MB, and for SharePoint itself is 50 MB. It’s unclear to me if BI Sites relies on Excel Services (I assume that it does), and if so, your data models have a hard constraint of 10 MB. Even if Excel Services is not the bottleneck, they won’t be any larger than 50 MB. That is simply too small for some models, and will leave them out in the cold. I was told at WPC by a Product Manager that this is changing, but specifics are currently unavailable.

Mobile Apps

The final component (that is still unavailable) of Power BI is a collection of Mobile BI apps. Reports are selected in various BI sites and flagged for mobile use. Once flagged, they appear in the mobile apps and can be interacted with directly.

8358.Mobile BI.png-550x0

As an aside, I borrowed the above image from the Office 365 technology blog. Anyone that has attended my BI presentations in the past year will be familiar with the first data set in the app, a topic near and dear to my heart.

Apps were initially announced for Windows 8 and for iOS. No mention was made of Android or Windows Phone (?!?!), but I have to think that that isn’t far behind. Blackberry? I have no idea, but I wouldn’t hold my breath.

To finalize, I’m very excited about several of the new capabilities that Power BI brings, and the doors that it now opens. It’s another big step in moving Business Intelligence from the realm of the data gurus, and into the hands of those that can actually benefit from it. Once it is available to play with it, I intend to focus on it more thoroughly.