Skip to content

Tag: Power BI

Power BI Data Management Gateway 1.2 Changes the Game

Last week, a new version of the Power BI Data Management Gateway was released. If you’re unfamiliar with it, it is the Power BI component that allows for workbooks stored in the cloud to be refreshed on  a regular basis with data that exists on-premises, or outside of the hosting center.

I’ve been using the gateway since its initial availability in preview form, and in my opinion, this is the most significant functionality change yet. Until this release there were a grand total of three possible data source types that could be refreshed. With this release, the total increases to 18 by my count (you could argue 22, but that’s plenty).

With past versions, I would write up a quick post on how it is configured, but that has been done, along with the complete list of supported data sources and a helpful video on this blog post by the Power BI team. In addition, an very comprehensive (although amazingly already in need of update) white paper on hybrid data scenarios has just been published by Microsoft here.

The big change here is that this multitude of data sources is supported for data retrieved by Power Query, and NOT by Power Pivot natively. The catch is that they’re only supported for Power Query queries. There is absolutely nothing wrong with this, but it does require us to change our approach a bit to using the data management gateway.

As I first mentioned in a post almost a year ago on Using the Data Management Gateway, and in a number of posts since, data connection strings needed to line up with Power Pivot connections. At the time, the only supported sources were SQL Server and Oracle (for the gateway) and Power Query wasn’t supported at all. Version 1.1 of the Gateway brought Power Query support, but only for those 2 supported data sources. With this release, the Power Query support includes not only all of the new data sources, but also the three original Power Pivot data sources (note: Power Pivot data connections can be found in the Power Pivot add-in UI, while Power Query connections are available on the data tab in Excel).

image

image

As of this release of the Data Management Gateway, there is almost no reason to use native Power Pivot connections any longer. My recommendation is therefore that unless there is a good reason for not doing so, you should try to use Power Query for all data acquisition tasks. It is quite clearly the way forward, and will only gain in supported capabilities. My suspicion is that Power Pivot connections will be retained for backward compatibility reasons only.

With that said, there are a couple of good reasons for using Power Pivot connections directly. One of these reasons is if your data source is online, whether it is SQL Azure, SharePoint Online, or Project Server online. With these data sources, a Data Management Gateway is not required for refresh to work from an embedded Power Pivot connection. However, if Power Query is used to access these sources, it is.

What this means is that for Power Pivot connections to these sources, a refresh allows the Power BI service in the cloud to directly access these data sources in the cloud. However, because ALL Power Query connections require the Data Management Gateway for refresh operations, a Power Query refresh operation will require all of the data to be first downloaded to the on-premises gateway, and then sent back up to the Power BI service in the cloud. While functional, this is hardly the most efficient approach.

Apart from this one small caveat, this version of the Data Management Gateway spells the way forward. Additional data source support should come fast a furious, and the Power Query focus means that we can start to rely its powerful transformational capabilities without having to sacrifice refreshability (if that’s even a word….).

7 Comments

Changes to Data Loading Features in Power Query – May 2014

The Power BI team continues to deliver new features at a rapid pace. The May 2014 release of Power Query is no exception. There are quite a few new features packed into this release, which you can read about at your leisure but I’m particularly interested in the ones pertaining to data loading, as I’ve discussed several of the limitation in this area in the past.

This release is major indeed. There are three significant changes to the data loading features in this build.

Configurable defaults for data loading

I’ve just posted an article on how to do this and why. To put it simply, the default data load behaviour is to load data into a worksheet in most cases. This leads to larger than necessary workbooks, to the point where they may not work properly with Power BI. Now you can change this default behaviour, which will be welcome for anyone doing serious data modeling.

Worksheet Size Warning

If you do decide that you want to load data to the worksheet, or you’re simply unaware of the issues, you will be prompted to consider loading to the data model once your data hits 10 MB, which is the maximum non-model workbook size in Power BI. Prior to this update, the user wouldn’t know that there was a problem until after they tried to use the workbook in Office 365.

Data Model Preservation

Prior to this update, it wasn’t possible to modify the query without losing all changes to the data model, or formatting in worksheet tables. This release of Power Query remedies this situation. You can now go back and make changes to your query without having to recreate the model.

Power BI is being built in a surprisingly collaborative way, with a large amount of input from the community. I’ve never seen this done to this extent at Microsoft, and it’s very good to see. I know that all of these features have been asked for and discussed by the community in the last few months, and here they are. Kudos to everyone involved, and keep it up. This product keeps getting better and better.

1 Comment

Set Your Data Load Defaults in Power Query

When using Power Query to load data from a data source, the user has three choices as to where that data can be stored. It can go into a worksheet in the workbook, into the data model, or into both. The default behaviour (and therefore the one that will be used most commonly) is to load into a worksheet when one table is being imported, or into the data model when multiple tables are selected.

The reason for the load to worksheet default is understandable, as casual users will expect to see the data immediately after it is imported – they won’t want to open the PowerPivot model editor (which is itself turned off by default in Excel 2013). However, loading to the worksheet introduces quite a few limitations, as I’ve described (OK, whined about) mostly here, but also here and here. These limitations mostly revolve around the million row limit in Excel, the fact that data is stored in the model much more efficiently, and the file size limits when working with Power BI.

The great news is, that as of the May 2014 build of Power Query . These data loading defaults are now user configurable. If you’re taking the time to read this post, chances are that you should set them.

The defaults are Power Query wide, and are therefore located in the Power Query options. Once Excel is open,  go to the Power Query tab and click the Options button.

image

Now, in addition to joining the Customer Experience Program, you can set your data load settings. I always import only to the model, so I check the custom option, and Load to Data Model.

image

I can’t be completely sure, but in my limited testing, it appears that this option setting does not sync between instances of Excel on different machines the way that other (non add in) settings do. You will therefore need to set this for every Excel installation that you have. Now if there was only a Group Policy setting that could be pushed out….

Kudos to the Power Query team for implementing this – it’s a welcome addition to the product.

1 Comment

How to Refresh Data Models in Office 365 from On Premises Using Nintex Workflow – A Hybrid Approach

One of the viable use cases for Power BI is to provide reporting to a mobile workforce, even when the organization has invested in BI on premises. In this scenario, there may be a SharePoint farm using PowerPivot for SharePoint with multiple workbooks connected to a myriad of data sources, including SQL Server Analysis Services cubes and models.

The Problem

In this scenario, the workbooks required by the mobile users can be copied up to the Office 365 tenant, and consumed from there. However, how is the data kept current? One way to do so is to set up the Power BI Data Management Gateway (DMG) to do this. The DMG is a great solution, but is relatively new, and has a few inherent limitations at the moment. Chief among them is that data can only be refreshed from SQL Server or Oracle data sources. For the moment at least, Analysis Services is left out in the cold. However, PowerPivot for SharePoint doesn’t share these limitations, and has no problem refreshing data from a wide variety of sources. Unfortunately, if you’re using Power BI, and need to refresh from an unsupported data source, you’re out of luck for the moment, so this is a problem.

Keeping a cloud copy of a local workbook carries the inherent problems of managing two different copies of the same workbooks. Every modification must be performed twice and there is always the chance that something will get missed. Ideally, for this scenario, we should have a publishing mechanism. We can use Nintex Workflow (on premises) to provide this publishing mechanism for us. Whenever the model is refreshed, the workflow will fire, and copy the file up to Office 365 to update the model there.

In order to implement this solution, you’ll need a copy of Nintex Workflow. The low end (Team) edition is fine, but what we’re after is the “Copy document to Office 365” action, and that is available through the Nintex Live action set.

The Solution

To start, we’ll work with an Excel workbook that is using an embedded model created from the standard Contoso Data Warehouse. In this example, we’ve built a simple pivot table to interact with the model. The model has been published to a PowerPivot gallery, and the refresh schedule has been set. Once we’ve tested the refresh to ensure that it’s working, we can create the workflow. To do so, we go to the workflow options section in the library ribbon and create a new Nintex workflow.

image

We then select the Blank template. Once presented with the design canvas, we select Workflow Settings, give it a good name, and set the workflow to run on create and on edit.

image

Once the settings have been saved, we can start to build the actual workflow. The first action will be a “Pause for” action. Find it in the Workflow Actions toolbar, and drag it onto the design surface. Select configure from its drop down menu, and set it to pause for a period of time, in this case, 5 minutes.

image

The amount of time that we need to pause for depends on how long it takes our model to refresh. When the refresh operation begins, it “edits” the document, which kicks off the workflow. We don’t want to upload the workbook until the refresh operation is complete, so we need to pause it. In this case, we are working with a data model that is about 100 MB and comprised of about 3 million rows of data. In this environment, it requires about 2.5 minutes to refresh, so a 5 minute pause is sufficient.

Next, we need to use the “Office 365 upload file” action. This action is one of the Nintex Live actions that can be added at no extra charge to the Nintex environment. If it hasn’t already been added to the toolbar, we need to do so. Assuming that Nintex Live has been enabled for the farm, we browse the available actions through the Catalog which is available through the designer ribbon.

image

Once in the catalog, we can browse the available actions, and click the Add button to add them to the toolbox. The action that we need to add is named “Office 365 upload file”. Once at least one action is added, an new section will appear in the Workflow Actions toolbar named “Nintex Live”. We then open that section and drag the Office 365 file upload action onto the design surface after the pause action, then configure it.

image

Walking through the options:

  • File to upload: We want the current file to be uploaded to Office 365, so we select current item
  • Destination site URL: This is in fact the URL of the site itself. Everything up to, but not including the document library
  • Folder path: This is the name of the destination document library, and if appropriate, any subfolders
  • File name: We can create or derive a new name for the file, but in this case, we will be using the same name as the source file
  • Overwrite existing file: We select this option, as we will be updating existing content.
  • Fields: We can add additional metadata values if we wish, but in this case, we leave it empty
  • SharePoint Online URL: this is the URL to the root of the Office 365 tenant for the destination
  • Username and password are for a proxy account that will be used to upload the file. Secure workflow constants can be used.

Once we configure the action, we save it, our workflow should appear as follows:

image

and we are ready to publish the workflow. To do so, select the Publish button, then close the workflow designer.

We are now ready to test our workflow. We can force the workflow to run on demand for our workbook, but a better test is to let the refresh trigger it. The most frequently that PowerPivot for SharePoint refreshes can be scheduled to run is once per day. but there is a way to force the refresh to happen on demand.

To force a refresh, we need to edit the refresh schedule for the workbook. Open up the refresh history screen by selecting Manage Data Refresh for the workbook. 

image

From the Refresh history screen, select the Configure Schedule link to edit the schedule. This will allow us to edit the existing schedule, but more importantly, it will also let us force a refresh by selecting the “Also refresh as soon as possible” option.

image

If we are using a specific account as a credential, we will also need to re-enter that. Once complete, selecting OK will queue the refresh job for execution. The refresh job will happen the next time that the “PowerPivot Data Refresh Timer Job” executes, and it runs every 5 minutes.

If you’re really impatient, and you have access to central administration, you can edit this timer job and force it to run immediately on a case by case basis.

Moving back to the refresh history screen we will notice that the refresh is currently running.image

And in the workflow history screen for the workbook, we can see that the workflow is also running.

image

Once both operations have completed, we can navigate to our Office 365 site, and we will find the file in the destination folder. If our model is greater than 10 MB in size, we will require Power BI in order to interact with it in the browser, and we will need to enable it for use with Power BI through the Power BI application. If it has already been enabled, then no additional actions are required. If the workbook is less than 10MB, it will work directly in any Office 365 library that is enabled for Excel Services.

More Possibilities

If you’ve worked with the Power BI Data Management Gateway at all, you’ll know that data refreshes can be scheduled at most daily. This limitation is also shared by PowerPivot for SharePoint, but it is possible to change its behaviour. Using the techniques outlined in this article by Ian Smith, and this article by Tim Laqua, refreshes can be made as granular as 5 minutes. These articles are written for SharePoint 2010, but the principles work with 2013 (hint, the stored procedure that forces a refresh is [DataRefresh].[AddRefreshSchedule]). Be warned, this technique is a hack, and not supported by Microsoft. However, it can be a way to up the frequency of refreshes on premises, and with the above technique, in Office 365 too.

Another way to refresh the data more often is by using SQL Server Integration Services (SSIS). SSIS can update an embedded data model in an Excel workbook through a script action (thanks to Gobán Saor).  SSIS packages can be scheduled with SQL agent jobs, which means that they can be run as frequently as every minute. Another advantage of this approach is that with Excel 2013, this method should be fully supported.

Finally…

By relying on our workflow to publish the refreshed model, we work around the current limitations of the Data Management Gateway. We do incur the cost of a Nintex license, but a few days of development would easily exceed that cost. In fact, if our data model doesn’t exceed 10 MB, we don’t even need Power BI for this solution to work.

1 Comment