Using the SSIS OData Source Connector With SharePoint Online Authentication

Last week, Microsoft released the OData Source for Microsoft SQL Server 2012 . What is it? It allows SQL Server Integration Services (SSIS) to use an OData feed as a first class citizen data source in the same manner as SQL Server, Oracle, etc. Until now it was necessary to code OData connections using the script object.

This matters to those of us in the SharePoint world because any SharePoint list data can be expressed as OData.

I’ve written before about how SharePoint data can be extracted into a data warehouse using SSIS and the SharePoint List Source and Destination Adapters, available from CodePlex. These adapters plug in to SSIS and wrapper the SharePoint SOAP web services, and therefore do not need to be installed on a SharePoint server. We have used them for years, and they work very well, however, they are a CodePlex project, and therefore not fully supported.

These CodePlex adapters have more recently been bumping into another limitation. While SOAP web services are supported in Office 365, the adapters don’t support the Office 365 authentication mechanism, which effectively renders them useless. The OData services require the same authentication, but the the new OData Source supports it. It’s also an official Microsoft product, and is fully supported.

There is, however a trick to getting it working. Once you install the OData Source, you open up SQL Server Data Tools, open an SSIS project, and add or edit a data flow task. In the SSIS Toolbox,  You should see the OData Source.

image

Drag the tool on to the design surface, and double click to configure it. You’ll first need to configure an OData Connection Manager, and you’ll do that by clicking the New button.

image

Give the connection a name. The connection will be common to all lists and libraries within a site, so something based on the name of the site is likely appropriate. The Service document location is the OData endpoint. It takes the form of the URL of the site, along with the suffix /_vti_bin/listdata.svc. If the connection is on premises, you can use Windows Authentication, but if it is Office 365, you must use a stored name and password.

image

If you are using Office 365, and you click Test Connection at this point, you’ll receive an error “Test connection failed –> The remote server returned an error: (400) Bad Request.”

image

This is due to Office 365’s “unique” authentication mechanism. In order to authenticate to Office 365, you must first select the “All” button in the toolbar, and set the value of “Microsoft Online Services Authentication” to true.

image

This option may not be available to you. If the Online Services Authentication option is disabled, or greyed out, as it was for me when I first tried to use it, it’s because a prerequisite is missing. In order to authenticate to Office 365, the machine must have the SharePoint Server 2013 Client Components SDK installed on it.

Once the client components are in place, and the option is selected, the data source should be able to connect to the source, and the connection manager can be closed. Lists are exposed as Collections, so if you want to work with list data, you can then select the list from the list of Collections.

image

At this point, the data source will act like any other SSIS data source, you can select and transform columns at will. More importantly, this will help you get SharePoint data both on-prem and in the cloud into a central data warehouse.

How to Change the Language for a SharePoint Site

If you’ve ever worked with multiple language packs for SharePoint, you’ll know that after you add a language pack to a farm, you’ll have the option of selecting a base language for any new site that is created.

image

The default language will be that of the site collection, but all installed language packs will be available. All of the system generated text in that site will be presented in the language of the site. This has been true since SharePoint 2007. SharePoint 2010 introduced the MUI (Multilingual User Interface), which, if configured, allowed the user to switch the language of the system generated text. SharePoint 2013 retains the MUI, but the way it is used has changed. However, all versions of SharePoint share a common limitation.

Once a site is created, its language cannot be changed. No way, no how. Well, at least not in any supported way.

I recently encountered a situation where a customer wanted to move their Internet facing site to SharePoint 2013. It was a multilingual site that used variations. However, when it was originally set up, no language packs had been installed. Both variation sites (English and French) were based on English. Although the content in the French variation site was in French, all of the system text was in English. This obviously needed to be corrected as any system text would pop up in English. A significant investment had been made into the content, so re-creation wasn’t our first choice.

An attempt was made to use the export function (using stsadm –o export – I’m old school). While the content exported just fine, it couldn’t be imported into a newly created French site, because the source site was in English. A little bit of web searching found Mirjam’s Van Olst’s article from 2008 on how to change a site’s language. This article was written for SharePoint 2007, and described how the content database could be directly updated to change the language for one or many sites.

Unfortunately, as Mirjam correctly points out, monkeying with the content database voids your warranty, and leaves SharePoint in an unsupported state. She also points out that this approach doesn’t work well for publishing sites, which is what we were dealing with. Our goal was to wind up with a clean system, so this wasn’t going to work for us, at least not as a complete solution.

The beauty of this approach however is that if you’re willing to compromise your content database temporarily, you can literally change the language of the site. Using this approach, we were able to set the language to 1036 (French) for all sites, export the French variation, and then change it back. Now technically, we’ve edited the content database, and rendered it unsupported. However, this doesn’t matter, as we wanted to import the content into new (French) variation site in a new, untouched content database,

This approach works, and unless I’m mistaken, should be totally supported. To be clear the steps taken are:

  1. Back up the source content database (always a good idea)
  2. Open SQL Server Management Studio, Connect to the content database in question , and create a new query. Any of Mirjam’s update statements would work, but this one is easiest
    UPDATE dbo.Webs SET Language = 1036
  3. Immediately export the site and all subsites. In my case I used stsadm, but of course PowerShell can be used, as can Central Administration.
    stsadm-o export –url http://xxxx.xxxx.xxx/fr-ca –filename frenchsite 
  4. Once complete, set the source site back to English
    UPDATE dbo.Webs SET Language = 1033
  5. Create a new site collection in NEW content database. Create the destination site using French (or allow the variations system to create it)
  6. Import into the destination French site
    stsadm -o import -url http://yyy.yyyy.yyy/fr-ca -filename frenchsite.cmp

The source content is successfully migrated into the destination site. Now, technically, because the source database has been directly modified, it’s in an unsupported state, and should be discarded. However, I have yet to see any ill effects. The good news is that the destination content database is pristine, and therefore this approach should be supported.

While we technically haven’t changed the language for an existing site, we have achieved the goal of getting the French language content into a proper French language SharePoint site.

Don’t Update Your Power Query After Building a Model

Power Query can be a very powerful data acquisition and transformation tool, and Power Pivot is great for modifying imported models. While it isn’t always crystal clear where to perform certain operations (i.e. both tools can change the type of columns, concatenate, etc.), what is clear is the order of operations. Once you start building your model, there’s really no going back.

Consider the following scenario. You use Power Query to import some data directly into a new data model. You then massage the model, adding a new calculated column that appends a couple of imported columns.  You then realize that one of the columns has some extra trailing spaces, go back to your Power Query to use the TRIM function. Once done, you apply and close to update your model, only to be faced with an error like one of the following:

image

We couldn’t refresh the table xxxx from connection xxxxxxx. Here’s the error message we got: Out of line object ‘DataSource’, referring to ID(s) xxxxxx, has been specified but has not been used……

Or

image

We couldn’t refresh the table xxxx from connection xxxxxxx. Here’s the error message we got: OLE DB or ODBC error: The query xxxxxx or one of its inputs was modified in Power Query after this connection was added…..

I’ve seen these two, but there may be others. The upshot of it is that Power Query can’t update an existing model once structural changes have been made to it. You can overcome this error by turning off “Load to Data Model”, applying your changes, re-editing the query and then turning “Load to Data Model” back on.

image

This severs the connection with the data model, and deletes the data model (or that portion that came from the query), allowing you to create it from scratch, which is great, unless you’ve put any work into changing the model.

Data refreshes are just fine, it’s structural changes that cause a problem. The thing to remember here is that editing a model created by Power Query a one way street. There’s no going back, so put as much effort into the initial query as possible.

Hopefully this saves a few poor souls from losing a fair bit of work, but I imagine that you’ll only be reading this if you’ve experienced one of the errors above. In that case,  maybe you won’t repeat the mistake as many times as I did….

Changes to the Default Data Loading Options in Power Query

This past week, the March 2014 update for Power Query became available. As always, I wasted no time in getting it installed. There are a number of significant new features in this release (all outlined at the above link), but one that caught my attention is the new “Selection Well” for multi table import.

The Selection Well allows you to not only see all of the items that have been selected for import, but also allows each query to be edited before the query is executed. This is a significant enhancement to the previous behaviour in which the queries would be executed once, and only then could be edited.

However, the thing that jumped out at me most was the change to the default load settings. “Load to worksheet” is no longer the default, but “Load to Data Model” is.

image

Hallelujah! As I’ve pointed out before, I strongly feel that this should be the default given the file size limitations for Excel workbooks in Power BI (still 10 MB outside of the model).  This, on the surface is a very welcome development. However, as it turns out, it’s not quite that simple.

This new default only takes effect when the “Select multiple items” option is selected in the query navigator. (For the record, it is possible to select this option, and then select only one item, and thus take advantage of the new default, but if you need to think about a default, it sort of defeats the purpose). If only one item is selected, then edited, the default is still “Load to worksheet” which I maintain is the wrong default, for all of the reasons outlined in my earlier article.

image

I presume that the thinking here is that casual users will expect to see their imported data immediately in Excel, but a user is using multiple tables, then they will understand that they need to use a data model, and will want to create relationships. The fear is that casual users will be confused if they don’t immediately see their data in the workbook.

This behaviour is also similar to that found in the import data dialog. If you choose to select multiple tables:

image

The data will automatically be added to the data model, whether or not Power Pivot has been enabled. The final screen shows that, and unless the “Table” view is selected, the data will be imported only into the model, not the workbook.

image

I applaud the move within Power Query, but I still feel that load to model should be the default in all cases. The current behaviour is inconsistent. Sometimes it is the default, and sometimes it isn’t, and if you’re not watching, you’ll wind up with the wrong behaviour. I also feel that we should be encouraging Excel’s use as a data client, and discouraging its use as a data base. Leaving the import to worksheet as a default further exacerbated the use of Excel as a “spreadmart” tool.

Friends don’t let friends Load to worksheet.

How to Refresh Data Models in Office 365 from On Premises Using Nintex Workflow – A Hybrid Approach

One of the viable use cases for Power BI is to provide reporting to a mobile workforce, even when the organization has invested in BI on premises. In this scenario, there may be a SharePoint farm using PowerPivot for SharePoint with multiple workbooks connected to a myriad of data sources, including SQL Server Analysis Services cubes and models.

The Problem

In this scenario, the workbooks required by the mobile users can be copied up to the Office 365 tenant, and consumed from there. However, how is the data kept current? One way to do so is to set up the Power BI Data Management Gateway (DMG) to do this. The DMG is a great solution, but is relatively new, and has a few inherent limitations at the moment. Chief among them is that data can only be refreshed from SQL Server or Oracle data sources. For the moment at least, Analysis Services is left out in the cold. However, PowerPivot for SharePoint doesn’t share these limitations, and has no problem refreshing data from a wide variety of sources. Unfortunately, if you’re using Power BI, and need to refresh from an unsupported data source, you’re out of luck for the moment, so this is a problem.

Keeping a cloud copy of a local workbook carries the inherent problems of managing two different copies of the same workbooks. Every modification must be performed twice and there is always the chance that something will get missed. Ideally, for this scenario, we should have a publishing mechanism. We can use Nintex Workflow (on premises) to provide this publishing mechanism for us. Whenever the model is refreshed, the workflow will fire, and copy the file up to Office 365 to update the model there.

In order to implement this solution, you’ll need a copy of Nintex Workflow. The low end (Team) edition is fine, but what we’re after is the “Copy document to Office 365” action, and that is available through the Nintex Live action set.

The Solution

To start, we’ll work with an Excel workbook that is using an embedded model created from the standard Contoso Data Warehouse. In this example, we’ve built a simple pivot table to interact with the model. The model has been published to a PowerPivot gallery, and the refresh schedule has been set. Once we’ve tested the refresh to ensure that it’s working, we can create the workflow. To do so, we go to the workflow options section in the library ribbon and create a new Nintex workflow.

image

We then select the Blank template. Once presented with the design canvas, we select Workflow Settings, give it a good name, and set the workflow to run on create and on edit.

image

Once the settings have been saved, we can start to build the actual workflow. The first action will be a “Pause for” action. Find it in the Workflow Actions toolbar, and drag it onto the design surface. Select configure from its drop down menu, and set it to pause for a period of time, in this case, 5 minutes.

image

The amount of time that we need to pause for depends on how long it takes our model to refresh. When the refresh operation begins, it “edits” the document, which kicks off the workflow. We don’t want to upload the workbook until the refresh operation is complete, so we need to pause it. In this case, we are working with a data model that is about 100 MB and comprised of about 3 million rows of data. In this environment, it requires about 2.5 minutes to refresh, so a 5 minute pause is sufficient.

Next, we need to use the “Office 365 upload file” action. This action is one of the Nintex Live actions that can be added at no extra charge to the Nintex environment. If it hasn’t already been added to the toolbar, we need to do so. Assuming that Nintex Live has been enabled for the farm, we browse the available actions through the Catalog which is available through the designer ribbon.

image

Once in the catalog, we can browse the available actions, and click the Add button to add them to the toolbox. The action that we need to add is named “Office 365 upload file”. Once at least one action is added, an new section will appear in the Workflow Actions toolbar named “Nintex Live”. We then open that section and drag the Office 365 file upload action onto the design surface after the pause action, then configure it.

image

Walking through the options:

  • File to upload: We want the current file to be uploaded to Office 365, so we select current item
  • Destination site URL: This is in fact the URL of the site itself. Everything up to, but not including the document library
  • Folder path: This is the name of the destination document library, and if appropriate, any subfolders
  • File name: We can create or derive a new name for the file, but in this case, we will be using the same name as the source file
  • Overwrite existing file: We select this option, as we will be updating existing content.
  • Fields: We can add additional metadata values if we wish, but in this case, we leave it empty
  • SharePoint Online URL: this is the URL to the root of the Office 365 tenant for the destination
  • Username and password are for a proxy account that will be used to upload the file. Secure workflow constants can be used.

Once we configure the action, we save it, our workflow should appear as follows:

image

and we are ready to publish the workflow. To do so, select the Publish button, then close the workflow designer.

We are now ready to test our workflow. We can force the workflow to run on demand for our workbook, but a better test is to let the refresh trigger it. The most frequently that PowerPivot for SharePoint refreshes can be scheduled to run is once per day. but there is a way to force the refresh to happen on demand.

To force a refresh, we need to edit the refresh schedule for the workbook. Open up the refresh history screen by selecting Manage Data Refresh for the workbook. 

image

From the Refresh history screen, select the Configure Schedule link to edit the schedule. This will allow us to edit the existing schedule, but more importantly, it will also let us force a refresh by selecting the “Also refresh as soon as possible” option.

image

If we are using a specific account as a credential, we will also need to re-enter that. Once complete, selecting OK will queue the refresh job for execution. The refresh job will happen the next time that the “PowerPivot Data Refresh Timer Job” executes, and it runs every 5 minutes.

If you’re really impatient, and you have access to central administration, you can edit this timer job and force it to run immediately on a case by case basis.

Moving back to the refresh history screen we will notice that the refresh is currently running.image

And in the workflow history screen for the workbook, we can see that the workflow is also running.

image

Once both operations have completed, we can navigate to our Office 365 site, and we will find the file in the destination folder. If our model is greater than 10 MB in size, we will require Power BI in order to interact with it in the browser, and we will need to enable it for use with Power BI through the Power BI application. If it has already been enabled, then no additional actions are required. If the workbook is less than 10MB, it will work directly in any Office 365 library that is enabled for Excel Services.

More Possibilities

If you’ve worked with the Power BI Data Management Gateway at all, you’ll know that data refreshes can be scheduled at most daily. This limitation is also shared by PowerPivot for SharePoint, but it is possible to change its behaviour. Using the techniques outlined in this article by Ian Smith, and this article by Tim Laqua, refreshes can be made as granular as 5 minutes. These articles are written for SharePoint 2010, but the principles work with 2013 (hint, the stored procedure that forces a refresh is [DataRefresh].[AddRefreshSchedule]). Be warned, this technique is a hack, and not supported by Microsoft. However, it can be a way to up the frequency of refreshes on premises, and with the above technique, in Office 365 too.

Another way to refresh the data more often is by using SQL Server Integration Services (SSIS). SSIS can update an embedded data model in an Excel workbook through a script action (thanks to Gobán Saor).  SSIS packages can be scheduled with SQL agent jobs, which means that they can be run as frequently as every minute. Another advantage of this approach is that with Excel 2013, this method should be fully supported.

Finally…

By relying on our workflow to publish the refreshed model, we work around the current limitations of the Data Management Gateway. We do incur the cost of a Nintex license, but a few days of development would easily exceed that cost. In fact, if our data model doesn’t exceed 10 MB, we don’t even need Power BI for this solution to work.