Skip to content

Tag: Power Query

Using Power Query with SharePoint Lists and Lookup Fields

As I’ve explained many times before, querying SharePoint data directly is a bad idea. The SharePoint data storage mechanisms simply aren’t designed for querying of any scale, hence the lookup limitations that have been imposed upon it. The best approach to querying SharePoint list data is to first load it into a data warehouse or data mart of some sort. However, both Reporting Services (SSRS) and Power Query support direct access to SharePoint lists. While I try to strongly dissuade people from doing this with Reporting Services, properly used, Power Query is a totally viable means of querying SharePoint list data.

Why is this? With SSRS, every query goes back to the data source for retrieval.  Power Query is different – it’s analogous to SQL Server Integration Services, which is an ETL management product. It loads source data into a repository, in this case, an embedded xVelocity, or Power Pivot model which can be considered a “personal data warehouse”. Queries against this mini data warehouse are fast, and don’t rely on SharePoint  retrieval mechanisms, and can be used quite effectively in reports.

There are a couple of subtleties to querying SharePoint list items with Power Query, and I will briefly walk through the process below.

With Excel open, click the Power Query tab, select “From Other Sources” and the select “From SharePoint List”.

image

Next, enter the URL for the SharePoint site (or subsite) that contains the list you wish to query.

image

If it is the first time accessing this site, you will be prompted for credentials. If your site is Office365, be sure to enter organizational credentials. If it is on premise, use Windows credentials.

Once entered, you will be presented with a list of SharePoint lists in the Power Query Navigator window. Select the list that you wish to query, in our case, Announcements. When selected, click the edit button to edit the query.

image

The data, or a subset will load into the query editor window. You will see all of the list item fields expressed as columns, and for the most part, using the correct data type. At this point you can remove any columns that are unnecessary, or filter any undesired rows. There are a couple of SharePoint field types that bear special mention.

Lookup fields are a lookup into another SharePoint list. Internally, the SharePoint item stores this as an ID and display value, but Power Query gives you access to all of the properties of the related item as a one-to-one relationship. Essentially, what you can do is to flatten that relationship by incorporating the related item’s attributes.

If you scroll to a column of this particular type, you will see the value expressed as a hyperlink with the value “Record”. Clicking on it will drill down to one related record, but that’s not what we want to do. We want to expand the properties for all items in the list. The way that you do this is  to click on the expand icon in the column header. In our case, we want to expand the “CreatedBy” field. CreatedBy is a standard list field, of the Person type. Person fields are actually a special case of a lookup field, so it exhibits this behaviour.

image

Here, we are interested in retrieving the user’s name and mobile phone, so we deselect all of the other fields. A new column will be created for every expanded field in the format sourcefieldname.attributename .

image

Attachments are another special case. There can be multiple attachments for a single list item, a one to many relationship. The hyperlink is therefore “Table”. Clicking on the column header expand for this column looks similar, but with an important difference. Options are available to either expand or aggregate the related items.

image

Selecting expand will create a new source record for each related item, and the only columns that will differ will be the items selected from the related table (Name in our case). Aggregate will not create any new records, but will summarize the related fields. For numeric fields, they can be totalled or averaged, and for text fields they can be counted.

Once ready, click “Close and Load” from the Query Editor ribbon, and the list data will load to either your model, or your workbook, depending on what your preferences are. Of course, I always recommend that you load to the model only.

Once loaded, any visualizations and queries will work against the model. The data can be refreshed at any point either manually, or automatically if using the Data Management Gateway. Keep in mind however that refreshes will operate against the source list.

17 Comments

Power BI Data Management Gateway 1.2 Changes the Game

Last week, a new version of the Power BI Data Management Gateway was released. If you’re unfamiliar with it, it is the Power BI component that allows for workbooks stored in the cloud to be refreshed on  a regular basis with data that exists on-premises, or outside of the hosting center.

I’ve been using the gateway since its initial availability in preview form, and in my opinion, this is the most significant functionality change yet. Until this release there were a grand total of three possible data source types that could be refreshed. With this release, the total increases to 18 by my count (you could argue 22, but that’s plenty).

With past versions, I would write up a quick post on how it is configured, but that has been done, along with the complete list of supported data sources and a helpful video on this blog post by the Power BI team. In addition, an very comprehensive (although amazingly already in need of update) white paper on hybrid data scenarios has just been published by Microsoft here.

The big change here is that this multitude of data sources is supported for data retrieved by Power Query, and NOT by Power Pivot natively. The catch is that they’re only supported for Power Query queries. There is absolutely nothing wrong with this, but it does require us to change our approach a bit to using the data management gateway.

As I first mentioned in a post almost a year ago on Using the Data Management Gateway, and in a number of posts since, data connection strings needed to line up with Power Pivot connections. At the time, the only supported sources were SQL Server and Oracle (for the gateway) and Power Query wasn’t supported at all. Version 1.1 of the Gateway brought Power Query support, but only for those 2 supported data sources. With this release, the Power Query support includes not only all of the new data sources, but also the three original Power Pivot data sources (note: Power Pivot data connections can be found in the Power Pivot add-in UI, while Power Query connections are available on the data tab in Excel).

image

image

As of this release of the Data Management Gateway, there is almost no reason to use native Power Pivot connections any longer. My recommendation is therefore that unless there is a good reason for not doing so, you should try to use Power Query for all data acquisition tasks. It is quite clearly the way forward, and will only gain in supported capabilities. My suspicion is that Power Pivot connections will be retained for backward compatibility reasons only.

With that said, there are a couple of good reasons for using Power Pivot connections directly. One of these reasons is if your data source is online, whether it is SQL Azure, SharePoint Online, or Project Server online. With these data sources, a Data Management Gateway is not required for refresh to work from an embedded Power Pivot connection. However, if Power Query is used to access these sources, it is.

What this means is that for Power Pivot connections to these sources, a refresh allows the Power BI service in the cloud to directly access these data sources in the cloud. However, because ALL Power Query connections require the Data Management Gateway for refresh operations, a Power Query refresh operation will require all of the data to be first downloaded to the on-premises gateway, and then sent back up to the Power BI service in the cloud. While functional, this is hardly the most efficient approach.

Apart from this one small caveat, this version of the Data Management Gateway spells the way forward. Additional data source support should come fast a furious, and the Power Query focus means that we can start to rely its powerful transformational capabilities without having to sacrifice refreshability (if that’s even a word….).

7 Comments

Changes to Data Loading Features in Power Query – May 2014

The Power BI team continues to deliver new features at a rapid pace. The May 2014 release of Power Query is no exception. There are quite a few new features packed into this release, which you can read about at your leisure but I’m particularly interested in the ones pertaining to data loading, as I’ve discussed several of the limitation in this area in the past.

This release is major indeed. There are three significant changes to the data loading features in this build.

Configurable defaults for data loading

I’ve just posted an article on how to do this and why. To put it simply, the default data load behaviour is to load data into a worksheet in most cases. This leads to larger than necessary workbooks, to the point where they may not work properly with Power BI. Now you can change this default behaviour, which will be welcome for anyone doing serious data modeling.

Worksheet Size Warning

If you do decide that you want to load data to the worksheet, or you’re simply unaware of the issues, you will be prompted to consider loading to the data model once your data hits 10 MB, which is the maximum non-model workbook size in Power BI. Prior to this update, the user wouldn’t know that there was a problem until after they tried to use the workbook in Office 365.

Data Model Preservation

Prior to this update, it wasn’t possible to modify the query without losing all changes to the data model, or formatting in worksheet tables. This release of Power Query remedies this situation. You can now go back and make changes to your query without having to recreate the model.

Power BI is being built in a surprisingly collaborative way, with a large amount of input from the community. I’ve never seen this done to this extent at Microsoft, and it’s very good to see. I know that all of these features have been asked for and discussed by the community in the last few months, and here they are. Kudos to everyone involved, and keep it up. This product keeps getting better and better.

1 Comment

Set Your Data Load Defaults in Power Query

When using Power Query to load data from a data source, the user has three choices as to where that data can be stored. It can go into a worksheet in the workbook, into the data model, or into both. The default behaviour (and therefore the one that will be used most commonly) is to load into a worksheet when one table is being imported, or into the data model when multiple tables are selected.

The reason for the load to worksheet default is understandable, as casual users will expect to see the data immediately after it is imported – they won’t want to open the PowerPivot model editor (which is itself turned off by default in Excel 2013). However, loading to the worksheet introduces quite a few limitations, as I’ve described (OK, whined about) mostly here, but also here and here. These limitations mostly revolve around the million row limit in Excel, the fact that data is stored in the model much more efficiently, and the file size limits when working with Power BI.

The great news is, that as of the May 2014 build of Power Query . These data loading defaults are now user configurable. If you’re taking the time to read this post, chances are that you should set them.

The defaults are Power Query wide, and are therefore located in the Power Query options. Once Excel is open,  go to the Power Query tab and click the Options button.

image

Now, in addition to joining the Customer Experience Program, you can set your data load settings. I always import only to the model, so I check the custom option, and Load to Data Model.

image

I can’t be completely sure, but in my limited testing, it appears that this option setting does not sync between instances of Excel on different machines the way that other (non add in) settings do. You will therefore need to set this for every Excel installation that you have. Now if there was only a Group Policy setting that could be pushed out….

Kudos to the Power Query team for implementing this – it’s a welcome addition to the product.

1 Comment

Don’t Update Your Power Query After Building a Model

Power Query can be a very powerful data acquisition and transformation tool, and Power Pivot is great for modifying imported models. While it isn’t always crystal clear where to perform certain operations (i.e. both tools can change the type of columns, concatenate, etc.), what is clear is the order of operations. Once you start building your model, there’s really no going back.

Consider the following scenario. You use Power Query to import some data directly into a new data model. You then massage the model, adding a new calculated column that appends a couple of imported columns.  You then realize that one of the columns has some extra trailing spaces, go back to your Power Query to use the TRIM function. Once done, you apply and close to update your model, only to be faced with an error like one of the following:

image

We couldn’t refresh the table xxxx from connection xxxxxxx. Here’s the error message we got: Out of line object ‘DataSource’, referring to ID(s) xxxxxx, has been specified but has not been used……

Or

image

We couldn’t refresh the table xxxx from connection xxxxxxx. Here’s the error message we got: OLE DB or ODBC error: The query xxxxxx or one of its inputs was modified in Power Query after this connection was added…..

I’ve seen these two, but there may be others. The upshot of it is that Power Query can’t update an existing model once structural changes have been made to it. You can overcome this error by turning off “Load to Data Model”, applying your changes, re-editing the query and then turning “Load to Data Model” back on.

image

This severs the connection with the data model, and deletes the data model (or that portion that came from the query), allowing you to create it from scratch, which is great, unless you’ve put any work into changing the model.

Data refreshes are just fine, it’s structural changes that cause a problem. The thing to remember here is that editing a model created by Power Query a one way street. There’s no going back, so put as much effort into the initial query as possible.

Hopefully this saves a few poor souls from losing a fair bit of work, but I imagine that you’ll only be reading this if you’ve experienced one of the errors above. In that case,  maybe you won’t repeat the mistake as many times as I did….

5 Comments