Using Power Query with SharePoint Lists and Lookup Fields

As I’ve explained many times before, querying SharePoint data directly is a bad idea. The SharePoint data storage mechanisms simply aren’t designed for querying of any scale, hence the lookup limitations that have been imposed upon it. The best approach to querying SharePoint list data is to first load it into a data warehouse or data mart of some sort. However, both Reporting Services (SSRS) and Power Query support direct access to SharePoint lists. While I try to strongly dissuade people from doing this with Reporting Services, properly used, Power Query is a totally viable means of querying SharePoint list data.

Why is this? With SSRS, every query goes back to the data source for retrieval.  Power Query is different – it’s analogous to SQL Server Integration Services, which is an ETL management product. It loads source data into a repository, in this case, an embedded xVelocity, or Power Pivot model which can be considered a “personal data warehouse”. Queries against this mini data warehouse are fast, and don’t rely on SharePoint  retrieval mechanisms, and can be used quite effectively in reports.

There are a couple of subtleties to querying SharePoint list items with Power Query, and I will briefly walk through the process below.

With Excel open, click the Power Query tab, select “From Other Sources” and the select “From SharePoint List”.

image

Next, enter the URL for the SharePoint site (or subsite) that contains the list you wish to query.

If it is the first time accessing this site, you will be prompted for credentials. If your site is Office365, be sure to enter organizational credentials. If it is on premise, use Windows credentials.

Once entered, you will be presented with a list of SharePoint lists in the Power Query Navigator window. Select the list that you wish to query, in our case, Announcements. When selected, click the edit button to edit the query.

The data, or a subset will load into the query editor window. You will see all of the list item fields expressed as columns, and for the most part, using the correct data type. At this point you can remove any columns that are unnecessary, or filter any undesired rows. There are a couple of SharePoint field types that bear special mention.

Lookup fields are a lookup into another SharePoint list. Internally, the SharePoint item stores this as an ID and display value, but Power Query gives you access to all of the properties of the related item as a one-to-one relationship. Essentially, what you can do is to flatten that relationship by incorporating the related item’s attributes.

If you scroll to a column of this particular type, you will see the value expressed as a hyperlink with the value “Record”. Clicking on it will drill down to one related record, but that’s not what we want to do. We want to expand the properties for all items in the list. The way that you do this is  to click on the expand icon in the column header. In our case, we want to expand the “CreatedBy” field. CreatedBy is a standard list field, of the Person type. Person fields are actually a special case of a lookup field, so it exhibits this behaviour.

Here, we are interested in retrieving the user’s name and mobile phone, so we deselect all of the other fields. A new column will be created for every expanded field in the format sourcefieldname.attributename .

Attachments are another special case. There can be multiple attachments for a single list item, a one to many relationship. The hyperlink is therefore “Table”. Clicking on the column header expand for this column looks similar, but with an important difference. Options are available to either expand or aggregate the related items.

Selecting expand will create a new source record for each related item, and the only columns that will differ will be the items selected from the related table (Name in our case). Aggregate will not create any new records, but will summarize the related fields. For numeric fields, they can be totalled or averaged, and for text fields they can be counted.

Once ready, click “Close and Load” from the Query Editor ribbon, and the list data will load to either your model, or your workbook, depending on what your preferences are. Of course, I always recommend that you load to the model only.

Once loaded, any visualizations and queries will work against the model. The data can be refreshed at any point either manually, or automatically if using the Data Management Gateway. Keep in mind however that refreshes will operate against the source list.

17 comments

  1. Hi John,
    I uploaded the workbook connected to a sharepoint online list as you described and enable it in Power BI site but I am not able to set the credentials (connection test failure ) in the Data Management Gateway.. ( I have no problems with other types of Power Query Connections – csv-folder)
    I choose both anonymous Authentication (as the DMG 1.2 PowerQuery support sources ) and windows but always failed to verify..Any ideas ?

  2. I’m having an issue expanding a lookup value that’s not the special case of created by or modified by. It’s a separate list, and what appears to be happening is that I’m getting a 500 error (page can’t be found) and when you look at the path it’s not going to the root of the site but inserting the list I’m working in before the list I’m trying to pull data from, so sitename/_vti_bin/listdata.svc/listIamworkingin/lookupList. I also noticed that the list I’m working in has an appended (1) at the end.

    again the error is 500. Any ideas?

  3. One other quirk with the Excel Power Query links to Sharepoint List is that the linked table field headers have spaces, “-“, “/”, “.” characters removed in the Power Query eg :

    “My Header” becomes “MyHeader”
    “My-Header” also becomes “MyHeader”
    “My.Header” also becomes “MyHeader”
    etc

    Also some minor editing of header string case is done eg

    “My header” becomes “MyHeader”
    etc

    Not sure if these are a Sharepoint or Power Query thing or maybe just when they are used together.

    I can understand why Power Query would edit out potentially ‘special’ characters, but not sure why Power Query would care about title case, though Power Query does care about string case in formulas so perhaps it is just a convention being followed by Power Query developers.

    Maybe my memory is wrong but I could swear I have sourced a Power Query to a Sharepoint List and it didn’t do these things above? Maybe this is something that has been introduced in newer Power Query versions?

    Anyways,ideally Power Query wouldn’t do these edits to headers because it adds another level of complexity to working with data eg have to have different names for headers in Power Query vs other data and reporting apps.

    Of course, could follow a strict naming convention eg using “_” instead of space in Sharepoint list, but that is difficult because business users do whatever they want and this level of control is difficult.

    Any ideas on managing these Power Query header renaming issues?

  4. I keep running into 100 records limitation from SharePoint. Is there anyway to boost that? For example, Is there a way to plug in the $top url variable to the query that Power Query sends to SharePoint?

  5. I’ve never come across a 100 item limit – might you be referring to a PowerQuery preview items limit? There is a limite there, but once applied, all items should load in.

  6. When I query by SharePoint site by IE and Power Query in Excel I get 100 row limits as in the following url:
    {site}/_api/Web/Lists(guid’xxx’)/Items
    but I get all of them when I do the following from IE:
    {site}/_api/Web/Lists(guid’xxx’)/Items?$top=1000

  7. For example when this simple query is done against a SharePoint server site we only get 100 rows
    ==================================================================================
    let
    Source = SharePoint.Tables(“https://SPServer.cdc.gov/sites/xxx/xx”, [ApiVersion = 15]),
    #”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX” = Source{[Id=”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX”]}[Items]
    in
    #”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX”
    ==================================================================================
    it doesn’t matter what is done with the query it only filters on the initial 100. I think we need something like the following – note the “?$top=1000”:
    ==================================================================================
    let
    Source = SharePoint.Tables(“https://SPServer.cdc.gov/sites/xxx/xx”, [ApiVersion = 15]),
    #”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX” = Source{[Id=”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX”]}[Items]?$top=1000
    in
    #”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX”
    ==================================================================================

  8. @William – Or more than 1000.. I haven’t yet tried this. Have you? Does it work? I should also ask if you’re using the latest Power Query (aka Get and Transform in Excel 2016..)

  9. jpw – Yes, I tried it and no it doesn’t work. I just downloaded Power Query about a week ago so yes I am pretty sure I have the latest.

  10. Interesting. As a test, could you install Power BI Desktop, and try it that way? That would tell us if the issue in your case is on the server side or the client side.

  11. Hi. John:

    Not sure you can help with this, using above example William provided, the data connection is using List GUID instead of names.

    We are in following sticky situation: we have multi sites (30ish) created using same template, so all same structure. And we created on Excel with a dozen connection to one site, which works great. When this excel is connected to other site with same structure, after switching data source URL. all connections are dead because List GUID is different. Then almost every connection needs be redone.

    Any idea how we can get away with these GUID?

    Thanks,

  12. Hi. John:

    Not sure you can help with this, using above example William provided, the data connection is using List GUID instead of names.

    We are in following sticky situation: we have multi sites (30ish) created using same template, so all same structure. And we created on Excel with a dozen connection to one site, which works great. When this excel is connected to other site with same structure, after switching data source URL. all connections are dead because List GUID is different. Then almost every connection needs be redone.

    Any idea how we can get away with these GUID?

    Thanks,

  13. After a couple tries, seems one only need to replace Source{[Id=”XXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX”]} with Source([Title=”Tasks”]}, then everything works fine when data connection is pointing to a new URL.

  14. When I try to use Power Query within Excel and connect to a list on my on premise SharePoint server I see AuthorID and EditorID instead of the Created By and Modified By fields. And, these field are not showing as record but as list so I don’t have the expand icon in the header where I can select which attributes I want to have.

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Exit mobile version