Microsoft Lists debuted in 2020 and they are a (yet another) great way to organize list-based data. Now, when someone says data, I think Power BI. Obviously, we’ll want to report on this data, but how do we do that? There is no Power BI connector for Microsoft Lists. The answer is quite simple, if not completely obvious. You need to use the SharePoint Online List connector in Power BI.
Microsoft Lists are the same thing as SharePoint lists. In fact, they ARE SharePoint lists. The Microsoft Lists service is just a new user interface for interacting with them. You can use Lists all you want and never see SharePoint at all, unless you need to change list settings, but I digress. Given this fact, as far as Power BI is concerned, everything that applies to SharePoint lists applies to Microsoft Lists lists (the grammar here gets awfully awkward).
For reference, I wrote a series of articles some time ago about the idiosyncrasies of working with SharePoint data in Power BI, and these articles are still valid today (2021). The most recent of these articles can be found here and includes links to the others.
There is one thing that is worth mentioning about Microsoft Lists. When a new list is created using the Lists interface, the user can save it to any of their SharePoint sites, but another option is to same it to “My lists”.
When you use the SharePoint Online list connector in SharePoint, it prompts you to enter the URL for the SharePoint site that contains the list that you want to report on. That is straightforward when your list is stored in a SharePoint site, but what if your list is stored in “My lists”? Where are “My lists” stored?
They are stored in a “personal” SharePoint site. We SharePoint old timers would know it as the MySite, and while usage of MySite has been de-emphasized in Microsoft 365, it is very much still there. Each user has one. In fact, this is where the “personal” OneDrive for Business content is stored – in the Documents library of the very same MySite. By storing personal lists in the MySite, Microsoft Lists is just following the same pattern used by OneDrive for Business, which makes perfect sense.
Given this, what URL should you use in Power BI to connect to your lists stored in “My Lists”? You’ll find it in the Microsoft Lists web interface in the URL bar. It’s that portion of the URL up to “/Lists/.
TenantName = the name of your Microsoft 365 tenant, i.e. Contoso
LoginID = the email address used to login to Microsoft 365 with the “@” and the “.” replaced with underscores i.e. jpw_contoso_com
ListName – the name of you list
Once you enter in this URL, you’ll have access to any of the lists stored in “My lists”. At this point, your personal lists will behave like any list in any other SharePoint site.
Excel is an excellent tool for analyzing data. An analyst can easily connect to and import data, perform analyses, and achieve results quickly. Export to Excel is still one of the most used features of any Business Intelligence tool on the market. The demand for “self-service BI” resulted in a lot of imported data being stored in overly large Excel files. This posed several problems. IT administrators had to deal with storage requirements. Analysts were restricted by the amount of data they could work with, and the proliferation of these “spreadmarts” storing potentially sensitive data created a governance nightmare.
A little history
Power Pivot was created to provide a self-service BI tool that solved these problems. Initially released as an add-in for Excel 2010, it contained a new analytical engine that would soon be introduced to SQL Server Analysis Services as well. Its columnar compression meant that millions of rows of data could be analyzed in Excel and would not require massive amounts of space to store. Data in Power Pivot is read-only and refreshable – ensuring integrity. It allowed analysts to set up their own analytical data sets and analyze them using a familiar looking language (DAX), and visual reporting canvas (PowerView) all from within Excel.
The original version of Power BI brought PowerPivot to Office 365 through Excel before Power BI’s relaunch gave it its own consumption interface (the service) and design client (Power BI Desktop). Both the PowerPivot engine, and Power Query were incorporated into the service and Power BI Desktop, while the Silverlight based Power View was replaced with a more web friendly reporting canvas.
Excel support
Throughout all these changes, Excel has continued to be well supported in the Power BI service. Analyze in Excel allows an analyst to connect to a deployed Power BI dataset (built with Power BI Desktop) and analyze it using pivot tables, charts, etc. Recent “connect to dataset” features have made this even simpler. Organizational Data Types allow Excel data to be decorated with related data in Power BI.
Excel workbooks containing Power Pivot models have always been supported by the service. These models can even be refreshed on a regular basis. If the source data resides on premises, it can even be refreshed through the on-premises data gateway. This all because the data engine in Power BI is essentially Power Pivot.
It’s that word “essentially” that causes a problem.
Datasets that are created and stored within Excel workbooks are functional but can only be accessed by that workbook. Contrast this with a dataset created by Power BI Desktop, which can be accessed by other interactive (pbix) reports, paginated reports, and as mentioned above, by Excel itself. The XMLA endpoint also allows these reports to be accessed by a myriad of third part products. None of this is true for datasets created and stored in Excel.
So why would anyone continue to create models in Excel. The reason has been until now that although Excel can connect to Power BI datasets to perform analysis, those connected workbooks would not be updated when the source dataset changes. This meant that those analysts that really care about Excel needed to work with the Excel created models. This changed recently with an announcement at Microsoft Ignite Spring 2021. In the session Drive a data Culture with Power BI: Vision, Strategy and Roadmapit was announced that very soon, Excel files connected to Power BI datasets will be automatically updated. This removes the last technical reason to continue to use Power Pivot in Excel.
Tooling
Building a dataset with Power BI Desktop is fundamentally the same as building one with Excel. The two core languages and engines (M with Power Query, and DAX with Power Pivot) are equivalent between the two products. The only difference is that the engine versions found in Excel tend to lag those found in Power BI Desktop and the Power BI service itself. I’d argue that the interfaces for performing these transforms, and building the models are far superior in Power BI Desktop. not to mention the third-party add-in capability.
In this “new world” of Excel data analysis, Datasets will be created by using Power BI Desktop, deployed to the service, and then Excel will connect to them to provide deep analysis. These workbooks can then be published to the Power BI service alongside and other interactive or paginated reports for use by analysts. With this new capability, Excel truly resumes its place as a full-fledged first-class citizen in the Power BI space.
What to use when
With this change, the decision of what tool to use can be based completely on its suitability to task, and not on technical limitations. There are distinct types of reports, and different sorts of users. The choice of what to use when can now be based completely on these factors. The common element among them all is the dataset.
With respect to report usage, typical usage can be seen below.
Tool
Used by
Purpose
Power BI Service
Report consumers
Consuming all types of reports: interactive, paginated and Excel
Excel Online
Report consumers
Consuming Excel reports from SharePoint, Teams, or the Power BI service
Power BI Desktop
Model builders Interactive report designers
Building Power BI dataset Building interactive reports
Power BI Report Builder
Paginated report designers
Building paginated reports
Excel
Analysts
Building Excel reports Analyzing Power BI datasets
Making the move
Moving away from Power Pivot won’t require any new services or infrastructure, and existing reports and models don’t need to be converted. They will continue to work and be supported for the foreseeable future. Microsoft has neither said not indicated that Power Pivot in Excel is going anywhere. However, by building your new datasets in Power BI Desktop, you will be better positioned moving forward.
If you do want to migrate some or all your existing Excel based Power Pivot datasets, it’s a simple matter of importing the Excel file into Power BI Desktop. This is completely different than connecting to an Excel file as a data source. From the File menu in Power BI Desktop, select Import, then select Power Query, Power Pivot, Power View. You will then select the Excel file that contains your dataset.
Power BI will then import all your Power Query queries, your Power Pivot dataset, and if you have any it will convert PowerView reports to the Power BI report types. The new report can then replace your existing Excel file. Once deployed to the Power BI service, other Excel files can connect to it if so desired.
Building your datasets with Power BI Desktop allows you to take advantage of a rich set of services, across a broad range of products, including Excel. Building them in Excel locks you into an Excel only scenario. If you already use Power BI, then there’s really no reason to continue to build Power Pivot datasets in Excel.
Azure Data Explorer (ADX) is a great platform for storing large amounts of transactional data. The Incremental Refresh feature (now available for Pro users!) in Power BI makes it much faster to keep data models based on that data current. Unfortunately, if you follow the standard guidance from Microsoft for configuring Incremental Refresh, you’ll quickly bump into a roadblock. Luckily, it’s not that difficult to get around.
Incremental Refresh works by setting up data partitions in the dataset in the service. These partitions are based on time slices. Once data has been loaded into the dataset, only the data in the most recent partition is refreshed.
To set this up in Power BI Desktop, you need to configure two parameters, RangeStart, and RangeEnd. These two parameters must be set as Date/Time parameters. Once set, the parameters are used to filter the Date/Time columns in your tables accordingly, and once published to the service, to define the partitions to load the data into.
When Power Query connects to ADX, all Date/Time fields come in as the Date/Time/Timezone type. This is a bit of a problem. When you use the column filters to filter your dates, the two range parameters won’t show up because they are of a different type (Date/Time). Well, that’s not a big problem, right? Power Query lets us change the data column type simply by selecting the type picker on the column header.
Indeed, doing this does in fact allow you to use your range parameters in the column filters. Unfortunately, data type conversions don’t get folded back to the source ADX query. You can see this by right-clicking on a subsequent step in the Power Query editor. The “View Native Query” option is greyed out, which indicates that the query cannot be folded.
Query folding is critical to incremental refresh. Without it, the entirety of the data is brought locally so that it can be filtered vs having the filter occur at the data source. This would completely defeat the purpose of implementing Incremental Refresh in the first place.
The good news is that you can in fact filter a Date/Time/Timezone column with a Date/Time parameter, but the Power Query user interface doesn’t know that. The solution is to simply remove the type conversion Power Query step AFTER performing the filter in the Power Query UI.
Alternatively, if you’re comfortable with the M language, you can simply insert something like the following line using the Advanced Editor in Power Query (where CreatedLocal is the name of the column being filtered).
#"Filtered Rows" = Table.SelectRows(Source, each [CreatedLocal] >= RangeStart and [CreatedLocal] < RangeEnd),
If the filtration step can be folded back into the source, Incremental Refresh should work properly. You can continue setting up Incremental Refresh using the DAX editor. You will likely see some warning messages indicating that folding can’t be detected, but these can safely be ignored.
Application Insights (AI) is a useful way of analyzing your application’s telemetry. Its lightning-fast queries make it ideal for analyzing historical data, but what happens when you start to bump into the limits? The default retention for data is 90 days, but that can be increased (for a fee) to 2 years. However, what happens when that’s not enough? If you query too much, or too often you may get throttled. When you start to bump into these limits, where can you go?
The answer lies in the fact that Application Insights is backed by Azure Data Explorer (ADX or Kusto). Moving your AI data to a full ADX cluster will allow you to continue using AI to collect data, and even to analyze recent data, but the ADX cluster can be sized appropriately and used when the AI instance won’t scale. The fact that it is using the same engine and query language as AI means that your queries can continue to work. This article describes a pattern for doing this.
Requirements
We’ll be working with several Azure components to create this solution. In addition to your AI instance, these components are:
Azure Data Explorer cluster
Azure Storage Account
Azure Event Namespace and at least one Event hub
Azure Event Grid
The procedure can be broken down into a series of steps:
Enable Continuous Export from AI
Create an Event Grid subscription in the storage account
Create an ADX database and ingestion table
Create an Ingestion rule in ADX
Create relevant query tables and update policies in the ADX database
Enable Continuous Export from Application Analytics
AI will retain data for up to 2 years, but for archival purposes, it provides a feature called “Continuous Export”. When this feature is configures, AI will write out any data it receives to Azure blob storage in JSON format.
To enable this, open your AI instance, and scroll down to “Continuous Export” in the “Configure” section. Any existing exports will show here, along with the last time data was written. To add a new destination, select the “Add” button.
You will then need to select which AI data types to export. For this example, we will only be using Page Views, although multiple types can be selected.
Next, you need to select your storage account. First select the subscription (if different from your AI instance), and then select the storage account and container. You will need to know what data region the account is in. Once selected, save the settings.
Initially, the “Last Export” column will display “Never”, but once AI has collected some data, it will be written out to your storage container, and the “Last Export” column will display when that occurred. Once it has occurred, you should be able to open your storage account using Storage Explorer, and then the container to see the output. In the root of the container selected above, you’ll see a folder that is named with the AI Instance name, and the AI instance GUID.
Opening that folder, you’ll find a folder for each data type selected above (if there has been data for them). Each data types will be further organized into folders names for the day, and the hour. Multiple files will be contained withing with the .blob extension. These are multiline json files and can be downloaded and opened with a simple text editor.
The next step is to raise an event whenever new content is added to this storage container.
Create an Event Grid subscription in the storage account
Prior to this step, ensure that you have created, or have available an Event namespace, and an Event hub. You will connect to this hub in this step.
From the Azure portal, open the storage account and then select the “Events” node. Then click the “Event Subscription” button at the top.
On the following screen, you’ll need to provide a name and schema for the subscription. The name can be whatever you wish, and the schema should be “Event Grid Schema”. In the Topic Details section, you will provide a topic name which will pertain to all subscriptions for this storage account. In the “Event Types” section, you select the types of actions that will fire an event. For our purposes, all we want is “Blob Created”. With this selection, the event will fire every time a new blob is added to the container. Finally, under “Endpoint Details”, you will select “Event Hubs” from the dropdown, and then you click on “Select an endpoint” to select your Event Hub.
Once created an event will fire anytime a blob is created in this storage account. If you wish to restrict this to specific folders or containers, you can select the Filters tab, and create a subject filter to restrict it to specific file types, containers, etc. More information on Event Grid filters can be found here. In our case, we do not need a filter.
When ready, click the “Create” button, and the Event subscription will be created. It can be monitored from the storage account and can also be monitored in the Event hub. As new blobs are added to the storage account, more events will fire.
Create an ADX database and ingestion table
From Azure portal, navigate to your ADX cluster and either select a database or create a new one. Once the database has been created, you need to create at least one table to store the data. Ultimately, Kusto will ingest data from the blobs added above whenever they are added, and you need to do some mapping to get that to work properly. For debugging purposes, I find it useful to create an intermediate ADX table to receive data from the blobs, and them transform the data afterward.
In this case, the intermediate table will have a single column, Body that will contain the entirety of each ingested record. To create this table, run the following KQL query on your database:
.create-merge table Ingestion (Body: dynamic)
The dynamic data type in ADX can work with JSON content, and each record will go there. For this to work, you also need to add a mapping to the table. The mapping can be very complex, but in our case, we’re doing a simple load in, so we’re matching the entire JSON record to the Body column in our database. To add this mapping, run the following KQL command:
At this point, we are ready for an ingestion rule.
Create an Ingestion rule in ADX
From the Azure portal, open your ADX cluster, and select the “Databases” node in the “Data” section, then click on your database.
The setting that we need is “Data ingestion” in the resulting window. Selecting that takes you to the ingestion rules. Now you want to create a new connection by selecting the “Add data connection” button.
The first selection is the data connection type. The options are Event Hub, Blob storage, or Iot Hub. We need to select Blob storage. Both it, and Event hub will connect to an Event hub, but the difference it that using “Blob storage”, the contents of the blobs will be delivered, and selecting “Event Hub” will only deliver the metadata of the blob being added.
Once the type is selected, you give it a name, and choose the event grid to connect to (the one that you created above) and the event type. Next, you select “Manual” in the Resources creation section. Selecting “Automatic” will create a new Event Hub Namespace, Hub, and Event grid and you won’t have any control of the naming of these resources. Selecting Manual allows you to keep it under control. Select your event grid here.
Next, select the “Ingest properties” tab, and provide the table and mapping that you created above (which in our case was “RawInput”). Also, you need to select “MULTILINE JSON” as the data format.
Once these values are complete, press the Create button and the automatic ingestion will commence. Adding a new blob to the storage account will fire an event, which will cause ADX to load the contents of the blob into the Body column of the Ingestion table. This process can take up to 5 minutes after the event fires.
Create relevant query tables and update policies in the ADX database
Once ingestion happens, your “Ingestion” table should have records in it. Running a simple query in ADX using the table name should show several records with data in the “Body” column. Opening a record will show the full structure of the JSON contained within. If records with different schema are being imported, a query filter can be employed to limit the results to only those records.
For example, the pageViews table in AI will always contain a JSON none named “view”. The query below will return only pageView data from the ingestion table:
This ingestion table can be queried in this matter moving forward, but for performance and usability reasons, it is better to “materialize” the views of this table. To do this, we create another table, and set an update policy on it that will add relevant rows to it whenever the ingestion table is updated.
The first step is to create the table. In our case, we want to replicate the schema of the pageViews table in Application Insights. This is because we want to be able to reuse any queries that we have already built against AI. All that should be necessary is to change the source of those queries to the ADV cluster/database. To create a table with the same schema of the AI pageViews table (mostly), the following command can be executed in ADX:
Once the table is created, we need to create a query against the Ingestion table that will return pageViews records in the schema of the new table. Without getting deep into the nuances of the KQL language, a query that will do this is below:
The “where isnull(Body.view) == false” statement above uniquely identifies records from the pageViews table. This is useful if multiple tables use the same Ingestion table.
Next, we need to create a function to encapsulate this query. When we add an update policy to the pageViews table, this function will run this query on any new records in the Ingestion table as they arrive. The output will be added to the pageViews table. To create the function, it’s a simple matter of wrapping the query from above in the code below and running the command:
.create-or-alter function pageViews_Expand {
Query to run
}
This creates a new function named pageViews_Expand. Now that the function has been created, we modify the update policy of the pageViews to run it whenever new records are added to the Ingestion table, and its output will be added to the pageViews table. The command to do this can be seen below:
After the next ingestion run, not only will you see records in the Ingestion table, but if there were page views, you should see the results show up in the pageViews table as well.
If you have data already in the Ingestion table that you want to bring in to the pageViews table, whether for testing or for historical purposes, you can use the .append command to load rows into the table from the function:
.append pageViews <| pageViews_Expand
Finally, if you don’t want to maintain data in the Ingestion table for very long, or not at all, you can set the retention policy on it. Data will be automatically purged from it at the end of the time limit. Setting the value to zero will purge the data immediately, and in that case, the Ingestion table simply becomes a conduit. To set the retention policy on the Ingestion table to 0, you can run the following command:
There are several steps involved, but once everything is wired up, data should flow from Application Insights to Azure Data Explorer within a few minutes. This example only worked with the pageViews table, but any of the AI tables can be used although of course their schemas will be different.
The combination of Power BI and Application Insights (AI)/Log Analytics (LA) is a powerful one. These tools provide a quick, convenient, and relatively cheap way to collect and analyze telemetry on a wide variety of applications. One drawback of AI/LA is that any data query will return a maximum of 500,000 rows, which can be quite constraining in some cases. This article describes a way to work around this limit.
In this example, we’ll be working with an Application Insights instance that is being populated by the WordPress Application Insights plugin – in fact, it’s the one used on this very blog. There are a couple of ways to connect Power BI Desktop to AI data. The Power Query code is downloadable directly from Application Insights, and you can also use the Azure Data Explorer proxy address as outline in my post on the topic here. This approach will work for both methods, and for our purposes, we’ll be using the generated Power Query code approach.
To begin, access your Application Insights instance, and open the Logs window. If necessary, dismiss the “Queries” window that pops up. Next, form your query using Kusto Query Language (KQL). In our case, we want a simple dump of all rows in the “pageViews” table, so the query is simple – just pageViews.
Once we have the query the way that we want it, we select the Export button, and choose “Export to Power BI (M query). M is the name of the language that Power Query uses. Once chosen, a text file will be downloaded that contains the Power Query that we will need in Power BI Desktop.
At this point, we launch Power BI Desktop, and choose “Get Data”. Since we already have the query that we need, we will choose “Blank Query”.
Next, we name our query “Page Views”, and select the Advanced Editor. This is where we can paste in the query generated by Application Insights above. At this point, we open the file that was downloaded above, copy the contents, and paste them in this window (the top comments can be excluded).
Of note here is the value that will be automatically set for timespan. By default, this will be set to P1D, which means data will be retrieved only for the previous day. In our example above, we have changed it to show data for the past 365 days.
Selecting “Done” will load a preview of our data into Power Query. However, if we want to then load it into the data model, it will do so in a single pull, and we will be subject to the 500,000 row limit. What we need to do is break up our query into multiple queries, and Power Query lets us do this through the use of functions.
The first thing that we’ll need to do is to decide on how to segment the AI data. In our case, it is unlikely that we will have more than 500,000 page views per month, so if we performed one query per month, we should be able to retrieve all of our data. In order to do this, we need to go back to Application Insights, and form up a query that will return a list of year and month for our data. In our case, this query is:
pageViews
| where timestamp > now(-365d)
| summarize by
Year = datetime_part('Year',timestamp),
Month = datetime_part('Month',timestamp)
Note that the number of days in the where clause above should match the number of days in the larger query above. Next, export this query to Power BI, and create another query in Power Query. Leave the name as default for now. Selecting Done should return a list of years and months for your data. These values are all numbers, and Power Query recognizes them as such. However, we need to work with them as text later on , so we change their types to text.
Now we will return to our original query, and modify it so that it only returns data for a single month. Reopen the advanced editor and replace the query “pageViews” with:
pageViews | where datetime_part('Month',timestamp) == 10 and datetime_part('Year',timestamp) == 2020
The values chosen don’t matter, but they should return data, In the end, the edited code should look as follows:
Selecting done, we verify that we have data restricted to the specified month. This is where the fun begins. We are now going to turn this query into a function. To do so, we right click on our pageViews Query, and select “Create Function”
We are then presented with a dialog box that asks if we want to create the function without parameters. We can go ahead and select “Create”. We are then prompted to name the function, and we’ll call it “GetViewsByMonthAndYear”. We now need to edit the function. To do so, with the function selected in the query pane, we select the Advanced Editor once again. We then dismiss the following warning, and then we edit the function in two places. First, we need to define two variables to be passed to the function Month and Year , and then we add them to our query.
In the function declaration we add “Month as text” and “Year as text”. We then replace the explicit month and year that we originally queried for with these new variables, Month and Year. Our function code now appears as below:
Now we are ready to use our function. We select our query that contains the list of years and months, select the “Add Column” tab from the ribbon, and choose “Invoke Custom Function”. We give the new column a name “Views”, select our function from the dropdown, and then we select our column containing years and the column containing months to be passed to the function.
At this point, selecting “OK” will cause the function to be executed for each of the listed months. These are individual queries to AI, not one large one. Each query is still subject to the 500,000 row limit, but provided that no specific month exceeds that limit, all of the data will be returned.
Initially, the data is returned as a single table per day, but selecting the expand icon at the right of the column header allows us to retrieve the row values. It’s also a good idea to turn off the “Use original column name” option.
Selecting OK at this point displays all of the appropriate column values. We can then remove the “Year” and “Month” columns, as well as the original Page Views table that we used to create the function. We also need to set the data types for all of our columns because Power Query is unable to detect them using this approach.
Renaming our combined Query to Views, gives us the following result:
We still have a single table, but there is no longer a 500,00 row limit. At this point, we can load the data into the model and build our report.