Author: John White

Continuous export for Azure Application Insights using Azure Data Explorer (Kusto)

Published July 6, 2022

If you use Azure, chances are that you’ve used Application Insights. Application Insights collects telemetry data for web applications, and allow that telemetry to be queries, analyzed or used to alert for any anomalies. It’s backed by the Kusto engine, which makes it possible to query and aggregate substantial amounts of data very quickly. It also relatively inexpensive. Depending on the nature of your data, however, you may find yourself bumping into one of its limitations. In most cases these limitations can be overcome by “upgrading” to AI’s big brother, Azure Data Explorer (ADX) which is more commonly known as Kusto.

This article outlines a pattern for continuously streaming data from Application Insights data to Kusto.

Application Insight Limitations

Retention – Data in application Insights is retained for 90 days at no charge and can be retained beyond that for a cost of $0.155 per GB per month. While the price is reasonable, there is a hard cap on retention of 2 years. Data older that 2 years is purged, so if longer retention is required, another solution is required.

Backup – At present, there is no way to backup and restore Application Insights data. It is certainly possible to export this data to a variety of media, but there is no way to restore it.

Data limits – Application Insights can struggle when a large amount of data is requested. It is not possible in any circumstance to query more than 500,000 rows or 64 MB of compressed data. It is possible to implement paged queries to work around this limitation, but this can be problematic. Query timeouts are also limited to 100 seconds, and unlike the underlying Kusto engine itself, these limits are absolute.

Scale – Application Insights is a “one size fits all” service. It cannot be scaled either up or down. It is therefore not possible to overcome issues with query performance of service limits by adding power.

Schema – At present, Application Insights collects data into 10 different tables. The schema of these tables is fixed and cannot be changed. It is possible to use custom data in these tables, in fact many have columns of dynamic type for just this purpose. JSON data can be added into these columns and queried by the engine. This makes Application Insights highly flexible.

The downside of this flexibility is performance. Querying custom data requires the engine to parse data at runtime. The engine is incredibly efficient at doing this, but it cannot compare to more structured columns, particularly when querying massive amounts of data. The fixed nature of Application Insights precludes other approaches for improving query performance like materialized views, etc.

Moving to Kusto

If any of these limitations are an issue, you may wish to consider moving your Application Insights data into Azure Data Explorer, otherwise known as “Kusto”. Kusto is the engine behind all of Azure Monitor (which includes Application Insights and Log Analytics), and it employs the same query language.

When you use your own Kusto cluster, you have complete control over your data. A Kusto cluster contains one or more nodes and can be scaled automatically. Specifically, it solves the limitations inherent to Application Insights while maintaining familiarity with the same data types and query language (KQL). It addresses the AI limits in the following ways:

Retention – Kusto has advanced data retention and caching settings that can be set at both the database level and the table level. Retention can be set to unlimited if necessary.

Backup – Kusto can connect to external tables that are connected to Azure storage accounts or to SQL tables. Continuous export can be added to any Kusto tables so that the externalized data is always up to date. Data can be restored from these externalized sources, or by reingesting directly from them. Alternatively, AI data can be simultaneously streamed into Azure storage accounts, and this data can be ingested into Kusto for restoration.

Data limits – The default query limits in Kusto are the same as those found in Application Insights, but here they are soft limits. They can be overridden, and asynchronous operations can be used to circumvent them when necessary. In most cases however, by using data optimization strategies available to Kusto, these limits should be less important.

Scale – Kusto clusters can be as small as 1 node (for development – a single node cluster has no SLA), and as large as 1,000. Each node can be as small as 2 CPUs/14 GB RAM, and as large as 32 CPUs/128 GB RAM. There is no limit to the quantity of data that can be ingested.

Schema – This is where Kusto really shines. Data can be transformed natively at the time of ingestion using update policies. Custom Application Insights data can be extracted from the dynamic columns into more structured tables. This has the benefit of greatly enhancing performance. In addition, materialized views can be created to further enhance query performance, create pre-aggregated query targets, etc. These strategies can greatly improve query performance.

By streaming Application Insights data into Kusto, you can continue to take advantage of the rich data collection capabilities of Application Insights, without being constrained by its storage limitations. In this scenario, AI acts as your telemetry collector, and Kusto your storage engine. The remainder of this article outlines how to do this.

Setting it all up

In our scenario, we are collecting data from 3 Application Insights tables, pageViews, customMetrics, and customEvents. To capture this data, we will use Diagnostic Settings, which transforms the table names to AppPageViews, AppMetics, and AppEvents respectively. The entire process is shown below for reference:

Azure Monitor collects data from application Insights as it arrives through Diagnostic settings. The data is then sent to an Azure Event Hub, as well as to an Azure Data Lake Gen2 (ADLG2) account for long term storage and recoverability. Azure Data Explorer (Kusto) ingests data directly from the Event Hub in near real time. Event Hub data is transformed and delivered to three staging tables through update policies and functions. In addition, External tables are connected to three containers in the storage account for diagnostic purposes or re-ingestion on demand.

Create an Event Hub and (optionally) a storage account

Data will be streamed continuously to an Event Hub and to and Azure Data Lake Gen 2 (ADLG2) account.

The Application Insights instance, the ADLG2 account, and the Event Hub namespace must all exist within the same Azure region. This is a limitation of the Azure Monitor service. The Kusto cluster can exist anywhere.

When creating the storage account, be sure to select the option for “Enable hierarchical namespace” from the Advanced page. This is what distinguishes an ordinary storage account from an ADLG2 account.

Configure Application Insights diagnostic settings

Many Azure services can stream usage data through their “Diagnostic Settings” option. In the case of Application Insights, all the collected data can be streamed. It should be noted however that the table names do not match those within the Application Insights logs, they are the same as those found in the Log Analytics workspace that backs the AI instance. In the example below, we are collecting data from the AppEvents, AppMetrics, and AppPageViews tables (customEvents, customMetrics, and pageViews in AI).

In this case we are sending data to an Event Hub and to an ADLG2 storage account. Each table will store its data in a separate container, and it is not possible to change that container.

Create the Kusto ingestion table and set up ingestion

The data stream to the Event Hub contains records from three different tables with different schemas. To accommodate this, we will create a temporary holding table, set up a policy to automatically distribute data in this table to three tables with different schemas, and then a retention policy to purge the holding table after distribution.

The holding table to receive Event Hub data will be named Pages_EventHub, and can be created from a Kusto query window using the following command:

.create table Pages_EventHub (records: dynamic)

This will create a table with one column named records which is of the dynamic data type. Event Hub data will land here.

Next, we create an ingestion mapping to match the incoming Event Hub JSON data to the holding table. This can be done from a query window using the following command:

.create table Pages_EventHub ingestion json mapping "RawRecordsMapping"
'['
' {"column": "records", "Properties":{"Path":"$.records"}}'
']'

When we define an ingestion, we will refer to this mapping by the name RawRecordsMapping. This mapping is a property of the holding table, and it will return the records path from the incoming JSON data and place it in the records column of the Pages_EventHub table.

Next, we define the data connection for the ingestion. To define a connection, navigate to your Kusto cluster, and open the Databases node, and then open the database that will receive the data. Finally, select Data connections, then Add data connection, and select Event Hub.

Give the connection a name, select the Event Hub namespace and hub, the default consumer group, and no compression. Use the table name and mapping created above and select JSON as the data format. When finished save the data connection.

If data is flowing into the Event Hub, it should begin to appear in the ingestion table within a few minutes, a typical time lag is 5 minutes. Once confirmed, it’s time to create the destination tables and update policies.

Create destination tables and update policies

We want to take data from the Event Hub and “reconstitute” it in Kusto. To do that, we will closely copy the data structure from the Log Analytics workspace that is connected to our Application Insights instance, leaving out some unnecessary system data. In our case, we will create three tables using the following Kusto commands (one at a time) in the Query window:

.create table pages_Staging_PageViews (TenantId: string, TimeGenerated: datetime, Id: string, Name: string, Url: string, DurationMs: real, PerformanceBucket: string, Properties: dynamic, Measurements: dynamic, 
OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Events (TenantId: string, TimeGenerated: datetime, Name: string, Properties: dynamic, Measurements: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Metrics (TenantId: string, TimeGenerated: datetime, Name: string, ItemCount: int, Sum: real, Min: real, Max: real, Properties: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string)

Next, we construct queries that will fit the schemas for these three and filter the result for the appropriate type. These queries will then be used to create Kusto functions for each of the three tables. The commands to create the three functions, which contain our queries can be found below.

.create-or-alter function fn_Pages_PageViewsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppPageViews"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Id = tostring(records.Id),
Name = tostring(records.Name),
Url = tostring(records.Url),
DurationMs = toreal(records.DurationMs),
PerformanceBucket = tostring(records.PerformanceBucket),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_EventsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppEvents"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_MetricsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppMetrics"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
ItemCount = toint(records.ItemCount),
Sum = toreal(records.sum),
Min = toreal(records.Min),
Max = toreal(records.Max),
Properties = todynamic(records.Properties),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

With the three functions in place, we need to create an update policy that will use the results of a function to load a table whenever data is added to the holding table. For our pages_Staging_PageViews table, we run the following command to create the policy.

.alter table [@"pages_Staging_PageViews"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_PageViewsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

This command adds an update policy to the pages_Staging_PageViews table. This update policy will be invoked whenever data is added to the Pages_EventHub table. It will execute the fn_Pages_PageViewsIngest function created above against this new data and load the result into the pages_Staging_PageViews table. The function itself filters out all data that did not originate from the original AppPageViews table and transform it to match the destination schema.

The commands for creating the policies on the other two tables are below:

.alter table [@"pages_Staging_Events"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_EventsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

.alter table [@"pages_Staging_Metrics"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_MetricsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

The last step is to add a retention policy to the Pages_EventHub table that will remove data automatically after it has been processed. This is an optional step and can be done at any point to conserve resources. A retention policy will remove ingested data after a defined time. Setting the period to 0 will delete the data shortly after all update policies have completed.

In this case the policy is added to the holding table by running the following command:

.alter-merge table Pages_EventHub policy retention softdelete = 0d recoverability = disabled

At this point, data should be flowing into the three destination tables shortly after it arrives through the event hub.

Connect external tables to the ADLG2 data (optional)

Earlier, we selected both an event hub and a storage account to receive data from Application Insights. The reason for the storage container is to provide an authoritative source of persisted data. Data in Application Insights expires by default after 90 days and cannot be retained any longer than 2 years. Data in Kusto can be persisted for an unlimited period, but it too can be configured to expire after a period of time. Storing the data in a storage account ensures permanency, and provides a location to re-ingest from should any disaster befall the Kusto data.

Kusto can be connected to external data sources as an external table. These sources can be a storage account, or SQL databases. While not strictly required, it is a good idea to create external tables connected to this data so that this data can be queried, and re-ingested with relative ease whenever necessary.

Connecting Kusto to ADLG2 storage is a two-step process. First you create a shared access signature, and then you create an external table in Kusto using that signature. A shared access signature can be created for the entire account, a container, or even a folder. Since we will be connecting to three different containers, we will create the signature at the account level. To do this navigate to the storage account in Azure, and the select Shared access signature in the Security + networking section. Select Blob and File from Allowed services, and then Container and Object from Allowed resource types. Set an expiry date applicable to your situation. The external table will stop working once your expiry date is exceeded.

When ready, click the Generate SAS and connection string button, and the screen will appear as follows:

Make note of the Blob service SAS URL – it will be needed in the next step. It’s also a good idea to record these settings, as it’s not possible to go back and retrieve them later.

Capturing the three tables above to ADLG2 creates the following three containers in the storage account:

insights-logs-appevents
insights-logs-appmetrics
insights-logs-apppageviews

When creating the external tables below, the Blob service SAS URL values need to be modified to include these containers by adding them before the token in the URL. Therefore:

https://mystorageaccount.blob.core.windows.net/?sv=2021-06-08&……. becomes

https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?sv=2021-06-08&……. and so on.

To create the external table in Kusto, navigate to a Kusto query window that is connected to the appropriate database. The following commands can be used to create the table, substituting the sample url with the ones from above:

.create-or-alter external table Pages_AppEvents_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,ItemCount:int) 
kind=blob 
dataformat=json
( 
   h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?******' 
)

.create-or-alter external table Pages_AppMetrics_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,Sum:int,Min:int,Max:int,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appmetrics?******'
)

.create-or-alter external table Pages_AppPageViews_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Measurements:dynamic,Id:guid,Name:string,Url:string,DurationMs:int,PerformanceBucket:string,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-apppageviews?******'
)

Once created, the external tables can be queried like any other table. They can be used for data validation or reingestion as appropriate.

In Conclusion

Once the data is flowing, subsequent tables and update policies can be set up to further transform the data, and materialized views can be created to further optimize query performance. Moving Application Insights data into Kusto gives you the best of both worlds, the telemetry collection capabilities of Application Insights, and the big data power of Kusto. This approach is not limited to Application Insights either – it can be used with and Azure services that support Azure Monitor with Diagnostic Settings.

Be aware however that this migration is a one way street. Once the data is in Kusto, it can be retained for as long as you like, but it can’t be put back into the source.

Connect Power BI to Azure Monitor data using Direct Query with Azure Data Explorer

Published March 15, 2022

Man – that’s a boring title. But it’s accurate.

A few months ago, I posted an article outlining how to connect Power BI to Azure Application Insights and Azure Log Analytics (jointly referred to as Azure Monitor) with Direct Query. This article describes an approach that allows you to use a native Kusto connector to connect to the Azure Monitor instance as if it were an ADX cluster. This allows for Direct Query to be used, among other things. The option connecting Power BI available through the Azure Monitor UI uses an html connector to query the respective APIs, and that connector doesn’t support Direct Query.

The problem with using this connector is that it’s a bit of a hack. At the time it was written, you needed to use the old Power BI driver for Kusto to make it work, and that approach isn’t simple. Over time, it stopped working altogether for Application Insights. The ADX connector has since been updated to support connection to Azure Log Analytics (but not Application Insights) and is therefore still valid.

There is however another way to achieve this by using your own ADX cluster. ADX clusters allow for “cross-cluster queries” that permit tables in a database in one cluster to be joined or unioned with tables in a completely different cluster. The same proxy addresses mentioned above can be used in one of these cross-cluster queries, and in this way, be just use the ADX cluster as an intermediary.

Everything that you need to know about this approach can be found in the support article “Query data in Azure Monitor using Azure Data Explorer”

To create a Power BI report that queries Azure Monitor data using Direct Query, first create a new report, and connect to data using the “Azure Data Explorer (Kusto) connector”. Enter the address of the cluster, and the name of a database within that cluster. The database itself doesn’t matter; it simply provides a scope for the query. Finally, you need to specify the query, and this is where the cross-cluster query comes into the picture. The query takes the following form:

cluster(‘ProxyURL‘).database(‘ResourceName‘).TableName

The Proxy URLs differ between Log Analytics and Application Insights. The two take the following forms:

Log Analytics:

https://ade.loganalytics.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.operationalinsights/workspaces/<workspace-name>

Application Insights:

https://ade.applicationinsights.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.insights/components/<ai-app-name>

The cross-cluster query for the table named “pageViews” in an Application Insights instance named “WhitePagesLogs” in a Resource group named “MyResourceGroup” in the subscription “71a90792-474e-5e49-ab4e-da54baa26d5d” is therefore”

cluster('https://ade.applicationinsights.io/subscriptions/71a90792-474e-5e49-ab4e-da54baa26d5d/resourcegroups/MyResourceGroup/providers/microsoft.insights/components/WhitePagesLogs').database('WhitePagesLogs').pageViews

It is worth explicitly noting that the resource name appears twice in the query – once in the cluster address, and as the database name.

When ready, the Get data dialog box should appear as follows:

If you want to use Direct Query, don’t forget to open the Advanced Options section, and select it here.

At this point, the report can be built, and it will behave as if it was a normal ADX cluster. You can of course build more complex queries, etc, but you cannot build things like functions, or materialized vies, since you do not have administrative access to the engine behind Azure Monitor.

Compared to using the Power BI ADX connector directly, this approach has the advantage of being explicitly supported, and it also works with bot Application Insights, and Log Analytics. On the downside, there is a cost to running your own ADX cluster, although it is minimal. This cluster is simply acting as a gateway in this case, and therefore, a bare minimum of resources will suffice.

2 Comments

First impressions of the new “Visualize in Power BI” feature for SharePoint Lists

First impressions of the new “Visualize in Power BI” feature for SharePoint Lists

Published June 1, 2021

SharePoint Lists have been a data source for Power BI ever since its introduction. Power BI Desktop can connect to those lists to build reports, and once published in the Power BI service, those same reports can be surfaced in SharePoint through the Power BI web part, or in Teams through tabs or the Power BI app. In fact, I wrote a series of articles a few years ago on using Power BI to report on complex field types.

While those articles are still valid, the need for such articles means that report creation has required a high level of sophistication, and the various tools required puts report creation out of reach for a whole class of users. With this in mind, the Power BI team has recently rolled out its first iteration of a report creation feature for Lists. Given my SharePoint background, I had to dig in. What follows are my first impressions.

Using Quick Visualize

Currently this feature is part of the Microsoft 365 Targeted Release experience but will roll out more broadly over the coming months.

At the top of each list is an “Integrate” button. This button is available in both Microsoft Lists, and through the standard SharePoint interface. Selecting it allows you to create a Power App or Power Automate Flow from the list, and once the feature is rolled out, you can create a Power BI report from this button. To begin with, I started with a relatively simple list to collect donation pledges:

A specialized Power BI window then opens with a pre-configured report. The service makes a guess at what you’ll want to visualize. Measures are summarized in cards and sliced by various dimensions horizontally. At the bottom is a table laying out the detail. For my list above, the default report appeared as follows:

In my list, monetary value was the only numeric field, so a count was also added. Along the right of the report is the brand new Summarize pane, which is how you select which measures and dimensions are included in the report. In my case, I’d rather slice by year than quarter and Count of Availability doesn’t mean much so I want to deselect it and add Count of rows. Just by using the Summarize pane, the report will appear as follows:

There is a limit of 4 dimensions that can be selected for slicing the measures.

If you want to customize the report further, you can select Edit, and you will be presented with a more classic report editing experience in Power BI. When you do this, you will lose access to the Summarize pane, but you will be able to make a lot of fine-grained changes to the report.

When complete, you can use the “Publish to the list” button to save it and share it.

What’s Different

Reports created in this manner are stored along with the list that created them in a specialized workspace. They cannot be viewed in the Power BI service at all. In addition, it’s not possible to download the report as a .pbix file to publish it elsewhere – it lives in the list. To access it, simply press the Integrate button again, and the reports that have been published from this list will all be available.

Like most other Power BI reports, these reports are driven from a data set. In this case, the data set is created behind the scenes automatically, but like most datasets, it contains a cache of the list data. This means that the report will not always be up to date with respect to the list data, and therefore must be periodically refreshed.

With these types of reports, the refresh happens every 3 hours automatically. This allows the data to be refreshed 8 times per day, which is the maximum allowed for the Power BI Pro license.

You don’t need a Pro license to use this feature, but you will if you want to publish your reports back to the list. The key here is that any sharing features in Power BI require a Pro license, or a dedicated capacity (Premium) backed workspace. At the moment, there is no way to move these reports to a dedicated capacity.

Complex SharePoint Fields

Lists are infamous for their specialized column types. Lookup columns, rich text fields, managed metadata can be tricky to work with from an integration standpoint. Happily, Power Query makes working with them fairly simple as I describe in the article series that I mentioned above (the latest one on location columns with links to the others is here – Using Power BI to Report on Location Columns in SharePoint – The White Pages (unlimitedviz.com)). However, the Quick Visualize feature doesn’t use Power Query. So how does it handle complex SharePoint fields?

Not that well as it turns out. At least not for now.

I decided to turn this loose on my “Properties” list that I use for demos. It contains one of every SharePoint field type. The initial result can be seen below.

To be fair, there is only one numeric column in this list, and it is composed entirely of complex field types. However, it does mean that for the moment at least, if you’re working with complex field types, this feature will only be of limited use to you. Even in the first example above you can see that the donor value was always “[List]” because donor was a person field.

This is a list of all the complex SharePoint field types, and how they are expressed when using Quick Visualize:

Type	Result
Person	[List]
Multi-Value Choice	[List]
Choice	Choice fields work – the value is shown
Managed Metadata	[Record]
Rich Text	The raw HTML value is shown with all markup
Location	[Record]
Picture or Hyperlink	The destination URL is show, not linkable
Lookup	[List]
Boolean	Boolean fields show True or False
Currency	Currency fields work

Impressions

It’s important to keep the design of this feature in mind. This is for users that do not have a lot of experience with Power BI that need a way to quickly visualize their list-based data. This is not for report designers that intend to publish reports for a wide audience – Power BI Desktop is there to serve that purpose.

This is a great move in the right direction. The lack of support for complex SharePoint field types mean that its usage will be limited to simple lists, but there are a lot of those out there, and I have every confidence that support for them will be coming. The lack of control of the refresh schedule may also prove limiting in several circumstances.

Overall, “Visualize in Power BI a great step in the right direction, and I suspect that we’ll see a lot more along this line in the not too distant future.

Using Power BI with Microsoft Lists

Published April 28, 2021

Microsoft Lists debuted in 2020 and they are a (yet another) great way to organize list-based data. Now, when someone says data, I think Power BI. Obviously, we’ll want to report on this data, but how do we do that? There is no Power BI connector for Microsoft Lists. The answer is quite simple, if not completely obvious. You need to use the SharePoint Online List connector in Power BI.

Microsoft Lists are the same thing as SharePoint lists. In fact, they ARE SharePoint lists. The Microsoft Lists service is just a new user interface for interacting with them. You can use Lists all you want and never see SharePoint at all, unless you need to change list settings, but I digress. Given this fact, as far as Power BI is concerned, everything that applies to SharePoint lists applies to Microsoft Lists lists (the grammar here gets awfully awkward).

For reference, I wrote a series of articles some time ago about the idiosyncrasies of working with SharePoint data in Power BI, and these articles are still valid today (2021). The most recent of these articles can be found here and includes links to the others.

There is one thing that is worth mentioning about Microsoft Lists. When a new list is created using the Lists interface, the user can save it to any of their SharePoint sites, but another option is to same it to “My lists”.

When you use the SharePoint Online list connector in SharePoint, it prompts you to enter the URL for the SharePoint site that contains the list that you want to report on. That is straightforward when your list is stored in a SharePoint site, but what if your list is stored in “My lists”? Where are “My lists” stored?

They are stored in a “personal” SharePoint site. We SharePoint old timers would know it as the MySite, and while usage of MySite has been de-emphasized in Microsoft 365, it is very much still there. Each user has one. In fact, this is where the “personal” OneDrive for Business content is stored – in the Documents library of the very same MySite. By storing personal lists in the MySite, Microsoft Lists is just following the same pattern used by OneDrive for Business, which makes perfect sense.

Given this, what URL should you use in Power BI to connect to your lists stored in “My Lists”? You’ll find it in the Microsoft Lists web interface in the URL bar. It’s that portion of the URL up to “/Lists/.

In most cases it will take the following form:

https://TenantName-my.sharepoint.com/personal/LoginID/Lists/ListName

Where:

TenantName = the name of your Microsoft 365 tenant, i.e. Contoso
LoginID = the email address used to login to Microsoft 365 with the “@” and the “.” replaced with underscores i.e. jpw_contoso_com
ListName – the name of you list

Once you enter in this URL, you’ll have access to any of the lists stored in “My lists”. At this point, your personal lists will behave like any list in any other SharePoint site.

It’s time to stop using Power Pivot

Published March 8, 2021

Excel is an excellent tool for analyzing data. An analyst can easily connect to and import data, perform analyses, and achieve results quickly. Export to Excel is still one of the most used features of any Business Intelligence tool on the market. The demand for “self-service BI” resulted in a lot of imported data being stored in overly large Excel files. This posed several problems. IT administrators had to deal with storage requirements. Analysts were restricted by the amount of data they could work with, and the proliferation of these “spreadmarts” storing potentially sensitive data created a governance nightmare.

A little history

Power Pivot was created to provide a self-service BI tool that solved these problems. Initially released as an add-in for Excel 2010, it contained a new analytical engine that would soon be introduced to SQL Server Analysis Services as well. Its columnar compression meant that millions of rows of data could be analyzed in Excel and would not require massive amounts of space to store. Data in Power Pivot is read-only and refreshable – ensuring integrity. It allowed analysts to set up their own analytical data sets and analyze them using a familiar looking language (DAX), and visual reporting canvas (PowerView) all from within Excel.

The original version of Power BI brought PowerPivot to Office 365 through Excel before Power BI’s relaunch gave it its own consumption interface (the service) and design client (Power BI Desktop). Both the PowerPivot engine, and Power Query were incorporated into the service and Power BI Desktop, while the Silverlight based Power View was replaced with a more web friendly reporting canvas.

Excel support

Throughout all these changes, Excel has continued to be well supported in the Power BI service. Analyze in Excel allows an analyst to connect to a deployed Power BI dataset (built with Power BI Desktop) and analyze it using pivot tables, charts, etc. Recent “connect to dataset” features have made this even simpler. Organizational Data Types allow Excel data to be decorated with related data in Power BI.

Excel workbooks containing Power Pivot models have always been supported by the service. These models can even be refreshed on a regular basis. If the source data resides on premises, it can even be refreshed through the on-premises data gateway. This all because the data engine in Power BI is essentially Power Pivot.

It’s that word “essentially” that causes a problem.

Datasets that are created and stored within Excel workbooks are functional but can only be accessed by that workbook. Contrast this with a dataset created by Power BI Desktop, which can be accessed by other interactive (pbix) reports, paginated reports, and as mentioned above, by Excel itself. The XMLA endpoint also allows these reports to be accessed by a myriad of third part products. None of this is true for datasets created and stored in Excel.

So why would anyone continue to create models in Excel. The reason has been until now that although Excel can connect to Power BI datasets to perform analysis, those connected workbooks would not be updated when the source dataset changes. This meant that those analysts that really care about Excel needed to work with the Excel created models. This changed recently with an announcement at Microsoft Ignite Spring 2021. In the session Drive a data Culture with Power BI: Vision, Strategy and Roadmap it was announced that very soon, Excel files connected to Power BI datasets will be automatically updated. This removes the last technical reason to continue to use Power Pivot in Excel.

Tooling

Building a dataset with Power BI Desktop is fundamentally the same as building one with Excel. The two core languages and engines (M with Power Query, and DAX with Power Pivot) are equivalent between the two products. The only difference is that the engine versions found in Excel tend to lag those found in Power BI Desktop and the Power BI service itself. I’d argue that the interfaces for performing these transforms, and building the models are far superior in Power BI Desktop. not to mention the third-party add-in capability.

In this “new world” of Excel data analysis, Datasets will be created by using Power BI Desktop, deployed to the service, and then Excel will connect to them to provide deep analysis. These workbooks can then be published to the Power BI service alongside and other interactive or paginated reports for use by analysts. With this new capability, Excel truly resumes its place as a full-fledged first-class citizen in the Power BI space.

What to use when

With this change, the decision of what tool to use can be based completely on its suitability to task, and not on technical limitations. There are distinct types of reports, and different sorts of users. The choice of what to use when can now be based completely on these factors. The common element among them all is the dataset.

With respect to report usage, typical usage can be seen below.

Tool	Used by	Purpose
Power BI Service	Report consumers	Consuming all types of reports: interactive, paginated and Excel
Excel Online	Report consumers	Consuming Excel reports from SharePoint, Teams, or the Power BI service
Power BI Desktop	Model builders Interactive report designers	Building Power BI dataset Building interactive reports
Power BI Report Builder	Paginated report designers	Building paginated reports
Excel	Analysts	Building Excel reports Analyzing Power BI datasets

Making the move

Moving away from Power Pivot won’t require any new services or infrastructure, and existing reports and models don’t need to be converted. They will continue to work and be supported for the foreseeable future. Microsoft has neither said not indicated that Power Pivot in Excel is going anywhere. However, by building your new datasets in Power BI Desktop, you will be better positioned moving forward.

If you do want to migrate some or all your existing Excel based Power Pivot datasets, it’s a simple matter of importing the Excel file into Power BI Desktop. This is completely different than connecting to an Excel file as a data source. From the File menu in Power BI Desktop, select Import, then select Power Query, Power Pivot, Power View. You will then select the Excel file that contains your dataset.

Power BI will then import all your Power Query queries, your Power Pivot dataset, and if you have any it will convert PowerView reports to the Power BI report types. The new report can then replace your existing Excel file. Once deployed to the Power BI service, other Excel files can connect to it if so desired.

Building your datasets with Power BI Desktop allows you to take advantage of a rich set of services, across a broad range of products, including Excel. Building them in Excel locks you into an Excel only scenario. If you already use Power BI, then there’s really no reason to continue to build Power Pivot datasets in Excel.

6 Comments