Skip to content

Tag: Azure Data Explorer

Continuous export for Azure Application Insights using Azure Data Explorer (Kusto)

If you use Azure, chances are that you’ve used Application Insights. Application Insights collects telemetry data for web applications, and allow that telemetry to be queries, analyzed or used to alert for any anomalies. It’s backed by the Kusto engine, which makes it possible to query and aggregate substantial amounts of data very quickly. It also relatively inexpensive. Depending on the nature of your data, however, you may find yourself bumping into one of its limitations. In most cases these limitations can be overcome by “upgrading” to AI’s big brother, Azure Data Explorer (ADX) which is more commonly known as Kusto.

This article outlines a pattern for continuously streaming data from Application Insights data to Kusto.

Application Insight Limitations

Retention – Data in application Insights is retained for 90 days at no charge and can be retained beyond that for a cost of $0.155 per GB per month. While the price is reasonable, there is a hard cap on retention of 2 years. Data older that 2 years is purged, so if longer retention is required, another solution is required.

Backup – At present, there is no way to backup and restore Application Insights data. It is certainly possible to export this data to a variety of media, but there is no way to restore it.

Data limits – Application Insights can struggle when a large amount of data is requested. It is not possible in any circumstance to query more than 500,000 rows or 64 MB of compressed data. It is possible to implement paged queries to work around this limitation, but this can be problematic. Query timeouts are also limited to 100 seconds, and unlike the underlying Kusto engine itself, these limits are absolute.

Scale – Application Insights is a “one size fits all” service. It cannot be scaled either up or down. It is therefore not possible to overcome issues with query performance of service limits by adding power.

Schema – At present, Application Insights collects data into 10 different tables. The schema of these tables is fixed and cannot be changed. It is possible to use custom data in these tables, in fact many have columns of dynamic type for just this purpose. JSON data can be added into these columns and queried by the engine. This makes Application Insights highly flexible.

The downside of this flexibility is performance. Querying custom data requires the engine to parse data at runtime. The engine is incredibly efficient at doing this, but it cannot compare to more structured columns, particularly when querying massive amounts of data. The fixed nature of Application Insights precludes other approaches for improving query performance like materialized views, etc.

Moving to Kusto

If any of these limitations are an issue, you may wish to consider moving your Application Insights data into Azure Data Explorer, otherwise known as “Kusto”. Kusto is the engine behind all of Azure Monitor (which includes Application Insights and Log Analytics), and it employs the same query language.

When you use your own Kusto cluster, you have complete control over your data. A Kusto cluster contains one or more nodes and can be scaled automatically. Specifically, it solves the limitations inherent to Application Insights while maintaining familiarity with the same data types and query language (KQL). It addresses the AI limits in the following ways:

Retention – Kusto has advanced data retention and caching settings that can be set at both the database level and the table level. Retention can be set to unlimited if necessary.

Backup – Kusto can connect to external tables that are connected to Azure storage accounts or to SQL tables. Continuous export can be added to any Kusto tables so that the externalized data is always up to date. Data can be restored from these externalized sources, or by reingesting directly from them. Alternatively, AI data can be simultaneously streamed into Azure storage accounts, and this data can be ingested into Kusto for restoration.

Data limits – The default query limits in Kusto are the same as those found in Application Insights, but here they are soft limits. They can be overridden, and asynchronous operations can be used to circumvent them when necessary. In most cases however, by using data optimization strategies available to Kusto, these limits should be less important.

Scale – Kusto clusters can be as small as 1 node (for development – a single node cluster has no SLA), and as large as 1,000. Each node can be as small as 2 CPUs/14 GB RAM, and as large as 32 CPUs/128 GB RAM. There is no limit to the quantity of data that can be ingested.

Schema – This is where Kusto really shines. Data can be transformed natively at the time of ingestion using update policies. Custom Application Insights data can be extracted from the dynamic columns into more structured tables. This has the benefit of greatly enhancing performance. In addition, materialized views can be created to further enhance query performance, create pre-aggregated query targets, etc. These strategies can greatly improve query performance.

By streaming Application Insights data into Kusto, you can continue to take advantage of the rich data collection capabilities of Application Insights, without being constrained by its storage limitations. In this scenario, AI acts as your telemetry collector, and Kusto your storage engine. The remainder of this article outlines how to do this.

Setting it all up

In our scenario, we are collecting data from 3 Application Insights tables, pageViews, customMetrics, and customEvents. To capture this data, we will use Diagnostic Settings, which transforms the table names to AppPageViews, AppMetics, and AppEvents respectively. The entire process is shown below for reference:

Azure Monitor collects data from application Insights as it arrives through Diagnostic settings. The data is then sent to an Azure Event Hub, as well as to an Azure Data Lake Gen2 (ADLG2) account for long term storage and recoverability. Azure Data Explorer (Kusto) ingests data directly from the Event Hub in near real time. Event Hub data is transformed and delivered to three staging tables through update policies and functions. In addition, External tables are connected to three containers in the storage account for diagnostic purposes or re-ingestion on demand.

Create an Event Hub and (optionally) a storage account

Data will be streamed continuously to an Event Hub and to and Azure Data Lake Gen 2 (ADLG2) account.

The Application Insights instance, the ADLG2 account, and the Event Hub namespace must all exist within the same Azure region. This is a limitation of the Azure Monitor service. The Kusto cluster can exist anywhere.

When creating the storage account, be sure to select the option for “Enable hierarchical namespace” from the Advanced page. This is what distinguishes an ordinary storage account from an ADLG2 account.

Configure Application Insights diagnostic settings

Many Azure services can stream usage data through their “Diagnostic Settings” option. In the case of Application Insights, all the collected data can be streamed. It should be noted however that the table names do not match those within the Application Insights logs, they are the same as those found in the Log Analytics workspace that backs the AI instance. In the example below, we are collecting data from the AppEvents, AppMetrics, and AppPageViews tables (customEvents, customMetrics, and pageViews in AI).

In this case we are sending data to an Event Hub and to an ADLG2 storage account. Each table will store its data in a separate container, and it is not possible to change that container.

Create the Kusto ingestion table and set up ingestion

The data stream to the Event Hub contains records from three different tables with different schemas. To accommodate this, we will create a temporary holding table, set up a policy to automatically distribute data in this table to three tables with different schemas, and then a retention policy to purge the holding table after distribution.

The holding table to receive Event Hub data will be named Pages_EventHub, and can be created from a Kusto query window using the following command:

.create table Pages_EventHub (records: dynamic)

This will create a table with one column named records which is of the dynamic data type. Event Hub data will land here.

Next, we create an ingestion mapping to match the incoming Event Hub JSON data to the holding table. This can be done from a query window using the following command:

.create table Pages_EventHub ingestion json mapping "RawRecordsMapping"
'['
' {"column": "records", "Properties":{"Path":"$.records"}}'
']'

When we define an ingestion, we will refer to this mapping by the name RawRecordsMapping. This mapping is a property of the holding table, and it will return the records path from the incoming JSON data and place it in the records column of the Pages_EventHub table.

Next, we define the data connection for the ingestion. To define a connection, navigate to your Kusto cluster, and open the Databases node, and then open the database that will receive the data. Finally, select Data connections, then Add data connection, and select Event Hub.

Give the connection a name, select the Event Hub namespace and hub, the default consumer group, and no compression. Use the table name and mapping created above and select JSON as the data format. When finished save the data connection.

If data is flowing into the Event Hub, it should begin to appear in the ingestion table within a few minutes, a typical time lag is 5 minutes. Once confirmed, it’s time to create the destination tables and update policies.

Create destination tables and update policies

We want to take data from the Event Hub and “reconstitute” it in Kusto. To do that, we will closely copy the data structure from the Log Analytics workspace that is connected to our Application Insights instance, leaving out some unnecessary system data. In our case, we will create three tables using the following Kusto commands (one at a time) in the Query window:

.create table pages_Staging_PageViews (TenantId: string, TimeGenerated: datetime, Id: string, Name: string, Url: string, DurationMs: real, PerformanceBucket: string, Properties: dynamic, Measurements: dynamic, 
OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Events (TenantId: string, TimeGenerated: datetime, Name: string, Properties: dynamic, Measurements: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string) 

.create table pages_Staging_Metrics (TenantId: string, TimeGenerated: datetime, Name: string, ItemCount: int, Sum: real, Min: real, Max: real, Properties: dynamic, OperationName: string, OperationId: string, ParentId: string, SyntheticSource: string, SessionId: string, UserId: string, UserAuthenticatedId: string, UserAccountId: string, AppVersion: string, AppRoleName: string, AppRoleInstance: string, ClientType: string, ClientModel: string, ClientOS: string, ClientIP: string, ClientCity: string, ClientStateOrProvince: string, ClientCountryOrRegion: string, ClientBrowser: string, ResourceGUID: string)

Next, we construct queries that will fit the schemas for these three and filter the result for the appropriate type. These queries will then be used to create Kusto functions for each of the three tables. The commands to create the three functions, which contain our queries can be found below.

.create-or-alter function fn_Pages_PageViewsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppPageViews"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Id = tostring(records.Id),
Name = tostring(records.Name),
Url = tostring(records.Url),
DurationMs = toreal(records.DurationMs),
PerformanceBucket = tostring(records.PerformanceBucket),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_EventsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppEvents"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
Properties = todynamic(records.Properties),
Measurements = todynamic(records.Measurements),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

.create-or-alter function fn_Pages_MetricsIngest {
Pages_EventHub
| mv-expand records
| where records.Type == "AppMetrics"
| project 
TenantId = tostring(records.Properties.TenantId),
TimeGenerated = todatetime(records.['time']),
Name = tostring(records.Name),
ItemCount = toint(records.ItemCount),
Sum = toreal(records.sum),
Min = toreal(records.Min),
Max = toreal(records.Max),
Properties = todynamic(records.Properties),
OperationName = tostring(records.OperationName),
OperationId = tostring(records.OperationId),
ParentId = tostring(records.ParentId),
SyntheticSource = tostring(records.SyntheticSource),
SessionId = tostring(records.SessionId),
UserId = tostring(records.UserId),
UserAuthenticatedId = tostring(records.UserAuthenticatedId),
UserAccountId = tostring(records.UserAccountId),
AppVersion = tostring(records.AppVersion),
AppRoleName = tostring(records.AppRoleName),
AppRoleInstance = tostring(records.AppRoleInstance),
ClientType = tostring(records.ClientType),
ClientModel  = tostring(records.ClientModel), 
ClientOS  = tostring(records.ClientOS), 
ClientIP  = tostring(records.ClientIP),
ClientCity  = tostring(records.ClientCity), 
ClientStateOrProvince  = tostring(records.ClientStateOrProvince), 
ClientCountryOrRegion  = tostring(records.ClientCountryOrRegion), 
ClientBrowser  = tostring(records.ClientBrowser), 
ResourceGUID  = tostring(records.ResourceGUID)
}

With the three functions in place, we need to create an update policy that will use the results of a function to load a table whenever data is added to the holding table. For our pages_Staging_PageViews table, we run the following command to create the policy.

.alter table [@"pages_Staging_PageViews"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_PageViewsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

This command adds an update policy to the pages_Staging_PageViews table. This update policy will be invoked whenever data is added to the Pages_EventHub table. It will execute the fn_Pages_PageViewsIngest function created above against this new data and load the result into the pages_Staging_PageViews table. The function itself filters out all data that did not originate from the original AppPageViews table and transform it to match the destination schema.

The commands for creating the policies on the other two tables are below:

.alter table [@"pages_Staging_Events"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_EventsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

.alter table [@"pages_Staging_Metrics"] policy update @'[{"Source": "Pages_EventHub", "Query": "fn_Pages_MetricsIngest()", "IsEnabled": "True", "IsTransactional": true}]'

The last step is to add a retention policy to the Pages_EventHub table that will remove data automatically after it has been processed. This is an optional step and can be done at any point to conserve resources. A retention policy will remove ingested data after a defined time. Setting the period to 0 will delete the data shortly after all update policies have completed.

In this case the policy is added to the holding table by running the following command:

.alter-merge table Pages_EventHub policy retention softdelete = 0d recoverability = disabled

At this point, data should be flowing into the three destination tables shortly after it arrives through the event hub.

Connect external tables to the ADLG2 data (optional)

Earlier, we selected both an event hub and a storage account to receive data from Application Insights. The reason for the storage container is to provide an authoritative source of persisted data. Data in Application Insights expires by default after 90 days and cannot be retained any longer than 2 years. Data in Kusto can be persisted for an unlimited period, but it too can be configured to expire after a period of time. Storing the data in a storage account ensures permanency, and provides a location to re-ingest from should any disaster befall the Kusto data.

Kusto can be connected to external data sources as an external table. These sources can be a storage account, or SQL databases. While not strictly required, it is a good idea to create external tables connected to this data so that this data can be queried, and re-ingested with relative ease whenever necessary.

Connecting Kusto to ADLG2 storage is a two-step process. First you create a shared access signature, and then you create an external table in Kusto using that signature. A shared access signature can be created for the entire account, a container, or even a folder. Since we will be connecting to three different containers, we will create the signature at the account level. To do this navigate to the storage account in Azure, and the select Shared access signature in the Security + networking section. Select Blob and File from Allowed services, and then Container and Object from Allowed resource types. Set an expiry date applicable to your situation. The external table will stop working once your expiry date is exceeded.

When ready, click the Generate SAS and connection string button, and the screen will appear as follows:

Make note of the Blob service SAS URL – it will be needed in the next step. It’s also a good idea to record these settings, as it’s not possible to go back and retrieve them later.

Capturing the three tables above to ADLG2 creates the following three containers in the storage account:

  • insights-logs-appevents
  • insights-logs-appmetrics
  • insights-logs-apppageviews

When creating the external tables below, the Blob service SAS URL values need to be modified to include these containers by adding them before the token in the URL. Therefore:

https://mystorageaccount.blob.core.windows.net/?sv=2021-06-08&……. becomes

https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?sv=2021-06-08&……. and so on.

To create the external table in Kusto, navigate to a Kusto query window that is connected to the appropriate database. The following commands can be used to create the table, substituting the sample url with the ones from above:

.create-or-alter external table Pages_AppEvents_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,ItemCount:int) 
kind=blob 
dataformat=json
( 
   h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appevents?******' 
)

.create-or-alter external table Pages_AppMetrics_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Name:string,Sum:int,Min:int,Max:int,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-appmetrics?******'
)

.create-or-alter external table Pages_AppPageViews_Ext (['time']:datetime,resourceId:string,ResourceGUID:guid,Type:string,ClientBrowser:string,ClientCity:string,ClientCountryOrRegion:string,ClientIP:string,ClientModel:string,ClientOS:string,ClientStateOrProvince:string,ClientType:string,IKey:guid,_BilledSize:int,OperationName:string,OperationId:guid,ParentId:guid,SDKVersion:string,SessionId:string,UserAccountId:string,UserAuthenticatedId:string,UserId:string,Properties:dynamic,Measurements:dynamic,Id:guid,Name:string,Url:string,DurationMs:int,PerformanceBucket:string,ItemCount:int) 
kind=blob 
dataformat=json 
( 
    h@'https://mystorageaccount.blob.core.windows.net/insights-logs-apppageviews?******'
)

Once created, the external tables can be queried like any other table. They can be used for data validation or reingestion as appropriate.

In Conclusion

Once the data is flowing, subsequent tables and update policies can be set up to further transform the data, and materialized views can be created to further optimize query performance. Moving Application Insights data into Kusto gives you the best of both worlds, the telemetry collection capabilities of Application Insights, and the big data power of Kusto. This approach is not limited to Application Insights either – it can be used with and Azure services that support Azure Monitor with Diagnostic Settings.

Be aware however that this migration is a one way street. Once the data is in Kusto, it can be retained for as long as you like, but it can’t be put back into the source.

Leave a Comment

Connect Power BI to Azure Monitor data using Direct Query with Azure Data Explorer

Man – that’s a boring title. But it’s accurate.

A few months ago, I posted an article outlining how to connect Power BI to Azure Application Insights and Azure Log Analytics (jointly referred to as Azure Monitor) with Direct Query. This article describes an approach that allows you to use a native Kusto connector to connect to the Azure Monitor instance as if it were an ADX cluster. This allows for Direct Query to be used, among other things. The option connecting Power BI available through the Azure Monitor UI uses an html connector to query the respective APIs, and that connector doesn’t support Direct Query.

The problem with using this connector is that it’s a bit of a hack. At the time it was written, you needed to use the old Power BI driver for Kusto to make it work, and that approach isn’t simple. Over time, it stopped working altogether for Application Insights. The ADX connector has since been updated to support connection to Azure Log Analytics (but not Application Insights) and is therefore still valid.

There is however another way to achieve this by using your own ADX cluster. ADX clusters allow for “cross-cluster queries” that permit tables in a database in one cluster to be joined or unioned with tables in a completely different cluster. The same proxy addresses mentioned above can be used in one of these cross-cluster queries, and in this way, be just use the ADX cluster as an intermediary.

Everything that you need to know about this approach can be found in the support article “Query data in Azure Monitor using Azure Data Explorer”

To create a Power BI report that queries Azure Monitor data using Direct Query, first create a new report, and connect to data using the “Azure Data Explorer (Kusto) connector”. Enter the address of the cluster, and the name of a database within that cluster. The database itself doesn’t matter; it simply provides a scope for the query. Finally, you need to specify the query, and this is where the cross-cluster query comes into the picture. The query takes the following form:

cluster(‘ProxyURL‘).database(‘ResourceName‘).TableName

The Proxy URLs differ between Log Analytics and Application Insights. The two take the following forms:

Log Analytics:

https://ade.loganalytics.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.operationalinsights/workspaces/<workspace-name>

Application Insights:

https://ade.applicationinsights.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.insights/components/<ai-app-name>

The cross-cluster query for the table named “pageViews” in an Application Insights instance named “WhitePagesLogs” in a Resource group named “MyResourceGroup” in the subscription “71a90792-474e-5e49-ab4e-da54baa26d5d” is therefore”

cluster('https://ade.applicationinsights.io/subscriptions/71a90792-474e-5e49-ab4e-da54baa26d5d/resourcegroups/MyResourceGroup/providers/microsoft.insights/components/WhitePagesLogs').database('WhitePagesLogs').pageViews

It is worth explicitly noting that the resource name appears twice in the query – once in the cluster address, and as the database name.

When ready, the Get data dialog box should appear as follows:

If you want to use Direct Query, don’t forget to open the Advanced Options section, and select it here.

At this point, the report can be built, and it will behave as if it was a normal ADX cluster. You can of course build more complex queries, etc, but you cannot build things like functions, or materialized vies, since you do not have administrative access to the engine behind Azure Monitor.

Compared to using the Power BI ADX connector directly, this approach has the advantage of being explicitly supported, and it also works with bot Application Insights, and Log Analytics. On the downside, there is a cost to running your own ADX cluster, although it is minimal. This cluster is simply acting as a gateway in this case, and therefore, a bare minimum of resources will suffice.

2 Comments

Using Power BI Incremental Refresh with Azure Data Explorer

Azure Data Explorer (ADX) is a great platform for storing large amounts of transactional data. The Incremental Refresh feature (now available for Pro users!) in Power BI makes it much faster to keep data models based on that data current. Unfortunately, if you follow the standard guidance from Microsoft for configuring Incremental Refresh, you’ll quickly bump into a roadblock. Luckily, it’s not that difficult to get around.

Incremental Refresh works by setting up data partitions in the dataset in the service. These partitions are based on time slices. Once data has been loaded into the dataset, only the data in the most recent partition is refreshed.

To set this up in Power BI Desktop, you need to configure two parameters, RangeStart, and RangeEnd. These two parameters must be set as Date/Time parameters. Once set, the parameters are used to filter the Date/Time columns in your tables accordingly, and once published to the service, to define the partitions to load the data into.

When Power Query connects to ADX, all Date/Time fields come in as the Date/Time/Timezone type. This is a bit of a problem. When you use the column filters to filter your dates, the two range parameters won’t show up because they are of a different type (Date/Time). Well, that’s not a big problem, right? Power Query lets us change the data column type simply by selecting the type picker on the column header.

Indeed, doing this does in fact allow you to use your range parameters in the column filters. Unfortunately, data type conversions don’t get folded back to the source ADX query. You can see this by right-clicking on a subsequent step in the Power Query editor. The “View Native Query” option is greyed out, which indicates that the query cannot be folded.

Query folding is critical to incremental refresh. Without it, the entirety of the data is brought locally so that it can be filtered vs having the filter occur at the data source. This would completely defeat the purpose of implementing Incremental Refresh in the first place.

The good news is that you can in fact filter a Date/Time/Timezone column with a Date/Time parameter, but the Power Query user interface doesn’t know that. The solution is to simply remove the type conversion Power Query step AFTER performing the filter in the Power Query UI.

Alternatively, if you’re comfortable with the M language, you can simply insert something like the following line using the Advanced Editor in Power Query (where CreatedLocal is the name of the column being filtered).

#"Filtered Rows" = Table.SelectRows(Source, each [CreatedLocal] >= RangeStart and [CreatedLocal] < RangeEnd),

If the filtration step can be folded back into the source, Incremental Refresh should work properly. You can continue setting up Incremental Refresh using the DAX editor. You will likely see some warning messages indicating that folding can’t be detected, but these can safely be ignored.

Leave a Comment

Connect to Application Insights and Log Analytics with Direct Query in Power BI

Application Insights (AI) and Log Analytics (LA) from Microsoft Azure provide easy and inexpensive ways to instrument applications. Using just an instrumentation key, any application can send operational data to AI which can then provide a rich array of tools to monitor the operation of the application. In fact, the blog that you are reading uses an Application Insights plugin for WordPress that registers each view of a page into an instance of AI in my Azure tenant.

Application Insights data can be queried directly in the Azure portal to provide rich insights. In addition, the data can be exported to Excel for further analysis, or, it can be queried using Power Query in either Excel or Power BI. The procedure for using Power Query can be found in this article. The approach for doing so, uses the Web connector in Power Query, which can be automatically refreshed on a regular basis. The Web connector does not however support Direct Query, so the latency of the data in this scenario will be limited by the refresh schedule configured in Power BI. Any features that depend on Direct Query (Aggregations, Automatic Page Refresh) will also not work.

If you’ve worked with AI or LA, and dropped down to the Query editor, you’ve been exposed to KQL – The Kusto Query Language. This is the language that is used by Azure Data Explorer (ADX), or as its code name, “Kusto”. This is of course not a coincidence, as the Kusto engine powers both AI and LA.

Power BI contains a native connector for ADX, and if you can configure an ADX cluster for yourself, populate it, and work with it in Power BI for both imported and Direct Query datasets. Given that ADX is what powers AI and LA, it should be possible to use this connector to query the data for AI and LA. It turns out that the introduction of a new feature known as the ADX proxy will allow us to do just that.

The ADX proxy is designed to allow the ADX user interface to connect to instances of AI and LA and run queries from the same screens as native ADX clusters. The entire process is described in the document Query data in Azure Monitor using Azure Data Explorer. The document explains the process, but what we are particularly interested in is the syntax used to express an AI or LA instance as an ADX cluster. Multiple variations are described in the document, but the ones that we are most interested in are here:

For LA: https://ade.loganalytics.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.operationalinsights/workspaces/<workspace-name>

For AI: https://ade.applicationinsights.io/subscriptions/<subscription-id>/resourcegroups/<resource-group-name>/providers/microsoft.insights/components/<ai-app-name>

By substituting in your subscription ID, resource group name, and resource name, you can treat these resources as if they were ADX clusters, and query them in Power BI using Direct Query. As an example, a simple query on this blog can be formed using the ADX connector:

And the result will appear as:

The precise query is provided in the query section of the connector above.

Once the report is built, it can be deployed to the PowerBI service, and refreshed using AAD credentials.

It is important to note that this method does NOT require you to configure an ADX cluster of your own. We are simply utilizing the cluster provided to all instances of AI and LA. We therefore do not have any control over performance levels, as we would have in a full ADX cluster. However, if the performance is adequate,(and the queries designed appropriately), this can be a good approach to work with AI and LA data that has low latency (near real time) requirements.

31 Comments