Skip to content

Tag: Excel

It’s time to stop using Power Pivot

Excel is an excellent tool for analyzing data. An analyst can easily connect to and import data, perform analyses, and achieve results quickly. Export to Excel is still one of the most used features of any Business Intelligence tool on the market. The demand for “self-service BI” resulted in a lot of imported data being stored in overly large Excel files. This posed several problems. IT administrators had to deal with storage requirements. Analysts were restricted by the amount of data they could work with, and the proliferation of these “spreadmarts” storing potentially sensitive data created a governance nightmare.

A little history

Power Pivot was created to provide a self-service BI tool that solved these problems. Initially released as an add-in for Excel 2010, it contained a new analytical engine that would soon be introduced to SQL Server Analysis Services as well. Its columnar compression meant that millions of rows of data could be analyzed in Excel and would not require massive amounts of space to store. Data in Power Pivot is read-only and refreshable – ensuring integrity. It allowed analysts to set up their own analytical data sets and analyze them using a familiar looking language (DAX), and visual reporting canvas (PowerView) all from within Excel.

The original version of Power BI brought PowerPivot to Office 365 through Excel before Power BI’s relaunch gave it its own consumption interface (the service) and design client (Power BI Desktop). Both the PowerPivot engine, and Power Query were incorporated into the service and Power BI Desktop, while the Silverlight based Power View was replaced with a more web friendly reporting canvas.

Excel support

Throughout all these changes, Excel has continued to be well supported in the Power BI service. Analyze in Excel allows an analyst to connect to a deployed Power BI dataset (built with Power BI Desktop) and analyze it using pivot tables, charts, etc. Recent “connect to dataset” features have made this even simpler. Organizational Data Types allow Excel data to be decorated with related data in Power BI.

Excel workbooks containing Power Pivot models have always been supported by the service. These models can even be refreshed on a regular basis. If the source data resides on premises, it can even be refreshed through the on-premises data gateway. This all because the data engine in Power BI is essentially Power Pivot.

It’s that word “essentially” that causes a problem.

Datasets that are created and stored within Excel workbooks are functional but can only be accessed by that workbook. Contrast this with a dataset created by Power BI Desktop, which can be accessed by other interactive (pbix) reports, paginated reports, and as mentioned above, by Excel itself. The XMLA endpoint also allows these reports to be accessed by a myriad of third part products. None of this is true for datasets created and stored in Excel.

So why would anyone continue to create models in Excel. The reason has been until now that although Excel can connect to Power BI datasets to perform analysis, those connected workbooks would not be updated when the source dataset changes. This meant that those analysts that really care about Excel needed to work with the Excel created models. This changed recently with an announcement at Microsoft Ignite Spring 2021. In the session Drive a data Culture with Power BI: Vision, Strategy and Roadmap it was announced that very soon, Excel files connected to Power BI datasets will be automatically updated. This removes the last technical reason to continue to use Power Pivot in Excel.

Tooling

Building a dataset with Power BI Desktop is fundamentally the same as building one with Excel. The two core languages and engines (M with Power Query, and DAX with Power Pivot) are equivalent between the two products. The only difference is that the engine versions found in Excel tend to lag those found in Power BI Desktop and the Power BI service itself. I’d argue that the interfaces for performing these transforms, and building the models are far superior in Power BI Desktop. not to mention the third-party add-in capability.

In this “new world” of Excel data analysis, Datasets will be created by using Power BI Desktop, deployed to the service, and then Excel will connect to them to provide deep analysis. These workbooks can then be published to the Power BI service alongside and other interactive or paginated reports for use by analysts. With this new capability, Excel truly resumes its place as a full-fledged first-class citizen in the Power BI space.

What to use when

With this change, the decision of what tool to use can be based completely on its suitability to task, and not on technical limitations. There are distinct types of reports, and different sorts of users. The choice of what to use when can now be based completely on these factors. The common element among them all is the dataset.

With respect to report usage, typical usage can be seen below.

ToolUsed byPurpose
Power BI ServiceReport consumersConsuming all types of reports: interactive, paginated and Excel
Excel OnlineReport consumersConsuming Excel reports from SharePoint, Teams, or the Power BI service
Power BI DesktopModel builders
Interactive report designers
Building Power BI dataset
Building interactive reports
Power BI Report BuilderPaginated report designersBuilding paginated reports
ExcelAnalystsBuilding Excel reports
Analyzing Power BI datasets

Making the move

Moving away from Power Pivot won’t require any new services or infrastructure, and existing reports and models don’t need to be converted. They will continue to work and be supported for the foreseeable future. Microsoft has neither said not indicated that Power Pivot in Excel is going anywhere. However, by building your new datasets in Power BI Desktop, you will be better positioned moving forward.

If you do want to migrate some or all your existing Excel based Power Pivot datasets, it’s a simple matter of importing the Excel file into Power BI Desktop. This is completely different than connecting to an Excel file as a data source. From the File menu in Power BI Desktop, select Import, then select Power Query, Power Pivot, Power View. You will then select the Excel file that contains your dataset.

Power BI will then import all your Power Query queries, your Power Pivot dataset, and if you have any it will convert PowerView reports to the Power BI report types. The new report can then replace your existing Excel file. Once deployed to the Power BI service, other Excel files can connect to it if so desired.

Building your datasets with Power BI Desktop allows you to take advantage of a rich set of services, across a broad range of products, including Excel. Building them in Excel locks you into an Excel only scenario. If you already use Power BI, then there’s really no reason to continue to build Power Pivot datasets in Excel.

6 Comments

Using Excel Files with Power BI Desktop and SharePoint

As discussed in a previous post, Working with Excel Files in the Power BI Service, Excel and Power BI have a rich, complex relationship. Power BI Desktop is the primary design tool for Power BI, and it has many feature overlaps with Excel as an analytic tool. Excel can be used both as an analytic tool and a data source, and the structure of the Excel file will dictate the way that Power BI Desktop can be used with it. If Excel is being used as an analytic tool (i.e. connected to data), the appropriate items in the file can be imported into Power BI Desktop. If it is being used as a data source (data in worksheets), Power BI Desktop will connect to it, and use its data to build a model. This post attempts to articulate the nuances of both scenarios.

Importing from Excel

Power BI Desktop has an unfortunate name in my opinion. It is a design tool and is not meant to replicate the capabilities of the Power BI service on the desktop, as the name might suggest. A better name for the product would I believe be Power BI Designer. Its purpose is to connect to and transform data (Power Query), build data models for Analysis (Power Pivot) and build reports (report designer).

Used as an analysis tool, Excel has all these capabilities as well. In fact, the first two (Power Query and Power Pivot) are identical to what is already in Power BI Desktop. Excel also has Power View for analytic reporting. Power View is very similar to the type of reporting in Power BI Desktop, but uses a different technology and has been deprecated for some time. As a result, Excel charts and pivot tables are the primary means of visualizing data in Excel.

So why would you need to use Power BI Desktop if you are using Excel? As explained in my previous post, the Power BI service can fully interact with Excel as an analysis tool, and allows you to interact with Excel right from the Power BI Service. If Excel is meeting all your analytics needs, then there may be no need to introduce Power BI Desktop at all. However, if you wish to take advantage of Power BI’s analytic reporting capabilities, and you have existing Excel assets, you may wish to convert them to the native Power BI format.  Whatever the reason, moving from Excel to Power BI is relatively straightforward with Power BI Desktop.

From the File menu in Power BI Desktop, select Import, and then Excel Workbook Contents.

Importing Excel Files contents

You are then prompted to select an Excel file. Once selected, you are then presented with a warning dialog.

Excel files contents warning dialog

The dialog does a very good job of explaining what will happen, specifically the fact that data from workbooks will not be brought into this new file. Any Power Pivot data models or Power Queries will be brought in. If the workbook contains legacy Power View sheets, they will be converted to native Power BI visuals. In addition, any legacy (non-Power Query) data connections used by the source file’s Power Pivot data model will be converted to Power Query and imported.

Imported Power View sheet

Legacy Power View Sheet converted to Power BI visuals

A complete list of workbook content and what is/isn’t converted is below:

Excel Content Import to PBI Desktop Support
Data in sheets Not imported
Data model (Power Pivot) Imported
Data connections Converted to Power Query and imported
Power Queries Imported
Power View Sheets Converted to PBI visuals and imported
Pivot charts/tables Not imported
Excel charts Not imported
Macros Not imported

Once imported, the new Power BI file (PBIX) lives on its own and contains no connection or any other type of relationship to the original source Excel file. If the source Excel file is changed, there is no way to update the PBIX file. Any imported data connections are between the PBIX file and the original data source. The new PBIX file can be published to the Power BI service like any other.

Connecting to Excel

Connecting to Excel as a data source is a very different thing than importing from it. In this scenario, the data in the worksheets and only the data in the worksheets is brought into the data model. The is very different behaviour than that of connecting to Excel files to the Power BI Service, where both the model and the worksheet data is brought in.

Using the Excel Connector

The easiest, and most obvious way to connect to Excel worksheet data is by using the Excel connector. From the ribbon in Power BI Desktop, select Get Data. The Excel connector is right at the top of the list.

Connecting to Excel files on the file system

Selecting it allows you select your source file, and then the workbooks within it, and then build out the data model.

This approach works well but carries with it an important limitation. The new queries are  connected to the file using a local file system. This means that to be refreshed, an on-premises data gateway is required. In order to eliminate the gateway requirement, you can connect to the file in SharePoint using the SharePoint folder connector.

Using the SharePoint Folder Connector

The SharePoint Folder connects to all the files stored in libraries of a SharePoint site. It allows you to report on file metadata, but it also allows you to drill into file contents.

From Power BI Desktop select Get Data but instead of selecting Excel, Search for SharePoint and select SharePoint folder.

Using the SharePoint folder connector

Once selected, enter the URL of the SharePoint site (NOT the URL of a library or folder) in the dialog box.

Next, you will be presented with a preview of all the files in your site. Unless you are only interested in file metadata, click on the Edit button to bring up the Power Query editor.

The initial view will contain all the files in the site, but we are interested in the content of just one of those files. Every file in this view will contain the hyperlinked value “Binary” in the Content column. Clicking that link for the file that you want to connect to will drill down into the contents.

Site contents using the SharePoint folder connector

From this level, you can build your Power Query, data model, and report as needed just as if you had used the Excel connector. The difference is that now when you publish your report to Power BI, it will know the file is stored in SharePoint and will connect directly to it. It will not require a gateway for refresh purposes. Once credentials are registered, the report will refresh itself directly from the workbook stored in SharePoint.

XLS vs XLSX

A note of caution. The above SharePoint folder approach only works for XLSX files. The Power BI Desktop and the Power BI service both support both Excel file formats (XLS and XLSX). However, refresh does not. If the source file format is XLS, and a refresh is attempted, you will receive the classic “microsoft.ace.oledb.12.0 provider” error in the Power BI service.

Excel files refresh error with XLS file types

The older Excel file format (XLS) requires an Access driver to refresh, which is not a part of the Power BI service. The newer XLSX file does not require this driver. As a result, if the source file is XLS, refreshing it requires going through an On-Premises Data gateway, and that gateway machine must also have the ACE components installed.

To recap, you can bring Excel assets into Power BI Desktop by using the import function, and you can load data from Excel files through Power Query. The two operations have very different results, and the can be combined if a source workbook contains both analyses and data.

5 Comments

Working with Excel files in the Power BI Service

Power BI has been able to work with Excel files since it was first introduced. Indeed, it was born from the analytic capabilities in Excel. Users can connect directly to Excel files by using the Power BI service and nothing but a browser. However, depending on the content of the Excel file, and the method of connecting, the resulting products can be very different. In this post I will attempt to clarify this behavior. A subsequent post will detail the options available when working with Excel files in Power BI Desktop.

File Structure

Excel is a multi-purpose tool. It contains all the building blocks of Power BI, and as such, it is an excellent Business Intelligence client. Excel files are also often used (much to my chagrin) as a data storage container, or as a data transport medium. Understanding how the file is structured, and what you want to do with it is key to making the right choice when combining it with Power BI.

Originally Excel files (workbooks) were collections of worksheets. Analysts could import data into those worksheets and then analyze them with the tools that Excel provided. Although Excel was never intended to be a database, it’s ease of use and familiarity led many people to begin using it that was, and “spreadmarts” (spreadsheet data marts) quickly became a problem. The problems arose because the instant data was extracted from a source it became stale, and the fact that it was being stored in worksheets meant that it could be edited (changing history) and became subject to the data size limitations of a worksheet.

To take advantage of Excel’s analytic capabilities without being subject to the issues involved in worksheet data storage, the data model was introduced, initially through PowerPivot. The data model is a “miniaturized” version of the SQL Server Analysis Services tabular engine that runs in Excel. This data model is read only, refreshable, and highly compressed which importantly means that its only data limitation is the amount of available memory available on the machine running it. Importantly, this engine is the same engine that is used by Power BI – the advantages of which we’ll explore shortly.

Excel of course still needs to be able to use worksheets and be Excel, so we can’t just remove the worksheet capability (which incidentally is effectively what Power BI Desktop is – Excel without worksheets). Therefore, today from a data perspective, Excel files can have data in the data model, worksheets or both. From the Power BI service perspective, the important thing is whether the file contains a data model, as it treats the two cases differently.

Getting Excel Data

From the Power BI service, you click the Get Data button, and then the Get button in the Files tile. You are then presented with one of two dialogs depending on whether you are using a personal workspace, or an app workspace.

Personal workspace

Importing files into a Power BI Personal Workspace

Connecting file-based data to a personal workspace

When importing into a personal workspace, there are 4 possible data sources.

A local file is one that is stored on a file system local to the machine being used. Selecting this option will allow you to work with the Excel file stored in that location, but if the file is being used as a data source (data is in the worksheets), then a Data Gateway will be required for any data refreshes. Power BI will also connect to a file stored in OneDrive, either Personal or Business (through office 365). Finally, the service can work with files stored in any accessible SharePoint site (not simply Team sites as the name would indicate).

App workspace

Importing files into a Power BI App Workspace

Connecting file-based data to an App workspace

When importing into an App workspace, there are 3 possible data sources. The Local File and SharePoint – Team Sites options are precisely the same as when importing into a personal workspace. The difference is the OneDrive – Workspace name option replaces the two other OneDrive options. Choosing this option allows you to work with files stored in the “Group OneDrive”. Since every App workspace is backed by an Office 365 or “Modern” group, it also has access to the SharePoint site for that group. The “Group OneDrive” is the Documents library within that SharePoint site. Therefore, choosing SharePoint – TeamSites and navigating to the Documents library will render the same results in a few more mouse clicks, but also give access to all other document libraries within that site.

Connect vs Import

Once you navigate to the Excel file that you want to work with, you select it, and click connect. You will then be presented with two options for the file, Import or Connect.

This choice dictates how the file is brought into the Power BI service. The structure of the file determines exactly what is brought in to the service in both cases.

Connect

Clicking the Connect button allows Power BI to connect to and work with the Excel file in place. The workbook is displayed as an Excel workbook in full fidelity in the Power BI interface using Excel Online. The file itself is shown in the Workbooks section in the Power BI interface, and it stands alone from other Power BI elements (except that regions of it can be pinned to a dashboard). Connecting to an Excel report will not create a Power BI Dataset, Report, or Dashboard. All operations, including refresh (see below) are controlled through the workbook.

At no point is the file moved, or “brought in” to the Power BI service. If the file is being stored in SharePoint, or OneDrive, anything done to the file in the Power BI service will be visible to anyone with access to the file itself, whether they are a Power BI user or not. This includes refresh, which will be discussed further below, but the important part to remember here is that if the data in the connected file is refreshed through the Power BI service, and it is being stored in SharePoint (or OneDrive), all users will be able to see updated data the next time that they open the file.

Connecting to an Excel file behaves the same way whether the file contains a data model or not, but the file must contain a data model in order to be refreshed by the Power BI service.

Connected Excel file within Power BI

Connected Excel file within Power BI

Import

Importing an Excel file behaves totally differently from connecting to it. When an Excel file is imported, it is treated as a data source to Power BI, and the assets within that file are brought into the Power BI service. Subsequent changes to the source file are not immediately reflected within the Power BI service, but are retrieved through the refresh process.

The way that the assets are brought into the service depends very much on the structure of the file, specifically whether it contains a data model or not. If the file does not contain a data model, then Power BI will use the data contained in the Excel worksheets to construct a new one. This is similar to what happens when a CSV file is imported into the service. If the file does contain a data model, then the worksheet data is imported, and that data model is brought into the service as-is. One important exception to this is if worksheet data uses the same query as an existing model, the worksheet data is ignored, and the data model is brought in as-is. This is important because Excel’s Power Pivot editor can be used to edit the model, creating calculated columns, calculated measures and relationships prior to import. The model that is automatically created when the file does not contain a model has no editing capabilities.

When an Excel file with a data model is imported, the data model (imported or created) is added to datasets, and a link to the dataset is added to the default dashboard for the workspace. If no default dashboard exists, one will be created. A report can then be authored in the service. If the workbook contains any PowerView reports, these will be converted to native Power BI reports and added to the service as well. Any embedded 3D maps are not brought in.

Imported Excel File showing calculated measures

Imported Excel File showing calculated measures

Refresh

Data refresh options, and behavior depend on both the Get Data choice (connect or import) and the structure of the Excel file.

Connected Workbooks

If the workbook is connected to the service, and it does not contain a data model, it cannot be refreshed. This is true even if the worksheets in the workbook contain data from Power Query queries. This is the only scenario that does not support refresh in any way.

If the workbook contains a data model refresh is supported. The interesting part is that refresh will be triggered not only for the data model itself, but for any worksheets that have Power Queries as a data source. Therefore, a workaround to the lack of refresh support for a worksheet with no data model is to add a blank data model.

For refresh to work, the data source must be available to the Power BI service. This means that the source must be available in the cloud or registered on an available On-Premises Data Gateway.

The important thing to note about connected workbooks is that the refresh options that are performed on them are permanent – refreshed data is stored with the workbook. This means that if the connected workbook is stored in SharePoint, or shared through OneDrive, updated data is available to all users with access regardless of whether they are Power BI users.

Imported Workbooks

Refresh options for imported workbooks are slightly more complicated. As mentioned above data is either imported from the worksheets, a data model imported into the service or both.

If data was imported from worksheets, then the Excel file is the data source from the standpoint of Power BI. If the file is stored in SharePoint or OneDrive, it will automatically be refreshed every hour by default. This means that changes to the underlying Excel file will be reflected back in the Power BI service within an hour. This feature can be disabled, but it is not possible to change to hourly schedule, nor precisely when it will occur.

Refresh options for workbooks in OneDrive/SharePoint

Refresh options for workbooks in OneDrive/SharePoint

If the file is stored on a file system, it can be scheduled more granularly, but you will need to connect to it through an On-Premises Data Gateway.

If the file contained a data model that was imported into the service, then the original source of data for that data model (the query) is what the Power BI service will refresh from (NOT the Excel file itself). In this case, the refresh options are the same as with most other Power BI data sources – Excel is taken out of the picture completely, and any changes to the source Excel file will not be reflected into the service. The exception to this is if the file had both a data model, and worksheet data that was imported.

In the case of an Excel with both a data model and worksheet data, both cases above will apply. The workbook is used as a data source for the table that was created by Power BI on import, and the original data model’s source is updated independently. This means that changes to the worksheet data are reflected in the Power BI service when refreshed, but any model changes to the original Excel file are not. Both the OneDrive and regular refresh schedules are used for imported files of this type.

Refresh options for a combined data source

Refresh options for a combined data source

The following table summarizes the refresh options available for file structure and connection type.

File Structure

Get Data option

Connect

Import

Worksheet data None Refresh from worksheet
Data model only Refresh from model source Refresh from model source
Data model plus worksheet data Refresh from model source and worksheet source Refresh from model source and worksheet

Summary

Both Excel and Power BI are powerful tools in their own rights, and the decision to use one does not preclude using the other and in fact there are many good reasons for doing so. Bringing refreshability to Excel files stored in SharePoint is just one of them. It is however important to understand how it all works in order to get the maximum impact.

5 Comments

Power BI Report Server Completes the Vision for On-Premises Reporting

Microsoft today made available the August 2017 preview of Power BI ReportServer 2017. This preview includes the long awaited support of embedded data models, as well as the ability to render Excel reports natively. This is a major step forward, because with this release, Microsoft has completed its vision for its on-premises reporting platform that it first articulated in October of 2015.

Excel content being rendered in Power BI Reporting Server

The big news at the time was that the platform was stated to be SQL Server Reporting Services (SSRS). Not SharePoint, not PerformancePoint, but SSRS. SSRS was a mature product that quite competently provided a platform for operational reports. What it needed was some modernization and the addition of some analytical and self-service reporting capabilities. Several of these capabilities were subsequently included with the release of SSRS 2016.

Gone would be the days of configuring complex SharePoint farms just to be able to work with analytical reports (ie Power View, Excel). New Features were being added to SSRS to make it a complete platform for both analytical and operational reports.

The Vision

The roadmap articulated four different report types, 3 of them analytical (by my definition) and one of them operational. These three types line up with reporting tools in the Microsoft BI stack:

Name Type Primary authoring tool ext
Paginated Operational SSRS Report Builder
SQL Server Data Tools
.RDL
Interactive Analytical Power BI Desktop .PBIX
Mobile Analytical Mobile Report Designer .RDLX
Analytical Analytical Excel .XLSX

Therefore, reading between the lines, in order to be a complete reporting platform, SSRS needed to be able to render all of these report types. Paginated reports were of course always native to SSRS, and the roadmap announced that Mobile reports would be included in SSRS 2016 through the integration of Datazen. The roadmap further committed to SSRS being able to render Power BI files in the future.

SSRS 2016

SSRS shipped with some significant modernization improvements, including a much awaited HTML5 rendering engine, and it included Mobile Reports. Mobile reports are delivered through the Power BI mobile application, and SSRS visuals can be pinned to Power BI dashboards.

Significant plumbing was done to move the platform forward in 2016, but it still only rendered 2 of the 4 report types.

In November 2016, it was further announced that the 2016 version of SSRS running in SharePoint Integrated mode would be the last. Moving forward, Reporting Services will only run in Native Mode. In the same announcement. In the same announcement, as I noted in another post, for the first time, the SSRS team committed to providing Excel report rendering capability as well.

Power BI Reporting Server

We first saw the on premises rendering of Power BI reports in the first community preview of SSRS V.Next in the fall of 2016. Those previews required that the reports be directly connected to SSAS tabular models, but they were ground-breaking just the same. A user could be totally disconnected from the web, and still render Power BI reports.

In May of 2017, Power BI Report Server (PBIRS) was announced. A less confusing name could have potentially been SSRS Premium, because that is in essence what it is. PBIRS is everything that SSRS is, plus the ability to render Power BI reports. SSRS will continue forward as a product without Power BI rendering capabilities. It is just a licensing distinction.

The release today of the August 2017 preview of PBIRS allows for embedded data models, and therefore a much wider breadth of capabilities. These models cannot be automatically refreshed yet, but they will upon release. This is, after all, just a preview. If you need automatic refresh of these data models in the meantime, there is an excellent third party solution to do this: PowerPivot Pro’s Power Update.

The inclusion of Excel report rendering capabilities means that PBIRS is a complete report rendering platform, more complete even that the Power BI cloud service.

Moving Forward

Now that the basic on-premises capability has been provided, SSRS/PBIRS needs to pay attention to paying back the debt that it incurred when SharePoint Integrated mode was deprecated. Chief among these features is the SSRS web part. The lack of a decent web part is a blocker for many organizations to move forward with this strategy. Some migration tools to move from Integrated to Native mode (like this one that migrates in the other direction) would be highly useful as well.

Now with on-premises covering all the bases, it’s easy to spot a glaring hole in the cloud Power BI offering. While it supports all three types of analytical reports, there is currently no way to render operational reports in the cloud. Until this capability is provided, it appears that on-premises will have the most complete solution.

2 Comments

The Difference Between Reporting and Analytics is 42

In his novel “The Hitchhiker’s Guide to the Galaxy”, Douglas Adams envisioned a giant supercomputer named “Deep Thought” that was built to solve the answer to the ultimate question of life, the universe and everything. For the 5 people out there that are unfamiliar with the story, I’ll relate the important bits here. Deep Thought was commissioned by a race of pan-dimensional beings and required seven and a half million years to complete its calculations. When it was finally complete, Deep Thought informed the descendants of the original creators that the answer was 42. The receivers were understandably disappointed with this response, and when they questioned Deep Thought further, the computer postulated that perhaps the problem was that they never really knew what the question was.

Undeterred, the race then commissioned a second computer (which happened to be the Earth) that would calculate the ultimate question. After a couple of 10 million year attempts, the ultimate question was determined to be “What do you get when you multiply six by nine”. Of course, Adams never claimed that the universe made sense.

To my mind, this is an excellent demonstration of the difference between reporting and analytics. The accurate answer (report) provided a result, but not meaning. Further analytics were necessary to determine context.

Like many information technology terms (Big Data, machine learning, CRM) Business Intelligence (BI) is one of those umbrella terms that many people use regularly without fully understanding its meaning. BI is comprised of many tools that help to glean information and insights from raw data. Thus, an ETL package that moves data from one location to another is just as much a BI tool as is a fancy looking infographic. Combine this lack of clarity with the overloading of the term “reporting, and we wind up with some real confusion in this space.

Reporting is the process of using data to highlight things or trends that have already happened. This can be contrasted with monitoring, which does the same for things that are happening now, and predictive analytics, which tries to predict what will happen in the future based on the same data. The difference between reporting and monitoring is only one of data latency, and as such, monitoring is often referred to as real time reporting, which further muddies the water. However, for the purposes of this article, I want to focus on historical reporting.

Reports are typically one of two types, either operational or analytical. Tools that are good at producing one type are typically not so good at producing the other. What’s the difference? Operational reports are designed to provide information that we know we need, and analytical reports are designed to help us discover things that we didn’t know, or to help answer unanticipated questions. Operational reports are typically designed to be printed. They are typically well paginated, pixel perfect, and provide a single view of the data within any given report. Analytical reports are just the opposite. They are designed with visuals as a starting point, but allow for the ability to pivot on or drill down into the data as appropriate to answer ad-hoc questions. Printing is typically a weakness for analytical reports, whereas drilldown is a weakness for operational reports.

Both report types have their place but they both have very different design point. The data that backs an operational report should ideally be relatively flat, as that best reflects the report layout and helps with performance. Conversely, cubes and data models exist simply because a flat data structure does not adequately support analytical reporting. With analytical reporting, a user may at any point decide to view quantitative data (a measure) through the lens of a different facet (dimension). This difference is so great, that we need a different type of engine to support it. OLAP cubes and tabular models are both examples of this.

Another difference is the data that is necessary to support both report types. Operational reports tend to concern themselves with various levels of subtotals per the predefined facets. In a case like that, the data mart that backs the report only needs to store those subtotals. The granularity, or resolution of the data stored in the data mart does not need to exceed that of report that references it. Analytical reporting is different. Since users will be expected to drill down on data, from on dimension to another, or to filter the data according to increasingly granular facets, it is critical to store all of the data in the data mart backing the data model. We don’t know the level of resolution the analyst will need; therefore, all detail is required.

As a simple example of this, consider the case where we want to analyze some server log data over a period of time. We can pre-aggregate the data in the data model such that it stores the total of the log entries of various entries on a daily basis. There would need to be a total based on each dimension, but the overall data storage would be less than for the raw data. Such data would allow an analyst to spot trends over several days, but the decrease in resolution means that it will be impossible to spot any usage trends within a given day. If daily trends will never be necessary, then this doesn’t matter, but the nature of analytical reports means that the designer can never be sure.

The more that the source data for the report is pre-aggregated, the less that report becomes analytical in nature, and the more it approaches operational. This is regardless of the tool used; you can build either report type with any tool, it’s just that it may not be optimal.

The issue here is one of semantics. Semantics however are important in knowing what you are getting if reports are being provided to you. Calling something “Analytics” does not make it so. If you spin up a content pack in Power BI, and find that the underlying data model provides just enough dimensions and measure to construct the provided report, and that you can’t deconstruct the data in any meaningful way, what you have is a report, not analytics, no matter what the platform. As with anything, there is a trade-off between complexity and power. Given the nuances of this topic, it’s important to look under the hood to know what you are getting.

The answer “42” is perfectly acceptable if you already knew that the question was “what is 6×9?”. But if you want to know why, that takes a little more digging. You’d also know that there might be a data problem…

3 Comments