In his novel “The Hitchhiker’s Guide to the Galaxy”, Douglas Adams envisioned a giant supercomputer named “Deep Thought” that was built to solve the answer to the ultimate question of life, the universe and everything. For the 5 people out there that are unfamiliar with the story, I’ll relate the important bits here. Deep Thought was commissioned by a race of pan-dimensional beings and required seven and a half million years to complete its calculations. When it was finally complete, Deep Thought informed the ancestors of the original creators that the answer was 42. The receivers were understandably disappointed with this response, and when they questioned Deep Thought further, the computer postulated that perhaps the problem was that they never really knew what the question was.
Undeterred, the race then commissioned a second computer (which happened to be the Earth) that would calculate the ultimate question. After a couple of 10 million year attempts, the ultimate question was determined to be “What do you get when you multiply six by nine”. Of course, Adams never claimed that the universe made sense.
To my mind, this is an excellent demonstration of the difference between reporting and analytics. The accurate answer (report) provided a result, but not meaning. Further analytics were necessary to determine context.
Like many information technology terms (Big Data, machine learning, CRM) Business Intelligence (BI) is one of those umbrella terms that many people use regularly without fully understanding its meaning. BI is comprised of many tools that help to glean information and insights from raw data. Thus, an ETL package that moves data from one location to another is just as much a BI tool as is a fancy looking infographic. Combine this lack of clarity with the overloading of the term “reporting, and we wind up with some real confusion in this space.
Reporting is the process of using data to highlight things or trends that have already happened. This can be contrasted with monitoring, which does the same for things that are happening now, and predictive analytics, which tries to predict what will happen in the future based on the same data. The difference between reporting and monitoring is only one of data latency, and as such, monitoring is often referred to as real time reporting, which further muddies the water. However, for the purposes of this article, I want to focus on historical reporting.
Reports are typically one of two types, either operational or analytical. Tools that are good at producing one type are typically not so good at producing the other. What’s the difference? Operational reports are designed to provide information that we know we need, and analytical reports are designed to help us discover things that we didn’t know, or to help answer unanticipated questions. Operational reports are typically designed to be printed. They are typically well paginated, pixel perfect, and provide a single view of the data within any given report. Analytical reports are just the opposite. They are designed with visuals as a starting point, but allow for the ability to pivot on or drill down into the data as appropriate to answer ad-hoc questions. Printing is typically a weakness for analytical reports, whereas drilldown is a weakness for operational reports.
Both report types have their place but they both have very different design point. The data that backs an operational report should ideally be relatively flat, as that best reflects the report layout and helps with performance. Conversely, cubes and data models exist simply because a flat data structure does not adequately support analytical reporting. With analytical reporting, a user may at any point decide to view quantitative data (a measure) through the lens of a different facet (dimension). This difference is so great, that we need a different type of engine to support it. OLAP cubes and tabular models are both examples of this.
Another difference is the data that is necessary to support both report types. Operational reports tend to concern themselves with various levels of subtotals per the predefined facets. In a case like that, the data mart that backs the report only needs to store those subtotals. The granularity, or resolution of the data stored in the data mart does not need to exceed that of report that references it. Analytical reporting is different. Since users will be expected to drill down on data, from on dimension to another, or to filter the data according to increasingly granular facets, it is critical to store all of the data in the data mart backing the data model. We don’t know the level of resolution the analyst will need; therefore, all detail is required.
As a simple example of this, consider the case where we want to analyze some server log data over a period of time. We can pre-aggregate the data in the data model such that it stores the total of the log entries of various entries on a daily basis. There would need to be a total based on each dimension, but the overall data storage would be less than for the raw data. Such data would allow an analyst to spot trends over several days, but the decrease in resolution means that it will be impossible to spot any usage trends within a given day. If daily trends will never be necessary, then this doesn’t matter, but the nature of analytical reports means that the designer can never be sure.
The more that the source data for the report is pre-aggregated, the less that report becomes analytical in nature, and the more it approaches operational. This is regardless of the tool used; you can build either report type with any tool, it’s just that it may not be optimal.
The issue here is one of semantics. Semantics however are important in knowing what you are getting if reports are being provided to you. Calling something “Analytics” does not make it so. If you spin up a content pack in Power BI, and find that the underlying data model provides just enough dimensions and measure to construct the provided report, and that you can’t deconstruct the data in any meaningful way, what you have is a report, not analytics, no matter what the platform. As with anything, there is a trade-off between complexity and power. Given the nuances of this topic, it’s important to look under the hood to know what you are getting.
The answer “42” is perfectly acceptable if you already knew that the question was “what is 6×9?”. But if you want to know why, that takes a little more digging. You’d also know that there might be a data problem…