Tag Archives: Analytics

The Difference Between Reporting and Analytics is 42

In his novel “The Hitchhiker’s Guide to the Galaxy”, Douglas Adams envisioned a giant supercomputer named “Deep Thought” that was built to solve the answer to the ultimate question of life, the universe and everything. For the 5 people out there that are unfamiliar with the story, I’ll relate the important bits here. Deep Thought was commissioned by a race of pan-dimensional beings and required seven and a half million years to complete its calculations. When it was finally complete, Deep Thought informed the ancestors of the original creators that the answer was 42. The receivers were understandably disappointed with this response, and when they questioned Deep Thought further, the computer postulated that perhaps the problem was that they never really knew what the question was.

Undeterred, the race then commissioned a second computer (which happened to be the Earth) that would calculate the ultimate question. After a couple of 10 million year attempts, the ultimate question was determined to be “What do you get when you multiply six by nine”. Of course, Adams never claimed that the universe made sense.

To my mind, this is an excellent demonstration of the difference between reporting and analytics. The accurate answer (report) provided a result, but not meaning. Further analytics were necessary to determine context.

Like many information technology terms (Big Data, machine learning, CRM) Business Intelligence (BI) is one of those umbrella terms that many people use regularly without fully understanding its meaning. BI is comprised of many tools that help to glean information and insights from raw data. Thus, an ETL package that moves data from one location to another is just as much a BI tool as is a fancy looking infographic. Combine this lack of clarity with the overloading of the term “reporting, and we wind up with some real confusion in this space.

Reporting is the process of using data to highlight things or trends that have already happened. This can be contrasted with monitoring, which does the same for things that are happening now, and predictive analytics, which tries to predict what will happen in the future based on the same data. The difference between reporting and monitoring is only one of data latency, and as such, monitoring is often referred to as real time reporting, which further muddies the water. However, for the purposes of this article, I want to focus on historical reporting.

Reports are typically one of two types, either operational or analytical. Tools that are good at producing one type are typically not so good at producing the other. What’s the difference? Operational reports are designed to provide information that we know we need, and analytical reports are designed to help us discover things that we didn’t know, or to help answer unanticipated questions. Operational reports are typically designed to be printed. They are typically well paginated, pixel perfect, and provide a single view of the data within any given report. Analytical reports are just the opposite. They are designed with visuals as a starting point, but allow for the ability to pivot on or drill down into the data as appropriate to answer ad-hoc questions. Printing is typically a weakness for analytical reports, whereas drilldown is a weakness for operational reports.

Both report types have their place but they both have very different design point. The data that backs an operational report should ideally be relatively flat, as that best reflects the report layout and helps with performance. Conversely, cubes and data models exist simply because a flat data structure does not adequately support analytical reporting. With analytical reporting, a user may at any point decide to view quantitative data (a measure) through the lens of a different facet (dimension). This difference is so great, that we need a different type of engine to support it. OLAP cubes and tabular models are both examples of this.

Another difference is the data that is necessary to support both report types. Operational reports tend to concern themselves with various levels of subtotals per the predefined facets. In a case like that, the data mart that backs the report only needs to store those subtotals. The granularity, or resolution of the data stored in the data mart does not need to exceed that of report that references it. Analytical reporting is different. Since users will be expected to drill down on data, from on dimension to another, or to filter the data according to increasingly granular facets, it is critical to store all of the data in the data mart backing the data model. We don’t know the level of resolution the analyst will need; therefore, all detail is required.

As a simple example of this, consider the case where we want to analyze some server log data over a period of time. We can pre-aggregate the data in the data model such that it stores the total of the log entries of various entries on a daily basis. There would need to be a total based on each dimension, but the overall data storage would be less than for the raw data. Such data would allow an analyst to spot trends over several days, but the decrease in resolution means that it will be impossible to spot any usage trends within a given day. If daily trends will never be necessary, then this doesn’t matter, but the nature of analytical reports means that the designer can never be sure.

The more that the source data for the report is pre-aggregated, the less that report becomes analytical in nature, and the more it approaches operational. This is regardless of the tool used; you can build either report type with any tool, it’s just that it may not be optimal.

The issue here is one of semantics. Semantics however are important in knowing what you are getting if reports are being provided to you. Calling something “Analytics” does not make it so. If you spin up a content pack in Power BI, and find that the underlying data model provides just enough dimensions and measure to construct the provided report, and that you can’t deconstruct the data in any meaningful way, what you have is a report, not analytics, no matter what the platform. As with anything, there is a trade-off between complexity and power. Given the nuances of this topic, it’s important to look under the hood to know what you are getting.

The answer “42” is perfectly acceptable if you already knew that the question was “what is 6×9?”. But if you want to know why, that takes a little more digging. You’d also know that there might be a data problem…

The State of Analytics in SharePoint and Office 365

After adoption of SharePoint or Office 365, one of the first things an organization will look for is some understanding as to how the product is being adopted, and what its impact on resource allocation is. Historically, options for reporting on SharePoint have been limited at best.

The Web Analytics Service application was introduced with SharePoint 2010, and relied on a series of connected Excel workbooks and a fairly Byzantine series of staging and reporting databases. It worked so well that it was removed from the product in SharePoint 2013. The Usage logs contain a rich set of information, and they are rolled up into the Usage database, but accessing the data or persisting it beyond a short time period required a fair bit of work.

There were also third party analytical solutions, but most of these came with a hefty price tag, and they focused on page views, embedding code on a page. This approach works well enough for web pages, but it doesn’t capture everything, for example document access though the .NET API. They’re therefore not always well suited to collaborative environments.

SharePoint in Office 365 was initially devoid of analytics, but some basic reports have been creeping in in recent months. With the new administration portal going live, these reports moved from the relative obscurity of the compliance center to the brand new report center, and were augmented by some additional reports.

With the release of SharePoint 2016, and the announcements made at the  Future of SharePoint Event on May 4 2016, we can see the additional areas where analytics are being introduced into the core product. At this point, it’s a good idea to step back and have a look at the Analytics landscape as it pertains to SharePoint and Office 365.

At the moment, the analytics offerings can be grouped into 4 major categories; tenant scoped, site scoped, document scoped, and Delve Analytics. Let’s have a look at each one in turn.

Tenant scoped

The tenant scoped reports are the aforementioned reports that are now available in the new Office 365 Reporting Center.

New usage reports for SharePoint OneDrive Yammer and Skype 1

There are a number of interesting reports in here that focus primarily on the tenant as a whole. How much OneDrive space users are using, Yammer message counts, Skype meetings, emails sent and received, etc. In addition, these reports can be interacted with to show four different time periods, 7, 30, 90, and 180 days. Year over year analysis is not available.

These reports will primarily interest administrators, and it therefore makes sense that they are only available in the administration center, where administrative permissions are required to access them.

Site Scoped

Site scoped analytics contain data that is of concern to site administrators. These users are more concerned with content usage than resource allocation. These analytics features were initially announced at the Future of SharePoint event on May 4 2016, and as of this writing, have not yet rolled out.

The initial rollout will focus on content consumption, visits to the site and document views

image

SNAGHTML1eec8862

SharePoint home page with activity - 100 percent

This is welcome data to beleaguered site administrators, and it will help to identify important content, and content that maybe could be pruned. While it will be initially rolling out to SharePoint Online, the good new is that on premises users will also be able to get this through the new Analytics service application.

In a similar model to the new hybrid search, the new Analytics service application called SharePoint Insights connects to Office 365 and delivers your on premises  usage data to the service – essentially everything that is kept in the logging database. From there, the service can act on it to do interesting thing. One of those interesting things will be to deliver content based activity reports like the ones seen above.

There are a few things to take note of about site scoped analytics. They are scoped to the site, not the site collection. They do not roll up into a master report, so each site must be visited in turn (they live in the “site contents” section) to see the results. As far as I’m aware, the data is only persisted for a short time (I have only seen 7 days), so time based analysis is not possible.

Document scoped

Document scoped analytics have been in the service for some time now, and the new document library exposes them. I call them analytics, but they really are just the activity stream for a document or a folder. The do offer insight, so we’ll stick with the term.

From a “new style document library, you select the information icon on the right to open up the information pane. Part of that information is the activity stream of the document. In the example below I have selected a folder.

image

It’s a welcome addition, and it is what it is. There’s currently no way to aggregate the data or to pivot on it focused on anything either than the document/folder

Delve Analytics

Delve Analytics is a new offering from the Office team that focuses on the user. It analyzes a persons communications and schedule to provide insights into their work experience, with measures like time spent in meetings, time spent in email, work life balance, etc.

Take back your time with Delve Analytics 2

Delve analytics doesn’t really belong in a blog post about SharePoint because it doesn’t analyze any SharePoint or OneDrive data, so I’ll keep this section short. For the moment at least, it is restricted to Exchange email data as a source.

Delve Analytics requires an Office 365 E5 license or it can be purchased separately. Unlike the rest of the analytics options here, there is an extra cost associated with it.

Summary

The analytics options available in Office 365 and in SharePoint have improved drastically, but are still in their infancy. Each of the approaches are targeted at different audiences (IT Pro, site admins, content authors, individuals). This approach is bot good and bad. Tailoring an approach to an audience means that the specific audience will be satisfied, but the different approaches can become somewhat disjointed. It depends on what is necessary.

Analytics at the moment are also limited to specific time slices, if time can be sliced at all and to specific dimensions/measures. This is no problem if recent activity is the only thing of interest, but if more fine grained time slices or year-over-year analyses are needed, then the out of the box approaches will fall short.

Finally, most of the reports focus on activity, there is very little information provided about the SharePoint or Office 365 inventory.

The good news in all of this is not only that Microsoft has made analytics a priority, but that all of its features in this area use publicly available APIs. this means that there is plenty of room for third party vendors to step in to fill gaps and to provide complete analytics focused solutions. In that vein, I’m very proud to announce that my company, UnlimitedViz will soon be releasing a product, tyGraph for Office 365 to do exactly that.