How To Perform Diagnostic Analytics

Eric Sandosham, Ph.D.
5 min readJun 9, 2024

--

Getting good at data-driven investigation.

Photo by Markus Winkler on Unsplash

Background

I’ve recently been engaged on a data analytics consulting project with a client to figure out how to grow their business. The client has an existing team of data analysts, handling activities from business reporting, campaign design and tracking, predictive model building. They were not lacking comprehensive data, and they were not lacking technical skills. But somehow, they were struggling to “connect the dots” to get to the heart of the challenges confronting the business.

This got me thinking about Diagnostic Analytics (DA). Most data analysts and data scientists are familiar with Exploratory Data Analysis (EDA), which is an analysis approach to identify general patterns in the data. There’s a lot of literature on EDA (it has its own Wikipedia page!), but not a lot has been written formally about DA (no page in Wikipedia). So I dedicate my 42nd weekly article to the “art and science” of Diagnostic Analytics.

(I write a weekly series of articles where I call out bad thinking and bad practices in data analytics / data science which you can find here.)

What is Diagnostic Analytics?

Let me first start by describing Exploratory Data Analysis and then contrasting it with Diagnostic Analytics. EDA is an investigative method on a given data set to summarise its key characteristics (e.g. number of records, statistical normality), uncover outliers, noise (e.g. missing data), and relatedness amongst the data elements. EDA is essential in the process of building data solutions such as reports and predictive models. Any data analyst / data scientist worth their salt will know how to do decent EDA. But most would not be similarly adapt or even exposed to doing good DA. Sadly, I’ve seen data analysts / data scientists employ EDA approaches in lieu of DA and this is extremely problematic.

What is Diagnostic Analytics? I am horrified with the various definitional and methodological nonsense that is written about it online! There is just so much bullshit around this subject matter. Simply put, DA is an investigative method on a given phenomenon of interest (e.g. customer churn or business decline) via data representation. DA is NOT exploratory analysis. You are not trying to find interesting trends or correlations in the data. Instead, you are trying to find evidence to support certain claims. Intent makes all the difference.

An underlying core competency to being able to perform Diagnostic Analytics is Data Sensemaking. Data sensemaking is simply the ability to apply a frame or perspective to interpret meaningful information from the data, and to show how the various data elements inter-relate to each other. This frame would also allow one to develop various initial hypotheses on the opportunities and/or risks represented in the data. I’ve written about data sensemaking in these earlier articles:

The focus of EDA is the data. The focus of DA is the interpretation. So how should one go about doing DA? Here’s my take on it:

  1. Validate all your underlying and operating assumptions.
  2. State your intentions.
  3. Focus on inferences instead of proofs.

Validate Operating Assumptions

The first step in DA is to validate all the underlying and operating assumptions about the phenomenon of interest that you are investigating. List down those assumptions and ask yourself: “What would I expect to find in my data if those assumptions were true?”

Consider the example where you are trying to figure out why your business isn’t growing. You need to state all the assumptions around your business operating model — e.g. assumptions about who your target market is, how your target market comes to be acquainted with you, how your value proposition affects the way your customers interact with your products, and so on. You could have an arms-length worth of assumptions. If these assumptions were true, what would you expect to find in your data? What pieces of data would logically triangulate?

I have found that I can often use the Odds Ratio technique to do a lot of validation work on operating assumptions. Odds ratios are used to present the strength of association between risk factors and outcomes in the medical domain. It’s a straight-forward conditional probability calculation. But it’s not often used in the business world. Below is a screenshot explaining the odds ratio calculation taken from Wikipedia.

Odds Ratio calculation logic, source: Wikipedia

Using this formula, you can quickly compute if the business outcomes line up with your expectations, based on the operating assumptions. For example, in the formula above, let X = customer belonging to your target market definition (1 for ‘yes’ and 0 for ‘no’), and Y = product performance (1 for ‘positive’ and 0 for ‘negative’), you can quickly ascertain whether your target market gives you a better product performance outcome as expected.

You’ll be surprised how many of the problems and issues we encounter in business is the result of believing that the operating assumptions hold true when in fact they don’t.

Intentions Matter

You must approach DA with intentions. Beyond the validation of operating assumptions, you must have a clear set of investigative questions to ask. It cannot be “trial-and-error” exploration of the data in the hope of “discovering” an insight. Coming back to the example above, you could ask questions regarding the impact of competitive pressures or regulatory changes on your target market and product performance. To answer these questions, you would then need to “isolate” the appropriate data. And this segues into the final point: that you will never have perfect evidence to answer those questions.

Inferences Only

Because you are working with historic data in DA — data that was not intentionally designed for a specific analysis — you will never be able to get irrefutable evidence to support your hypotheses. All historic data is biased because it has been shaped by your business decisions. For example, your historic data won’t have information about persons that you’ve chosen not to do business with; so it can’t be representative of the general market. There will be phenomenons that are missing or under-represented in your data, and there will be phenomenons that are over-represented. You can only get near-perfect representative data if it comes from a well-executed test-vs-control experiment. The best you can do in DA is to sample the data in such a way that allows you to reasonably infer your evidence. Yes, you can always tweak the data sample in your favour, but this is where work integrity (and ability) matters and where you will build the foundation of your reputation.

Conclusion

Diagnostic Analytics can come across as more art than science, particularly when viewed by a casual observer. This is largely because not enough has been systematically written about its practice. DA requires strong domain knowledge because it’s not about data techniques but about investigative thinking. In my experience, DA is what separates a good data analyst / data scientist from a great one.

--

--

Eric Sandosham, Ph.D.

Founder & Partner of Red & White Consulting Partners LLP. A passionate and seasoned veteran of business analytics. Former CAO of Citibank APAC.