What’s in a Name? (Part 3) — Data Analytics Education

Eric Sandosham, Ph.D.
4 min readDec 23, 2023

--

Photo by Patrick Tomasso on Unsplash

Background

I write a weekly series of articles where I call out bad thinking and bad practices in data analytics / data science. I’ve now written 2 articles in my sub-series (here and here) that questions both the need and the impact of the proliferation of a great many roles and sub-practices in data analytics. For example, we’ve recently created new roles such as Decision Scientist, AI Engineer, Data Connector. In part 3 of this sub-series, I unpack the implications that these multitude of roles have on our data analytics education.

Before I proceed with that, allow me to quickly provide some further clarification to the issues raised thus far. A number of my readers gave feedback on the importance of the emerging new roles in data analytics and how their teams are organised around it. However, I note that we often confuse roles with sub-practice. Roles are based on a defined set of activities and responsibilities, and are necessary as a means to ‘divide and conquer’ the ever-growing space of work. But new roles do not have to translate to new sub-practices. However, in the realm of data analytics, they tend to. A data scientist will claim that they are fundamentally different from a regular data analyst. And an AI Engineer would do the same and separate themselves from ‘ordinary’ data scientists. The desire to draw boundary lines impacts data analytics education, and also data analytics leadership (the focus of my next article).

We have seen education providers customise and differentiate their learning curriculum in response to some of these new roles. For example, we are now seeing courses that differentiate between AI and Data Science, where the former covers machine learning, deep learning and natural language processing (NLP), and the latter covers insights extraction from machine learning. If I were an undergraduate, I would be confused on which path to pursue. If we start out with speciation too early, will we not be creating friction in terms of career development down the road? For example, there is a perception that it is difficult for a regular data analyst to switch to the ‘sub-practice’ of data science in the middle of their career because the sub-practices are ‘fundamentally ‘ different.

Gaps in Data Analytics Education

I’ve been asked a few times to teach undergraduate courses in data analytics. I generally turn them down. Because I disagree with the mainstream curriculum, and proposing changes is a bureaucratic nightmare. So I prefer to teach practice-oriented adult courses in data analytics. Over the years, I’ve been approached to develop adult learning curriculum on data sensemaking for data analytics / data science practitioners, and I have found this to be a curious oddity.

And so I submit to you that education institutions should be teaching only ONE foundational curriculum on data analytics. Because there is only ONE practice domain. Education institutions should not give in to the temptation to speciate their curriculum offering just because the market creates these new roles. Instead, they should be continuously subsuming these new roles into the practice by enhancing the curriculum to fill the gaps.

At present, there are already several gaps noted in data analytics education. The current curriculum in data analytics / data science consists of topics in probability + statistics + calculus, coding + programming, predictive modelling + machine learning, data management, data visualisation, application use cases. The curriculum is heavy on techniques and its applications into various domains. But they don’t cover Information Theory, Data Sensemaking, Visual Cognition (how the brain sees information), Problem Framing (not the same as hypotheses creation), Change Management for Data Analytics, to name a few.

Consider Data Sensemaking. Education institutions rarely cover the nature of information signals and how problem-framing can affect signal detection. Ultimately, all of data analytics / data science starts with signal detection; data is just the ‘vessel’ for it. Let’s consider the following example regarding signal detection and problem-framing. In credit underwriting (i.e. loan approval process), occupation is an important variable. It will most definitely (negatively) correlate with the target outcome of predicting whether an applicant is likely to become repayment delinquent subsequently. Occupation is a ‘vessel’ carrying multiple signals — e.g. it carries information signals on income, on cashflow, and for the purpose of credit underwriting, on economic resilience. How does the strength of these signals change under different context? How do we think about alternative (data) proxies for the same signal? This intersects with the problem-faming domain. And formal education doesn’t teach this.

Predictive modelling, including machine learning, is essentially signal amplification and noise reduction. It’s not about deriving ‘insights’ from the data; you should already have an expectation of what signals you are attempting to find and isolate / amplify. Signal amplification / isolation and noise reduction extends beyond modelling. Even ‘traditional’ data analytics requires it. For example, segmentation analysis, which is really signal isolation.

Conclusion

Education institutions must review the foundations of their data analytics curriculum. Each time the market proposes a new role, they should figure out the nature of the gap and work towards filling it in their curriculum. They must educate for the wider practice of data analytics and not speciate it for roles. Many of the professors who teach the current curriculum are largely non-practitioners, or they pretend to be just because they’ve got a consulting gig on the side. This is one of the reasons why so many data-scientists-to-be feel lost when they graduate from school and enter the workforce where they are confronted with real-world challenges.

My next article will be the last in this sub-series. I will explore the implications of role and sub-practice speciation on the evolution of data analytics leadership.

Stay tuned!

--

--

Eric Sandosham, Ph.D.

Founder & Partner of Red & White Consulting Partners LLP. A passionate and seasoned veteran of business analytics. Former CAO of Citibank APAC.