The Problem with Data Governance

Eric Sandosham, Ph.D.
5 min readMar 10, 2024

--

Is it still a legitimate practice in the knowledge economy?

Photo by Tobias Fischer on Unsplash

Background

A friend recently reached out to me for my point-of-view on Data Governance — she was asked to lead this initiative in her organisation and was totally unmotivated by it. I understand why. It’s a topic that’s as dry as sandpaper. Back when I was the regional CAO for Citibank, I was also the designated data governance officer. I hated the additional role. I thought it was a waste of my time. Critical data elements, data stewards, data profiling, etc.; it seemed like smoke-and-mirrors to me. I thought the premise was ill-conceived, and 15 years on, I still think it is.

Over the years, Data Governance has lost much of its lustre having been unable to show much business impact. But what exactly is Data Governance? Has it evolved beyond data quality and data access? How does it square up with the emergence of AI solutioning? And so I dedicate my 29th weekly article to discussing the topic of whether Data Governance as a practice still makes sense.

(I write a weekly series of articles where I call out bad thinking and bad practices in data analytics / data science which you can find here.)

Poor Definitions

Data Governance is described as the process of managing the availability, usability, integrity and security of the data in enterprise systems. That’s the modern definition. It wasn’t always the case. The scope has now been cast so wide that the word ‘Data Governance’ has become meaningless. It’s pure over-reach! Data Governance officers have been reinventing and expanding their charter to stay job-relevant rather than admit that the original construct didn’t have legs to stand on. In fact, the expanded charter of Data Governance looks suspiciously like the operating principles for Industrial Engineering — reliability, availability, maintainability, safety (RAMS for short). Hmmm.

The word ‘governance’ invokes policy setting and compliance management. And policies and compliance are required because there are risks associated with the ‘thing’ arising from choice-making. For example, there is governance in medicine (risk in bad treatment choices leading to loss of life), there is governance in financial products (risk in bad recommendations leading to loss of money), etc. But what are the risks associated with data through choice-making?

Now, the original focus of Data Governance was all about data quality, and that’s where they considered the risks to be. But I would argue that that thinking is misplaced — poor data quality is self-governing because its outcome will determine its continued usage. But the larger issue is what defines quality in data. Many organisations focus on missing data (i.e. data fields not sufficiently populated) or data accuracy (i.e. inputs are valid). But even matured organisations don’t spend enough time thinking about data representation. And how can they when Data Governance is overseen by those who don’t understand information theory. Data can be accurate (e.g. a valid address) but not representative (e.g. it’s not the customer’s address). Consider the example in retail banking where income and occupation are important data points about the customer as they have strong correlation to financial behaviours. Few banks have a ‘data governance’ policy that deals with the expiration and continuous re-collection of income and occupation data. Most banks don’t even think about this problem. They believe that just because the data fields are populated in the system and is ‘accurate’, the data is complete and usable. This failure to recognise data shelf-life is seldom discussed by Data Governance officers.

If Data is an Asset …

Here’s a more practical way to think about Data Governance. In fact, I won’t call it Data Governance at all since the term is a misnomer. Everyone says that data is an asset. I agree. And you don’t ‘govern’ assets. Rather, you ’manage’ assets. You manage assets in such a way that we enhance and extend their economic shelf-life. And so we should go back and expand on the principles of Data Management, and forget about Data Governance. The expanded scope should be grounded on the following objectives:

  1. Data instrumentation & curation → address supply side of asset.
  2. Data enhancement & enrichment → address enhancement side of asset.
  3. Data distribution & utilisation → address monetisation side of asset.

All this speaks to the larger conversation around building Data Capability and Data Strategy, which was the topic of my most read article to date.

Data Instrumentation & Curation requires strategic intent. It requires an awareness of existing data gaps, the value of data diversity, the ethics of data collection, and the underlying infrastructure to organise and index data, including setting up expiration dates and triggers for re-collection.

Data Enhancement & Enrichment requires knowledge of information theory, signal identification, and signal-to-noise amplification. The ability to easily apply data transformation and data blending techniques to the underlying data asset, and to then recognise the outputs as additional assets is critical.

Data Distribution & Utilisation requires knowledge of who needs what kinds of data, who can benefit from additional data (i.e. proactive pushing and recommendation of new data based on domain and past usage). The ability to measure not just use and re-use of data but domain-of-use is essential to understand the monetisation value of the data asset.

What About AI?

These 3 points on data management scope expansion segue into supporting AI strategy and even the elusive ‘AI Governance’ quite nicely. AI adoption (both traditional and generative) is inevitable. But AI isn’t a new discipline with regards to data; it’s just data analytics at scale. And so the asset management principles should logically apply. The key data themes in AI are (i) accurate and ethical sourcing of data for training purposes, (ii) the ability to intelligently feature engineer to reduce unnecessary data bloat, and (iii) to close the feedback loop on utility and usability for continuous solution fine-tuning. These are just granular exposition of the 3 scope areas.

Conclusion

Data Governance is dead, or should be left to die. It needs to mature into Data Management. It isn’t just about semantics; the ramifications are significant. Governing data makes no logical sense in the today’s knowledge economy. You govern solutions; you govern choices. Data as an entity is not where the risks reside, and focusing on data quality and data access isn’t a real practice or discipline. To artificially expand Data Governance to cover availability and usability is disingenuous. Take a bow, Data Governance. It’s time to move on.

--

--

Eric Sandosham, Ph.D.

Founder & Partner of Red & White Consulting Partners LLP. A passionate and seasoned veteran of business analytics. Former CAO of Citibank APAC.