Pivoting to Data Products

Eric Sandosham, Ph.D.
5 min readJan 5, 2025

--

Can the data warehouse team evolve to create data products?

Photo by fabio on Unsplash

Background

Banks have traditionally been at the forefront of creating data warehouses and data lakes (DWH for short). The core of their valuable data is structured, making it relatively easier to process, organise, and curate. Banks have increased their business and operating dependence on their DWH, there is talk about how they can evolve their DWH capabilities towards a product engineering structure, where IT are engineers creating and maintaining the viability of data products.

So what exactly is a data product? And how should the DWH IT function in organisations (and banks) think about the roadmap towards product engineering. And so I dedicate my 72nd article to unpacking some ideas around this topic.

(I write a weekly series of articles where I call out bad thinking and bad practices in data analytics / data science which you can find here.)

What is a Data Product?

What exactly is a data product? Forbes defines data products as “live, refined, fully governed and ready-to-use data assets that are instantly discoverable, contextualised, trustworthy and reusable for many use cases. Put simply, data products allow organisations to reuse data across a variety of use cases to save costs and time.”

A post from Reddit reads, “A set of tables is not a data product. But a dataset in a Data Catalog is a data product if you have data engineers using the data catalog. If you have no data engineers, it’s not a data product.”

Another definition from a Data Engineering software solutioning company, lakeFS: “A data product is any tool or application that processes data and generates insights. These insights are aimed at helping businesses make better decisions for the future.”

The Forbes definition conflates data assets with data products. It is broad and ambiguous, and not particularly helpful. It is a “catch-all” of buzzwords. Using the Forbes definition, a domain-centric data warehouse with a proper metadata layer would qualify as a data product. I don’t agree with that. The Reddit definition is pure rubbish. It is based on usage. The lakeFS definition resonates with me. I think it is a sufficiently good definition because it is directional and usable. I would personally define a data product as a solution built on data assets, where its utility is information that improves decision-making.

As a former banker, I will illustrate these points using the example of the Credit Bureau (CB). For those not knowledgeable about CB, it is simply a 3rd party government-approved data provider with information about individuals with loans (including credit cards) disbursed by the member banks of the CB. The information is on loan amount, repayment behaviour, etc. The information is used by the CB member banks to decide whether it should approve a current loan application and even to withdraw a previously approved and disbursed loan. The CB offers a range of services: it allows member banks to query its database via API (not all data points can be queried depending on the country’s central bank’s regulations), it can produce reports on trends and market share, and for the more sophisticated ones, it also provides a credit score that estimates the likelihood of an applicant subsequently becoming delinquent on their loan if it were to be approved. Now, which of these are data products?

The API-enabled query-able database is a data product. Why? Firstly, through the magic of API, it is a solution and not just a database. And it provides information-centric utility for decision-making, namely credit underwriting.

It should be obvious that the credit score is a data product. Information signals have been distilled via the score, and it is consumed directly into the credit underwriting process, making it even more valuable than an API-enabled query-able database.

The trend and market share reports aren’t necessarily data products. Why? A pre-canned static report is not a solution; it’s unclear what it is solving for. It is just information. However, if those trend and market share reports are interactive (via interface) and customisable (including enrichment with 1st party or public domain data), then it has the beginnings of a data product.

Evolution of DWH Towards Data Products

Coming back to the question posed in the opening paragraph, we can unpack it in the following way. The presence of a DWH in an organisation suggests that there are data practitioners — data analysts and/or data scientists. These data practitioners are already creating data products or pseudo data products for their respective user functions — e.g. a predictive score for a marketing campaign, a recommendation engine, an automated financial forecast. The DWH team shouldn’t be competing with these data analysts and data scientists. Rather, the DWH needs to figure out the gaps that they can fill. For example, can the DWH create data products that ease the toil of the data analysts and data scientists in their own work of creating data products? Can the DWH create data products for audiences that are not sufficiently served by the data analysts and data scientists? Can the DWH unify similar but independent data products created by data analysts and data scientists into a more useful generic data product?

In my 25th article (The Problem with Data Monetisation), I introduced several principles on what drives the monetisation value of data. I’m repurposing it here for data products. Here are the 3 principles that drive the utility of data products:

  1. Your data product should be used for decision input rather than decision outcome. As an example, a credit score is used for decision input while an interactive, customisable market share report is used to monitor decision outcomes.
  2. Your data product should significantly and uniquely reduce decisioning uncertainty, rather than simply providing an additional triangulation or reference point. As an example, a data product that is utilised to predict a specific outcome is more valuable.
  3. Your data product should be difficult to replicate. Specifically for DWH, it should be difficult for internal data practitioners to replicate. This can be because of unique DWH capabilities like pre-processing, intensive computation, access or ownership of data pipelines, or even security.

Conclusion

It is only natural that as DWH teams mature, they seek to move up the value chain. And given the influence from Big Tech, the creation of data products seems logical and attractive. As an internal infrastructure capability, the DWH needs to first focus on where the opportunities are to complement, supplement and amplify the existing work of the data practitioners. Organising themselves to execute against these opportunities would then be the next logical step.

--

--

Eric Sandosham, Ph.D.
Eric Sandosham, Ph.D.

Written by Eric Sandosham, Ph.D.

Founder & Partner of Red & White Consulting Partners LLP. A passionate and seasoned veteran of business analytics. Former CAO of Citibank APAC.

Responses (2)