The Problem with Using External Data

Eric Sandosham, Ph.D.
5 min readNov 26, 2023

--

Photo by Hunter Harritt on Unsplash

Background

I’ve been writing a weekly article on bad thinking and bad practices in data analytics / data science for the last 3 months. I’ve enjoyed the challenge of figuring out articles to write and maintaining the output consistency. My most popular article to date was one about the problems in which organisations approach their data strategy. While I covered a number of critical points, the one that really caught fire was about approach data strategy based on intent:

  1. Data-for-Automation (DfA) — leveraging data & information to reduce process friction and increase the speed of decisioning.
  2. Data-for-Decision (DfD) — leveraging data & information to reduce uncertainty in decision-making, thereby improving the quality of its outcome.
  3. Data-for-Product (DfP) — leveraging data & information to improve the features and functionalities of your product offering, thereby creating more demand for your products.

(Obviously the same data element can appear in more than one category.)

Interestingly enough, I was recently invited to a data symposium in Kuala Lumpur organised by one of Malaysia’s largest bank; I was asked to be a guest on a panel discussion to discuss ‘leveraging external data for competitive advantage’. While the questions scratched the surface (par for such conferences), it did trigger me to want to talk more about this sub-domain in the broader data strategy. And so I dedicate my 14th weekly article to pointing out the bad thinking around the use of external data.

Types of External Data

Let’s first consider the landscape of external data. All data can be classified as 1st party, 2nd party and 3rd party. 1st party data is data that is generated from your organisation’s business operating model, and is proprietary and unique to you. 1st party data is also known as internal data. 2nd party data is data that is generated from your business partners’ operating models and is shared with you through a legal agreement. In short, 2nd party data is someone else’s 1st party data. 3rd party data are generally obtained from data aggregators, but could also include your own collection process on non-customers using such techniques as screen scraping (legalities apply). Both 2nd party and 3rd party data are considered as external data. (For the purpose of this article, we will not cover the direct purchase of prospect data for use in business development.)

Value of External Data

Now, we typically assess data on 2 broad dimensions — validity and reliability. Validity relates to whether the data element accurately represents the phenomenon of interest. For example, clicking and placing an item into an online shopping cart can be accurately interpreted as ‘interest to purchase’. Reliability, on the other hand, relates to whether the data element consistently represents the phenomenon of interest. For example, the same occupation code doesn’t always represent the same level of economic resilience as the macro social-economic landscape shifts with each generation.

As one moves from 1st party to 3rd party data, both validity and reliability degrade. And correspondingly, so does the economic value to your organisation.

Reducing Uncertainties

Relying solely on external data will NEVER give you a competitive advantage. You might get a near term boost in your bottom line but it will not be sustainable as your competitors can easily replicate, and overtake, your success. When thinking about external data, we must take an inward view first. We must be sufficiently familiar with both the information value contained within our 1st party (internal) data and its underlying uncertainties (validity and reliability) — e.g. we may have income information about our customers but it is outdated and hence represents significant uncertainty regarding its validity. External data is always complementary; use it to improve validity and reliability of your internal data.

A good way to start is use the framework of Data-for-Automation (DfA), Data-for-Decision (DfD), Data-for-Product (DfP) as highlighted above. Ask yourself …

  1. What uncertainties exist for your DfA, DfD and DfP activities in terms of accuracy and repeatability of outcomes? (Assuming you’ve maximised the use of internal data already.)
  2. Can these uncertainties be reduced through the use of 2nd party or 3rd party; and it what way?
  3. How would you ensure the validity and reliability of the external data over time?

Consider the example where your customer demographics data is outdated. You could use 2nd party data or even social media data to either explicitly find the new updated data or to estimate it based on behaviour changes (you can triangulate the estimates from internal data proxies and external data proxies).

It may also be helpful to classify your external data, assuming you already have some. This classification may help you to understand the ‘granularity’ of your external data and what kinds of uncertainties they may serve to reduce:

  1. Data about your customers
  2. Data about your products
  3. Data about your organisation
  4. Data about your competitors
  5. Data about your markets
  6. Data about your socio-economic environment

The Right Data Partners

A question that seems to arise often enough is how to create the right data partnerships, particularly when it comes to 2nd party data. With 2nd party data, it typically takes the form of data exchange — a kind of quid pro quo, if you will. In any exchange, it is the asymmetric recognition of value that underpins it — the giver’s data is more valuable to the recipient, and vice versa. You shouldn’t be trying to solve for equivalence of exchanged value, but rather to encourage and work with that asymmetry in mind.

Now, how should you go about evaluating if your data partner is appropriate? If your organisation has a dominant market share, then your data can be a fairly accurate representation of the market. The nature of your uncertainty isn’t necessarily about how to further grow your market share, but perhaps deepening those relationships with more products and services. You want to understand new needs and pain-points that are not captured within your 1st party data. It makes sense then to get into a data partnership with a similarly market-dominant player in a complementary industry, one where there is a high percentage of overlapping customers.

However, if you are a non-dominant market player, then the nature of your uncertainty may be about better understanding your existing customer needs, which would then allow you to increase your market share. In such an instance, data partnering with another non-dominant market player may be the better approach, i.e. you and your data partner don’t have a high percentage of overlapping customers.

Conclusion

As the world becomes increasingly awashed with data, having a good appreciation for the value and use of external data becomes more and more critical. But ultimately, external data is complementary. External data is a gap-filler. The more you understand your information gaps based on your familiarity with your internal data, the more you will be effective in leveraging external data.

--

--

Eric Sandosham, Ph.D.
Eric Sandosham, Ph.D.

Written by Eric Sandosham, Ph.D.

Founder & Partner of Red & White Consulting Partners LLP. A passionate and seasoned veteran of business analytics. Former CAO of Citibank APAC.

No responses yet