The Problem with UX Analytics
Background
User Experience (UX) has been a major consideration in the digitalisation journey of many organisations. But despite its importance, not enough has been discussed about better ways to measure its effects. UX designers are still highly reliant on ‘5-star’ or ‘thumbs up / thumbs down’ ratings or net promoter scores (NPS) to gauge whether users are delighted with their digital solutions. Survey-based responses have been steadily plummeting across the decades, and now stands at below 10% (i.e. less than 10% of users will take part in a survey). NPS as a methodology has been academically discredited but the world continues to cling stubbornly to it with very little actionable insights. Coupled with the rise of ‘traditional’ and generative AI being embedded into digital solutions, UX designers are beginning to feel the pressure to justify their design suggestions.
I have been consulting with a major Asian bank on their data strategy for the last several years, and had created a methodology on data instrumentation to strengthen their data requirements process for digital solutions. Of late, I’ve started to intersect with their Design Group on the topic of measuring UX. And so I dedicate my 15th weekly article to unpacking the challenge of UX analytics.
(I write a weekly article on bad thinking and bad practices in data analytics / data science which you can find here.)
Stop the Hand Waving
As a data guy, I dislike latent variables. Latent variables are intuitively understandable but poorly defined variables. For example, ‘happiness’ would be a latent variable. ‘Customer satisfaction’ would be another classic example. Both ‘happiness’ and ‘customer satisfaction’ cannot be defined by a single measurable data variable. Now, a lot of UX objectives and outcomes are defined in terms of latent variables — e.g. ‘appeal’ and ‘usability’ — and this creates a huge problem when we try to embrace a more data-driven approach to UX design.
My Asian bank client defines UX as ‘utility’, ‘usability’, ‘appeal’ and ‘trust’ (the 4 pillars). The design group has the responsibility to move the needle and show progress on UX. They would look at outcome metrics like NPS or behaviour metrics like repeat users. But the former is difficult to attribute to any of the 4 pillars while the latter can be highly skewed because of the nature of ’captive audience’ in banking due to high switching cost and inertia.
Stop the Surveying
Most UX designers rely heavily on surveys, including the use of 5-star and thumbs up/down ratings inserted into their digital journeys. While response participation rates have obviously fallen off the cliff, making the surveys responses less and less valid, the other major issue is the flawed thinking on the ‘wow’ effect. Let me explain. When you design a digital application to work, and it does, should your users be ‘wowed’? Why would you expect a 5-star rating when the task is completed successfully on the digital solution as per expected? And how can the digital solution be better than expected? The experience outcome is naturally asymmetric; you can only expect the solution to either work or fail, but not exceed. Of course there are shades of grey when it comes to task failure such as outright non-completion vs taking too long to complete. But we can define this all upfront and simply measure it directly without the need for surveys.
Measure what Matters
So what, when and how should we measure UX? I submit to you the following argument:
- Your UX measurements must affect your design choices; you must have the ability to make tweaks to them based on your measurement readings.
- Your UX measurements must be context resilient, or you must capture context along with your measurements.
- Your UX measurements must allow for unambiguous diagnosis to effect change.
Let’s unpack each one of them.
Point (1) is related to a key principle of data — that it is used for or to improve decision-making, i.e. data-for-decision. I wrote about this in my article on Data Strategy. Consider the example of removing log-in steps by providing pre-filled information through universal authentication such as the use of digital national ID application SingPass in Singapore to renew your car insurance policy digitally. This is obviously a better UX. There is no point measuring ‘task completion rate’ (a popular UX metrics) or 5-star ratings. The design consideration here is the population that cannot authenticate through SingPass and you need to design for this alternative journey and data instrument it to keep making tweaks to remove log-in friction.
Point (2) is related to the information signal contained in data, which is always context-specific. I also wrote about this in my article on Information Theory. Consider the example of the popular UX metrics — ‘task duration’ — used to measure the attachment of supporting documents for a loan application as part of a new digital loan process. The user could be using a laptop where it is easier to attach documents due to availability, formatting options and overall user interface, but have a high probability of multi-screening, leading to inactive time. The user could alternatively be using a smart phone where it is harder to attach documents but have a much lower probability of multi-screening. When you measure top-line ’task duration’ without contextualising for device usage or whether the user is on the move or even the strength of the internet connection, then the metrics can be interpreted in so many ways that it becomes useless for problem detection. So it’s important to capture the context along with the UX metrics or convince yourself that the UX metrics does not variate significantly with different probable usage context.
Point (3) is related to data relationships as covered in my previous article on designing dashboards. I wrote about the use of the framework “Input -> Activity -> Output -> Outcome -> Impact” to ensure that you capture the range of data points for a given phenomenon of interest. This applies equally well to the domain of UX. It’s important to show the ‘lineage’ of how you got to your desired UX outcome metrics (e.g. 5-star rating). Without the lineage, you would have difficulty figuring the right levers to pull to improve your UX outcomes. It is also important to consider 1st party, 2nd party and 3rd party data across the data lineage. Let’s revisit the example of the digital loan application process mentioned above in Point (2). To measure UX for this process, you could capture:
- Input — number of first time users vs repeat users
- Activity — task duration for smart phone users vs laptop users across all sessions
- Output — number of successful application submissions on smart phones vs laptops
- Outcome — number of usability complaints from internal customer support and on social media
- Impact — ratio of loan application from analog vs digital process
Conclusion
My Asia bank client recently shared a wonderful example of how even the best UX designers can miss the obvious. The bank recently re-engineered its branch CRM platform. The UX designer created a user ratings at the end of the usage journey, and if a user provided a negative rating, they were prompted to provide a short written reason. The bank found that the re-engineered branch CRM platform garnered a 90% positive rating but they didn’t see a corresponding rise in sales productivity. A focus group revealed that the users wanted to minimise task duration by not having to write the additional reasons for poor ratings; they were busy with their work already. And so they opted to give a good rating instead even though they were not pleased with the updated solution. The UX designers in this instance forgot about the well-know design philosophy that users will always pick the path of less resistance. But could they have discovered this with a more thoughtful UX measurement?
I’m hoping to see more discussions on incorporating UX measurements as a foundational input into UX design frameworks. This is going to get critical as we start to leverage AI into many of our solutions, making UX measurements even more complex.