Measuring Success When GenAI Products Have Ambiguous Use Cases
Products with unlimited and unpredictable use cases make it hard to measure success against standard metrics. This guide will help.
Kevin Newton is a Senior Manager of User Experience Measurement at LinkedIn. This article has been adapted from his Co-Lab Continued presentation, “Copilot, Coach, Constructs, Oh My!”
Most of us have a firm idea of how to measure user experience success through standard metrics. But when it comes to the evolution of GenAI products, all bets are off.
Users are applying GenAI in innovative and limitless ways that can’t easily be compared or measured 1-1 with previous products. So how do we measure success? Let’s explore.
Jump to…
Measuring UX 101
Let’s set a baseline by measuring UX 101 first so that we have the same language because there's so much to measuring UX.
The first thing that we do with digital products and experiences is that we build a “thing” that people can do another “thing” with. We want them to be able to do something with our product.
For LinkedIn, that’s things like applying for a job, checking your notifications, creating a document, adding an image to a Google slide, etc.
Then, we define what success is for our users. For example, if it's an application for a job, they have to be able to…
- Find a relevant job
- Assess your qualifications
- Put in the application
If they do that, they will show up on a dashboard as a percentage of success. That's how we do it today.
We are then concerned with the ease of use and most delightful path to success. To measure the experience, you observe people trying to take the action and collect all of the objective and subjective measures. Those are your objective measures when you're thinking of success.

The classic most common benchmarks of success include…
- How many people could successfully take the action?
- How long did it take them?
- How many errors were there?
Then there are more subjective measures around delight. This is where the human comes in. We ask them…
- Was it easy or difficult?
- How confident did you feel?
- How satisfied were you?
- How did it meet your requirements?
If any of those metrics are low, we blame the interface, and we blame the code—as we should.
We say, “We need to make it more intuitive, more usable, more delightful. Simpler.”
How GenAI changes typical measurement practices
Now let’s talk about the AI experience. GenAI breaks all the rules. It hides in the shadows. The AI experience involves training a model to employ probabilistic strategy, not deterministic to respond to prompts.
It's no longer human → computer interaction.
It's human → AI → computer interaction.
It's very different. This puts the user's experience or the user's interactions as nothing more than a gateway to an invisible interaction happening between the AI and the computer—it has seemingly unlimited potential and output.
Building things a user can do instead becomes building connections to a model. Those connections can be used to do many things. Some we want AI to do, and unfortunately some we don't.
Let's look through some examples of how people are using one of our products.
Microsoft's Copilot example use cases
Below are some examples of how people used Copliot in unexpected ways.
✔ More quality time with family
One person has an autistic child who really responds to having stories of their life told to them. She used to spend hours writing stories, but now she uses Copilot to write them. She says, “Now I can spend time with my child instead of preparing for time with my child.”
✔ Environmental sustainability
One user is sustainable and loves the environment, but also loves to bake. What is she going to do with all those egg yolks after she makes the white frosting? She doesn't like throwing them away. She uses Copilot to help her be more sustainable in her hobby.
✔ Gathering photos of family
This user takes pictures of all of his family's outings on the weekends. He added some and put them in an album every weekend. He used Copilot to write code to take out all of the stuff he didn't like. It pulls in the photos from the camera, puts a filter on them, and puts them in an album.
✔ Better job recruiting
This user is my favorite because I work at LinkedIn. This is a tech recruiter in India.
Recruiters have to recruit for a lot of different roles. They don't do those roles or work those jobs, which entails a lot of conversations about what that role is. Copilot makes it easy for him to understand any type of job, so he can find more qualified candidates. It's less back and forth between him and the hiring manager.
✔ A creative partner
This person uses Copilot to help make marketing materials, so she can spend more time with her customers. She treats it as a brainstorming partner.
Measuring AIX 101
So instead of what can people do with the products your putting out, the question now becomes what will people do with your products. And that’s something we don't know, GenAI is freedom.
We're unlocking the gates and giving them access to this data in a constrained way, but basically, it’s freedom they’ve never experienced. It's no longer deterministic on an interface.
If we can't define it, it's hard to measure it. We have to be able to define success in some way.
What will people do with your product? The next step is also ambiguous. Unlike deterministic products, GenAI will only stop them if they're trying to go against its programmed ethics. (Unfortunately, sometimes not even then—yet). What success looks like is also varied and hard to define.

We saw it in the Copilot examples—it's different for every person. Maybe we should just ask the question subjectively, “To what extent was their success?” If we aren't necessarily telling the user what success is or how to define it, as long as they feel like they're successful, that might be enough.
Let's bring back the objective and subjective measures. I would propose that the objective measures are now coming from the user interacting with the computer or the data—that it actually comes from the AI itself. We're no longer judging the success of the user, how long interacting with specific product tasks takes, or how many errors they make.

Instead, we're asking…
- How successful was the output?
- How long did it take for your AI feature to give them the output?
- How many errors did it make in the output?
If you have a GenAI product out there today and you're not measuring these, you're behind. Talk to your leaders right now and make sure that they're monitoring this because it’s a crucial aspect of AI features in today's world.
Don’t be afraid to be a critic (and a visionary)
We need to employ humans to be harsh critics of our AI. We don't know what their outcome is, we don't know how they’re using it. So ask them, “Did it meet your intended outcome?”
If they're saying yes, it's doing what we want it to do. That's important because it could do anything. Did the AI provide them with something useful? It's maybe one of the first times that we have a ubiquitous tech that's going to change lives.
Here's a provocation: If the feature you put out in the world for GenAI isn't changing your users’ lives, you're probably using it incorrectly. See what's the next thing you can do, because it really has that opportunity.
Ask users…
- Does using this AI actually change your life for the better?
- If you couldn't use this anymore, would your life be worse?
“Here's a provocation: If the feature you put out in the world for GenAI isn't changing your users’ lives, you're probably using it incorrectly.”
Senior Manager of User Experience Measurement, LinkedIn
Finding the invisible road of AIX measurement
Experience has shifted to an invisible state where deceptively simple interfaces act as unconstrained gateways to access. To measure the experience of these freeing but invisible states, we need to treat the AI experience as the user.
We need to produce objective UX metrics to ensure our products perform no matter what people do with them—as long as it's ethical—with our products.
- How successful is the model?
- How fast is the model?
- How many errors does it make?
To understand how these invisible processes are delivered, delivering on the promise of the AIX, we need to employ our users to be harsh critics.
- Did it behave the way they intended it to?
- Did it deliver on the utility our customers and members expected?
- Does the AI experience truly enhance and change our customers’ or members’ lives?
By recognizing the invisible and freeing nature of the AI experience, expanding our definition of who users are, and employing our customers or members as harsh critics of the AI experience, we can get closer to holistic measurement of the elusive and indefinable future of UX.
You may also like…

Subscribe To People Nerds
A weekly roundup of interviews, pro tips and original research designed for people who are interested in people