Words by Nikki Anderson-Stanier, Visuals by Allison Corr
Usability testing is a fundamental part of the product development process and a critical skill for anyone looking to evaluate an idea or prototype.
Although usability testing is one of the first research methods many people learn, it takes a lot of time and practice to nail the skill down. From planning to writing tasks to analyzing data, usability testing is a fantastic method to have in your toolkit. It can help your team move forward and get "unstuck" in certain situations.
On the flip side, usability testing can be overused or used in the wrong scenarios. It's incredibly important that, when planning, we ensure the methods we choose are the best for getting the information we need at the end of the study.
There are a few times I went about this incorrectly, rushing to conduct a usability test when it wasn't the correct method for the goals my team was trying to achieve. Because they are so easy (or seem to be), we can get stuck saying, "Let's run a usability test," without being sure that's the best approach.
To take away that uncertainty that I once felt, let's go through the end-to-end planning process of a usability test to ensure when you pick it, it's the best method for what you need!
What is usability testing?
Usability is the ability for someone to:
- Access your product or service
- Complete the tasks they need
- Achieve whatever goal or outcome they expect
For example, with a clothing company, a user would go to the website with the expectation of being able to purchase clothing. But just because a user can buy clothing doesn't mean it's an easy or satisfactory experience.
So we can break usability down further into three areas:
- Effectiveness: Whether a user can accurately complete tasks and an overarching goal
- Efficiency: How much effort and time it tasks for the user to complete tasks and an overarching goal accurately
- Satisfaction: How comfortable and satisfied a user is with completing the tasks and goal
Usability testing—whether you use metrics or run a qualitative usability test (more on that later)—looks at these three factors to determine whether or not a product or service is usable.
So, what about those who browse e-commerce out of boredom (and no purchase intent)?
This leads to the next question…
What are we testing? And how do we know?
As I mentioned above, there are many aspects you could test (even with a simple product!). But the point of a usability test is to ensure that users can complete their most common tasks to achieve their goals.
Again, one of a clothing website's main goals is to purchase clothing. You can have smaller goals within that larger goal, such as comparing clothing or buying a gift for someone.
Then, you can break each larger goal into the tasks people must do to achieve those goals.
- Searching for specific types of clothing with keywords
- Filtering through colors, brands, sizes
- Sorting by reviews, prices
- Opening multiple windows to compare different options
- Saving clothing to a favorites list
- Reading (and understanding) the size and fit of clothes
- Adding a piece of clothing to a basket
- Checking out and paying for the clothing
- Receiving a confirmation of purchase
- Receiving the clothing
These are all tasks associated with the larger goal of purchasing clothing. With usability testing, we ask people to do these critical tasks to assess whether or not they can achieve them and the larger goal efficiently, effectively, and satisfactorily. We also get their feedback on the experience we put them through.
If someone can do these tasks, they can achieve their expected outcome. However, if efficiency, effectiveness, or satisfaction suffer during this process, they may get frustrated and give up or go to a different website.
We've all encountered this—an infuriating user experience that made us rage-click, throw our phones (against a soft surface, of course), and give up on a product or service.
This is why usability is so vital. It can give us a clear understanding of how our product is aligning (or not aligning) with users' mental models of the experience.
Qualitative versus quantitative usability testing
There is an essential distinction between qualitative and quantitative usability testing. I put usability testing into one category for a while, rather than thinking through which was most applicable. I also sometimes tried to run a hybrid test, which made for an inefficient use of the method.
Qualitative usability testing is looking to get actual feedback from the user on the idea, prototype, or flow. For this, you show participants screens or images and ask them to provide qualitative feedback to understand how they perceive what's in front of them. This route means the session will be a discussion or conversation.
Conversely, quantitative usability testing strictly assesses the above three cornerstones of usability: effectiveness, efficiency, and satisfaction. Through this session, we evaluate how participants move through the experience we put in front of them through metrics. In this session, you aren't asking for qualitative feedback (as that would skew the metrics). Instead, you’re trying to mimic a real-life experience as much as possible.
Is usability testing the correct method for your study?
Knowing what a method is and when to apply it are two distinct things. As I mentioned above, I knew what usability testing was.
I was reasonably confident in my skills, so I overused the method for studies that would have been better off with a different approach. This happens quite often—we get comfortable with and stick with a method, even when it may not be ideal.
However, there are ways to determine if usability testing is the proper method for your study. I do this by reverse engineering the information we need by the end of the study. Start with research goals by talking to stakeholders about the questions they need answered by the study.
Creating research goals for usability testing
Before we dive into the main goals for usability tests, it's great to understand where usability testing shines and what it can help us understand:
- How users perceive a product
- How well a user can use a product for its intended function
- How well a product allows (or doesn't allow) a user to reach their goals
- How a user uses a product, separate from how we think a user should use a product
- How a product functions when placed in front of a human
- Where bugs and complicated user experiences lie within a product
On the flip side, usability testing will NOT tell us:
Reverse engineering goals
As mentioned above, I love to reverse engineer my research goals by basing them on my stakeholders' questions and the information they need to make a clear and straightforward user-centric decision moving forward.
I ask my stakeholders questions when they come to me with a particular research request. Some of these questions include:
- What type of information do you need at the end of the project?
- What decisions do you want to be able to make?
- What are the top three questions you need to be answered?
- What is your definition of success for this project?
Another way to do this that I've used is to have them fill out this mad lib:
I need [X information] to answer [Y question] to make [Z decision] by [timeline].
Once you have this information, it’s easy to start creating research goals that help you determine the best methodology. I do this by writing these in a research plan, which is a great way to keep everyone aligned during the research project.
I will split the goals up by qualitative and quantitative usability testing.
Qualitative usability testing goals
Taking the above questions, let's say that our stakeholders gave us the following answers about the information they need:
- To understand how participants react to and perceive the prototype/experience
- To understand how well our idea aligns with their mental models or expectations
- To get some feedback on how we could improve the prototype/experience
And about the decisions they are trying to make at the end:
- To make iterations on the experience so that it better aligns with users' mental models
- To improve the flow/experience based on feedback
- To feel more confident they are going in the right direction with the experience/flow
With this, it’s clear that they want qualitative feedback from participants to better understand the current experience and how they can improve it. We seek qualitative feedback whenever we want to know how people react, feel, and perceive.
Additionally, the team wants to improve the experience so that qualitative feedback would be hugely helpful.
For this, I would say the goals are:
- Discover participants' reactions to and perceptions of the current experience
- Learn about participants' current pain points, frustrations, and barriers to the prototype/experience
- Evaluate how participants work through the prototype/flow and their feedback
Quantitative usability testing goals
Instead, let's say our stakeholders answered that they wanted the following information from the study:
- To measure how effective and efficient the product is
- To evaluate how the product works when put in front of a human
- To benchmark this current experience against a future one
And they wanted to make the following decisions:
- To improve the efficiency and effectiveness of the product
- To make any final changes before shipping the product
With this, we are looking at concrete information regarding effectiveness and efficiency. The best way to get this type of information is through quantitative testing. With this information, qualitative feedback wouldn't give us the necessary results. Quantitative usability testing would enable us to measure the metrics and improve them by the end of the study.
A note on fidelity and order
For quite some time, I made quite a significant mistake with my quantitative usability tests. I tested low-fidelity prototypes with quantitative usability testing.
Low-fidelity prototypes often have a "happy path." The point of making a low-fidelity prototype is not to design everything, including the "unhappy path." With this, when it came to measuring the most common metrics, such as time on task or task success, I couldn't accurately measure them.
If there was only one path, how would I know if someone might be unsuccessful? How would I know how lost someone would get? Or how much time would it take them if there were all the distractions there typically are on an interface?
Low-fidelity prototypes aren't ideal for quantitative usability tests because they don't mimic real life in the way high-fidelity or live products are better suited to. There might not be possibilities for participants to fail or get lost.
With that in mind, I always recommend using qualitative usability testing with low fidelity so you can hone it into a high-fidelity or working prototype that you then test with a quantitative usability test.
Picking metrics (for quantitative usability tests)
Since quantitative usability tests require metrics to measure, it's best to pick those earlier on in the process. I base these metrics on the three cornerstones of usability we already mentioned:
- Effectiveness: Whether or not a user can accurately complete a task that allows them to achieve their goals. Can a user complete a task? Can a user complete a task without making errors?
- Efficiency: The amount of cognitive resources it takes for a user to complete tasks. How long does it take a user to complete a task? Do users have to expend much mental energy when completing a task?
- Satisfaction: The comfort and acceptability of a given website/app/product. Is the customer satisfied with the task?
Combining these metrics can help you highlight high-priority problem areas. For example, suppose participants respond confidently that they completed a task, yet most fail.
In that case, there is a vast discrepancy in how participants use the product, leading to problems. Let's break up the metrics by area of usability testing:
✔ Task Success
This simple metric tells you if a user can complete a given task (0=Fail, 1=Pass). You can get fancier with this by assigning more numbers that denote users' difficulty with the task, but you need to determine the levels with your team before the study.
✔ The number of errors
This task gives you the number of errors a user committed while trying to complete a task. You can also gain insight into common mistakes users encounter while attempting to complete the task. If any of your users want to complete a task differently, a common trend of errors may occur.
✔ Single Ease Question (SEQ)
The SEQ is one question (on a seven-point scale) measuring the participant's perceived task ease. Ask the SEQ after each completed (or failed) task.
Confidence is a seven-point scale that asks users to rate how confident they were that they completed the task successfully.
✔ Time on Task
This metric measures how long participants can complete or fail a task. This metric can give you a few different options to report on, where you can provide the data on average task completion time, average task failure time, or overall average task time (of both completed and failed tasks)
✔ Subjective Mental Effort Question (SMEQ)
The SMEQ allows the users to rate how mentally tricky a task was to complete.
✔ System Usability Scale (SUS)
The SUS has become an industry standard and measures the perceived usability of user experience. Because of its popularity, you can reference published statistics (for example, the average SUS score is 68).
✔ Usability Metric for User Experience (UMUX or UMUX-Lite)
The UMUX/UMUX-Lite is the "newer" version of the SUS and measures the experience's perceived usability. The advantage is the UMUX is only four items, and the UMUX-Lite is only two, making it easier for participants to take.
When choosing metrics for a quantitative usability test, especially if it is moderated (more on that in a bit), make sure not to pick so many that it is difficult to manage. The first time I conducted a quantitative usability test, I got very excited. I decided to measure time on task, task success, confidence, the SUS, and the number of errors.
It was way too much for me to properly keep track of during the test, even though I had a notetaker helping me. Of course, if you're using an unmoderated tool, some metrics—typically task success and time on task—are already recorded, so you don't have to worry about tracking them.
But also keep in mind, when choosing metrics, the number of surveys or items you're asking your participant to respond to. For instance, in one study I ran once asked after each task:
- Single ease questionnaire
- Subjective mental ease questionnaire
And then, on top of that, I asked the SUS after the end of the test, as well as some general questions on satisfaction. All of these measures were a lot to ask the participants, and I felt them fatiguing after a few tasks.
Now, I typically choose the following metrics for each task:
- Time on task
- task success
And for the end of the test, I use:
If we measure particularly complex tasks, I’ll also ask about confidence because I wonder if people believe they’re completing tasks correctly. Additionally, if we’re curious about potential unhappy paths, I add the number of errors and then report on the most common unhappy paths people take.
However, when it comes to quantitative usability testing, I always operate by keeping it as simple as possible. I also always talk with my stakeholders for their input on the decision.
Who should you talk to? And how many of them?
The next step when planning a usability test is considering which participants to talk to. Recruitment can make or break a study, so I recommend going into this step as thoughtfully as possible. Trust me, it's challenging to fill a 60-minute usability test with the wrong participants, plus it's awkward.
The best way to do this is through a screener survey. You use these short surveys to qualify participants and ensure you get the best fit for the information you need.
One big mistake I used to make was only asking for demographic information in my screener surveys. Since I came from an academic background, demographic information was a must in many projects. I didn't focus on behavior or habits in those projects.
Unfortunately, when I just asked for demographic information (ex: gender, age, location), I landed in a terrible situation: the participants fit the demographic data, but they couldn't give me the information I needed.
For example, I was investigating the experience of a new flow for people to better estimate their jeans size before buying. Unfortunately, I focused too much on demographics instead of habits and recruited people who didn't purchase jeans online or had no problem estimating their sizes.
Once this happened (a few times), I realized that it was essential to write good screener surveys for a few different reasons:
- Finding and talking to the most relevant participants with the characteristics, habits, and behaviors you need to understand better.
- Hitting the correct sample size by segmenting your participants into different buckets, you can later create meaningful deliverables that lead to action.
- Ensuring the return on investment for your research will be as high as possible and avoiding wasted time and money on suboptimal participants.
- Avoiding burning out your participant list by asking all users to participate constantly.
For usability tests, I ask myself the following questions to help understand the screener questions I should ask:
- What are the particular behaviors I am looking for?
- Have they needed to use the product? And in what timeframe?
- What goals are essential to our users?
- What habits might they have?
So, if we were to redo the failed screener about better estimating your size when purchasing jeans online, I might need to hone in on the following criteria:
- People who have purchased jeans online in the past three months
- People who have struggled with jean sizes online in the past three months
- People who have returned jeans after buying the wrong size
By targeting these criteria, I would have been better able to test the flow with someone who had these painful experiences in the past, getting me much more valid and reliable data.
How many people?
One note about the sample size for usability testing: A general idea for evaluative research is testing with five people. When I review research plans, I often see five participants as the number for an evaluative study.
While this could be correct, it isn't a hard and fast rule. Like the above, if you pick five random people off the street to test your product, you likely won't find 85% of the issues.
The fine print behind five people per usability study is that it means five users per segment.
So always make sure you’re thinking about segmentation and aren't asking five completely different and random participants.
There are a few other areas to cover when scoping and planning your usability test:
✔ Unmoderated versus moderated usability test
I typically default to moderated usability tests because that's what I grew up on; however, unmoderated testing has some immense benefits.
Typically, I use moderated testing for qualitative usability tests because it allows me to discuss better and dig deeper with the participant. For instance, if participants struggle with certain areas of the experience, I can understand why in a moderated test. It allows me to converse with participants that can give the team more direction with additional and more profound feedback.
However, with a quantitative usability test, you aren't looking for this depth of feedback or to understand why people are struggling. Instead, you are looking specifically to measure your chosen metrics. With this, unmoderated testing can be hugely beneficial by giving quicker responses and a larger sample size.
✔ Session length
Regarding the length of usability tests, I have seen them run anywhere from 15 to 90 minutes, depending on the complexity of the tasks and the type of information you are trying to get.
For instance, when it comes to unmoderated quantitative usability tests, you could be looking at a session length of 15-20 minutes since you are strictly measuring the metrics and not going deep into qualitative feedback.
Going toward a moderated qualitative test might run closer to 45-60 minutes because you need time to ask follow-up questions and discuss these with the participant.
I always default to 60 minutes during a moderated qualitative usability test because it’s better to end early than to run late, and this timing gives me a chance to understand the "why's" behind people's feedback.
For unmoderated quantitative tests, it depends on the number of tasks (of which I recommend no more than 10, if possible), but I usually choose 20-30 minutes.
And finally, for complex and moderated quantitative tests, I default to 45-60 minutes to go through the necessary tasks and measurements.
To learn more about usability testing, check out...
Nikki Anderson-Stanier is the founder of User Research Academy and a qualitative researcher with 9 years in the field. She loves solving human problems and petting all the dogs.
To get even more UXR nuggets, check out her user research membership, follow her on LinkedIn, or subscribe to her Substack.