As a user researcher, one of your central roles in an organization is to help colleagues make better decisions—whether it’s what to build next, the flow of an experience, or enabling others to understand the top pain points necessary to solve.
When it comes to decision-making, users' input can help your colleagues make the most informed next steps. However, what happens when you get asked the infamous question, "Can we do usability testing to see which design users prefer?"
Of course, this question will come up. You help product managers figure out what is most important to tackle next, so why wouldn't you also help designers decide which design is best to move forward?
Well, there are a few caveats with this kind of question. As we know, user research is a great tool, but it doesn't answer or solve each problem and should be used carefully to help teams thoughtfully answer their questions with valuable data. And, at times, user research can't provide teams with valid or reliable data.
3 types of tests (and why they're not enough)
I am in no way saying you can't use user research to test different designs, but there are a few different scenarios to be aware of in this space:
- Visual testing. The most common way I see this question phrased is, "Can we test two different colors to find out which the participants like more?"
- Preference testing. Like visual testing, preference testing looks at multiple prototypes/flows and asks, "Which prototype do users like more?"
- Performance testing. With this type of question, designers are looking to understand which visual components across different prototypes/flows perform better. This question usually comes as "Which button will perform better?"
Don't get me wrong: these are all great questions to ask and should be asked during a project. However, qualitative user research might not be the best place to search for answers to these questions.
With qualitative user research, you only speak to a small sample size of people. With that small sample, it’s challenging to determine preference or performance properly. If the designer is looking for this type of answer as an outcome of a test, user research will be a letdown.
Unfortunately, you cannot talk to a small number of people and claim "this visual component will be better" or "people will like this design more." Those results will not be valid or reliable and may cause colleagues to make an uninformed decision.
Keep in mind, we call it usability testing for a reason. We are testing the usability, not the design of a prototype or live code. In this case, your primary goal should be looking at whether or not the participant can use the software, rather than if they like the look of it.
So, what can happen next?
Other approaches to consider
If these are the types of answers your colleagues are seeking, not all is lost. There are other methods to help them better understand how different components/flows resonate with users and perform:
- A/B testing. Over a more extended period and with a larger audience, this type of testing will help you understand which, if any, component or flow performs better. A/B testing can give you much better confidence in answering the performance and preference question. Want to learn more? Check out my article all about A/B testing and how it can go hand-in-hand with user research.
- Unmoderated user testing. Unmoderated user testing can help gather more significant amounts of data on simple prototypes or flows. This method can better assist you in answering questions on preference and performance, as well. By accessing a larger audience, you can better extrapolate your results and have more validity and reliability in the findings. Check out how to set-up and optimize your unmoderated user tests.
A/B testing and unmoderated user testing can be a great alternative in helping teams answer their preference and performance questions. However, there is another option.
In addition to the above, comparative usability testing can be a good approach, depending on the type of information your colleagues need.
What is comparative usability testing?
Comparative usability testing is different from the other approaches because you compare the effectiveness and efficiency of different designs. Both of these concepts, effectiveness and efficiency, are the cornerstone of usability testing.
Comparative usability testing still won't give us information on preference or performance. Instead, it can tell us if users will be able to complete tasks and achieve their goals. You are trying to determine which prototype functions better with the participants, as this design will allow users to accomplish their tasks in the best way.
For this type of testing, the designer will create several different designs that allow participants to achieve the same task/goal. Low-fidelity prototypes are ideal because they invite feedback from participants, but it is also possible to test live code. Make sure the tests are different enough from each other so that participants can compare each solution. For example, if there are a few different ways you think users could accomplish a particular goal, have each prototype address one solution and see which resonates most with participants.
There are a few different ways to compare the designs during the comparative usability test. You can compare task success, the number of errors, time on task, satisfaction, usability, and general feedback.
For each task, I like to have success criteria. Suppose I am trying to understand how people determine whether to buy a product. In that case, I might have criteria such as "the amount of information on the product page is sufficient" or "the reviews are clear and accessible," or "people can find the 'add to cart' button."
When you track this type of information across the different designs, you can give better comparisons outside of subjective feedback such as "I like this" or "This looks good."
The expected outcome of comparative usability testing is to determine which design works best so far by how well the users can accomplish the tasks during the test. At the end of the test, you should tell the designer which design functioned best and which facets of each design worked or didn't work. With this information, they can iterate on the designs based on the feedback.
As a side note, always remember the 5-7 participants per usability test rule means 5-7 people per segment/persona. This rule means that you should not be picking 5-7 random people from your entire audience pool but should be segmenting them in some way (ex: age, location, gender, etc.).
What to keep in mind when running a comparative usability study
Since comparative usability studies have a particular and relatively narrow goal, it is essential to keep a few things in mind when conducting them:
- Even during comparative usability testing, it can be easy to slip into the "prefer" or "like" arena. Try to avoid using these terms while talking to participants. Delete them from your script and even write down a reminder not to use them!
- Mix up the order the participants receive the different designs to help mitigate bias. For instance, if you are testing three prototypes (A, B, C), make sure participants do not see them in this exact order. Participant one should see A, B, C; participant two then sees B, C, A; participant three sees C, A, B, etc.
- Keep the number of designs you are testing to a minimum, usually no more than three. Imagine, if you were a participant, trying to compare five or even six different flows and prototypes. Testing too many designs can be quite overwhelming to participants and lead to poor quality insights, so keep it simple with fewer comparisons!
- Test different designs and don't make the difference too subtle. You don't want to end up using comparative testing for something like two variations of a button. Again, this is better suited for A/B testing. Ensure the designs are diverse enough solutions to a given problem users are encountering.
- Always keep in mind, it is about whether it works, not whether people like it.
- You might not end up with one design "winning." In this case, you can take the positives from the various designs, combine them into one or two, and retest in a follow-up test.
Overall, when asked to make comparisons between designs, always start by understanding what type of information your colleagues need. If you need to determine which design works better for users, comparative testing might be a great approach!