Skip to content

Prove the Value of Your UX Research Insights with a Benchmarking Study

Benchmarking studies let you know how (and if) your insights are improving user experience. Here's how to conduct one.

Words by Nikki Anderson-Stanier, Visuals by Allison Corr

As researchers, we produce insights. These insights are a lighthouse for our teams to steer towards. They give teams a guiding path to creating functions, features, or products that solve real human problems.

Sometimes these ideas are a massive change to the current product. To our stakeholders, it can feel like a scary gamble.

If you're in this position, you may encounter this seemingly simple question, "How do you know your insights and recommendations are improving user experience?"

The first time I heard that question, I was tongue-tied. “Well, if we are doing what the user is telling us then, of course, it is improving.” Let me tell you that didn’t cut it.

Then I learned about benchmark studies.

These studies allow you to test how a website or app progresses over time and where it falls compared to an earlier version, competitors, or industry benchmarks. Benchmarking will allow you to answer that above question confidently.

Jump to:

How do I conduct a benchmarking study?

✔ Set up a plan

You have to start with a conversation with your team to agree on your benchmarking study's goals and objectives. Ideally, you answer the following questions:

Can we conduct benchmarking studies regularly?

How often are new iterations or versions being released? How many competitors do you want to benchmark against, and how often? Do you have the budget to run these tests on an ongoing basis? Benchmarking tests cost money and effort. You have to recruit and incentivize participants. Whenever possible, I recommend doing them in person, but you can also conduct them remotely.

What are we trying to learn?

Do we want to compare a new version of your product with an older version? Analyzing two different versions will see how the insights are impacting users. You can also compare your product to competitors or an industry benchmark (note, this isn't always possible). Answers these questions will help you determine if benchmarking is the right methodology for your goals.

What are we trying to measure? 

What parts of the app/website are we looking to measure—particular features or the overall experience? It is usually fruitful to test the entire product with benchmarking tests, rather than just smaller flows.

    ✔ Write an interview script

    Once you have your goals and objectives set out, you need to write the interview script. This script will be similar to how you write usability testing scripts. The questions focus on the most critical and essential tasks on your website/app. There needs to be an action and a final goal. For example:

    • For Amazon, “Find a product you’d like to buy.”
    • For Wunderlist, “Create a new to-do.”
    • For World of Warcraft, “Sign up for a free trial.”

    Notice, these tasks are incredibly straightforward. Don’t give any hints or indications on how to complete the task, as it will skew the data.

    I know it can be hard to watch participants struggle with your product, but that is part of the benchmarking and insights you can bring back.

    For example:

    • “Click the plus icon to create a new to-do.” = Bad wording
    • “Create a new to-do” = Good wording

    If you would like to include additional questions in the script, you can use follow-up questions, asking them to rate the difficulty of the task.

    After you complete the script, and once everyone has input any suggestions or ideas, keep it as consistent as possible. It is challenging to compare data if the interview script changes.

    ✔ Pick your participants

    As you are writing and finalizing your script, it is good to begin choosing and recruiting participants.

    Although standard user research studies, such as qualitative interviews or usability testing, generally call for fewer participants, we must realize we are working with hard numbers and quantitative data.

    It is ideal to set the total number of users to 25 or more. At 25+ users, you can more easily reach a statistical significance and draw more valid conclusions from your data.

    Since you will be conducting studies regularly, you don’t have to worry about going to the same group of users over and over again.

    It would be beneficial to include some previous participants in new studies, but it is fine to supplement that with new participants.

    The only important note is to be consistent with the types of people you are testing with—did you test with specific users of your product who hold a particular role? Or did you do some guerrilla testing with students? Make sure you are testing with those users for the next round.

    How often should I be running benchmarking studies?

    To determine how often you should/can run the benchmarking studies, you have to consider:

    At what stage is your product?

    If you're early in the process and continuously releasing updates/improvements, you will need to run more benchmarking studies. If your product is more established, you could benchmark quarterly?

      What is your budget?

      If you are testing with around 25 users each time, how many times can you realistically test with your budget?

      How will this relate to other product development?

      If you're releasing updates on a more random basis, you could develop ad-hoc benchmarking studies that correlate to releases. However, this may not be the most effective way to show data.

      You want to see progress over time and how your research insights are potentially improving the user experience. Determine with your team and executives the most impactful way to document these patterns and trends. Just make sure you can run more than one study, or the results won't be actionable!

      What metrics should I be using?

      There are many metrics to look at when conducting a benchmark study. As I mentioned, many benchmarking studies will consist of task-like questions, so it is imperative to quantify these tasks. Below are some useful and common ways to quantify tasks:

      Task Metrics

      ✔ Task Success

      This simple metric tells you if a user could complete a given task (0=Fail, 1=Pass). You can get fancier with this once and assign more numbers that denote the difficulty users had with the task, but you need to determine your team's levels before the study.

      ✔ Time on Task

      This metric measures how long it takes participants to complete or fail a given task. This metric can give you a few different options to report on, where you can provide the data on average task completion time, average task failure time or overall average task time (of both completed and failed tasks)

      ✔ The number of errors

      This task gives you the number of errors a user committed while completing a task. This can also allow you to gain insight into common mistakes users run into while completing the task. If any of your users seem to want to complete a task differently, a common trend of errors may occur.

      ✔ Single Ease Question: (SEQ)

      The SEQ is one question (on a 7-point scale) that measures the participant’s perceived ease of a task. Ask this after every task is completed (or failed)

      ✔ Subjective Mental Effort Question (SMEQ)

      The SMEQ allows the user’s to rate how mentally tricky a task was to complete

      ✔ SUM

      This measurement allows you to take completion rates, ease, and time on task and combine it into a single metric to describe the usability and experience of a task

      ✔ Confidence

      Confidence is a 7-point scale that asks users to rate how confident they were that they completed the task.

        Using a combination of these metrics can help you highlight high priority problem areas. For example, if participants respond with high confidence, yet the majority fail, there is a vast discrepancy in how participants are using the product.

        Questionnaire Metrics

        ✔ SUS

        The SUS has become an industry standard and measures the perceived usability of user experience. You can reference published statistics (for example, the average SUS score is 68) because of its popularity.

        ✔ SUPRQ

        This questionnaire is ideal for benchmarking a product’s user experience. It allows participants to rate the overall quality of a product’s user experience, based on four factors: usability, trust & credibility, appearance, loyalty.

        ✔ NPS

        The Net Promoter Score is an index ranging from -100 to 100 that measures customers' willingness to recommend a company’s products or services to others. It gauges the customer’s overall satisfaction with a company’s product or service and its loyalty to the brand.

        ✔ Satisfaction

        You can ask participants to rate their level of satisfaction with your product's performance or even your brand, in general. You can also ask about more specific parts of your product to get a more fixed level of satisfaction.

          What do I compare my product to?

          After you have completed your benchmarking study, there are a few ways you can use the data. There are general benchmarks companies should aim for or should try to exceed. Below are examples from MeasuringU:

          • Single Ease Question average: ~5.1
          • Completion rate average: 78%
          • SUS average: 68
          • SUPR-Q average: 50%

          You can also find averages for your specific industry, either online or through your benchmarking analysis.

          Finally, you can compare yourself to a competitor and strive to meet or exceed the key metrics mentioned above. Some companies have their metrics online so that you can access their scores. If not, you can conduct your benchmarking study on competitors, as well as your product.

          Even if you have to start small, benchmarking can grant you a world of insights into a product and be a great tool in measuring your user research practice's success.

          Interested in other articles like this? Check out...

          The team at T-Mobile consistently needed robust insights on a tight timeline. See how they were able to shift from a reactive to a proactive research strategy, gather in-context mobile data quickly, and get their stakeholders more engaged with the data.

          It’s not always clear whether the data you've gathered is statistically significant. These methods will help you navigate that question.

          "Should I report a one-off Insight?" Though one participant’s opinion isn’t statistically significant, there are times when a piece of feedback may point to a larger issue.

          Nikki Anderson-Stanier is the founder of User Research Academy and a qualitative researcher with 9 years in the field. She loves solving human problems and petting all the dogs. 

          To get even more UXR nuggets, check out her user research membershipfollow her on LinkedIn, or subscribe to her Substack.

          Subscribe To People Nerds

          A weekly roundup of interviews, pro tips and original research designed for people who are interested in people

          The Latest