Skip to content

How UX Can (and Should) Humanize Machine Learning

Google’s Michelle Carney on thriving in the increasingly collaborative environment between data science and user research. 

Featuring Michelle Carney, Ben Wiedmaier

Google’s Michelle Carney talks on ways data science and UX can better collaborate.

Increasingly, AI and ML are shaping the way we build products and design experiences.

And increasingly, user-centric organizations will need to inject qualitative data into the creation of models, algorithms, and interactions.

In this webinar, Google’s Michelle Carney walks us through the “whys” and “hows” of making UX-driven machine learning more human.

Panelist will share...

  • How user research can (and should) guide the fast-growing machine learning landscape.
  • Tactics for better enabling collaboration between UXR and data science at your organization.
  • Guidelines for collecting early-stage user data for ML models, featuring an example dscout use case.

Transcript:

Ben:
All right. Hello everyone and welcome to People Nerds's September Webinar. We are excited to welcome Google's Michelle Kearney who is going to be describing just some of the ways that user experience research can help humanize machine learning.

Ben:
I will turn things over to a person I've had the pleasure to work with on this deck, learning a lot, I must say in just the few weeks I've been working with her. Michelle Kearney. Michelle, thank you so much. Take it away.

Michelle:
Thank you so much Ben. Oh my gosh, this is so cool. This is like such a dream come true too because I'm a huge fan of dscout, so really excited to share my practice with you all. Thank you so much everyone for taking the time. I'm really excited to just get into it.

Michelle:
Okay. Hello, we're going to talk about Machine Learning and User Experience. You are now tuned into the People Nerds, how UX research can, and should humanize machine learning. If you are in the wrong Webinar, that's okay. I won't look, you can leave. No. Okay, cool.

Michelle:
Really quickly, a little bit about my background. Hey, I'm Michelle. I'm a senior UX researcher on ML and AI, Google's AIUX team. I'm also the founder and organizer of Machine Learning and User Experience Meetup. I also lecture down at the Stanford d.school, and a little bit about my previous experience, I actually come from more of the data science machine learning side. I was a computational neuroscientist turned UX researcher, have done some UI engineering too.

Michelle:
I'm really excited to share a little bit about my practice and what I've learned about combining these two fields in interesting ways. I thought today for an agenda is we could just start off like, "Hey, what is machine learning and UX?" I did share some of the cool stuff around the people in AI guidebook from this brand new team that I just joined at Google, and then we can move into a typical UX research case study.

Michelle:
I've boiled up some of my best practices into a sample case study that I would typically use if I'm building out a machine learning system, so hopefully some of it's useful to you. I do want to couch this and that this is a brand new and emerging field. Don't worry if you are like, "Oh no, I'm doing something a little bit different." That's totally fine. I'm sure a lot of the folks tuning in are UX designers, UX researchers, you are an expert in your own field. I'm just going to share some of my experience and my knowledge in this and if there's some parts that really resonate with you, awesome. If there's some parts where you're like, "That's less applicable to me," that's also okay, and fine. I'm also going to put it out there, super interested in how you do this approach. Yeah, so let's get started. Like Ben mentioned, if you could hold your questions to the end and then we can circle back to him. Cool.

Michelle:
Yeah, what is machine learning and UX? If you've been to the Meetup, I always cover this too, but how I like to describe it is, how do we use data to drive and inform UX design decisions? But also how do we design to help our end user understand what's going on with these models? Where that data is coming from? What type of data is being used? And make more transparent and controllable machine learning AI experiences. I know I've just simplified it up here with UX and data, but UX is, as I'm sure a lot of us know, more than just visual design. It's interaction design, it's UX research, it's voice user interface design, all that stuff.

Michelle:
And data here too, I'm simplifying it as well. If there's any data scientists or machine learning experts, I'm couching that, for artificial intelligence, data analysis, even stats, all of that too. How do we use these two disciplines that help design better experiences? Like some of these, these are some of my favorites. Hope the gifs aren't too slow over the Webinar, but on the left we have, Pinterest's visual discovery tool. What's happening is someone searched for something and they're browsing a bunch of images and they're like, "Oh, this pizza looks great. I actually want to receive my searches. Instead of going back up to the top, I want more pizzas that look like this."

Michelle:
Then it updates the query with a bunch of other samples of pizza, which is a really powerful computer vision model, which is awesome, but it's built into a really nice user experience. Other really great examples of this too can be like Google's type-ahead or how it finds gifs on the Google keyboard or the Google Creative Labs, QuickDraw and AutoDraw is also a total favorite of mine. If you've never played around with it, totally can share that link too.

Michelle:
If you're interested in topics like this, good news, we have a Meetup on it. Like I mentioned about two years ago, I founded the Machine Learning and User Experience Meetup out here in San Francisco, and we've grown a ton. We're at a little over 1800 approaching 2000 members. We hold monthly events to try to share out these best practices, these case studies, both from the design side. Like, how do you [inaudible] a product designer on an AI system as well as from the machine learning side? Like, how do you use unsupervised learning methodology in order to make data different personas? Like that's an example talk.

Michelle:
Yeah, so we have about monthly events and this is what they typically look like. We've had the pleasure of partnering with some really, really great companies like Salesforce, IDEO, Google, and Amazon as well as like, smaller startup incubator hubs too to find out what are some other noble approaches that folks are working on. Our next one is actually at Capgemini. We've partnered with [inaudible], so yeah, we've just had the pleasure of partnering with a lot of really great folks and learning about like, "Hey, in this new and emerging field, how do you approach designing for AI systems? How do you approach using machine learning for UX or UX to inform machine learning systems?" All that type of stuff.

Michelle:
Like I mentioned, if you're interested in this, totally come and check us out, I'll share some links afterwards too. But when I started this field or this Meetup, there really wasn't a lot yet in the field. It's only recently that we started seeing jobs like IDEO, which is a design group. They now have a data science practice, which is so exciting to me. It's a lot of like machine learning prototyping, which is really cool, as well as like actual job descriptions. Like this one from Apple, which is like, we're looking for a UX UI designer to bridge the gap between design and machine learning. That is so cool. That was not a thing when I started the Meetup two years ago. Super excited about that as well as we're seeing roles around UX research for AI and machine learning systems. While this is super exciting too, I think it's also important to share best practices and just any guidelines or help along the way too. Because it is a new one and a [reading] field.

Michelle:
Want to also say to after doing the Meetup, I had the pleasure of teaching at the Stanford d.school a course on designing machine learning. This is something that is publicly available. All the materials are up online. I'm just going to give this a shout out in case you're interested. We've created a ton of artifacts like AI ethics, and this one around what parts of machine learning can be designed. I am writing up a medium article on this, so check out the d.school medium account in the next couple of days, because that should be up soon.

Michelle:
But these are tools that are out there. If you want to, you'll do a workshop to help align your team about, "Hey, when are we actually designing all this stuff?" Those are publicly available tools. Just wanted to give them a shout out as well.

Michelle:
But my day to day actually involves a lot of UX research for machine learning and AI systems. Like I mentioned, I just recently got a new job. Yay. I am a senior UX researcher for Google AI's, AIUX team, and they're the larger team that has the people in AI research team on it. You might be familiar with this, it's the People and AI Guidebook. This is something that I am a huge fan of, a huge... the reason why I joined this team, I just think it's fantastic. It's a compilation of over a hundred folks here at Google, designers, researchers, and more. Machine learning scientists, coming together, doing an audit on, "Hey, how do we... what are examples of AI machine learning systems that we've made before in the past? What are some major categories?" And these six chapters bubbled up. They're on, user needs and defining success, data collection and evaluation, mental models, explainability and trust, and feedback controls, and errors and failing gracefully.

Michelle:
There's a lot in there. Like I mentioned, this is a compilation of dozens if not hundreds of folks coming together to share their knowledge, get something out there that can help others. Like I mentioned, this is an emerging field, so it is something that like, this is the current state, but it could also really potentially change in the future too. I'm just going to highlight like three of my favorite chapters. I mean, they're all very important chapters, especially for UX research. But three that I use all the time when I do user research are around mental models, feedback and control and arrows and failing gracefully.

Michelle:
Going into that, in the mental models chapter, they really discuss understanding your user's point of view, this is just a screenshot from the website too around like, "Hey..." Like key questions you should be asking are like, "What aspects of AI should we explain to our users? How should we introduce AI to the users initially and thereafter? How does it change over time? What are some of the pros and cons of introducing our AI as human-like as well?" Because you can imagine if there's a mental model of like, "Oh, this is can behave like a human and then it can't do human like functions, it might be good to let the user know about some limitations of these types of AI systems. That chapter is fantastic on that.

Michelle:
The other one that I love is on feedback and control which is really about learning from your users. This is super important. I know there's another one on data and collection stuff too. But if you are not already aware, I'm sure all of us are, machine learning cannot happen without data. When you release the new feature and everything too, already having feedback from previous users about how they potentially might want to use it or what they're trying to do or something too, can help build out future models that aren't already in the product yet.

Michelle:
In the feedback and control section there's really great questions too about how should the AI request and respond to user feedback, questions around like AI interpreting both implicit and explicit user feedback, which I think is fantastic because sometimes there might be more implicit type behaviors that we're seeing. Then when the user actually goes out of their way to give explicit feedback, that's also extremely valuable and how we should be able to take that and use that to improve the system as well.

Michelle:
Then also this idea, this notion of control and customization. A lot of folks feel like AI is just something that happens to them. How do we make them feel like they're more in control, that they are working with the AI, they can control it, they... our end users understand and I'm sure all of us understand too that, machine learning is something that can allow for immense customization and personalization and a ton of benefits. But it really is a bummer when it feels like, "Oh, this is just my algorithm changing. I have no control over this. I have no idea what happens to favorite playlist, I listened to one song by Jimi Hendrix, now everything's all skewed towards Jimi Hendrix, stuff like that. How do you allow for that type of control? There's a bunch of great things in that chapter.

Michelle:
The last one that I want to mention too, similar with the feedback is around on errors and failing gracefully really about improving your AI system. It's really important to note too that for machine learning, machine learning is probabilistic. There will be times when it gets it wrong and that's great. Use that as an opportunity instead of like, "Oh no, this is a failure." How do you allow that to get more feedback on that model, on that system, on how people are using it. Designing for those types of ways too are also really, really important for improving the AI system. Then this chapter they discuss things like when do users consider low confidence predictions to be an error? What does that look like to them? How will we reliably identify sources of error in a complex AI? Does AI allow users to move forward after an AI failure?

Michelle:
All super, super important questions, especially when you're doing research and stuff too. Just wanting to call these out as some of my favorites from even before I joined this awesome team here at Google. I think that it's awesome. There are a ton of other great, great resources. I am just pulling out the very high level from those chapters. Definitely dig in, look through all this. It's fantastic. Huge fan, like I mentioned. But it's one thing to talk about like these chapters that are high level and it's another thing to actually then go through and try to practice this. I thought that we could try a case study where we actually try to internalize, "Hey, what does it mean to think about mental models, think about feedback and control, think about failing gracefully and everything too?"

Michelle:
For this I have extrapolated away at like a very high level, some best practices that I have learned along my time's about doing a UX research on machine learning models before they're built yet. I've made up a sample case study on personalized travel. We're going to pretend that we are working on a brand new AI ML powered system that is going to revolutionize personalized travel. It's all about recommending the perfect place for you, for your family, whatever. But we might not know exactly how to get started. Is this a brand new system? Is it going to live in a smart assistant? What does that actually look like? I'm going to share a little bit about my approach to this and how I would typically think about testing a system like this. Especially if it's in a brand new domain. I know this is a silly example because there's a lot of other competitive offerings out there, but bear with me while you see the high level things that I might try. Cool.

Michelle:
I tend to use a three pronged approach, which starts with some generative research. Even before, we might have an idea of what we want the ML or AI system to do, but really diverging before we converge on like, "Hey, what are all possible worlds? What are things and product offerings that our customers might want that we aren't currently offering?" All that type of stuff. Then once we have some ideas, narrowing that down and creating a concept and then maybe doing some moderated testing around that and really getting feedback on like, does this make sense?

Michelle:
Then once we have the model actually in a good place that's when I move forward with evaluating a prototype. That's when I might use a moderate as testing like dscout. Like I mentioned too, this is a new field, new emerging field. There might be some points in this where you're like, "Yes, I totally need the generative research stuff, or I totally need evaluating your prototype." That's great. If the rest of it doesn't resonate with you, that's also totally fine. This is just something that I found works really well when I'm working on new products and would want to share with all of you too. Yeah.

Michelle:
Let's get into the generative research. In generative research like I had mentioned it's really important to think broadly like, "Hey, what are some ways that people are typically solving this?" But also if we're thinking about recommendations, maybe like how would people recommend these to each other? Is there anything of how human to human recommendations work or anything like that too? It's super important, like I mentioned, for new domains as well, like smart home devices, voice assistance. One thing that I have tried that we can speculate is recommendations between people. Bringing folks into the same room, getting them together actually having them recommend to each other. How would you recommend travel? Maybe giving them a list of their top three places or something like that beforehand. They could do their own research on the side and then they come in and ask each other questions.

Michelle:
I've seen if we're in the personalized travel domain, I've seen some really, really great stuff around folks talking about, "Yeah. Who are you traveling with? When do you want to travel? Why do you want to travel? What is the modality of your travel really important?" That's fantastic because if I was just doing a competitive audit of the current state of the personalized travel, I might just converge on the same solution. But by having folks watching and understanding how folks are prioritizing it, asking questions of each other, that helps me understand too, like, what are possible different mental models that they might have about recommendations on travel rather than just being told like... or more of like a search and command control system of like, "Oh yeah, I want to go to Roswell New Mexico." And this is exactly how you do it. It might be more like, "Oh, what are some types of experiences you want to have?" That kind of stuff. Fantastic. Can't recommend it enough.

Michelle:
Another method that I would super recommend you look into, which could be a whole talk and up itself is actually TripTech. It's a tool for generative research, really, really great on prioritization. It was pioneered by some folks here at Google, also on the broader AIUX team. Really about like, "Hey, early stage machine learning concepts and prototypes. How can we understand users mental models, how they would prioritize different things over each other? There's a fantastic KAYAK paper from this year on it too. Some of you might've seen it, but the huge win on this is actually they did a lot of concepting and everything too, to draft the UX requirements for them now playing on any Android devices or pixel too, sorry.

Michelle:
If you leave your phone out, it can then identify like, "Hey, what music is in the background and stuff too." They came up with this through this TripTech experience. They've used it for a couple of other projects too, and so they've documented that process and made it accessible to anyone. I think there's also a video on it too, so I can't recommend it enough. That's a really exciting way to start with that generative research.

Michelle:
Then moving on from there you might do... I might analyze this too, is thinking about like, okay, well we saw eight different conversations about how people recommend things to each other. I'm now going to affinity diagram. These are the types of words and modifiers that people are expecting to use. These are the types of actions or ways that they might want to filter it or pull out major findings that way. Then moving on, you can then use that to inform a concept that you built. Concepts here could be like an early stage prototype. You want to use that feedback from the generative research to really inform this. If you're hearing from all these different participants like, "Oh, it's super important to account for multiple travelers," or, "I would rather prioritize the date in which I'm going rather than the place," or something like that or maybe vice versa, I'm allowing for that flexibility. Use that to inform the concepts. I know travel is something that's very well thought out. Not that this is a future, future ML or AI system.

Michelle:
This, I might do like through a moderated in person usability study or a wizard of Oz study. If I'm doing something possibly on voice. If you're not familiar with that type of methodology, no worries. This is a type of methodology where you might have a couple of key voices or interactions or even it can be done on visual as well. I've done it a lot on voice. I would probably have them pulled up as audio files on my computer and Bluetooth into a smart home or another smart speaker. Have a moderator in the room with the participant and have the moderator explain to the participant like, "Hey, this is a brand new prototype. It might not work as you'd expect but go ahead and give it a try."

Michelle:
It's actually me on the other end, who's sending forward the designs that we have and the voice user interface testing out if that works, if that doesn't work. Similar to the wizard of Oz, don't pay attention to the person behind the screen. It is actually me, Bluetooth in from the other room. Fantastic method as well as paper prototyping. Instead of full screens mocked up in theater prototyping, I might use more of a modular type design. Like having a couple... like one base piece of paper, but then having a couple of different things that I might sub in as you quote unquote, interacts with the concepts, if that makes sense. But having it moderated is really important and it really is like testing out the concepts. You do not need your machine learning model yet. This is a really important thing.

Michelle:
Machine learning models take time to build and I'm sure as some of us know who work on these concepts and everything too. When the machine learning initially is started, it's actually not very good and it takes time to train them to tweak them, to optimize hyper-parameters all that stuff. For evaluating the concepts, you don't need any of the machine learning yet. You can actually just test out the North star of the experience. You as a UX researcher or UX designer you are really designing and testing. What will the machine learning experience be? Remember that you are an expert in testing out these things and machine learning is just a little bit different in the sense that it's not as easy to necessarily prototype it. It might be a little bit more probabilistic and everything too, but it still has inputs and outputs.

Michelle:
Inputs could be things that you know are going into the model, or maybe they go through an out of box experience or something like that, and the outputs that's actually going to be what's in the interface or the designer, what they're actually selecting. Making sure that that's like, "Hey, are we designing the right experience for this MLS system?" I'm also thinking through... although we've done the generative research around like, "Hey, what types of mental models do people have when they are doing personalized travel or personalize something personalization has a lot of machine learning and what types of things do they think would inform that?" But also like does this actually line up in the concepts?

Michelle:
What I'm talking about the concept too, I think it's important to note the tangible artifact. One thing that I do is create a quote unquote custom prototype for every participant. This might sound scary because custom prototype every participant like yikes. that is a lot of custom prototypes. But it actually doesn't need to be super in detail. It could just be something where there's a couple of custom elements. Where do you get the custom information? I actually do a survey to all the potential participants, whoever I'm screening to come in. Within that survey I ask questions that can inform the prototype.

Michelle:
If for instance, Ben, if you were going to come in and I'm screening you, I might have a survey that says, "Great, tell me about the last time you traveled. What were some of the places you've traveled to? Okay. How often do you travel? Where are your top three places that you want to travel? Where's one or two places where you're really not interested in traveling?" Things like that too.

Michelle:
You might forget that you've answered your top three favorite places and like bottom three favorite places. That might be something that I then inform what feels like a personalized system within the prototype. This can be mocked up in systems like using sketch and then invision. It doesn't need to be the entire prototype than updated. It could just be a couple of elements like I mentioned.

Michelle:
I also want to stress that it's okay that the prototype isn't exactly perfect like the machine learning system. Machine learning is sometimes wrong. That's why there's a whole chapter on a failing gracefully in the guidebook, right? Because machine learning is probabilistic. This is another reason too, why I ask for maybe least favorite or places that would be not a good fit. That way we do get some sample answers of wrong, machine learning gone wrong, and like very wrong and how people would want to fix that. The dscout team has mocked up a wonderful app for me on travel time. Here, we're highlighting you can imagine all of them have color or something too, but maybe just the first landing screen has those custom parts that have been informed by the survey.

Michelle:
If then was coming in, maybe we would have like Ben mentioned, L.A., Paris and New York City as his top places of travel, but Roswell's not on that list. Having that immediately as you get to the landing page telling your end user like, "Oh, this is like a prototype. It might not work as you'd expect. It's from a server on Mars, what do you think about this?" Some key questions too that I try to use to guide my practice too. Then the other thing that I want to mention before I move into those key questions are around interactions of that. You also don't need to feel like you need to create a fully functioning prototype for all of them. You might just do one or two screens and have them reflect like, "Okay, yeah, Ben, you wanted to go to Paris, let's pretend you clicked on Paris, but click on L.A. instead. What would you be thinking about? That kind of stuff too. Asking questions about the types of outputs that the model might give, how you want them visualize, all that stuff too.

Michelle:
Yeah, moving into those key questions that I would ask about a concept I would definitely ask questions about like, if this was a personalized travel AI system, is this what you expected? Why or why not? Are there other things that you thought that the system was going to recommend that it didn't? Oh, are you surprised that it's places rather than modalities of travel? That's another thing too, lining up with those mental models. Also asking questions around like, where does this data come from? But of course like having... that's a very loaded term of like, where's does this data come from?

Michelle:
Asking maybe more pseudo anthropomorphic questions about, how do the model know this? Where does the model find out that you want to go to L.A. or that kind of stuff too. Equally is important, asking what happens when the model is wrong. We have like Roswell here. Maybe that's not like, that's like, "No, I'm really good. I'm not super into aliens. I'm fine." "Okay, great. Then what would you do to change that? How would you give the system feedback that you're really not interested in Roswell? It's like the fourth thing on the page. How would you want to change that? Why do you think the model is recommending that to?" That's another really important one too.

Michelle:
Then another way you might approach this as well is, having them speculate about how this would change over time. Possibly doing two different prototypes and being like, "Okay, pretend that this is a week later and you come back to this or pretend that you traveled to LA and now you've come back. What would you expect to see?" Having them speculate about the concept, the interaction. Like I mentioned, this is not machine learning powered at the moment. This is very much like, "Okay, we know that this is going to be a machine learning personalized experience. But we were really testing, are we building the right type of machine learning experience?" Cool.

Michelle:
From there you've now done the generative research, you've evaluated the concepts. Now your machine learning model is built and you actually want to evaluate a semi working prototype. This is you've given the feedback back to the engineers and the machine learning scientists about like, "Okay, we really need to be thinking about giving feedback when the system is wrong," and maybe that's not even machine learning powered. Maybe that's just logic about like, you then hide things once people say that they don't want them shown. Then you can work with them to actually get an MVP of the model working in a prototype that you can maybe give access to end users and everything too.

Michelle:
This I would actually recommend doing unmoderated and for the reasons behind this you can get a bunch of folks at scale. I like coupling a contextual inquiry as well as usability on the model prototype together in the same test because it's really, really rich data around like, "Hey, how are people currently solving this problem? Now if we give them access to the system, how does this help? Does this hurt?" Okay, we already saw that, then evolve or solves this problem this way, and this model doesn't account for that. That's something too and they could answer it on their own time.

Michelle:
Again, unmoderated is great too because you can get feedback at scale, you will get edge cases that you would not have gotten, just by moderated in person testing too, people are going to be doing it, maybe like on the fly, like they're really booking a travel experience and then they've tried on your app, and they really give you some great feedback around that.

Michelle:
When I do this type of testing, like I mentioned, I start with a control portion where we really talk about like, "Hey, show us maybe three entries around how you currently plan trips. Are you planning a trip right now? What have you done in the past? Do a screen capture on that too about like, what are you looking for when you plan a trip and everything as well? Then moving then and I give them access to the prototypes. That's my experimental condition of like now try using this prototype to plan a trip. What do you notice? Is there anything that you know works really well? Like maybe this could be a selfie recording, this could be a screen recording. Really understanding like how are they trying to use this prototype?

Michelle:
Then I end with a reflection. "Hey, you've now seen this ML AI system try to help you do personalized travel. Do you have any feedback? Do you have any thoughts? Do you have any ideas? Anything else come to mind too that we might be missing that we didn't get from the generative or the evaluating the concept part?"

Michelle:
The kind folks at dscout have mocked up how I would typically do this and dscout, which I think is awesome. You don't need to do it in dscout, but I just think that dscout is fantastic for doing this type of work because if you aren't familiar, if you're new to dscout too, dscout, how I describe it it's like Snapchat meets Qualtrics. You're able to do surveys coupled with like video captures, like short video captures and annotate, log them, bookmark them, tag them, collaborate with your other researchers or designers. It's honestly fantastic. I'm huge fan. This would be a sample screen that the participant might see. Like, "Hey, in 60-second video, show us what resources that you typically use to travel this nations? You could also ask a couple survey questions too, if you know the competitive field, if you're like, "Oh, okay, I know that folks are using this service, this service or this service let's have them tag that for us so we can see at scale what's happening too."

Michelle:
There might be interesting interactions that you weren't expecting as well, things like, "Oh, I use this service and then I transitioned into this service." If you were maybe just asking people like how do you book travel? They might only show you one, but it's really important to actually get both, that way you understand when you introduce your personalized ML travel system into their ecosystem, they can then, like you understand like how they might use that in practice. This would be a type of video. Let's see if it plays.

Video:
All right. I'm here on the KAYAK app and I am going to see about flying to Chicago around Halloween. Let's see if we get any deals. I typically use KAYAK because it really has a clean UI. It's very easy for me to see the flights, the costs and in particular the time. I can go through and I can, this looks like a pretty good flight. 300 bucks round trip is pretty cheap too. I will want to look at that a little more. I typically just go on time in the air and price. Those are the two things that I really look for.

Michelle:
Yeah, so that's fantastic. There's a ton of really rich information there. The participant said like sometime around Halloween like, "Oh, maybe like holiday travel is something really important." So, dscout also let's you tag those types of things too, which I think is awesome. You could mention like, "Oh, this is holiday, possible holiday travel or something like that too." You could also... the participant very clearly States like, "Oh, time in the air and price are key things for me." Really like pulling that out, logging that somewhere so that way you can double check that with like... you can get like the numbers too about like, oh, 30 out of our 90 entries around this actually said like price is the most important. Maybe on our UI we actually need to include the price or like an average price or something like that.

Michelle:
I also would ask for probably at least three entries around this too, in this control condition. Because I think it's really important to get a couple of not just like a a typical case. Sometimes I even ask questions around like, "Hey, how typical is this for you? Is this a normal thing that you would do? Are you going out of your way to show me this or is this something that you always do?" I do see a bunch of chat things coming into, just as a meta note, if you could use the Q&A portion, that would be awesome. Because Ben is actually monitoring that and using the upvote system to figure out what questions we use. Okay, so cool. We did the controlled division, we got a bunch of entries around that. Fantastic. We logged it, we had a great time with our other collaborators going through analyzing this.

Michelle:
Also in dscout too. It allows you to highlight stuff. That's super useful. Now we could do the same thing, give them access to the actual prototypes. We can ask them. "Great, thank you so much for showing us how you would typically plan a trip. Now we're going to give you access to this prototype. If you could give, go try and go through that as well. You know what are some things that did it work as you'd expect?" Similar to what you would ask in the evaluating the concepts stage. This one, our participants then did a selfie, which is awesome, and walked us through like, "Hey, like this is..." You can actually see the transcript down below about like, "Oh, it was very visual. Awesome. I kept scrolling, I couldn't find the search. It's like great feedback.

Michelle:
You're getting both usability feedback as well as like model usability feedback too. Asking maybe some questions too about like what did you think about the recommendations? Were there any recommendations that..." Like any places or, and being clear about like like recommendations but not saying like, "Oh, tell me about what you think the model is doing." Maybe more things like, "Hey, how does travel time understand where you want to travel?" That stuff too. Asking again a bunch of questions around that coupled with maybe some survey questions too. We'll take your qualitative more into the quantitative realm and you can then do analysis on both.

Michelle:
Then after I might run them through again like two to three times entries around the experimental condition. It's also great because you get the same participants who just did the controls, so they're in the entire process for this study. You really understand like, "Hey, Ben always uses KAYAK. He's expecting that KAYAK should be able, or like our travel time should be able to behave in similar ways to KAYAK or not behave or this type of improvement. Then they've gone through, experimented with the MLS system. They can understand what's going on. Having them actually reflect then about like, "Hey like what were some moments that surprised you? What are some moments that you really enjoy? What are some ways that this is different than previous personalized traveling? What are some ways that you would like this to be more personalized? So, dscout is awesome like I mentioned, because it allows for those open text fields well as like a quantitative survey things.

Michelle:
I think that these word clouds are great for understanding like, hey what are some adjectives that you would want to use around this too? I'm seeing something bubble up to, and as well you can do then do the counts by like tagging him. Also my other favorite thing is that dscout has, sorry, this turned into a huge promo for dscout. Great job dscout allows for the demographic information, so when you are screening participants and everything too, you can make sure that you have folks from all different education backgrounds, locations, represented in your participant pool, which might be different than if you were doing your moderated concept testing. Which I know for me, I'm in the Bay Area that might be limited to only people in the Bay Area. Right? Unless I'm doing it over a remotely, but that might be tricky if I'm doing like a wizard of Oz study for instance.

Michelle:
Having them reflect on this is really important because then you can understand, hey, what are other opportunities ML system that we haven't built out yet. All this type of stuff. Bringing together the generative research with the evaluating the concept that we previously did. Just to reflect when, if I were to make a dscout study to really evaluate the prototype of an ML model or AI system, I first to the control where I, the participants actually show me how they might use competitive offerings, how they currently solve this type of problem. Then I give them access to the prototype, the experimental condition. Show me like, how is this prototype performing? Are there any weird answers at the models giving that we can then fix before we launch it? Then reflect like are there opportunities that we are not tackling that you wished that we would do maybe do better, that kind of thing too.

Michelle:
This is all part of the larger system of when I design UX research approaches for ML models and AI systems. Starting initially first with that generative research, understanding those mental models, how people might recommend things to each other, how they might approach this. Then evaluating the concepts. Learn from the generative research around what are the key things to have in a prototype or concept. Getting that feedback early on, so that way when the machine learning scientists are building out that model we can make sure we are building the right type of experience. That way when we go to evaluate the prototype, it really is like more fine tweaks, fixes and stuff to those edge cases. Like I mentioned from machine learning system because machine learning is probabilistic, so there will be edge cases that you were not expecting. Yeah. Oh yeah. A big plug, dscout is great for the evaluating your prototype phase.

Michelle:
I know that was a lot of information, but if you're interested in learning more too I just want to give a huge shout out for the people in AI guidebook again. Something that has definitely guided my practice. Like I mentioned giant reason why I joined this team. It's fantastic. Totally look it up. It goes into details about all of these different chapters like user needs, defining success, mental models, feedback and control, data collection and evaluation, explainability and trust, and errors and graceful failure.

Michelle:
Not only do they go into chapters where like over a hundred Googlers have aggregated their knowledge into these best practices, case studies, all of this stuff too. There's also worksheets. There's chapter worksheets that are linked to at the top as well as at the bottom of every chapter, which you can actually do with your team. I just think that they are fantastic and very, very well thought out. But like I mentioned, this is a brand new and emerging field, and it is something that is... we are all still learning on and feedback is very important. This is funny because it's the feedback and control chapter, but we actually on the entire guide book we have a feedback mechanism. If you have thoughts, ideas we'd like to see something else.

Michelle:
You're using the guidebook and really interesting ways. Definitely leave us some feedback. The team would love to see that as well. It's also something too like I mentioned like this is, we're all learning through this together. This might be my approach and best practice, but there might be things that really resonate with you and there might be things that you're like, "I don't feel like generative research is what we need. I really need to do that more dscout, evaluate the prototype approach, which is totally fine. There's also a ton of really great things here in the guidebook too that I'm just barely skimming the surface.

Michelle:
Lastly I know I want to leave some time for questions too. If you were at all interested in meeting other cool folks who are doing this. We do have a Meetup where we do meet in person. You can check it out at meetup.com/mluxsf. Right now the majority of our Meetups are in San Francisco, but we are low key doing some stuff in New York the next month, keep an eye out for that. That also has all of our past events and everything too. If you're looking too, you see a cool Meetup and you're like, "Oh, I wish that I was able to attend that." Our last one was actually with feminist AI at Reddit. Our upcoming one is at Capgemini with clink on chatbots and multimodal, and you're like, "Oh, I really wish I could attend but I can't go in person." Good news. We have a YouTube channel, so you can actually go to our YouTube channel and watch our past talks and everything too.

Michelle:
We record. Most of them are up there. really great talks like from Salesforce. We did one earlier this year on AI for [inaudible] good with salesforce.org and how they approach using AI for people problems as well as we did one on also with Salesforce on data-driven personas. Using unsupervised learning techniques in order to inform UX research approaches around personas. Really, really great stuff, Tony too, as well from R2D3, a visual introduction to machine learning. He actually gave a talk for us on how to be a product designer for AI. Ton of really great resources there. Totally feel free to peruse. Lastly, we do have a Twitter handle, @mluxsf, and that's where we tweet out our upcoming events, any cool articles that we're seeing, all that stuff too.

Michelle:
If you're interested in joining our LinkedIn group, I'll send that out too to Ben and the team, if we do have a presence on LinkedIn, so totally check us out there. But these are probably some of the best ways to find us. Like I mentioned, we have two awesome events coming up just in the next month. Oh my gosh, I'm so excited. We have one next week actually in San Francisco on Chatbots, conversation design, voice user interface design, and multimodal design with clicks, which is a startup from the incubator. Our site up that is going to be in the Capgemini incubator space here in San Francisco. We're really excited because clinks approach is all about like how do we do multimodal from the beginning which is fantastic. They're going to share how they approach enterprise level chatbots and multi-modal design, and what that looks like, some of their tips and tricks and some best practices that we can all use. Because I'm sure there's quite a few of us working in that domain.

Michelle:
The other one that I want to plug to, we are doing our first ever non Bay Area MLUX, we're partnering with a Spotify design to do an event on design for ML with Die Dang and Christiana Che out in New York. That's going to be October 17th. I've sent out the link too, be sure to there is like a apply link within the Spotify design website. Definitely check that out. Definitely, if you're in New York and you want to say hi, we would love to see you, especially if you work in this space too. Yeah, I'm trying to think about what else. I think that's it. I just wanted to say a big thank you to the dscout team, especially Ben, Emmy, Zoey really thank you all so much for making those mocks, and pulling those together, especially with the dscout screens.

Michelle:
I wanted to thank my team, the Google AI UX team, which is also the umbrella team for the people in AI research team. They are fantastic. They're working on a ton of cool stuff. We'll be posting more about the cool stuff that I'm now up to on this team. Also like I may be the only ones speaking right now from the MLUX community, but we actually have a ton of people on our steering committee who help lead these events, find venues and sponsors and speakers in this domain. Really thank you to you for attending and I'm really excited to share this best practices and how I approach this topic. If you're interested in saying hi, this is my email or if you want to speak at a future MLUX event, totally. Email me, would love to chat with you. Yeah, thanks so much.

Ben:
Michelle. Thank you again. This was really, really cool. We have so many questions to get to. A few of them were about the MLUX startup and other locations. I think you did a good job answering those. I want to dive now into some of the questions about research. The top rated question right now is from Jennifer who asks, "My question would be how you come up with quick and dirty ML prototypes, thinking less about training it or testing an algorithm, really faking the functionality. Do you have any examples of quick and dirty ML prototypes that you get in the hands of users?"

Michelle:
Oh my gosh, I love that question. I think it's really important to do basic level ML prototyping. I mentioned too, my background is also in UI engineering. I come from the ML data science space. A lot of the machine learning that we end up building is just like custom based off of the already prebuilt models that are out there. For instance, Word2vec, PoseNet, a bunch of these already existing models. I think using tools like ML5JS which is a Java script framework, which has a bunch of pre-trained ML models. It's trained on giant porpuses as well. You can totally check them out on, I can send you all a link too. I take that and put that into like a bootstrap template website to just get the concept of what machine learning is doing in front of users and being like, hey, like I might use pre skins to whatever product it looks like and be like, "This model is not very good.

Michelle:
The server is all the way in Arizona, that's far away from California." Like what do you think it should be doing? Like here, try and play around with it. You can see like it does it maybe 60% of the way as you'd expect. But how would you want to improve that system? Just talking about UX research forum, machine learning, prototyping too, I think wizard of Oz testing is a great approach to that. Like I had mentioned too, a few are working with voice user interfaces that might be like Bluetoothing into a smart home speaker in the other room and sending over MP3s to make it sound like it is the assistant, or whatever voice assistant you're using, speaking back in the same script as you would expect for an ML system. Building it out like you are with your ML prototype that will be built that you would later test.

Michelle:
Or paper prototyping I think is a great methodology to instead of using the classic, like one piece of paper per screen, I actually just do one piece of paper for the entire application and then maybe change out elements so cut like modules and then be like, "Okay, you just gave feedback that you don't like LA," and I might flip the card over and it's Roswell, New Mexico and be like, "This is what came up instead. How would you feel about that?" Yeah, paper prototyping, but keeping it modular and even having the participant maybe co-create with that too, and designing along, I think is a fun way to do that.

Ben:
That's fantastic. Again, we'll be sure to get some of those resources out that Michelle mentioned. That's great, Michelle. It's great to hear how you are still very much piecing together the practice because the models are quite literally learning as you're working or you're sometimes playing catch up, sometimes innovating the actual model, but then once you press play it's... the basis of it, it's to learn. To that point we have a lot of questions that revolve around how you as a researcher focused on and trained in user research methods broadly bring stakeholders who are more quantitatively minded to understand and see the value of more qualitative methods.

Ben:
Along the way I'm asking how do you build bridges between teems that might not be integrated like they are at Google or other places where in the digital or the machine learning teams are very quantitatively savvy. Some of our attendees might be on those more qualitatively minded teams for home... advocating for these methods is a challenge. Do you have any tips or advice on how they can get a seat at the table or kinds of I don't know, outputs or deliverables that really work? I know for me as a researcher, I always like showing engineers and and product designers videos, like those videos you shared. I always find that a quote with a video exemplifying what's going on, and then maybe some of the light quantitative you shared is really useful. But what do you think?

Michelle:
Yes, that is a great question. I think what I always tell machine learning scientists, Quant people, BI engineers, whatever, is like the models that we are making are not made in a vacuum. Like you are not just making them for no one or just cool, ML science or whatever. These are going to end up being used by people, and you're UX, qualitative researchers, you have your UX designers, they are your gateway to your end users. I'm bringing them along to understand, "Hey, what is this model doing? What data are the inputs? What do we expect to be the outputs? What do we expect? The experience to be only further improves, not only how people are going to use this model but potentially helps you get data for future models that you don't even have yet through building and feedback mechanisms."

Michelle:
That's something too that that is a pretty important step in the machine learning. Or just like data science process is like getting models or data for future models and also like that's all about UX about like building trust with your end user about like, "Hey, this is why we're giving you this recommendation. Like [reason] codes as well as like surfacing to them like, "If you give us feedback, this is how we will then use it for future models or things like that too," being transparent about why is the model giving me this type of input? It just makes better machine learning experiences. Really advocating for like, "Hey this model is going to be used by 10 people. You need me in the room to not only help design the interface and like how it is going to behave, but also improve it for future like models that might not exist yet." So...

Ben:
That's great. Jake's question is the next top voted one and I think you have implicitly answered it. He asks when and how does the research you just described influence how the machine learning model is designed? Is it merely just experience designer? Or can it influence how the model is designed and it sounds like it very well can, you need to make the case to those who are doing the designing that these data were while they might not be the, the X's and O's or the ones and zeroes that make the model, they should inform how the model responds to data. Did you have any other thoughts on how the research that you described here can and should influence how the model is learned or designed or trained or set upon user data?

Michelle:
Yeah. A lot of times in machine learning it's not like a model is like one and done. It needs to be updated over time. Like what we built like five years ago might not work for now. like what does that look like to have an ongoing system and everything too. At least getting into that process of like, "Hey I am your partner in creating like, as a designer or a qualitative researcher, I am your partner in creating better experiences for people. If we can get it in this round it is so much easier for future rounds, it only improves the model for future rounds as well, and being able to extrapolate that away too, to be like, "Hey not only did we have a win with this product where we incorporated UX into the machine learning process, but here's key takeaways about how we could do it in the future too."

Michelle:
Speaking about the machine learning process and everything too, I fully think that it can be in parallel. UX research can be done in parallel with building out the model. As long as the machine learning, data scientists, scientists understand we are testing out, are we building the right experience for machine learning? The core of the machine learning probably won't change that much, but building the machine learning model from the start to allow for feedback is a really important step. Just like it's really hard to build a system that allows for feedback when it wasn't made to accept feedback. Knowing that that is coming down the pipeline, knowing that this is the type of input that we will be giving as UX researchers I think is all really important.

Ben:
Oh, that's great. Thank you so much Michelle. I love thinking about how the user experience researchers, designers, and just human centered thinkers broadly can influence these machine learning models in how they're designed to accept future feedback. That is so vital. When again, you're working with something that might falsely be thought to be very static, once you press play, that's it. Oh gosh, we've missed our chance. A great thanks to all of you who asked questions that touched on that. We are out of time this month. Michelle, thank you so much again for your time. To the attendees, thank you for spending a little bit of yours with with us. The recording will be sent out. Thanks again and we'll see you next time.

The Latest