Hello, and welcome back. Today, we'll be talking about qualitative analysis. I know I've been referring to it for quite a few videos as we discussed things like interviews and observations and contextual inquiry, and today, we'll actually talk about how to do it. The method I'm going to be discussing here is inspired by something called grounded theory method. Now we won't go into too many details about grounded theory. It could take years to learn and I'll refer you to a book at the end but I'll just try to be very pragmatic and describe the methods step by step. So what you need to know about this is that this is an inductive data driven approach. What this means is that you don't start with a hypothesis and you don't start with the things you think you're going to find but rather you sift through the data to see what insight is available in the data itself. This process also relies on something called constant comparison of open codes, the idea that you identify units of meaning within your data. And then you compare them to each other to identify meaningful clusters and patterns. Now both of these, the idea of a method being inducted, and data driven, and the idea of constant comparison are key to this grounded theory method. And so if you do end up reading the grounded theory book or finding more about it, you will hear words like data-driven and constant comparison used there as well. So, this kind of serves as a foundation to learning more about similar methods in the future. So, here's an overview of a qualitative analysis process. And you're going to prepare the data, do an open or initial set of coding on it. Do some thematic clustering final coding and finally writing. And writing is part of the qualitative analysis process. This method is really powerful because it can actually be applied to any type of data that could be made textual. So obviously, if you have interviews that you've recorded and then transcribed, that transcript is textual data. But perhaps, less obviously the field notes that you make in observation or after contextual inquiry are also potentially part of a dataset. You can even add word descriptions or text descriptions to something like video, audio, images. Like for example, children's drawings on particular topics can become sources of analysis by converting them to text. So as long as you can take some piece of data and make it into a paragraph you can then do qualitative analysis on it. So the first step is, in fact, this preparation process. Can you take the data that you currently have available and transform it to text? If it's an interview, this just means transcribing. If it's other types of evidence, it may mean creating a text description of it. Now if you do this project in the real world or if you're doing it for research, chances are you want to keep your participants anonymous. You don't want their names appearing in your findings. And so, I find that this is a very good stage to actually go to that anonymization process. Typically I just remove all names and label each participant with the number, participant one, participant two, etc. And this is the good stage where you want to also link any associated data for specific participants using that number. So let's say you had conducted an interview and then some observations. And then also giving a questionnaire to the participant. You want to make sure that all of those are linked to the same participant number. And this is great to actually do explicitly because I have had a study where we kind of half-hazardly ended up assigning numbers. And accidentally assigned the same participant number to two different participants. And then ended up spending roughly a week trying to disentangle the data and figuring out, we had this quote from this participant, this is indeed from participant two, or is this the other participant that we labelled participant two. So if you're explicit about this and careful about this early in the process, you can avoid going through the same panicked week of a lot of work just to make everything make sense again. The second part is called open coding and the goal of this stage is really to immerse yourself in your qualitative data to really understand what meaning is there. The general process is pretty straightforward. You take your textual data. You read through it. You highlight anything that may be significant. And you write a short phrase for this highlight to express the meaning that's in this highlight. This short phrase is in fact your open code or your initial code. Now, generally you've tried to write a short phase to capture each unit of meaning. So, it could be with the participant has actually spent a minute communicating with you and there's really only one unit or meaning there. As people we frequently add a lot of kind of cushion words around what we say, so the participant may have repeated your question and then said let me think about it and then said I think the answer is. And then only that last part would they actually tell you to the answer to the question is the unit of meaning that you have to label. Conversely, sometimes a single sentence can hold a lot of meaning. So maybe they say one thing, but it actually applies to multiple different points. So maybe I ask them, how do you currently stay in touch with your child? And these are something like well I really rely on the residential parents who help me stay connected. We also use phone and video whenever possible except with the kids sometimes thinks it's boring. So it's actually three, maybe four units of meaning within that one sentence that participant could have said. The first unit is they rely on the residential parent at hope. The second one is they use video chat. The third one is they use phone. And the last one is that this is not engaging to a child. So I could potentially have four different open codes just for that single sentence. And they're all called kind of units of meaning. So as an example, I recently did a set of interviews with families, cross cultural families. So these are families where one parent is from a different culture from the other parent. To kind of identify opportunities for technology in that space. And from the 20 interviews that we conducted, we had more than 750 open codes. And that's pretty typical. You usually get hundreds of codes just from a few interviews. So the next part is actually making meaning of all these codes that you have. So this is the clustering process. What you really want to do is arrive at thematic clusters. Kind of larger units of meaning that take all these open codes and combine them in a ways that let you see patterns. So this is definitely kind of there's an art to this process. It could also be guided by theory so if there's specific things that you already know that you may be interested in you might kind of look for that in the clusters. But most of the time it's fairly data driven, so you start out not knowing what kind of meaning you have in the data. Now there's tools that aim to help you with qualitative analysis. Here we'll actually just be mostly focusing on paper as a tool because that's the tool most readily available, and honestly, that's the way I still do qualitative analysis. I think it's really effective. I think it's a great way of collaborating. But there are other tools out there. And just so you're aware, there's a free tool called QDA which helps you assign codes and then cluster them. Recent tool somebody told me about called NVivo is also all about that kind of qualitative clustering. And then, Atlas.ti I think is a fairly comprehensive, fairly expensive tool, that also specializes in kind of doing video labeling as part of that dataset analysis. So, but let's talk about paper, because like I said this is the tool that is actually available to most people. So here's how you would do thematic clustering using paper. So have one open code written on the note, so maybe a sticky note, a post it note, whatever you have available. And also jot down the participant idea on that note, so something from my previous example. Child not engaged with video chat or phone was the unit of meaning from the sentence apparent said so we jot down that, that will be on a note as an open code and now let's say something like participant tree the one that set it on the bottom. I may take that note in another note that I have currently in my large open code file. And compare it maybe I have another note that says something like my kid is bored unless he played video games. Maybe that's a quote that some other participant had. Well, I can compare that to and think, do they actually communicate some similar sense of meaning? So maybe these two are actually pretty similar because they're both about the child not being engaged. But they're not exactly the same. So one is just talking about the fact that phone or video chat is not engaging. And the other one is really focusing on the fact that video games are. So I would put them somewhat in a similar location on say, a wall or a tabletop, but maybe not in the exact same space, so they wouldn't be overlapping. Now if I then pick up another note, and maybe it says something about video games but nothing about engagements, or something like my kid and I like to play World of Warcraft together. That might go closer to the note about video games further away from the note about engaging over phone or video chat. And so on and so forth, each note I pick up I compared to all the notes that already on the board and place in relationship to them kind of a, in terms of the geography of the board. Now as you keep going through this process as I said, it's quite typical to have something like 750 open codes from a set of interviews. You may find that you need to keep adjusting your category. So for example, let's say that halfway through this process, I find that there was a lot of nuanced answers that people give about video games. And that I can't just keep it all as a single category. That maybe lots of people talk about World of Warcraft, lots of people talk about Minecraft, lots of people talk about casual games. A few people talk about games you play while also video chatting. Kind of like, play checkers over the network. So all of those games are actually a little bit different. So, I may actually go into my category that had all the games listed there, that it was just about video games, and actually do a little bit more kind of separating things and moving things apart. I may also find that it kind of started with something being central. And the entirety of the bodies of notes kind of moves towards a certain direction. And I may need to kind of reposition it in order to continue being able to add notes to it, so this is quite common. So adjusting, adding, new clusters, moving entire clusters is all part of this process. And when you finally ran out of notes, the next step is just to name the clusters as you see them so, perhaps, I would still have a cluster that's focused on playing video games with children over distance. Maybe it'll have a cluster that's focused in defectiveness of video chat in phone and those might be separate clusters. Now the next step once you kind of have gone through this process is to actually kind of develop your coding scheme. So these clusters that you have named now serve as the coding schemer for you to analyze the rest of your interviews. The idea is that now you can to a pass through the data so leading again through all your textural data. Coding units, whether it's a line on your field notes. Or whether it's an entire participant saying they fit into this category or not. Or whether it's a specific comment or a specific answer. So coding these units into the identified clusters. Now optionally, and a few people like to do this but it's kind of a controversial thing in the field still, optionally you can do inter-rater reliability on a subset of these codes. So if you want additional confidence to know that a particular statement you've actually assigned it to the correct category, or I would say assigned it to a reasonable category that other people would assign it to. You can ask somebody to help you calculate an inter-rater reliability analysis. The way you do this is you provide them with a code book that you use to assign your codes, and you provide them with some randomly selected subsets of your unit of analysis. So it might be some set of statements that participants made or some set of interviews. You ask them to go through it and assign codes and then you calculate the Cohen's Kappa in that analysis to make sure it's satisfactory. You might think, okay, why wouldn't everybody do this. It seems like it would only improve work. It's not that controversial. And the reason that not everybody does it is, in fact, it could be a little bit misleading. It's not adding quantified information to a place where that's not really what this is all about. So, qualitative analysis is kind of about making meaning, and that meaning is being made from fairly small number of participants, so interviews, typically less than 20. Contextual inquiry may be even smaller than that. And assigning things like inter-rate reliability or numbers and saying how many participants did what, can actually be interpreted in a misleading way. People start thinking that your data is representative, and I have had this happen to me at one my paper said something along the lines of three of our participants said that they use video chat, and later I found this paper sided elsewhere as something along the lines of it gave a percentage. It said something like 3 out of 18 participants, and therefore this percentage, will actually use video chat. It kind of suggested that we had a representative sample, where that was not the case. So lots of people who actually report qualitative data choose to focus on themes. Because those are harder to kind of try to think that this applies to everybody. It's more focused on this idea of telling the participants' story. And so it's kind of more true to the methods nature. But in the end, especially with kind of less with textual data that may be more controversial or perhaps it's harder to put into categories. I think inter-rater reliability analysis could add something to that process. And now we get to the final step which is writing. So this is part of the qualitative research process because you're really putting together all these very nuanced stories, all these very nuanced ways of looking at it. Writing really helps you get the concrete statement. What are we getting out of this? What is actually the takeaway from all these interviews, or all these observations, or all these contextual inquiries that we've done? And there's kind of a few guidelines I have to being a good writer of formative work. So the main one is that you want to tell the story. So your data is going to have a story, and that doesn't just mean listing all the themes or listing all the clusters that you came up with, but rather kind of the larger takeaway. You want the story to always stay true to the data. Of course, you don't want to lie. You want to support every claim with quotes that participants have made, and you want to account for exceptions. If nine of your participants said one thing, but then there were two that were kind of vehemently opposed, they are very different from that one actually talked about those exceptions and do communicate experience of these other users as well. I actually really like things specific by giving numbers of participants. Again, this could be controversial because then, it could be miscited as saying that a certain percentage of your participants that something. But to me it's kind of more exact than saying some or many or almost none or whatever kind of hand way the words they can use, and the key is that you want to give next steps and implications for design. And I see implications for design here. This could actually be either specific design ideas or it could be something like personas or it could be design requirements. Whatever it is that you think should be done with the data that you found through a formative process, I think it's important to articulate it because at this point of the process you'll have more insight into the data than perhaps anybody else in the world. You've probably spent hours reading through the interviews, you've spent hours making open codes, and clustering them into categories, and then actually coding all the interviews again with your final categories. And that deserves some recognition. I think your recommendations will be taken seriously, so you do want to make sure to actually give them in the write up of your formative process. So I've mentioned these books before. But all of these include chapters on qualitative analysis. I think the one book I hadn't mentioned before is this Basics of Qualitative Research by Strauss and Corbin. If you want to learn more about grounded theory method and kind of really not the abbreviated version but a real process I think this is the book to read. The one in the middle there. But the two other books to the side, Interviews and Analyzing Social Settings are excellent books that provide very practical descriptions of the analysis process. And so I also recommend looking at those. So to summarize, qualitative coding is complicated. But in the end, it's just a process that you can follow. So I hope that this gives you more insight into how you can make meaning out of your qualitative data, and good luck with your analysis. I'll see you in the next video.