Welcome back, in this lecture we're going to take a look at some of the interesting work on explaining recommendations to make them more accessible to the users who are receiving them. The objectives of this lecture are to understand the variety of ways that explanation can be made in a recommender system and what we know about how users react to them. To understand how explanations can fit into a recommender system. And to recognize some of the pitfalls in explanation, in particular the difference between explaining and inadvertently persuading. Let's dive right in by taking a look at the key reference in this space. A paper written by Herlocker and others, including me, back at the turn of the century. It's called Explaining Collaborative Filtering Recommendations, and I'm going to show you some of the shots out of that paper. It's also linked for you on the course readings page. There's also several dozen more recent papers that look at the specific aspects of explanation. How explanations influence trust. The value of transparency in a recommenders system. Giving people control over their profiles. Computing efficient explanations. But almost all of these goes back to the first experimental work which is the work that we'll talk about today. So what is an explanation? One way of thinking about it is as additional data that helps a user understand a specific recommendation. So this is different from saying here's how the system works as a whole. This isn't saying, well, we compute nearest neighbor correlations and select a neighborhood. But rather something about the specific recommendation. This recommendation was based on 83,000 pieces of data. Or we have very high confidence in this for some reason. Or maybe this is a recommendation that really can be grounded in three of your friends and what they said. It's lots of different ways of explaining a particular recommendation. Often, but not always, it's a glimpse inside the computation. Something you don't see when you see simply four stars. If I show you that an item has four stars, you might first think, on a five star scale that's pretty good. Most people probably thought it was pretty good. But would you think differently if I said, what you should really know is that this isn't four stars because most people think it's pretty good. This is four stars because a bunch of people really hate it and the rest really really love it. Nobody gave it anything but five stars or one star, but you can't communicate that in four stars, and an explanation might say, hey look this has a weird distribution, it's highly bi-modal. This is a love it or hate it kind of thing. There are other weird distributions as well. And in fact there's these weird cases where the thing we predict is the least likely outcome that you'll actually have. Think about that four star example and now let's make it three stars. What if I knew that 10,000 people had seen this movie? Let's come up with an example where this could theoretically be true. The Rocky Horror Picture Show. If you haven't seen it look it up, maybe go see it. It would not be far fetched to believe that half of the people who rated Rocky Horror rated it five stars. They love it. It's a phenomenal experience. And half of the people may have rated it one star. That's a horrible movie. In fact both groups usually agree it's a horrible movie that can be a phenomenal experience. How does that work? Well when you do the computation, if we can't tell which group you're in You might get three stars, but it's almost certainly the case that you won't think it's three stars. You'll think five or you'll think one. This is a phenomenon that we see with a very certain type of movie. The term that's often used is date films. And we only see it in that case, and the reason we only see it there, is that mostly people self select what they go see, and they mostly go see things they like. So normally you'll get a nice normal distribution around the mean. And that mean is high, three and a half, four, four and a half stars. But if you have a movie that people are dragged out to go see with others, then suddenly there are movies that people saw but hated, and explanations can help us find that. Finally, explanations are sometimes tied to the ability to edit a profile to improve recommendations, and we’ll talk a little bit more about that in a little bit. So let's take a look at the Herlocker paper in detail. I'm going to cut over to the overhead camera, where you can see it's figures. This table, table one in the paper, is really the main result of the paper. And it shows two groups of explanations. 21 different explanations in all. And these are explanations that people liked, people were neutral about, people disliked. There's also data on what seems to work but what people tended to like We're simple explanations. Actually right next this is one of the ones that was there. This was number four on the list. Your neighbor's ratings for this movie. So you can see by looking at this I find out that 14 of my neighbors gave it 4 stars. 9 five stars, 7 three stars, and only 2 and 1 gave it two and one star respectively. My neighbors basically liked this movie. Actually, one of the few that was significantly better than this, not statistically significantly, but ranked higher, Is this histogram that just makes it even simpler than that. The ones and twos were three people the threes were seven and the fours and fives 23 people. That's a really simple explanation. What didn't work? When we come back to the table, a whole bunch of things didn't work. One of them was the use of any statistical terms. The explanation of how many neighbours did you have, and what was their average standard deviation of their ratings among them. Woah. Standard deviation doesn't make me feel like going to see a movie. Neighbors and correlation. Correlation sounds like a sequel to the matrix or something. But not getting people excited. Actually our favorite visualization was a dismal failure. Let me show you that visualization, because if you're like us, you'll think it was wonderful! I'm going to zoom in just a bit. And what you're going to see here, is all of the data that went into this particular prediction. From how strong to how weak your neighbors were and what they rated the movie all with gratuitous color and pretty bars and people hated it. Hated it horribly. What became clear is, users want simplicity, and simple visualizations work well. But we learned something else, too. We learned that some of our top examples were interesting lies that we made up to simulate real data that had nothing to do with explaining so much as supporting our original recommendation. So number two on our list was past performance. A statement that said 95, or I think it was 90% of the time the system has been right in the past for you. I think of it as bullying, frankly, but it could also be thought of as an appeal to credibility. Now, this is mom saying last time I told you to take a sweater and you didn't, you were cold. So see the movie I told you to. Also up here Impositions 5 and 6 were similarity to other movies rated. We didn't actually give them any data. We just said, this is similar to other movies you liked in the past. Okay, that sounds good. Or this movie has your favourite actors or actresses. We didn't even bother to tell them which movie, which actress. They seemed to like that too. Things that helped them believe that they were making the right decision. We've actually done other displays. This one's really simple. We came up with a confidence score. A combination of how much data went in, how high the correlations were. And how much we had variance around versus a tight distribution. We've actually done other experiments with confidence where we've annotated recommendations with dice, to show that some movies are riskier choices than others. And people seem to do okay with those confidence scores. Let's cut back to the slide. So I want to pull together the key lessons we have here. Number one, simplicity. Statistical terminology really failed. People didn't want to be overwhelmed with data. Two simple visualizations, simple graphics. There're other people who have done similar things with bullseyes and targets. Anything that really simply makes it clear what's going on. And three, users value supporting information, not simply an actual explanation of how we got to this data. But other things around it that helped them make a decision, historical success, attribute-linked data, product associations. I should make clear that we made one big mistake in this piece of work.We knew it by the time the work was done. We never actually measured the effectiveness of our explanations. We measured their persuasiveness. We asked people if they liked them, and then we picked a movie we had reason to believe they would enjoy and saw whether the explanation made them more, less, or equally likely to choose to see that movie. The problem there is, if the real thing that was going on, was they shouldn't have enjoyed the movie. We couldn't measure that. And we couldn't distinguish between bullying people into seeing the movies, because we're always right and we can tell creative lies about how much you'll like it. And actually helping people go along. Fortunately, others have followed up this work. Looking at explanations that can be evaluated entirely on whether they help people make better choices. Not simply whether they helped people make the choice you wanted them to make. The whole area of persuasive computing is another field, in and of itself. So, explanations show up in practice. We saw that with the example from Amazon.com in our tour earlier where you could ask the question, why am I getting this recommendation? And get back the items that it was based on, including an ability to provide feedback to update your profile to perhaps change. In general, why is a remarkably compelling anchor. You give the person who's using a system the chance to say, hey, why are you doing this? And more often than not, you'll see that they're curious and they ask. And to provide user control, the idea of progressive revelation, tell me more. Here's a movie, four stars. Tell me more. Well, we have 37 people like you and 18 of them gave it five stars and 12 of them gave it four stars and so forth. Tell me more! Well, here's some information about the cast and the crew. As you give people the ability to progressively reveal, you'll find they take control and get the information they want. Be careful though. If you overwhelm the user with too much information when they don't expect it, that's an easy way to cause a user to shut down. Now explanations are not only for user user collaborative filtering. You can use explanations with content based filtering. You can use them with the later algorithms, the item-item and the dimensionality reduction algorithms we're coming up with. We present them here because the original work was done in the context of user-user filtering and the explanations make sense. In terms of neighbors and neighborhoods. But the concepts of explanation generalize much broader. Should also point out there's a wonderful, mostly open area. There's been a lot of explanation of specific items. Why did you recommend this item? But much less explanation of lists or sets of items. Why did you put this whole list together? Is this more diverse? Is this somehow covering an area as a spread rather than just a set of individual items? And as we talk about diversity and serendipity in the next module, you may want to think about some of the explanations that could be useful to users who are using systems that try to enhance the recommendation lists. So as we pull this together, you've seen several types of explanation. You've seen more data on the prediction itself. You've seen other data about the item that's being recommended. You've seen simplified presentations of the relevant data. Histograms or when they were not simplified, you saw that they were not particularly effective. Past performance data and supporting statistics that can be useful. And now you have at least some experience with the kinds of explanations and how they work with ordinary users. [SOUND] So moving forward, the rest of this module is now a set of interviews related to trust, reputation, influence of bad ratings on a recommender system. And alternative ways to focus on trusted sources or limit the influence of unknown sources in your system. We have assignments related to user-user recommendation, programming assignment and hand exercises. And then our next module moves on to evaluation where we'll be looking at a bunch of new metrics that can help as we think not only about recommendation generally, but about this topic of explanation. We'll look forward to seeing you there. [SOUND]