[MUSIC] Alan, thank you for joining us today. I really appreciate it. Could you tell us a little bit about yourself and what you do now and how you came into be associated in the world of privacy and data protection and data security? I actually got into this slowly over the past eight years I would say. I did my PhD as a sort of traditional computer scientist building systems, very technical sort of work. And then around 2012-2013 myself and a couple of colleagues got really interested in studying algorithms. This was at the time where services were becoming more ubiquitous and the real change that at that time was Google started personalizing web search results. Other services are persons like sites like Facebook for example are defined by personalization, it's based on who your friends are, right? My Facebooks will look very different than your Facebook. >> Sure. >> Same thing with Twitter and so forth. But other sites like Amazon are highly personalized, when you log in the front page of Amazon is all recommendations based on your previous purchase history, what they think you'd like what other people like, you like to buy and so forth. And it's just more natural now and even services like Google Maps, for example, if you just bring up maps.google.com, it's going to show you all the places you've previously searched for, presumably because those are the most relevant for you, right? So I think people have accepted it now. >> Right. >> But in a back six seven years ago this was a change from the way services used to operate to where they started operating now. >> Right, but some of the examples that you gave are really visible examples, so when you log onto Amazon, they're literally tags that say recommended for you, right? And I find it's very usefully like, you know what? I do want this next book that I have been hearing about. But are there places online where it doesn't actually, it's not that visible, where you log on and you might be sort of seeing a version of the web that you don't know is being highly tailored to you? So certainly advertising is one context where we see that and a lot of our work at least on Facebook has focused on how is Facebook and the advertisers deciding which ads you see. And that is incredibly personalized that down literally to advertisers picking your email address and targeting you with ads but there are other services where that's also less obvious. Some of our studies have looked at, for example job search services. So indeed.com, monster.com, these sorts of services where they want to find relevant candidates for jobs, and the notion of relevance, and how much relevance is taking into account varies, it's not really well described. And so I think a lot of times both the users as well as the companies who are using these services don't realize the extent to which other results or other candidates in this context are being hidden from them. Because the service, the algorithm doesn't believe they're a good match for this for the job that's being listed. >> Yeah, which is really problematic because you are not allowed of course, to discriminate based on race employment or housing opportunities. So I want to pull back then a little because I think that that your work hits upon a really profound concepts in our modern world, which is the idea that these algorithms reflect value choices. And A, I wonder if you could sort of talk for a second and maybe simply exclaims what an algorithm is for people that aren't familiar with it. And then talk about sort of how these how these algorithms are created and then reflecting its. And then I've got a follow-up question about what we should do after that. >> So here's the way I like to explain it, which is imagine you're a bank and we're safe before computers, right? >> Right. >> You receive loan applications, you have to decide should I make this loan, right? And you don't want to lose money, you want to give it to somebody who's going to default on the loan. So you have some sort of flow chart and that's based on how much money they make and how much are they asking for or whatever, right? >> Right. >> So you have some sort of process, and what we realize back and say 80s and 90s, is that well, okay, a lot of those steps in the flow chart that could be automated, right? Now, I could automatic the process that loan application. >> So you could write a piece of code that says if if $10,000 reject if $100,000 income accept. >> Precisely. >> Right. >> I just automate the process of applying the flowchart to an application. >> Right, okay. >> And but then what happens is over there that past 30, 40 years the amount of data that we have on people has grown substantially. And so now that I have a lot more data on you, it's not your applications, it's not just one page, it's many pages. My process, my flowchart can get much more complicated, right? I can say okay, if you have less than this amount, but if this other thing is true, like if your parents have this much money and whatever, so my process can get more complicated meeting my code that implements this process will start getting much more complicated. And what people realize is that okay, well, as my code gets really, really complicated it's no longer clear that I'm following the best possible process. In other words that I'm making the best loan decisions I could. >> Right. >> Because at the same time that my loan applications have grown in sort of the amount of data they have, I have a lot of historic data on what people have done in the past. Meaning when I've made a loan the past did they default on that. >> Right. >> And maybe what I could do is leverage that old information that I had. And so Machine Learning or when I call algorithm, that's essentially what people are referring to. >> Sometimes I call it AI. >> AI- >> Machine software. >> It's all kind of the same. >> Right, right. >> Right, it says okay, we're not going to do it this way, we're not going to have this flow chart that some human wrote down. Instead what we're going to do is we're going to have the computer learn the flow chart itself. In other words it is going to be given all the historic data and say here's a big pile of loan applications, here's the ones we made, and here's the ones that's defaulted. And it's going to tell the computer you figure out the best set of rules, the best flow chart for which we should make a decision on whether or not to make a loan. And you have to give it some objective and in this context our objective would be minimize the probability of the default, right? >> Okay. >> So make the loans that are the best loans for us to make we make the most money, right? And if you have enough data, that's what Machine Learning does is it takes in all this old data, it's called Training Data. And it learns what's the best series of steps to go through to be able to decide yes, that we should make that loan or no, we shouldn't make them, right? >> Right. >> And so that's Machine Learning, right? And there's a lot of technical details and how you make this work, but that's ultimately what it is. Where the computer is learning the rules itself. >> Right. >> Okay, but think of other contexts think of jobs, for example, another special legal protections around anti-discrimination for jobs. You can't run a job ad saying only men can apply, it's actually legal United States. And so if I'm running a job ad say on a service like Facebook, I would not want to say only target men, right? Because people got in trouble for doing that kind of thing. And so instead I as the employer says I want to target everybody equally, right? >> Right. >> What we found is that there's a second phase after the targeting, where Facebook says okay, Woody Hertzog's browsing Facebook an ad slot comes up. >> Right. >> And then Facebook has to pick from among all the advertisers who's included you in there set which ad should win, which one we should we show. And if you ask Facebook, they're like look, we want to show relevant ads, right? It's showing you an ad that we're pretty sure is irrelevant, it's wasting your time because it's irrelevant and it's also wasting the advertisers money because they're paying to show you an ad that is irrelevant, right? >> Right. >> So let's find the one that's the most relevant show you that everybody is happy, right? >> Yeah. >> But the worry is well, that estimate of relevance essentially, where's that come from? Well, it comes from what people did with previous ads which essentially all of our societal biases get baked into that. Meaning we have results showing if I run an ad for jobs and save the lumber industry, which is a very male, very white industry that Facebook will steer that towards almost 90% male 85% white users. Even if the advertiser says no, I wish to target 50/50 male-female, 50/50 white black users, right? So it's a case where the algorithm, again, Facebook is not trying to do this in the sense of they're not actively saying yes, we want to be racist or whatever. >> Right, right. >> But it's just a consequence of searching for like I'm trying to optimize relevance without controlling for these other potential negative effects that the algorithm can have. >> So this is really interesting, there's a second layer that results in what we would say would be a disparate impact on certain populations. Not because of any actual choices, but because of the simply the way the Machine Learning sort of works, right? It trains on, it looks at all the stuff that came in the past, right? >> Yeah. >> And that stuff is sort of messed up. So are you saying that there might be instances where, [COUGH] for example, if you had a job ad for a CEO, demographically if you look back CEOs are all- >> Yeah. >> Right, I mean like just overwhelmingly, right? Because they're the biases were baked in. That then women that are looking for jobs will not see a lot of these ads or not see a lot of these job offers? >> Yeah, we have result, we ran an experiment where we took, we ran job listings our own ads for jobs in ten different verticals, including janitorial services, as well as lumber and supermarket cashiers, these sort of things. >> Right. >> And we found very dramatic differences in how those ads were delivered despite the fact that they were targeted identically. Meaning is the advertiser we told Facebook going to target everybody but the estimate of relevance ended up steering it highly male or female or highly white or black, depending on what Facebook's estimates of which users would be interested in that. So to your example, yes, if there was a woman who is not qualified to be CEO was looking for a job, the relevance estimate would likely make that harder for her to see that ad even though she's equally as qualified. >> Well, unfortunately, I think we've run out of time, but I want to thank you so much- >> Thank you. >> For joining us today, and we appreciate it and we'll be on the lookout for more incredible research friendly. [MUSIC]