0:00

Welcome to Lecture four of Networks, Friends, Money and Bytes.

Â And today's topic is Netflix. Let me first warn you though, that

Â throughout the course of twenty plus one lectures, if you plot the length of the

Â video lecture, okay? Against these twenty lectures, they go up

Â and down. The average might be 75 minutes across all

Â the modules, okay? So, they may go up and down if I

Â interpolate. Now, this Lecture four is actually, going

Â to be by far, the longest lecture. You will see that we have quite a few

Â modules and each module is kind of long because we have to introduce both the

Â concepts of collaborative tutoring, okay? Individualized recommendation by scaling

Â out the system, the network as well as the mathematical language of more optimization

Â theory. Again, if you plot the level of

Â mathematical difficulty involved, or the amount of symbolic operation involved

Â against these twenty lectures. So, they go up and down quite a bit

Â actually, you know? I'll say that, this lecture is perhaps, by

Â far the most mathematical one. You, if you can pass lecture, you probably

Â can face any of the remaining mathematics throughout the rest of the course, okay?

Â So, to some degree we are climbing up a hill today, and this hill is both

Â conceptually literally involved, and mathematically would require some hard

Â work as climbing up a real hill would demand.

Â But, if you can climb through this hill, then the rest of the course should be

Â actually much easier, especially on some of the lectures coming up.

Â But, having said that, let's first review what we had in the past three lectures.

Â We saw three beautiful equations, and they are very useful equations too.

Â The first one was in Lecture one, DPC, Distributed Power Control used in 3G CDMA

Â data networks. And it says that, for each transmitter of

Â logical pair, i, the transmit power, P, at the next discreet time slot, t + one

Â should be whatever it is right now, P sub i of t, times a gang parameter and that

Â parameter is the ratio between the target SIR gamma, and the currently achieved SIR

Â on this length. Now, this does not change over time as far

Â as this algorithm is concerned, it is a constant.

Â And this varies over time as different people change their transmit power level.

Â And we saw that this is essentially, an implicit feedback signal telling you

Â whether you should increase or decrease your power.

Â And the second equation that is beautiful and useful we saw in Lecture two can be

Â summarized as following, that it is the payoff function for the i-th bidder in an

Â auction, U sub i is a function of all the bidders' bids, just like the SIR is a

Â function to all the transmit powers, the vector b.

Â And, it is the difference between your private independent evaluation, V, and the

Â price you have to pay, which depends on all the bids.

Â And this function, the price, which is not the power now, the price, as a function of

Â all the bids. The shape of this function is determined

Â by the designer of the auction. And for different kind of function P of b,

Â you will induce different kind of behavior by the bidders.

Â They will bid different, bi, stop the optimal according to some metric, b, for

Â example, whatever maximizes the utility. Now, of course, this is assuming you

Â actually win the allocation. If you do not get item then utility is

Â zero. This definition of payoff function, which

Â also highlights the fact that you can design the auction and then that will

Â induce different bidding behavior, is the second equation.

Â The third one, which we touch upon last lecture, can be written as the following.

Â Simply the vector pi star transpose equals pi star transpose times G.

Â This is the Google matrix that we defined last time, and it is defined in such way

Â that the corresponding eigenvector for responding to the largest eigenvalue of

Â one can be uniquely defined and efficiently computed, okay?

Â So, this defines that Google PageRank score, which further leads to the rank

Â ordering of the web pages. Now, if you count how many times DPC is

Â used in the mobile world. If you count how many times auctions used

Â in many contexts including the online space, if you count how many times the

Â Google PageRank is used every time you search on Google, you can tell these three

Â are powerful equations. They are simple but they are very, very

Â widely used. So, now, we're going to continue this

Â track of four lectures. We've talked about power control,

Â distributed protocol. We talked about second price option, we

Â talked about PageRank. And now, collaborator filtering for

Â recommendation, like on Netflix. And these four lectures also introduce the

Â language of optimization, then game, then graph, and our learning theories.

Â Now, the backdrop is Netflix. To those of you who are in the US, you

Â know Netflix very well, started it's DVD rental business a while back, actually, in

Â 1997. So, the story says that, the founder of

Â Netflix was tired of paying fees to the DVD rental stores.

Â If you remember those days, there were a lot of those on the street corners.

Â And if you don't return the DVD in time, or the VCD in time, or the tape in time,

Â then you pay extra fees. He says, well, that's not very appealing

Â to me. So instead, he says, you can keep the DVDs

Â as for as long as you want, okay? As long as you keep paying the monthly

Â fee, then that's actually great thing for the company.

Â You don't return, I don't send you new ones, okay?

Â So, you can keep paying the monthly fee without getting any new DVD.

Â This idea is going to show up again in a completely different context of sliding

Â window control in TCP, congestion control. That would be in Lecture fourteen.

Â But, is effectively the same concept. You don't return, I don't give you more.

Â And by 2008, in the US, there were about nine million users of Netflix DVD rental.

Â And then, that year or so, Netflix started trying out something quite new of using

Â the internet to deliver feed. Internet as the medium instead of the

Â postal service to deliver video. So, the content is stored in some video

Â servers, is sent through the internet and may be wireless networks to your internet

Â connected devices, which could be TVs or set-up boxes, but also could be game

Â consoles, smart phones, tablets, PCs and so on.

Â And by 2011 spring, it's got 23 million customers.

Â 8:22

And, in summer of 2011, it was counted that about one in every four, a quarter of

Â the bits going through the internet that month was Netflix traffic.

Â That was huge, okay? Now, of course, video is a, a very big

Â files and therefore, if you count by volume, you're going to get big numbers of

Â percentage. But still, one in four, that's a lot.

Â And then, in September 2011, you may remember that Netflix tried to split into

Â different, two different businesses separating DVD rental, online streaming

Â and then they later put them back together but the billing still what became

Â separated. Now, later what we will look at the

Â technology network involved for multimedia streaming over the internet in Lecture, I

Â think, seventeen. We look at Netflix, and Amazon, Prime,

Â Google, HBO, Go, IPTV and so on, look at the differences across the models.

Â Today, this Lecture four, the focus is, however, on the social network dimension

Â of recommendation system, okay? You and I, the customers on Netflix, also

Â form a network. These 23 million and, bigger number now,

Â they form a social network. They don't directly interact, but the way

Â that they watch and rate movies will be used and leveraged by Netflix to make

Â better individualized recommendation. Think about it this way, you're on open

Â online platform, okay? Whatever that platform might be.

Â And there are, maybe, 100,000 of you. By scaling out, scaling up a social

Â network, we actually can then scale down the individualized recommendation because

Â we get to see other like minded people like you how they behave, what they like

Â and don't like. And we have much more data.

Â So, even if you don't provide a huge amount of activity on the platform, we get

Â to see others. And this is the basic idea of

Â 10:39

Collaborative Filtering, CF. Now, there are many kinds of a

Â recommendation system. For example, on Amazon, you may notice

Â that each time you refresh the homepage after you log in, they will recommend

Â different kind of products to you. That's largely based on what's called

Â content based filtering. Basically, if you purchase certain kind of

Â products before, like certain kind of electronics, then they will recommend

Â similar electronics to you in the future. If you buy a book written by Steven

Â Hawking, then they will try to recommend new books by Steven Hawking.

Â That is called content based filtering, okay?

Â Look at what you bought, what you'd liked, and then recommend accordingly.

Â Youtube, we will see later in a few lecture's time the way they make

Â recommendations is largely driven by the so-called co-visitation counts.

Â Briefly speaking, it means that it will look at how many users watch these two

Â videos back to back, okay? And then if so, then we put a score to the

Â co-visitation count between these two videos, say A and B.

Â And then, this will give you a way to start quantitatively decide, what video

Â should you recommend. We'll come back to this, later.

Â In Pandora, if you have used that, it is a mostly expert-driven recommendation of

Â mix-up music. But, as a consumer you get to thumb up or

Â thumb down Just like in some discussion forums, you get to vote up or down.

Â Now, Netflix does not want to use expert-driven movie review, even though

Â there's no shortage of movie reviewers, professional ones out there.

Â It instead, uses a collaborated filtering. The idea is that not only we'll look at

Â what you like, but we also look at what similar people similar to you liked and

Â watched. The question is, how do we define

Â similarity between people, or flipped the other way around between movies.

Â Before going further, let's first more rigorously define this recommendation

Â system. A recommendation system is very helpful

Â feature, okay? For stickiness of the consumers for

Â inventory control and so on and so forth. Now, in the case of Netflix, you can think

Â of this as a, say, a black box. There are inputs, outputs.

Â What are the inputs? First of all, the user ID, okay?

Â That's your log in. We're going to index user by u.

Â Second, movie ID, we're going to index each movie by i.

Â And then, there will be of course, a rating which is really a number drawn from

Â one, two, three, four, five stars, and we call this r sub ui, okay?

Â R for rating, of user u or movie i, And, the time that this happened, this review

Â was entered, it's tui. Of course, there's also the review of

Â text, okay? And, we're now going to talk much about

Â text review in today's lecture. So, we've got u and i, and rui, and tui.

Â How many rui's and corresponding tui's are we talking about here?

Â Actually, a lot. Think of millions of users and tens of

Â thousands of movie titles. Of course, only a few percent of the users

Â actually watch all the movies out there. And, or any substantial amount of movies

Â out there. And, among all the movies that you

Â actually watched, you likely will only rate a few of them.

Â Some people like me, actually, rate none of them, I just don't like rating movies.

Â But, a lot of people rate. But, they don't necessarily rate every

Â single movie that they watched. Despite that, we're talking about the

Â order of billings of rui's for Netflix. Now, it's a much challenging problem,

Â actually, when you just have very few rating in your database, the co-star

Â problem. But, Neflix doesn't have that problem now.

Â Alright, those are the inputs. What does the recommendation system do?

Â What is the output there? The output, primarily of course, is the

Â predicted rating, lets put a r hat ui, okay?

Â Now, in the case of Netflix price, they actually know the true rui.

Â They just don't tell you, the competitor into the price, competition.

Â In that case, you actually know the ground truth, then you can compare your

Â prediction r hat ui and the actual rui. But, of course, in the real operation of

Â Netflix, they don't know the ground truth that's why they want to recommend you to

Â watch their movie, okay? So, you are going to just have to believe

Â that if it works well in the test set where you know the ground truth as if it's

Â going to work well, and other part of the system too, okay?

Â So, this r hat however, could be a fractional number, it could be a real

Â number, for example, 4.2 star. You cannot rate 4.2 star but the

Â prediction might say 4.2. So, you can interpret that as maybe 80

Â percent of the chance this user u will rated this movie i with five star, and

Â maybe twenty percent of chance he'll rate it as four star.

Â Maybe that's an interpretation of number 4.2 This r hat to be also be going above

Â five or go under one, but then we have to clip it so that if it is above five, it's

Â five, if it's under one, we just call that one.

Â All right. How do we compare two recommendation

Â systems? What is the metric of [inaudible] that

Â we're going to use to do the comparison to say, look, this recommendation system is

Â better than the other one. What do you mean by better?

Â Well, there are three levels of understanding here, going from most

Â relevant but least tractable to most tractable but, perhaps, least irrelevant

Â among the three. The first level is there, of course,

Â eventually, what I care about is Netflix is customer satisfaction.

Â If I recommend you to watch some movie, do you lack that recommendation in the end?

Â Okay. That actually is just very hard to gather

Â data. The second level is, all right, at least I

Â want to know what is the prediction effectiveness, okay?

Â If I recommend some movie, do you even watch it or not?

Â Let alone you like it or not? Again, that's hard to collect.

Â So, we're going to look at a proxy. Just like in Google PageRank.

Â Eventually, what really matters is, does the search inquirer actually find the

Â resulting search listing useful or not? But, that ultimate test is very hard to

Â quantify and collect data about. So, they use the this, PageRank to say,

Â well, I'm going to say, this is the importance of nodes.

Â Here, we are going to just say, let's quantify the notion of prediction error

Â here by looking at r hat ui and rui, okay? Again, how do you know rui?

Â In real life, you don't. But, when you compare different

Â recommendation systems, you can hold some known data as ground truth and then use

Â that as a test data set, okay? Well, of course, I would look at the

Â difference between the two. Maybe this over shoots, maybe this is

Â under shoots. So, it can't just be the difference.

Â It could be the absolute value, okay? It could also be the L2 norm, the square

Â of the difference, okay? So, if I overshoot by one star or

Â undershoot by one star after squaring the all positive numbers, meaning, positive

Â error, okay? Term.

Â So, I'm going to square them. And, how many do I have?

Â Let's say, I've got a C of, of those ui pairs.

Â 19:29

So, I'll have to sum up across those ui pairs, and the C of them, this squared

Â difference. But, since you squared the things, you

Â actually changed the scale to bring the scale back down, people often put a square

Â root. And that is so-called Root Mean Squared

Â Error, RMSE. This effectively, so-called L2 norm of a

Â vector. The vector being the error vector, the

Â difference between r hat and r across all these ui pairs.

Â You're going to take a look at the each entry of this vector square it, then

Â average it and then take the square root. Of course, when you minimize an error,

Â whether you minimize the square root of the error or without a square root the

Â solution won't change, okay? So, min square error or root min square

Â error, either one, can be used equivalent as the objective function that you want to

Â minimize. All right.

Â So, now, we're looking at this RMSE or MSC as the quantitative metric describing how

Â wrong your recommendations are across this set of test points, and we'll use that as

Â a proxy for these eventual goals. So, our job is not to say, give me the

Â inputs, okay? U and i, and rui and tui, I'm going to

Â find a way to make recommendations and give you an output r hat, okay?

Â I will train this black box by looking at known r and known, and, and the resulting

Â t hats. How do I train them?

Â By minimizing the RMSE. Then, I say, alright, good, I'm done with

Â training this black box. Now, throw at me some questions where I

Â don't know the ground truth r. Maybe you, as the examiner, knows the

Â ground truth r. Well, good, then keep it to yourself and

Â I'm going to make a guess r hat. Then, you check your ground truths, see

Â how good I am. That is the spirit of Netflix price.

Â It was announced in October 2006. And, the challenge is to say, can you make

Â a really good recommendation system? I'll give you data to train your

Â recommender. Now, can you make it work really well,

Â better than what I currently have, with I meaning, Netflix.

Â