0:00
Welcome to Lecture four of Networks, Friends, Money and Bytes.
And today's topic is Netflix. Let me first warn you though, that
throughout the course of twenty plus one lectures, if you plot the length of the
video lecture, okay? Against these twenty lectures, they go up
and down. The average might be 75 minutes across all
the modules, okay? So, they may go up and down if I
interpolate. Now, this Lecture four is actually, going
to be by far, the longest lecture. You will see that we have quite a few
modules and each module is kind of long because we have to introduce both the
concepts of collaborative tutoring, okay? Individualized recommendation by scaling
out the system, the network as well as the mathematical language of more optimization
theory. Again, if you plot the level of
mathematical difficulty involved, or the amount of symbolic operation involved
against these twenty lectures. So, they go up and down quite a bit
actually, you know? I'll say that, this lecture is perhaps, by
far the most mathematical one. You, if you can pass lecture, you probably
can face any of the remaining mathematics throughout the rest of the course, okay?
So, to some degree we are climbing up a hill today, and this hill is both
conceptually literally involved, and mathematically would require some hard
work as climbing up a real hill would demand.
But, if you can climb through this hill, then the rest of the course should be
actually much easier, especially on some of the lectures coming up.
But, having said that, let's first review what we had in the past three lectures.
We saw three beautiful equations, and they are very useful equations too.
The first one was in Lecture one, DPC, Distributed Power Control used in 3G CDMA
data networks. And it says that, for each transmitter of
logical pair, i, the transmit power, P, at the next discreet time slot, t + one
should be whatever it is right now, P sub i of t, times a gang parameter and that
parameter is the ratio between the target SIR gamma, and the currently achieved SIR
on this length. Now, this does not change over time as far
as this algorithm is concerned, it is a constant.
And this varies over time as different people change their transmit power level.
And we saw that this is essentially, an implicit feedback signal telling you
whether you should increase or decrease your power.
And the second equation that is beautiful and useful we saw in Lecture two can be
summarized as following, that it is the payoff function for the i-th bidder in an
auction, U sub i is a function of all the bidders' bids, just like the SIR is a
function to all the transmit powers, the vector b.
And, it is the difference between your private independent evaluation, V, and the
price you have to pay, which depends on all the bids.
And this function, the price, which is not the power now, the price, as a function of
all the bids. The shape of this function is determined
by the designer of the auction. And for different kind of function P of b,
you will induce different kind of behavior by the bidders.
They will bid different, bi, stop the optimal according to some metric, b, for
example, whatever maximizes the utility. Now, of course, this is assuming you
actually win the allocation. If you do not get item then utility is
zero. This definition of payoff function, which
also highlights the fact that you can design the auction and then that will
induce different bidding behavior, is the second equation.
The third one, which we touch upon last lecture, can be written as the following.
Simply the vector pi star transpose equals pi star transpose times G.
This is the Google matrix that we defined last time, and it is defined in such way
that the corresponding eigenvector for responding to the largest eigenvalue of
one can be uniquely defined and efficiently computed, okay?
So, this defines that Google PageRank score, which further leads to the rank
ordering of the web pages. Now, if you count how many times DPC is
used in the mobile world. If you count how many times auctions used
in many contexts including the online space, if you count how many times the
Google PageRank is used every time you search on Google, you can tell these three
are powerful equations. They are simple but they are very, very
widely used. So, now, we're going to continue this
track of four lectures. We've talked about power control,
distributed protocol. We talked about second price option, we
talked about PageRank. And now, collaborator filtering for
recommendation, like on Netflix. And these four lectures also introduce the
language of optimization, then game, then graph, and our learning theories.
Now, the backdrop is Netflix. To those of you who are in the US, you
know Netflix very well, started it's DVD rental business a while back, actually, in
1997. So, the story says that, the founder of
Netflix was tired of paying fees to the DVD rental stores.
If you remember those days, there were a lot of those on the street corners.
And if you don't return the DVD in time, or the VCD in time, or the tape in time,
then you pay extra fees. He says, well, that's not very appealing
to me. So instead, he says, you can keep the DVDs
as for as long as you want, okay? As long as you keep paying the monthly
fee, then that's actually great thing for the company.
You don't return, I don't send you new ones, okay?
So, you can keep paying the monthly fee without getting any new DVD.
This idea is going to show up again in a completely different context of sliding
window control in TCP, congestion control. That would be in Lecture fourteen.
But, is effectively the same concept. You don't return, I don't give you more.
And by 2008, in the US, there were about nine million users of Netflix DVD rental.
And then, that year or so, Netflix started trying out something quite new of using
the internet to deliver feed. Internet as the medium instead of the
postal service to deliver video. So, the content is stored in some video
servers, is sent through the internet and may be wireless networks to your internet
connected devices, which could be TVs or set-up boxes, but also could be game
consoles, smart phones, tablets, PCs and so on.
And by 2011 spring, it's got 23 million customers.
8:22
And, in summer of 2011, it was counted that about one in every four, a quarter of
the bits going through the internet that month was Netflix traffic.
That was huge, okay? Now, of course, video is a, a very big
files and therefore, if you count by volume, you're going to get big numbers of
percentage. But still, one in four, that's a lot.
And then, in September 2011, you may remember that Netflix tried to split into
different, two different businesses separating DVD rental, online streaming
and then they later put them back together but the billing still what became
separated. Now, later what we will look at the
technology network involved for multimedia streaming over the internet in Lecture, I
think, seventeen. We look at Netflix, and Amazon, Prime,
Google, HBO, Go, IPTV and so on, look at the differences across the models.
Today, this Lecture four, the focus is, however, on the social network dimension
of recommendation system, okay? You and I, the customers on Netflix, also
form a network. These 23 million and, bigger number now,
they form a social network. They don't directly interact, but the way
that they watch and rate movies will be used and leveraged by Netflix to make
better individualized recommendation. Think about it this way, you're on open
online platform, okay? Whatever that platform might be.
And there are, maybe, 100,000 of you. By scaling out, scaling up a social
network, we actually can then scale down the individualized recommendation because
we get to see other like minded people like you how they behave, what they like
and don't like. And we have much more data.
So, even if you don't provide a huge amount of activity on the platform, we get
to see others. And this is the basic idea of
10:39
Collaborative Filtering, CF. Now, there are many kinds of a
recommendation system. For example, on Amazon, you may notice
that each time you refresh the homepage after you log in, they will recommend
different kind of products to you. That's largely based on what's called
content based filtering. Basically, if you purchase certain kind of
products before, like certain kind of electronics, then they will recommend
similar electronics to you in the future. If you buy a book written by Steven
Hawking, then they will try to recommend new books by Steven Hawking.
That is called content based filtering, okay?
Look at what you bought, what you'd liked, and then recommend accordingly.
Youtube, we will see later in a few lecture's time the way they make
recommendations is largely driven by the so-called co-visitation counts.
Briefly speaking, it means that it will look at how many users watch these two
videos back to back, okay? And then if so, then we put a score to the
co-visitation count between these two videos, say A and B.
And then, this will give you a way to start quantitatively decide, what video
should you recommend. We'll come back to this, later.
In Pandora, if you have used that, it is a mostly expert-driven recommendation of
mix-up music. But, as a consumer you get to thumb up or
thumb down Just like in some discussion forums, you get to vote up or down.
Now, Netflix does not want to use expert-driven movie review, even though
there's no shortage of movie reviewers, professional ones out there.
It instead, uses a collaborated filtering. The idea is that not only we'll look at
what you like, but we also look at what similar people similar to you liked and
watched. The question is, how do we define
similarity between people, or flipped the other way around between movies.
Before going further, let's first more rigorously define this recommendation
system. A recommendation system is very helpful
feature, okay? For stickiness of the consumers for
inventory control and so on and so forth. Now, in the case of Netflix, you can think
of this as a, say, a black box. There are inputs, outputs.
What are the inputs? First of all, the user ID, okay?
That's your log in. We're going to index user by u.
Second, movie ID, we're going to index each movie by i.
And then, there will be of course, a rating which is really a number drawn from
one, two, three, four, five stars, and we call this r sub ui, okay?
R for rating, of user u or movie i, And, the time that this happened, this review
was entered, it's tui. Of course, there's also the review of
text, okay? And, we're now going to talk much about
text review in today's lecture. So, we've got u and i, and rui, and tui.
How many rui's and corresponding tui's are we talking about here?
Actually, a lot. Think of millions of users and tens of
thousands of movie titles. Of course, only a few percent of the users
actually watch all the movies out there. And, or any substantial amount of movies
out there. And, among all the movies that you
actually watched, you likely will only rate a few of them.
Some people like me, actually, rate none of them, I just don't like rating movies.
But, a lot of people rate. But, they don't necessarily rate every
single movie that they watched. Despite that, we're talking about the
order of billings of rui's for Netflix. Now, it's a much challenging problem,
actually, when you just have very few rating in your database, the co-star
problem. But, Neflix doesn't have that problem now.
Alright, those are the inputs. What does the recommendation system do?
What is the output there? The output, primarily of course, is the
predicted rating, lets put a r hat ui, okay?
Now, in the case of Netflix price, they actually know the true rui.
They just don't tell you, the competitor into the price, competition.
In that case, you actually know the ground truth, then you can compare your
prediction r hat ui and the actual rui. But, of course, in the real operation of
Netflix, they don't know the ground truth that's why they want to recommend you to
watch their movie, okay? So, you are going to just have to believe
that if it works well in the test set where you know the ground truth as if it's
going to work well, and other part of the system too, okay?
So, this r hat however, could be a fractional number, it could be a real
number, for example, 4.2 star. You cannot rate 4.2 star but the
prediction might say 4.2. So, you can interpret that as maybe 80
percent of the chance this user u will rated this movie i with five star, and
maybe twenty percent of chance he'll rate it as four star.
Maybe that's an interpretation of number 4.2 This r hat to be also be going above
five or go under one, but then we have to clip it so that if it is above five, it's
five, if it's under one, we just call that one.
All right. How do we compare two recommendation
systems? What is the metric of [inaudible] that
we're going to use to do the comparison to say, look, this recommendation system is
better than the other one. What do you mean by better?
Well, there are three levels of understanding here, going from most
relevant but least tractable to most tractable but, perhaps, least irrelevant
among the three. The first level is there, of course,
eventually, what I care about is Netflix is customer satisfaction.
If I recommend you to watch some movie, do you lack that recommendation in the end?
Okay. That actually is just very hard to gather
data. The second level is, all right, at least I
want to know what is the prediction effectiveness, okay?
If I recommend some movie, do you even watch it or not?
Let alone you like it or not? Again, that's hard to collect.
So, we're going to look at a proxy. Just like in Google PageRank.
Eventually, what really matters is, does the search inquirer actually find the
resulting search listing useful or not? But, that ultimate test is very hard to
quantify and collect data about. So, they use the this, PageRank to say,
well, I'm going to say, this is the importance of nodes.
Here, we are going to just say, let's quantify the notion of prediction error
here by looking at r hat ui and rui, okay? Again, how do you know rui?
In real life, you don't. But, when you compare different
recommendation systems, you can hold some known data as ground truth and then use
that as a test data set, okay? Well, of course, I would look at the
difference between the two. Maybe this over shoots, maybe this is
under shoots. So, it can't just be the difference.
It could be the absolute value, okay? It could also be the L2 norm, the square
of the difference, okay? So, if I overshoot by one star or
undershoot by one star after squaring the all positive numbers, meaning, positive
error, okay? Term.
So, I'm going to square them. And, how many do I have?
Let's say, I've got a C of, of those ui pairs.
19:29
So, I'll have to sum up across those ui pairs, and the C of them, this squared
difference. But, since you squared the things, you
actually changed the scale to bring the scale back down, people often put a square
root. And that is so-called Root Mean Squared
Error, RMSE. This effectively, so-called L2 norm of a
vector. The vector being the error vector, the
difference between r hat and r across all these ui pairs.
You're going to take a look at the each entry of this vector square it, then
average it and then take the square root. Of course, when you minimize an error,
whether you minimize the square root of the error or without a square root the
solution won't change, okay? So, min square error or root min square
error, either one, can be used equivalent as the objective function that you want to
minimize. All right.
So, now, we're looking at this RMSE or MSC as the quantitative metric describing how
wrong your recommendations are across this set of test points, and we'll use that as
a proxy for these eventual goals. So, our job is not to say, give me the
inputs, okay? U and i, and rui and tui, I'm going to
find a way to make recommendations and give you an output r hat, okay?
I will train this black box by looking at known r and known, and, and the resulting
t hats. How do I train them?
By minimizing the RMSE. Then, I say, alright, good, I'm done with
training this black box. Now, throw at me some questions where I
don't know the ground truth r. Maybe you, as the examiner, knows the
ground truth r. Well, good, then keep it to yourself and
I'm going to make a guess r hat. Then, you check your ground truths, see
how good I am. That is the spirit of Netflix price.
It was announced in October 2006. And, the challenge is to say, can you make
a really good recommendation system? I'll give you data to train your
recommender. Now, can you make it work really well,
better than what I currently have, with I meaning, Netflix.