So, by understanding the underline statistics, we can may be better process,
this averaging. The third one is of signal processing of
various kinds. We can say that, maybe, we will look at
the timing of these reviews. We'll look at how recent is it.
Is it a bad processing phenomenon, like what, what we briefly mentioned, in the
last lecture on Netflix. And we will later look at a specific
signal processing matter and bejing estimation and maybe voting.
Maybe we should ask them instead of just aggregating into a scaler like this.
We should say, why don't you, each of you rank?
Each person rank all the products. For example, all the LCD, HD TV out there.
And then I'll aggregate the list from Alice, from Bob, from Chris, and so on.
And then generate a ranking order in that way.
Instead of condensing a vector review per product into a scaler and then rank the
scaler, I ask each individual to give me a complete ranking of the competing
products. And then I'll aggregate these rank order
lists. This would be a very different approach.
It's impractical for Amazon. Because very few people will be in the
position or in the mood to do a company ranking according to some scale of all the
competing products. But in other scenarios, such as voting,
like what we'll encounter in the next lecture, this will be useful.
So before proceeding further, let's take a look at a couple click examples.
One example, this is from a certain 2011 day, we picked, two TVs.
It was HD TVs. One from Phillips, one from Panasonic,
okay? This one from Philllips got four star out
of 121 review. This is a simple arithmetic mean.
Simple averaging. The Panasonic one got almost 4.5 stars,
but out of only 54, reviews. So between four star, 121 review, versus
4-1/2 star out of 54 reviews, which one would you buy?
How much do you trust this four or 4-1/2 star depending on the number of reviews?
How do we quantify that into a, And more generally, if we could generate a whole
histogram of one two three four five stars.
And look at a bar chart. Would the spread of the stars also make a
difference? The average might be the same.
But this would look very different from no two, three, four star, but only one in
five stars. On the one hand, this kind of, histogram
will alert you, because it's a very bipolar opinion on the product or service.
On the other hand, you may say, you know what?
I'm among those who hold this belief. And therefore, this suits me very well.
A second example, this is again in 2011, particular month, looking at an ipod touch
review, one chart records the most helpful reviews.
So we look at the review of reviews, number of people who found that useful,
the percent of people who found that useful and we, only because those, they
are believed to be very useful reviews and record their associated numerical ratings.
On the other hand, we a have a chart that just records most recent ones.
You can see a clear difference between the two.
Now maybe there is a generational upgrade past a certain point in time.
This is time and therefore we won't like detect when that happened by looking at
say, the average scores. And more generally speaking we may say
that a rating, or ranking based on rating, is not a static object.
We have to look at the entire time series. You can take full course on time series
analysis, for example. Now let's say there are three different
behaviors in this cartoon A, B, C, where the time axis is on the time scale of say
a, month. If this time scale's minutes, actually, it
doesn't quite matter what's the fluctuation.
But let's say this is on the scale of month and this is the review that you
receive aggregated over each week for example.
This pattern is clearly cyclic and you may say, gee this product, you know, is not
receiving a stabilized review. This one seems to be stabilizing around a
pretty good number, a little above four. And this one, it's hard to tell, hasn't
stabilized yet. Can I trust the average over a certain
moving window of a certain duration, say the past most recent three months, or not?
It's actually hard to tell. So now we have raised so many questions.
Starting from, can we trust simple averaging, to all kinds of, statistical,
signal processing, natural language processing, kind of questions on how to
enhance our ability to understand. Turning a vector into a scalar and then
ranking base on that. Now we don't have as many answers
unfortunately, most of the questions we just raised do not have very stable
answers in the context of Amazon, that's why it is still an art to decide when
should you trust the average rating or when can you trust the ranking based on
average rating provided by Amazon. We'll come back to this example very soon.
So what we're going to do now, is to look at two cases, where we do have a pretty
good answer. One is averaging a crowd, and looking at
the wisdom of the crowd. The other is to look at evasion estimation
and adjusting the ranking based in part on the number of ratings.