So in general we are, looking at this tradeoff through a picture called the
rate distortion curve. so I plot, the rate on the axis, that is
the playback rate for example or the encoding rate of the video, and D, is the
distortion on the Y axis. This again can be measured by a one norm,
L two norm, or whatever you like. And in general, the trade-off between the
two is curve of this shape,
okay? So convex curve shape, convex shaped curve.
And if you can invent a better compression, you are basically,
geometrically speaking, pushing this convex shaped curve towards the R region.
In other words, for the same playback rate, bit rate, I can give you a much
lower distortion compared to before. Or, alternatively, for the same kind of
distortion that I require, instead of this much data, bit rate, I can take in a
much, lower bit rate requirement. Okay.
So this will represent an enhancement of the compression technology.
Now which bit rate should I pick then? Part of that depends on the distortion
that's tolerable. 'Kay.
Including, factors like your kind of screen that you're using.
Is it retina display or not? Part of that depends on the channel
condition. If the channel is in a bad condition for
congestion reason or air interface, interference reasons, then you may say,
wow, let's take a smaller bit rate. And a part of that can also, in today's
economic environment, depends on the usage quota.
The quota aware video and adaptation to see how much quota you still have or we
project you will have towards the end of the billing cycle,
and adjust bit rate accordingly. So in general it can be a combination of
these factors. And you may wonder what can I compress?
How can I actually compress effective 100 effective 1000?
Are you sure that, you know, only one% of.1% of the signal is actually needed?
Well first of all there are distortions, so don't forget about that.
And second, you'd be surprised to see amount of redundancy in signals, okay.
For example, for motion picture frame to frame similarities can be striking, okay.
Certainly for talk shows. But even for motion rich movies, because.
Human perception our newer answer will bring rely on the similarities between
one frame and the next frame to register motion.
That's how motions can be registered. So it's precisely because of the
redundancy. So when you transmit for example you can
say," Gee, I will just transmit this picture.
It's two guys fighting." And the next picture is still two guys fighting except
this guy's arm moves upward. So just focus on this part, the rest
remain the same. That's one way to compress.
Take advantage of redundancy. Another way is take advantage of human
visual limitations. Okay, even though there are differences
we may not be able to process that in the brain.
For example, there are a certain range of frequency in the signal that people tend
not to be able to detect very well, the differences.
So, examples like transform coding, you put it into the correct representation of
coordinates in say frequency domain, then you look at the components of the signal
and then you start ignoring the higher order ones.
The third way is the statistical structure, okay.
Certain things just happen a lot more often.
For example in text or speech encoding, people use Hoffman coding.
Where you give a shorter description length to more frequently occurring
syllables or phrases.'Kay? In this way, the average, the expected
length you need to encode a paragraph or a textbook will be smaller.
And indeed, in, say, Morse code. You will see that the more often used set
of letters in the alphabet are given a shorter representation.
So what are redundancy of human visual limitation or statistical structure in
the signal there are many places where you can compress.
And indeed people worked very hard over the past twenty years and more to
compress a motion pictures. So, N-pact is key family in compression
standards. For example impact one back in 1992, was
used for VCD. And we're talking about something like
one megabit per second bit rate for encoding and playback.
Then MPEG-2 which is also called H.262 because there's a United Nations ITU
International Telecommunications Union standardization body that names a bunch
of per, standards H.something. Okay.
So whatever name it is it was done in 1996 and DVD Which you know, lasted ten
plus years, pretty much is the dominant medium for storing TVs and videos
In this use, now about ten megabit per second, much better quality than VCD.
It is actually hard to find VCD'S these days.
And, of course, you must have heard of MP3, you must have listed to MP3 music.
Okay? These are audio tracks that recorded and
compressed using the standard. It actually is not a, a stand alone
standard, it is the layer three of Empact two.
Okay. Layer three here has nothing to do with
the protocol layers in the network community.
Okay. This is one module of MPEG two, motion
picture standardization where you compress the audio track.
Now you may wonder what about MPEG three. There's MPEG one, two, four.
well MP3 is not MPEG three. It's part of MPEG two.
There were actually no MPEG three. It started but then it was absorbed back
into MPEG two. And, and M-MP3, for example, can get a
compression ratio of twelve to one for music.
And to people who listen to it in a crowded spot or on the subway, probably
that's good enough. Okay.
But if you listen in a quiet space, then you can tell the difference between an
MP3 music versus, for example, DVD quality.
And then in 2000, it was impact four. And this is the current family of vida
compression standard that people use. For example in 2004, the part ten of
impact four, so people started to realize well lets name the different parts,
instead of naming it impact five, six, seven, eight, so on.
Now this part ten is also called h.2 sixty four.
This is at this point, sort of, major, Be the compression standard.
It's got sixteen so called profiles. It give you quite a bit of flexibility
for different types of video. it is used for like HDTV and Blue Ray.
Hd TV if you want a real HD TV it's more like twenty megabit per second.
For blue ray something like 40 megabit per second.
And this can get a factor of 100 easily, okay?
They tend to be on the order of one to 200 compression ratio.
And that's what made it possible to squeeze that many bits through the
internet. Now these are not the only ones.
There are quite a few others. For example H.261 was once upon a time
quite a popular one. Okay, for IP video.
Apple, has QuickTime that's merging into Impact four now, Windows has Windows
MediaPlayer, okay, Adobe has Flash, and Real Networks has Real Media Player.
So these are the main types of playback formats and standards, and there are as
you can see still proprietary ones and that's what makes a little bit difficult
for example a lot of Apple devices, they don't like Flash,
As they believe it consumes more, energy than needed and so on and so forth.
Now whatever, is the right side of the argument between Apple and Adobe you can
see that there are so many different, compression standards out there.
A lot of them, utilize the redundancy from one frame to another.
And this is what we called a grouperfect picture concept.
So we're going to encode all these frames, into blocks.
Each block is called a, a GOP, a group of pictures.
And a group of picture consists of three kinds of frames.
Three types of frames. Not three frames, three types of frames,
okay, called I, P and B frames. So I frame is the intra-coded frame, and
each GOP always starts with an I frame. This is the frame that whose encoding
does not depending on the frames before and after it.
For example, when you switch from two guys fighting.
To, you know, one guy laying on the ground.
All right. So.
You say, all right, this, I'm going to start an independent frame here.'Kay.
And, there will be also something called a P frame.
Okay, P is called a predictive coded, and this is type frame that depends on
the previous I or P frame. So, it would depend on the previous.