[BLANK_AUDIO] Hello and welcome to week eight. This is the beginning of a three week coverage of the exciting topic of image and video compression. Image and video compression represents one of the major topics in image and video processing. Compression is one of the enabling technologies behind the multimedia revolution we are experiencing. Without successful algorithms and standards, we would not be able to rent movies on a DVD, stream them on our computer, use FaceTime, share images and videos over the Internet. And use so many other cool applications involving images and videos. We divide compression techniques into loss free, or lossless techniques, and lossy techniques. As the names imply, in the first case we have a reversible process. No information is lost and the reconstructed image is identical to the original one. In the second case, information is lost, which results in deteriorated quality of the reconstructed data. However, this deterioration is controlled by the person performing the encoding. And an image or video of acceptable quality results, depending on the resources and the requirements imposed by the application. The first interesting question one might ask is, why are images and video compressible? There are two primary reasons for that. One is that most images and videos have some structure in them or they are correlated. Or in other words, they have redundancies, statistically speaking. We can uncover this redundancy and then remove it so that compression is achieved. This is a step taken by both lossless and lossy compression schemes. The second reason pertains to loss of compression and has to do with the fact that there is information in images and videos that is not perceived by humans. And therefore, it can be discarded. So this week, we address lossless compression approaches. Such approaches result in algorithms or standards for encoding images and videos, as well as other types of data. Equally importantly, however, such source coding techniques, for example the Huffman and arithmetic coding, are part of any lossy compression scheme as we'll see in the next two weeks. We'll start in the first segment by introducing compression, discussing the need for it, and then listing the lossless compression techniques to be covered this week. As we'll see, such coding techniques have their roots in information theoretic concepts. Therefore, we review some of the basic concepts for information theory needed for this class. The heart of the source encoding problem is the representation of the symbols generated by a source by binary code words. An important question arising is, what's the smallest possible rate or the smallest average code word length of the code for representing the source? The answer, due to Shannon from 1948, is that the smallest possible rate is equal to the entropy of the source. A quantity that measures the uncertainty of the source. We'll discuss two important and widely used techniques for generating codes with performance close to entropy, the Huffman and the arithmetic codes. We'll also discuss a replication to the FAX in coding standards. We'll discuss also the so-called dictionary techniques for encoding data with repeating patterns, such as text. Finally, we'll discuss the concept of prediction. The correlation in the data allows us to predict the value of a pixel based on past values. Which then results in all this sending to the decoder the unpredictable part of the prediction system, which is the prediction error. The information theory concepts are rather mathematical. But we only need certain simple results, and more specifically, their application to the lossless compression problem. They represent very interesting, intriguing, useful and widely used results. So with that, let's start with this exciting material of week eight. Let us start this lecture by defining what is compression, signal compression, also define the source compression or source coding. It is the minimization of the number of bits in representing a digital signal, at least the image or video, the signals we're primarily interested in this class. While either being able to exactly reconstruct the original data, and this is referred to as lossless compression. Or maintaining an acceptable quality of the reconstructed original data, and this is referred to as lossy compression. In this class, we are going to cover both lossless and lossy compression. So this week, we'll talk about lossless compression, the following week about lossy compression of still images. And the third week about lossy compression of digital video. As it will become clear relatively soon, lossless compression stands by itself. But, it's also part of any lossy compression scheme. So, let's proceed putting together the pieces of this exciting topic of image and video compression. The natural question is, why do we need to compress signals? An answer can be found by looking at the raw data generated by various sources or the raw data utilized to represent various signals. So in this table, there are some representative signals. So for example, if we look at the telephony applications, the speech signal there occupies this bandwidth. We sample at twice, at least, the highest frequency in the signal, so we sample at eight kilo samples per second. We use 12 bits per sample, and the, row rate is 96 kilobits per second. Moving down to wideband speech, doing similar calculations, we are now at 224 kilobits per second. Looking at music, wideband audio, we've reached the rate of almost one and a half megabits per second. If we look at the medium-size still color image, 512 by 512 pixels, three channels, eight bits per channel, 24 bits per pixel. We are now at the rate of 6.3 megabits. And if we look at videos, here is the CCIR601 recommendation. So this is the special resolution 720 by 480 pixels, this is the luminous component. We use eight bits per pixel. With the NTSC system, we have 30 frames per second. And then for the two chrominance components, they are subsampled in the one dimension according to this recommendation. So each of the chrominance components has 360 by 480 pixels. Again, eight bits per pixel, 30 frames per second, and we end up with a rate of 166 megabits per second. If we go to a larger spatial resolution, HGTV, then here there's no subsampling, so there are three channels. This is the spatial resolution, 24 bits per pixel, 60 frames per second, and we are now at the gigabit per second rate. So, compression addresses primarily two applications. The storage application, how much space does it take to store a specific image or video. As well as transmission applications, how long does it take to transmit certain signal. So, if we look at a simple example here. If we were to store CCIR601 video that we just showed over here. And if we use a CD-ROM of 650 megabytes as the storage medium. Then, on one disc, we could only store 31 seconds of video. If we use a 40 gigabit DVD-5, then still we could only store four minutes of video. Therefore if you, if you were to store a whole movie, you can do the calculations based on how long the movie is. You need many, many of such CDs or DVDs. Similarly, if we look at the transmission broadcasting application. Then for HDTV terrestrial broadcasting, this is the transmission rate, almost 20 megabits per second. And therefore, we see that we need to compress the HDTV signal by a factor of about 70 in order to be able to fit the HDTV signal into the six megahertz channel. So, signals in their native domain generate tremendous amount of data. And without compression, we wouldn't, would not be able to store them efficiently or transmit them efficiently. One natural and important question is, why are signals compressible? There are two reasons for that. The first one is that there exists statistical redundancy or structure in the data in the spatial, temporal, and spectral domains. If we're given, for example, an image with no structure in it, such as an image of white noise, then such an image is not compressible. None of the existing algorithms would be able to reduce the number of bits required to represent such an image. This redundancy manifests itself in the form of correlations in the data. One expression of this correlation is in repeating patterns, such as in text data. Which we can capture, for example, by building short dictionaries able to describe large strings of data. Alternatively, we can use these correlations to perform predictions. So in the spatial domain, if we look at any image. We can predict the intensity value at this location by utilizing, for example, the intensity values of these four neighbors. We can build, for example, a linear model to do this prediction. In compression, we are interested in the unpredictable part, or the prediction error, when we perform this prediction. Because, the decoded can carry out the prediction as well, assuming that the prediction model is known. And therefore, only the correction term is needed that when added to the prediction will give rise to the reconstructed value. Similarly, when I look at two consecutive image frames, let's say frame t, and t minus one. Then, I can predict the values at frame t using the motion information from t minus one to t. And this is a first order temporal prediction model that is used extensively in video compression. This statistical redundancy or structure in the data is exploited by both lossless and lossy compression schemes. Now, for lossy compression schemes, we also have the existence of perceptually irrelevant information. This information we cannot see, or we cannot hear, when we deal with audio. Therefore, this information can be thrown away. Of course, we need sophisticated models and techniques to efficiently capture and remove such information. Because, otherwise, we run the risk to throw away perceptually relevant information, as well. Such sophisticated models do exist. They have been proposed in the literature. And they have also been incorporated in image and video compression standards. And they're very effective in capturing and removing such irrelevant information. One can generalize this in some sense. And talk about the existence of information which is extraneous to the intended use of the image and video, in cases that the human is not the recipient of the compressed data. By all accounts, compression is one of the enabling technologies behind the multimedia revolution we are all experiencing. Without the existence of successful image and video coding algorithms and standards, we would not have been able to store a movie on a DVD. Or stream a movie, or a video clip from a live concert, or from YouTube. Or use video conferencing over Skype, or use FaceTime, and so many other cool applications that have changed the way we live and work today. So it is certainly a topic worth the three weeks coverage we will provide in this class. There are number of reasons contributing to the proliferation of compression applications both lossy and lossless compression. One such element is the extraordinary theoretical results on compression, which has been derived in the last 50 or so years. We will see some of these results here in the first week of classes. Like for example the 65 year old, by now, source coding result attributed to Claude Shannon. A lot of advances have been made in being able to model and extract the perceptually irrelevant information in the signal, be it a video, or, an audio signal. There have been, very, successful compression standards, such as JPEG and the various MPEG and H.families of standards to talk about. It should be kept in mind that not all standards succeed, and that there are many application areas for which there are no standards. The processing speeds and general capabilities of DSP processors have increased tremendously. So, that it's now possible to implement, for example, a video coder, which is a rather demanding task as we'll see, on a handheld device. And coupled with this are the technological advances, in general, in computers, networks, and telecommunications. That make so many applications, like the few ones I mentioned in the previous slide, feasible. Let us proceed with lossless compression. As already mentioned, the objective in this is case is the exact reconstruction of the original data from the compressed data. In other words, lossless compression is a reversible process. We can move back and forth between uncompressed and compressed data without loss of information. An ever-present question when we deal with signal compression is, how much can a specific signal be compressed? The compression ratio quantifies this. It's defined as the ratio of the size of the input signal, over the size of the output signal. The greater this number, the more effective the compression scheme is. For lossless compression, this question is answered by the source coding theorem due to Shannon. As you'll see, it tells us that no matter how inventive we are we, cannot compress a signal below its entropy. This is a quantity we mentioned back in week six when we were covering non-quadratic restoration techniques. But, it's a quantity we'll define here more specifically in this, in this week. An important question, of course, remains how precisely can one measure the entropy of the signal? As probably expected, the compression ratios, in this case, are low and certainly lower than when compared with lossy compression. Lossless compression has a number of applications, such as the compression of text. You don't want the report you wrote for school or work to read differently after it's decompressed. It has also applications with medical images. Since now the specialist is not interested in the visual quality of the compressed image, in general, but instead in the diagnostic quality of the image. So, with loss, lossy compression we run the risk that due to the artifacts [UNKNOWN] by compression, an operation, for example, is ordered when one is not needed. Finally, as I already mentioned a couple of times and we'll see starting next week. A lossless code or decoder, it's called a codec, is part of any lossy compression scheme. Now, when we evaluate the compression scheme, certainly the coding efficiency is of the utmost importance. However, when we deal with actual implementations of codecs, then other factors such as the coding delay, how long does it take to encode the signal. As well as the coder complexity, which can be measured in terms of memory requirements, power requirements, operations per second, should also be taken into account. Lossless source coding techniques can be grouped into statistical methods, when the source statistic are known. The primary representatives are Huffman codes, and extended Huffman codes, that we will cover in this class. And you will see actually the application of Huffman into, the, encoding of FAX. The second group includes so-called universal methods. When the source statistics are not known but instead evaluate from the data. So I'll say a few word about arithmetic coding as well as dictionary techniques. And then we'll see the application of arithmetic coding when it comes to JBIG, one of the standards.