Hello, and welcome back. It's time now to present examples of
sparse modeling in image processing. Before we do that, we have to do one more
thing. Now, remember what we have.
We have a dictionary of K elements, K atoms, and every atom has N dimensions.
Now, what are basically the, basically the signals that we are going to process?
And that's what we need to discuss just for one minute.
And the basic idea is very simple. We're going to be talking about images.
This is going to be our data, the noisy image.
The whole image. This is what we are looking for.
Now, we are not going to work on entire images,
we are going to work on patches. And this is the r.
The r is just a basic, a simple binary matrix that extracts the patch around ij.
So, if we have an image, for every pixel, we construct a patch around it.
Or, we could actually consider ij the upper left corner.
Anything that is good as a coordinate system, and we extract patches.
For example, eight by eight patches as we have been talking.
Now, it's very important that we do not take blocks as we did on JPEG, in JPEG.
We actually take all the patches that are fully overlapping.
So, we have an image, and we take a patch, and we take the
patch next to it and the patch next to it.
So, for every pixel, let's say the upper left corner,
we construct the eight by eight patch. So, this is a binary matrix that extracts
that patch. And it's the patch that we want to have a sparse code for it with a
learned dictionary. So here is the sparsity.
So, we're not working on the whole image at once, we're working on all the
overlapping patches of the image. And, what we have is as we discussed
before, the model is for the individual patches.
We're going to construct a dictionary that works, for example, on all the
overlapping patches of a given image at the same time.
And note here, we are doing basically the optimization over the signal, this is the
whole image. We are looking for the sparse code,
this is the second part. And as we're going to see in a second,
we're also going to learn the dictionary. We could learn the dictionary offline or
we could learn it and adopt it for the image as we're going to explain.
Once again, this is just nothing than a binary measure that extracts the patch.
So think like you have the image, you pick a patch,
you do sparse coding. You pick the next patch,
you do sparse coding. Now, what about then the dictionary?
The first option is to train on a large database.
So, you do that offline you train on a large database.
The second option, and these are not contradictory as we're going to see in a
second, is to use the image itself. You travel all around the patches of the
image, that gives you a lot of training data already.
And the basic idea is that normally we combine both of them.
That's one possibility. So, you learn a dictionary with a bunch
of images off the shelf offline, and that you use as your initialization of the
dictionary for K-SVD, and then you get the new image.
Even though it's noisy, you adapt the dictionary to the image.
So, you run a few iterations of this dictionary adaptation, K-SVD, sparse
coding, dictionary adaptation, sparse coding for the particular image.
That's one option. The other is to forget completely about
this and start with any visualization and only run it here.
So, there is many options and, depending on the application, which one you can
pick. Now, if you are in a rush and you don't
have computational time to do the de-noising, you don't have to update the
dictionary. Basically, you train it with a database.
Let's say, that you want to do faces, you train it with a database of faces and you
use that dictionary. If you have computational time to do
that, it's always better to basically adapt the dictionary to the particular
image. And we have tons of data for that because
we can use all the patches for doing that.
So, once again, this is our general formulation but now we have D either
because we are starting from scratch for the particular image or because we are
adopting the pre-learned dictionary to the current image.
We have to, this is our data. And we have to, basically reconstruct the
image, we have to compute the code, we have to compute the common dictionary,
one dictionary for all the patches in the image.
We are not allowed to use a dictionary per patch,
that would be basically ridiculous. So, we have three things to optimize for
and the basic idea is very simple. You fix two, you optimize for the third
one. For example, you fix,
you ignore x and basically you assume that x is fixed.
So you ignore this term, and you fix D, and you do sparse coding.
That counts the first part of K-SVD. Then, you fix the code,
and you update the dictionary. And this is basically an iteration that's
basically K-SVD. So, you have the image, you go over the
patches. Every patch becomes the columns of the
large x matrix that we saw in the previous video.
And then you do K-SVD for all of them, okay?
So you are encoding with sparse coding every patch.
When you're finished, you update the dictionary, you encode again, you update
the dictionary. When you are done with the dictionary,
all that's left is to compute x. So, you basically don't touch the code
anymore, don't touch the dictionary anymore.
And you have to basically solve for this problem when basically D and alpha are
already fixed. Of course, this is gone.
x only shows up here and shows up here, and is very easy to show that the result
is basically the weighted average of the patches.
So what happened is that every pixel is touched by multiple patches.
So let me just draw that here, for example.
We have an image, and we have a pixel.
Now, there are multiple patches that touch that pixel.
For example, this one that has the picture kind of the, pixel, kind of the,
on the bottom left corner. There is also this one.
basically there is also let me just use the different color, there is for
example, a patch here. There is how many?
If we have eight by eight, we have 64 patches that are touching that pixel,
unless the pixel is at the border of the image.
Let's ignore those for a second. And basically, for everyone of those
patches, we have sparse coding with the dictionary that we have learned from all
the patches in the image. So, that sparse coding, basically for
every single one of these patches is that we can reconstruct it as the alpha with
the optimal alpha that we have compute for every single of the patches ij.
Every single one has been reconstructed. How do we reconstruct the pixel?
We basically average all those patches. You can add weights to the average.
What's basically what the theory tells you.
If it's, the pixel is in the center of the patch, then that patch is more
important than if the pixel is in the corner of the patch, but that's just a
technicality. You can basically just do the
reconstruction of every patch with the sparse code and the dictionary that we
have learned and just average those 64 if you're using an eight by eight block.
So now, this is all what you do. Every patch sparse coding and K-SVD to
learn the dictionary. So, I think we are ready now to see
examples. Let's start from a image in gray values,
no color. This is the source image, and this is a
noisy image. We have add, we have added Gaussian noise
with these variants. Now, we can initialize the dictionary
with random patches from this image. We don't have this image anymore.
This is just for reference. This is all what we have.
We can initialize with patches from here. We can initialize with a dictionary that
we have learned offline, or we can initialize with something that
we know is good, Discrete Cosine Transform Dictionary.
That's the one used in JPEG. So we know it's reasonable a good
algorithm. So a, a good dictionary.
In this case, we have initialized with that.
And basically, we update using all the patches here all the possible patches.
We learn the dictionary, then we encode every part with the
learned dictionary using sparse coding. Then, we average all the patches that
touch a given pixel. And voila, this is the dictionary that we
have learned. Very interesting how it has changed from
the [UNKNOWN] dictionary. And here is the restored image where we
basically have reduced the noise by two things, one projecting every patch to the
space defined by these dictionary using sparse coding, and then averaging the
purges. And here you have a result.
This was the original, this is what's available to us, and this is the de-noise
data. Very important that the adaptation of the
dictionary is down with the noisy data and still the results are really, really
good. And I want you, once again, to observe
the type of dictionary that we have learned from the image.
For example, it has learned that texture that we have here, the different types of
texture, are learned in the dictionary. It was basically adapted to the image.
Now, how do we deal with color? There are numerous ways of dealing with
color, but basically one of the simplest ways is to, instead of doing eight by
eight patches, we do eight by eight by three.
We could have done every color independently, as we have done for
example, for in painting or even for compression.
But we can also take eight by eight by three patches.
You know, we concatinate all the colors and we do K-SVD sparse modeling in
exactly the same fashion. Let's see some results. So, again, we
always show the original just for reference because you never have it.
What you have is this image, this is noisy image.
A noisy version of this. And then, you run K-SVD on this eight by
eight by three or five by five by three, depending on the patch size that you
decide to use, and you get a very nice result of image
denoising. And here are just the, basically, the
standard measurements of, of p, K's and R in DB's to measure just the quality of
the result. Here is another example.
We start from again the reference. This is only for reference, the original
image. We have noises.
Look how bad it is. This is really bad.
We have added a lot of noise and we get a beautiful reconstruction from this sparse
modelling technique. Really, really nice reconstruction with a
lot of details and a relatively high quantitative measurement.
Although, what's most important is the qualitative quality of the
reconstruction. So, we went from here to here.
And once again, we learn the dictionary. In this case, we initialize a dictionary
with an off the shelf pre-learned dictionary, and then we adapt it to this
noisy image for a few iterations of K-SVD.
Now, we can also do inpainting with this.
Inpainting is a particular example of noise where the noise is kind of so
strong that it makes some pixels disappear. And, there's many ways of
defining the types of challenges of inpainting.
One of them is to take the image and to just drop 80% of the pixels.
This is related to compressed sensing, and we're going to talk about that at the
end of this week in just one video. we basically drop 80% of the pixels.
You say, you're gone. So that's basically represented here with
dark values zero. So it's like, the noise is so strong that
you, basically there's nothing there. And then you run basically sparse
modelling and K-SVD starting with this image, this is the missing part.
We start from this image and here is the reconstruction.
Once again, a very, very interesting result.
We only had 20% of the pixels and we managed to do a nice reconstruction.
Again, we go from here to here. This is just for reference, this image.
Let's just see that in color. Again, 80% missing and here is the
reconstruction. Once again, I think a very, very nice
result. Again, for this example, we started with
a dictionary that was learned from a database of natural images, then we adapt
it to this image for a few iterations. And then, we do this by reconstructing
every patch and then averaging for every pixel all the patches that touch it.
Again, I think a very impressive result showing the power of this technique.
This is another example, instead of dropping the pixels, we do the same that
we did when we were talking about inpainting in general.
We write on top of the image, and then we do the reconstruction.
Again, very good results with nothing else than doing sparse modeling, learning
a dictionary from this noise image and being able to reconstruct this image.
Now, we can also do video inpainting. We can drop 80% of the pixels.
And now, again, we have a number of options.
We can do every frame by itself, or we can do patches in kind of three
dimensions, X, Y and T.
We discussed about that when we were talking about video inpainting in the
previous week. Basically, we take patches, let's say, we
take three frames together, five frames together so the patches have also some
temporal information. After we have decided what type of
patches to use, we run sparse modeling with absolutely the same technique that
we have been describing. And I'm going to run a few of them.
The original is as always, only for reference.
It actually is not available. We always go from here to here doing the
dictionary learning. Let me run a few frames.
This is one frame, this is another frame. Look always at the quality of the result,
the reconstruction. Another one.
Another one. So once again, we basically walk through
the different frames going from this to this.
And you can always look at the reconstruction is very, very close to the
original that is not available to us. Just one more frame here.
Now, I want to do an, yet another frame. I want to finish with demosaicing.
This would be the last example. It's kind of a particular case of image
inpainting as we have seen before. Let me explain what's the problem here.
Most digital cameras do not collect full red, green, and blue.
They basically do a trick which is, one pixel, they collect ren, red.
The next, they collect green, the next, red, the next green, and so on.
For the next role, they collect green, blue, green, blue, green, blue, and then
they repeat that. So, at every pixel, they have only one
color. Red, or green,
or blue. And what you have to do is you have to
interpret. In order to show me the image, you have
to, at every pixel, create a red degree and a blue.
This is kind of an interpretation or an inpainting of the colors.
But, we can treat this as nothing esle than inpainting of the colors where
basically we have just, instead of dropping the entire pixel, we drop two of
the colors of the pixel per pixel. Sometimes we draw up the red and the
green. Sometimes we draw up the green and the
blue. Sometimes we draw up the red and the
blue. So imagine that you have a queue which is
n by n by three with sometimes is, you have it available and sometimes you
don't. Now, we have seen that sparse modeling is
extremely successfully even when the entire pixel is gone.
Even when 80% of the complete pixels are gone.
So, let's run it on this. And this is called demosaicing.
It has to be done basically in all consumer digital cameras.
Very expensive digital cameras actually collect the full red, the full green, and
the full blue. But, normal consumer cameras do this type of mosaic and then
the production of the full color image is called demosaicing. And here you see
examples of color images that have been reconstructed from patterns like this
using sparse modeling. It's a particular case of basically
inpainting in the color domain. It's even easier than that, because
instead of dropping entire 80% of the pixels, you just drop some of the colors
per pixel, and the results are really, really nice results.
We have, basically, some zooming, and you can see that the results are very, very
sharp. And these are really some of the best
results produced with, for demosaicing. They're obtained with sparse modeling
techniques. So, these are just some of the examples
of sparse modeling. Sparse modeling in image processing is
being used a lot these days, also for image classification and other image
processing challenges I have shown you for denoising, inpainting, demosaicing.
And I think that this basically illustrates the importance of the theory
that has a lot of mathematical components.
And basically also very, very cool applications.
We have a couple of additional things to learn this week about sparse modeling and
also about compressed sensing, and the connections between the two, and I am
going to do that in the forth coming videos.
Thank you very much, see you later.