[MUSIC]
In this video, we'll see a couple of examples of how
Bayesian optimization can be applied to real world problems.
So the first one is hyperparameter tuning.
You usually train your neural networks and you have to retrain them
many times by finding optimal number of layers, the layer sizes,
whether to use dropout or not, to use batch normalization, and
which nonlinearity should you use, the RELU, SELU, and so on.
Also you have training parameters like learning rate, momentum, or
maybe it can change the different optimizers.
For example, and SGD.
So what you could do is you could use the Bayesian optimization to find
the best values of all of those parameters for you automatically.
It usually finds better optima than when you tune those by hand, and
also it allows for honest comparison with other methods when you do research.
For example, you came up with a brilliant method and
you spend a lot of time tuning the parameters for it, and
in your paper you want to compare your model with some other models.
And it is really tempting to not to spend much time tuning the parameters for
the other model.
However, what you could do is you could run the automatic
hyperparameter tuning to find the best variables for
those parameters for the model that you are comparing with, and
in this case the comparison would be more honest.
So the problem here actually is that we have a mixture of discrete and
continuous variables.
For example, we have the learning rate that is continuous, and we have
the parameter whether to use drop out or not, which is actually a binary decision.
So how can we mix the continuous and discrete variables for Gaussian process?
Well, the simple trick is like this.
You treat discrete variables as continuous when you fitting process.
So for example, when you use drop out, this would be the value one and
when you don't use it, it would be a value of zero.
And then when you try to maximize the acquisition function, you optimize
it by [INAUDIBLE] forcing whole possible values of discrete variables.
So for example, you will find the maximum of the acquisition
function without drop out, we'll find it with drop out, and
then select the one case that is better for you.
One special case is when all variables are discrete.
Those are called multi-armed bandits, and
they are widely used in information retrieval tasks.
For example, when you're building a search engine result page,
you can select a lot of hyperparameters that are discrete, and
for this case, the Bayesian optimization is really useful.
Another application is drug discovery.
We have some molecules that can probably be the drugs for some severe diseases.
In this case, I have the molecule of and we can represent it using string.
This string is called SMILES and
it can be constructed from the molecule very simply.
What you can do then is you can build an autoencoder that will try
to take the SMILES as an input and reproduce itself as an output.
You can use a variational autoencoder that we talked about in week
five to make the latent space dense, that is you can move along the space and
for each point, you will be able to reconstruct some valid molecule.
And now here's a trick, you know that some molecules are useful for
curing some diseases and some are not.
And so here I have a plot of a latent space, and in this latent space,
you want to find the position of the maximum,
that is the molecule that will be best for cures and diseases.
After you find the maximum in the latent space,
you simply plug it in to the decoder and reconstruct the molecule, and
then you can do some trials, for example in vitro or in viva.
And after this, you get the value of the point, the new value.
You add it to a model.
You reconstruct the Gaussian process and
find the new maximum of an acquisition function.
And just by [INAUDIBLE], you can quickly find new drugs for different diseases.