0:03

In this video, we'll see what are Gaussian processes.

Â But before we go on,

Â we should see what random processes are,

Â since Gaussian process is just a special case of a random process.

Â So, in a random process,

Â you have a new dimensional space,

Â R^d and for each point of the space,

Â you assign a random variable f(x).

Â So, those variables can have some correlation.

Â And using this, you'll have for example,

Â some smooth functions as samples or maybe non-smooth ones.

Â So, if you take f(x) at some point in this case for x=3,

Â you'll have one dimensional distribution that will look like,

Â for example this red curve.

Â You can also sample all the f(x) points through all of the space,

Â R^d, and in this case you'll have just a normal function

Â f. We can write down like this trajectory.

Â And so those functions after we sample the other invariables are called trajectories.

Â When the dimension of x=1,

Â we can say that it equals the time.

Â We're now ready to define the Gaussian process.

Â So, what I would like to say is that the joint distribution over

Â all points in R^d is Gaussian.

Â However, we didn't define the Gaussian for the infinite number of points.

Â We have multivaried Gaussian only for finite number of points,

Â and so what we can say is that for arbitrary number of points n,

Â it would take the endpoints x1 to xn,

Â their joint distribution will be normal.

Â And since we said that it happens for arbitrary number of points,

Â we can select n to be arbitrarily large.

Â And it would be something like,

Â the joint probability of all points approximately would be Gaussian.

Â So, when we take n points and take the joint and distribution with them,

Â it is called a finite-dimensional distribution.

Â Actually, using finite-dimensional distributions is useful for something.

Â For example, we cannot sample the whole function,

Â but we can sample it in like thousand points

Â and plot it by interpolating the space between them.

Â And actually, it is the case, it's what I used to draw this plot.

Â So, Gaussian process is franchised by the mean and the covariance matrix.

Â So, the mean would be the function that takes the random variable f(x) at each point

Â of the space and assigns the mean value of it to all points.

Â Also we have the covariance matrix that takes two points x1 and

Â x2 and returns the covariance between the line of variables f(x1) and f(x2).

Â And so it will be equal to sum of function K,

Â that depends simply on the positions of those two points.

Â We'll call this function kernel.

Â So, finally, it will have endpoints.

Â The joint distribution of them would be normal and with mean being

Â the vector where the components of it are the function m(x1),

Â m(x2) and so on, m(xn),

Â and the covariance matrix would have the function K

Â of different points and these elements.

Â For example, the first element would be K(x1,

Â x1) and so on.

Â We'll also need a notion of a stationary process.

Â The process is called stationary if it's

Â finite-dimensional distributions depend only on its relative positions of the points.

Â For example, here I have four points drawn as red,

Â and their joint distribution would be equal to the joint distribution of

Â the blue points since they're just obtained by shifting the red points to the right.

Â So let's play a game.

Â I have some samples from a Gaussian process,

Â and we should find out whether those are samples from the stationary process or not.

Â So, what do you think about this sample.

Â So, actually it is not stationary since here is seasonality.

Â If we take the points in the beginning of the period,

Â you will easily say that you are well,

Â in the beginning of that period.

Â And if we move it a bit to the right,

Â you'll say something like you're in the end of the period.

Â And so, the joint distribution would be different in different parts of the space.

Â And so it is not stationary.

Â What about this sample?

Â Well, again we have a trend here.

Â And so by computing the mean of for example,

Â some points, you can take 10 points,

Â compute their mean, and by using this mean you would be able to predict where,

Â in which part of the space you are.

Â And so this means that the process is not stationary. What about this one?

Â Well, seems like stationary.

Â So, for stationary process,

Â we shouldn't have trend.

Â This means that the mean variance should be constant over all space.

Â So m(x) as a function would be constant.

Â Also the kernel should depend only on the difference of the two points,

Â and this would mean that the joint distribution

Â depends only on the relative positions of the points.

Â So, we'll write it down as the K(x1 - x2).

Â Also using this citation,

Â it is really easy to compute the variance of the line of variable f(x).

Â The variance would actually be equal to the kernel at position zero,

Â since it would be equal to the covariance

Â between the line of variable f(x) with itself which actually equal to the variance.

Â Below I have the example of the kernel.

Â So, the covariance at 0 would be 1,

Â so the variance of f(x) is 1,

Â and as it goes further from the point,

Â the covariance becomes lower and lower.

Â There are many different kernels that you can use for training Gaussian process.

Â The most widely used one is called the radial basis function or RBF for short.

Â So, it equals to the sigma squared times the exponent of minus

Â the squared distance between the two points over 2l^2.

Â l is a parameter that is called a length scale,

Â and sigma squared just controls the variance at position zero,

Â which is a variance of f(x).

Â There are also many other kernels like a rational quadratic kernel or Brownian kernel.

Â They all have some different parameters and

Â will give you different samples from the process.

Â So, let's see the radial basis function a bit closer.

Â When the line scale parameter equals to 0.1,

Â the samples would look like this.

Â If we increase the length scale,

Â the samples would look a bit smoother and they'll change less rapidly.

Â And if we keep increasing the value of l,

Â we'll have almost constant functions.

Â