Due to availability of the optimal knowledge collection,

we do not face in our setting a potential overestimation problem.

Because now, we do not use the same data set to estimate two functions.

The second calculation of the optimal election is now analytical.

And here I would like to know that a potential for

overestimation, is a classical problem of Q-learning.

Where this is sometimes addressed using various numerical fix, such as for

example, double Q-learning.

You can read more on such methods in the additional reading for this week.

But in our framework,

such things are not needed because there are no logical explanations available.

This produces a numerical stable solution to the model.

So to summarize, we saw how to make one backwards step in a fitted method.

Going in this way all the way back to the current time,

t equals zero, we get the optimal option mention price.

But this time, in a completely data driven and model independent way.

Due to the fact that fitted Q iteration is a model fit and of policy algorithm.