So, in a previous lesson,

we talked about how ideas from physics influenced some

of the modern financial models such as the Black-Scholes model.

We also spoke about problematic sides of

the Black-Scholes model such as ignorance of risk and options,

due to the insistence of continuous re-hedging.

Now, in a previous lesson,

and also in the previous course,

we made the point that this approach is problematic at

a conceptual level rather than just empirically or numerically.

Ignorance of risk of mishedging and pricing options

leads to the conclusion that options are redundant instruments.

If this were true,

neither market makers nor option speculators would exist in the marketplace.

Hence, options themselves would not exist either as a result.

Because there would be no one to trade them.

In 2003, Bates surveyed an empirical literature on option markets.

He concluded that the risk-neutral methods of option pricing that

extend the Black-Scholes model cannot fully capture and let alone,

explain the empirical properties of option prices.

He said the following,

"To blithely attitude divergences between

objective and risk-neutral probability measures to

the free "risk-premium" parameters within

an affine model is to abdicate one's responsibilities as a financial economist.

A renewed focus on

explicit financial intermediation of

the underlying risk by option pricing market makers is needed".

Now, if you're familiar with the financial theory,

you would understand what he means by

the objective and the risk-neutral probability measures.

But if you're not, an answer in essence,

risk-neutral valuation methods amounts to a statement

that all prices can be computed as discounted expectations of future cash flows.

Computer down there are very special probability measure q

called a pricing measure or a risk-neutral measure.

In this measure, all securities have

the same expected returns which are equal to a risk-free interest rate.

Dynamics under the physical measure

p and pricing measure q are assumed to be of the same form,

while parameters such as drift are

related via a free parameter called the market price of risk.

I will talk a bit more about the pricing measure acute below.

But here, I would like to talk about how it reminds me a famous quote of Richard Feynman.

He once said that the procedure of renormalization in quantum field theory

reminds him brushing garbage under a carpet instead of taking it out.

So, let me explain what he meant by that.

In quantum field theory,

if you follow rules of

an original model construction and try to compute some physical quantities,

you would naively get an infinite number.

This happens because you have to integrate over

some virtual processes of

particle creation and annihilation over variance small distances.

This produces divergences in integrals that express this durables.

This on its own means that the quantum field theory is only approximately correct

and should be replaced by distinct theory or something else at ultra short distances.

But quantum field theory instead uses a procedure called renormalization

that achieves finiteness by adding so-called counter terms,

which are assumed to have the same functional form as the original terms in the model.

They are then added with coefficients that ensure cancellations of divergences.

Now, this sounds very similar to a transition between

the physical measure p and pricing measure q in financial theory.

In both cases, inconsistency of the model with the data is explained

away by insistence that the unseen things

should be structurally the same as the things that we see.

In the field theory,

the unseen is the physics at ultra small scales.

In finance, the unseen is the pricing measure,

as it doesn't exist in any well-defined sense

beyond very restrictive assumptions of classical finance,

which we will discuss in our next video.

Okay. Now, let's move back from physics to finance.

More specifically, to option trading can risk of this business.

Now, there exists multiple origins of risk in options.

One of them is play of demand forces in the options market itself.

In a very interesting paper from 2007 called "Demand-Based Option Pricing", Garleanu,

Pedersen and Panageas looked at it from the modeling perspective.

They looked at the problem of option pricing from the point of view

of an option of market maker that is an option dealer.

Option prices then become functions of demand pressure in the market.

In other words, the prices should be such that dealers that maximize

their utility to supply

precisely the quantities of options that the end users of options demand.

Now, these authors found that the marginal increase in the demand pressure in an option,

increases its price by an amount proportional to the variance

of the unhedgeable part of the option.

This sounds very similar to the net effect of risk in

the model that we considered in the previous course.

This model that I called QLBS model as

a short name for Q-Learning for the Black-Scholes problem,

deals with a discrete-time hedging for an option.

As we discussed in this previous course,

rewinding back the continuous time limit of

the original Black-Scholes model is

the simplest possible way to remove the assumption of a perfect hedge.

Made in [inaudible] Black-Scholes model.

The net result of such unwinding is that,

the option price receives the risk-premium that to the first degree is proportional

to a sum of variances of the hedge portfolio across the hedge at times.

This is a similar effect to one obtained by Garleanu and co-authors.

Now, the topic of this lesson is to talk about other origins of risk,

in option pricing and trading,

as well as to talk about interconnections between the option and stock markets.

Let's start with a diagram,

that we discussed many times in this specialization.

Namely, a diagram that shows the interaction of an agent with the environment.

Reinforcement learning solves this task of sequential decision-making,

that has to optimize some gross expressed via cumulative reward function.

We call such tasks action tasks,

and such that they involve a perception tasks,

task as an intermediate step,

because optimizing for the goal now involves planning and forecasting into the future.

So, the agent absorbs states as T of the environment,

and then performs an action AT.

As a result, the agent gets a reward RT while

the system moves to a new state ST plus one.

Now, in a general setting goals of reinforcement learning probabilities,

of transitions to new states ST plus one may depend on the action AT taken by the agent.

This is called the feedback loop.

It means that, actions of the agent may impact the future,

but of course they do not have to.

Models were actions of an agent did not impact the future can

simply be viewed as a special case of models where such impact is present.

Now, we can view option trading,

using the same framework.

In option trading, the role of an agent is played

by an arbitrator robot or a human option trader.

The agent absorbs the market price ST and makes a trade or action AT.

If we consider an agent that has already sold or bought an option,

then by action AT we mean trading in the underlying stock.

Clearly, the task of agent in

this case involves planning and forecasting into the future.

There are also feedback effects in option markets and stock markets.

For example, delta hedging activities of large option holders,

can potentially, and partially move the market.

We will talk more about this later in this lesson.

In addition to feedback effects,

there are other market imperfections effects,

such as transaction costs,

holding costs and liquidity effects.

Now, the standard approach of

mathematical finance tremendously simplifies this whole task,

albeit at a substantial cost.

Here by the standard approach,

I mean the so-called risk-neutral pricing methods,

that can be viewed as straightforward modifications of the Black-Scholes model,

that it consider volatility and non-constant or stochastic.

If you have to optimally hedge an existing option position,

then in this approach,

blending amounts to stick into the delta-hedging strategy,

and the forecasting task amounts

to forecasting the stock volatility or implying it from market prices.

Feedback effects, from trading in options or in option underlying are neglected.

Therefore, there is no backward feedback loop in this setting.

Other market imperfections such as transaction costs

or limited stock liquidity are also neglected.

Option pricing can hedge in such models

amount to solving partial differential equations or PDEs,

they describe option prices,

and generalize the Black-Scholes PDE Now,

in the previous course,

we discussed how this setting can be modified,

to make it amenable to reinforcement learning.

It turns out that a key step to abandon,

is to abandon the continuous-time formulation and go back to a discrete time formulation.

It is very natural to consider time steps,

that correspond to actual re-hedging frequencies for an option.

If we re-hedge daily,

we should use daily time steps and so on.

The fact that we retain time steps finite,

make a perfect hedging impossible,

and this step is crucial.

After this step is made or other improvements can be added incrementally.

Now, what would be such improvements?

Well, beyond using discrete time steps and relying on Q

learning instead of Log-Normal model as in the Black-Scholes model,

or any other model for that sake,

we kept all other assumptions of the Black-Scholes model intact in this setting.

In particular, we neglected

transaction costs in our model specification in the previous course.

There's certain market conditions,

this is a reasonable assumption,

which is the same as in the original Black-Scholes model.