So, in a previous lesson, we talked about how ideas from physics influenced some of the modern financial models such as the Black-Scholes model. We also spoke about problematic sides of the Black-Scholes model such as ignorance of risk and options, due to the insistence of continuous re-hedging. Now, in a previous lesson, and also in the previous course, we made the point that this approach is problematic at a conceptual level rather than just empirically or numerically. Ignorance of risk of mishedging and pricing options leads to the conclusion that options are redundant instruments. If this were true, neither market makers nor option speculators would exist in the marketplace. Hence, options themselves would not exist either as a result. Because there would be no one to trade them. In 2003, Bates surveyed an empirical literature on option markets. He concluded that the risk-neutral methods of option pricing that extend the Black-Scholes model cannot fully capture and let alone, explain the empirical properties of option prices. He said the following, "To blithely attitude divergences between objective and risk-neutral probability measures to the free "risk-premium" parameters within an affine model is to abdicate one's responsibilities as a financial economist. A renewed focus on explicit financial intermediation of the underlying risk by option pricing market makers is needed". Now, if you're familiar with the financial theory, you would understand what he means by the objective and the risk-neutral probability measures. But if you're not, an answer in essence, risk-neutral valuation methods amounts to a statement that all prices can be computed as discounted expectations of future cash flows. Computer down there are very special probability measure q called a pricing measure or a risk-neutral measure. In this measure, all securities have the same expected returns which are equal to a risk-free interest rate. Dynamics under the physical measure p and pricing measure q are assumed to be of the same form, while parameters such as drift are related via a free parameter called the market price of risk. I will talk a bit more about the pricing measure acute below. But here, I would like to talk about how it reminds me a famous quote of Richard Feynman. He once said that the procedure of renormalization in quantum field theory reminds him brushing garbage under a carpet instead of taking it out. So, let me explain what he meant by that. In quantum field theory, if you follow rules of an original model construction and try to compute some physical quantities, you would naively get an infinite number. This happens because you have to integrate over some virtual processes of particle creation and annihilation over variance small distances. This produces divergences in integrals that express this durables. This on its own means that the quantum field theory is only approximately correct and should be replaced by distinct theory or something else at ultra short distances. But quantum field theory instead uses a procedure called renormalization that achieves finiteness by adding so-called counter terms, which are assumed to have the same functional form as the original terms in the model. They are then added with coefficients that ensure cancellations of divergences. Now, this sounds very similar to a transition between the physical measure p and pricing measure q in financial theory. In both cases, inconsistency of the model with the data is explained away by insistence that the unseen things should be structurally the same as the things that we see. In the field theory, the unseen is the physics at ultra small scales. In finance, the unseen is the pricing measure, as it doesn't exist in any well-defined sense beyond very restrictive assumptions of classical finance, which we will discuss in our next video. Okay. Now, let's move back from physics to finance. More specifically, to option trading can risk of this business. Now, there exists multiple origins of risk in options. One of them is play of demand forces in the options market itself. In a very interesting paper from 2007 called "Demand-Based Option Pricing", Garleanu, Pedersen and Panageas looked at it from the modeling perspective. They looked at the problem of option pricing from the point of view of an option of market maker that is an option dealer. Option prices then become functions of demand pressure in the market. In other words, the prices should be such that dealers that maximize their utility to supply precisely the quantities of options that the end users of options demand. Now, these authors found that the marginal increase in the demand pressure in an option, increases its price by an amount proportional to the variance of the unhedgeable part of the option. This sounds very similar to the net effect of risk in the model that we considered in the previous course. This model that I called QLBS model as a short name for Q-Learning for the Black-Scholes problem, deals with a discrete-time hedging for an option. As we discussed in this previous course, rewinding back the continuous time limit of the original Black-Scholes model is the simplest possible way to remove the assumption of a perfect hedge. Made in [inaudible] Black-Scholes model. The net result of such unwinding is that, the option price receives the risk-premium that to the first degree is proportional to a sum of variances of the hedge portfolio across the hedge at times. This is a similar effect to one obtained by Garleanu and co-authors. Now, the topic of this lesson is to talk about other origins of risk, in option pricing and trading, as well as to talk about interconnections between the option and stock markets. Let's start with a diagram, that we discussed many times in this specialization. Namely, a diagram that shows the interaction of an agent with the environment. Reinforcement learning solves this task of sequential decision-making, that has to optimize some gross expressed via cumulative reward function. We call such tasks action tasks, and such that they involve a perception tasks, task as an intermediate step, because optimizing for the goal now involves planning and forecasting into the future. So, the agent absorbs states as T of the environment, and then performs an action AT. As a result, the agent gets a reward RT while the system moves to a new state ST plus one. Now, in a general setting goals of reinforcement learning probabilities, of transitions to new states ST plus one may depend on the action AT taken by the agent. This is called the feedback loop. It means that, actions of the agent may impact the future, but of course they do not have to. Models were actions of an agent did not impact the future can simply be viewed as a special case of models where such impact is present. Now, we can view option trading, using the same framework. In option trading, the role of an agent is played by an arbitrator robot or a human option trader. The agent absorbs the market price ST and makes a trade or action AT. If we consider an agent that has already sold or bought an option, then by action AT we mean trading in the underlying stock. Clearly, the task of agent in this case involves planning and forecasting into the future. There are also feedback effects in option markets and stock markets. For example, delta hedging activities of large option holders, can potentially, and partially move the market. We will talk more about this later in this lesson. In addition to feedback effects, there are other market imperfections effects, such as transaction costs, holding costs and liquidity effects. Now, the standard approach of mathematical finance tremendously simplifies this whole task, albeit at a substantial cost. Here by the standard approach, I mean the so-called risk-neutral pricing methods, that can be viewed as straightforward modifications of the Black-Scholes model, that it consider volatility and non-constant or stochastic. If you have to optimally hedge an existing option position, then in this approach, blending amounts to stick into the delta-hedging strategy, and the forecasting task amounts to forecasting the stock volatility or implying it from market prices. Feedback effects, from trading in options or in option underlying are neglected. Therefore, there is no backward feedback loop in this setting. Other market imperfections such as transaction costs or limited stock liquidity are also neglected. Option pricing can hedge in such models amount to solving partial differential equations or PDEs, they describe option prices, and generalize the Black-Scholes PDE Now, in the previous course, we discussed how this setting can be modified, to make it amenable to reinforcement learning. It turns out that a key step to abandon, is to abandon the continuous-time formulation and go back to a discrete time formulation. It is very natural to consider time steps, that correspond to actual re-hedging frequencies for an option. If we re-hedge daily, we should use daily time steps and so on. The fact that we retain time steps finite, make a perfect hedging impossible, and this step is crucial. After this step is made or other improvements can be added incrementally. Now, what would be such improvements? Well, beyond using discrete time steps and relying on Q learning instead of Log-Normal model as in the Black-Scholes model, or any other model for that sake, we kept all other assumptions of the Black-Scholes model intact in this setting. In particular, we neglected transaction costs in our model specification in the previous course. There's certain market conditions, this is a reasonable assumption, which is the same as in the original Black-Scholes model. From the point of view of modelling, these produces a big simplification. The reason is that, without transaction costs, our system has no memory of past states. If trading in the stock is costless, we do not have to know how many stocks we held in the previous time step in order to optimally hedge at the current time step. We can view each optimal hedge as equivalent, to assuming that we buy a full amount of the stock, that is needed for the optimal hedge every time in you. Because stock trading is assumed to be costless, holding N stocks with market price ST is completely equivalent to holding the amount N times ST of cash. Now, from the perspective of reinforcement learning, it means that we do not have to keep the stock holding XT, in this case, as a part of the state vector. Therefore, in a simplified setting that we considered in the previous course, the state vector was made of only one number ST, that is the stock price. Now, on other hand, actions AT your actual values of the delta-hedge, that is the mount of stalking the option replicating portfolio. You also neglected feedback effects from trading [inaudible] underlying. Again, this is justified for small option player, the same way as in the Black-Scholes model, but it might be inaccurate assumption for a big option player, or in a weaker market. This brings us to the topic of this lesson, where we will talk about modelling market imperfections for option pricing and hedging. In our next video, we'll talk about the first item in this big topic, which is something called market liquidity.