0:00

, I often want to differentiate an inverse function. Say, I've got a function f. The

derivative of f encodes how wiggling the input affects the output. The derivative

of the inverse function would encode how changes to the output affect the input.

Here's a theorem that I can use to handle this situation. Here is the inverse

function theorem. I'm going to suppose that f is some differentiable function, f

prime is continuous, the derivative is continuous. And the derivative, at some

point, a, is nonzero. In that case, I get the following fantastic conclusion. Then

the inverse function at y is defined for values of y near f of a. So, the function

f is invertable near a. The inverse function is differentiable for inputs near

f of a. And that derivative is continuous in your inputs near f of a. And I've even

got a formula for the derivative. The derivative of the inverse function at y is

1 over the original derivative, the derivative of the original function,

evaluated at the inverse function of y. How can I justify a result like that? Why

should something like that be true? One 1 way to think about this is geometrically.

Here, I've drawn the graph with just some made up function, y equals f of x. What's

the graph of the inverse function look like? Well, one way to think about this is

that the inverse function exchanges the roles of the x and y axes, which is the

same as just flipping it over, alright? What was the y-axis now, the x-axis, what,

was the x-axis is now the y-axis? And this graph here is y equals f inverse of x.

This is how you graph the inverse function. Alright.

So, let's go back to the original function and if I put down a tangent line to the

curve at some point, let's say that tangent line has slope m. Well, what's the

tangent line of the inverse function? That would be the derivative of the inverse

function. Well, if I flip over the graph again to look at the graph of the inverse

function, I can put down a tangent line to the to the inverse function. And that has

slo pe 1 over m. If m was the original slope for the tangent line to the original

function, 1 over m is the new slope to the tangent line of the inverse function. Why

1 over m? Well, that makes sense because I got this graph by exchanging the roles of

the x and y-axis, by flipping the paper over. And that exchange is rise for run,

and run for rise. So, the slope becomes the reciprocal of the old slope. This

slope business is reflected in the notation, dy dx. Som let's suppose that y

is f of x, so x is f inverse of y, supposing that this is an invariable

function. If y is f of x, then f prime of x could be written dy dx. And if f is

inverse of y, then the derivative of the inverse function at y, well, that's asking

how's changing y change x could write that as dx over dy. Well, if you really take

this notation seriously, what it looks like it's saying, is that, dx dy, which is

the derivative of the inverse function, should be 1 over dy dx, right? The

derivative of the inverse function is 1 over the derivative of the original

function. But you have to think about where these derivatives are being

computed. So, maybe you believe that dx dy is 1 over dy dx, it makes sense that if

you exchange the roles of x and y, that takes the reciprocal of the slope of the

line. But where is this wiggling happening, right? dy dx is measuring how

wiggling x affects y. Wiggling around where? Well, let's suppose that I'm

wiggling around a. So, I'm really calculating dy dx when x, say, is at a.

3:59

This is the quantity that records how wiggling x near a. will affect y. Well

then, where's y wiggling? Well, if x is wiggling around a, y is wiggling around f

of a. So, the derivative on this side is really being calculated at y equals f of

a. And it's really necessary to keep track of where this wiggling is happening in

order to get a valid formula. It's actually easier to think about what's

going on if we just phrase all of these in terms of the Chain rule. So, what do I

know about the inverse function? Well, here's f inve rse.

F of f inverse of x is just x. Alright, what is the inverse function do? Whatever

you plug into the inverse function, it outputs whatever you need to plug into f

to get out the thing you plugged into the inverse function. Alright. So, this is

true. Now, if I differentiate both sides, assuming that f and f inverse are

differentiable, then by the Chain rule, what do I get? Well, the derivative of

this composition is the derivative of the outside at the inside times the derivative

of the inside. And that's equal to the derivative of the other side, which is the

derivative of x is just 1. Now, I'll divide both sides by f prime f inverse of

x and I get that the derivative of the inverse function of x is 1 over f prime of

f inverse of x. Is that a proof? Absolutely not. The embarrassing truth is

that this argument assumes the differentiability of the inverse function.

If this function, f inverse, is differentiable, then the Chain rule can be

applied to it. The Chain rule requires that the functions be differentiable. Now,

if the function is differentiable, then this Chain rule calculation tells me that

the derivative inverse function is this quantity. But that's all predicated on

knowing that the inverse function is differentiable. How do we know that? Well,

that's actually the content of this theorem, right? The content of the inverse

function theorem is not really the calculation of the derivative of the

inverse function. It's really just the fact that the inverse function is

differentiable at all. That is a huge deal, and it's not something that we can

just get from the Chain rule. Once we know that the inverse function is

differentiable, then the Chain rule gives us this calculation. But actually

verifying if the inverse function is differentiable is really quite deep,

that's why the inverse function theorem is such a big deal. The Chain rule requires

that the functions I'm applying the change rule to be differentiable. In contrast,

the inverse function theorem is asserting the differenti ability of the inverse

function. It's really saying much more, than just a computation of the derivative

if the derivative exists. It's actually telling me that the derivative exists. I'm

going to have to punt on saying much more about the proof of the inverse function

theorem. But nevertheless, we can now apply the inverse function theorem to some

concrete examples. For example think about the function, f of x equals x squared.

Well, what's the inverse function to this? Let's suppose the domain is just the

nonnegative real numbers. Then, the functions invertible on the

domain, and we know the name of the inverse is the square root of x. What's

the derivative of the original function? Well, we know that it's 2x, and the

derivative is continuous and the derivative is not 0 provided that x is a

positive. This is all the stuff that we need to apply the inverse function

theorem. Then, we know that the derivative of the inverse function at x is 1 over the

original derivative at the inverse of x. Now, the inverse fuction is the square

root of x, so that's 1 over f prime of the square root of x, and what's f prime? f

prime is the function that doubles its input. So, that's 1 over 2 square roots of

x. So, the derivative of the inverse function, the derivative of the square

root function is 1 over 2 square roots of x, provided x is bigger than 0, right?

Just like before, this is a calculation of the derivative of the square root

function. We can also see this numerically. So, the square root of 10,000

is 100, and you might ask what do you have to take the square root of, to get at

about 100.1? Say, some numeric example. Well, think now about the functions that

are involved here. There's the squaring function and the square root function. we

saw the derivative of the square root function is 1 over 2 square root x and the

derivative of x squared, we already know, is 2x. Where are we evaluating these

functions? Well, I'm evaluating the square root function at 10,000, right? This is at

x equals 10,000 . And if I evaluate that at 10,000, that's 1 over 2 times the

square root of 10,000, that's 1 over 200. Where am I evaluating the other function,

the x squared function? Well there, I'm really thinking of 100 as the input, so

I'll evaluate that derivative at 100 and 2x, when x is a 100 is 200. And it's not

too surprising, right, that 1 over 200 and 200 are reciprocals of each other, because

I'm calculating derivatives of a function and the inverse function at the

appropriate places. Now, let's try to answer the original question. I'm trying

to figure out, what do I have to take the square root of to get about 100.1? Well,

the ratio here is about 200 between the input and the output. So, if I want the

output to be affected by 0.1, I should try to change the input by about 200 times as

much, and 200 times 0.1 is 20, so I should try to change the input by about 20 and

sure enough, if you take the square root of 10,020, that's awfully close to a

100.1. I hope that you'll play around with these numbers. All the conceptual stuff

that we're doing, these theorems, I'm not telling you these theorems to make numbers

boring, right? I'm telling you all these theorems to heighten your appreciation of

the numerical examples.