simply, each agent starts with some belief about what is the strategy of the other

agent. each agent best responds. The agent updates their beliefs based on what they

observed in the, in this iteration of the game, and the game repeats. As I said,

this is a very general regime and in fact this is a general regime of model based

learning. We have a model of the agent which would best respond to an update over

time. Fictitious plays is a special case where the model is simply a count of the

other agent's action so far and you take their counts as their current mixed

strategy. And so a little more formerly let's, let's assume that wa is the number

of times the opponent played action a in the past. And there's some initial values

for those that are non zero. And, and then you simply play a with probability that is

proportional to the time that it was played in the past. That's a very

straightforward, simple procedure. And so you, you always, there's something a

little paradoxical going on, because every agent, assuming, unless we'll talk about

two agents here. The two agents are always playing a pure strategy. And yet they're

modeling each other always with playing a mixed strategy. So be that as it may, and

we should always note that you need to worry about edge cases such as tie

breakings and what happens if you have two actions who's played an equal number of

times in the past, well you need to worry about that. Here's an example of how it

might work and in the context of matching pennies. Matching pennies, again, two

players, each choosing heads or tails. If they both chose the same, the first guy

wins. If they chose differently, the second guy wins. Let's assume that these

are the initial frequencies that they have in mind. And so I's beliefs about 2 is

that 2 played heads with. You know, a frequency of 1.5, and a, entails a

frequency 2. And these are player 2's belief abou t player 1. Okay now it's

round 1 what should they do? Well, player 1 want's to best respond to his beliefs.

He believes that this, he believes that this is the distribution of player 2 and

he wants to match. So he's going to play tails so he can match, this is the best

response to this mixed strategy that he ascribes with, to player 2 so he's going

to play tails. What about player 2? Player 2 has these beliefs, and he wants to

mismatch. So, since he believes that player 1 will play heads with greater

probability than tails, he's going to play tails. So he too, is going to play tails.

And this stage is over. Let's move now to the next stage. At this point, what

happens, well, these are the updated beliefs of the players. Player 1 observed

player 2 playing tails, so he increases the 2 to a 3, and so does player 2

increases his beliefs in what player 1 will do. So, what do they do, what player

1 still want's to match player 2 and he still believes in fact, even more

strongly, that player 2 will play tail with greater probability, so he's going to

play tail again. On the other hand, player 2 now believes that these are the

probabilities, so player 2 believes that player 1 will play tail with the greater

probability. Player 2 wants to mismatch, and so player 2 will now play heads. And

you'll continue this calculation and you'll, you can persuade yourself that the

play will proceed in this fashion. And this is how fictitious play, takes place.

So, notice something interesting. the strategies don't converge, and if you were

to continue to play this out, you would see that the T's and H's on both sides

sort of ebb and flow. But you will see that there's a certain balance taking

place over time. And this is, an, an, an, an, an, and in fact in this game they

would converge that on average, if you look the long term average of the

strategies, each of the agents will play, tails and heads with equal probability,

.5. And so, we call this the empirical frequency. Now notice that in matching

pennies, . 5 is also the unique Nash Equilibrium.

And the question is, is this an accident? And the answer is no. And here's the

theorem, the theorem says that if the empirical frequencies of the players plays

converge in fictious play, then to have to converge to a Nash Equilibrium of the

game. Now, they may not converge in general. That's why it's not an effective

learning procedure in general. But there are a host of conditions under which the,

even if the play doesn't converge, the empirical frequency does. And here are

some of the conditions are sufficient. If the game is zero sum. If it's solved by

something called iter, iterated elimination of strictly dominated

strategies. If it's something called a potential gain, which we won't define

here. Or if it is a 2 by n gain as with one pair has only two strategies, the

other may have more. But it has what's called generic payoff, which we also, also

want to find here. But the main thing to, to take away from this is that there are

some sufficent conditions that guarentee that the empircal frequency of play and

fictious play will converge even, even if the play itself will not. Let us now

switch to a very different form of learning as a whole class will learn in

core on no regret learning and it, it's difference in a fundamental way. first of

all the methods themselves will not be model based, they will not explicity model

what the other agent is doing. but rather we'll adapt the, the strategy directly

there's one difference. But perhaps more fundamentally in this case we'll start

with a learning method, but we start with a criterion for, for, for what we want the

method to satisfy. namely the no regret criterion. And so, what does this say?

We'll say that a regret on an agent at time t, for not having played some

strategy s, is this difference. The difference between the payoff he actually

got at the time and the payoff he would have gotten, had he played strategy s.

That's the naturally enough notion. We will now define where the learning rule

exhibit no regret. It will be the, if the case that if in the limit, agents would

not play will not exhibit inner regret. In other words, if you as you go to the,