Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies From Optimality to Equilibrium Lecture 4 From Optimality to Equilibrium Lecture 4, Slide 1 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Lecture Overview 1 Recap 2 Pareto Optimality 3 Best Response and Nash Equilibrium 4 Mixed Strategies From Optimality to Equilibrium Lecture 4, Slide 2 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Non-Cooperative Game Theory What is it? mathematical study of interaction between rational, self-interested agents Why is it called non-cooperative? while it’s most interested in situations where agents’ interests conflict, it’s not restricted to these settings the key is that the individual is the basic modeling unit, and that individuals pursue their own interests cooperative/coalitional game theory has teams as the central unit, rather than agents From Optimality to Equilibrium Lecture 4, Slide 3 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Defining Games Finite, n-person game: hN, A, ui: N is a finite set of n players, indexed by i A = A1 × . . . × An , where Ai is the action set for player i a ∈ A is an action profile, and so A is the space of action profiles u = hu1 , . . . , un i, a utility function for each player, where ui : A 7→ R Writing a 2-player game as a matrix: row player is player 1, column player is player 2 rows are actions a ∈ A1 , columns are a0 ∈ A2 cells are outcomes, written as a tuple of utility values for each player From Optimality to Equilibrium Lecture 4, Slide 4 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Prisoner’s dilemma 3 Competition and Coordination: Normal form games Prisoner’s dilemma is any game C D C a, a b, c D c, b d, d Figure 3.3 Any c > a > d > b define an instance of Prisoner’s Dilemma. with c > a > d > b. To fully understand the role of the payoff numbers we would need to enter into a discussion of utility theory. Here, let us just mention that for most purposes, the analysis of any game is unchanged if the payoff numbers undergo any positive affine From Optimality to Equilibrium Lecture 4, Slide 5 competition; one player’s gain must come at the expense of the other player. Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies As in the case of common-payoff games, we can use an abbreviated matrix form to represent zero-sum in which we write only one payoff value in each cell. This Games of Puregames, Competition value represents the payoff of player 1, and thus the negative of the payoff of player 2. Note, though, that whereas the full matrix representation is unambiguous, when we use Players have exactly opposed interests the abbreviation we must explicit state whether this matrix represents a common-payoff must game or aThere zero-sum one.be precisely two players (otherwise they can’t A classical example a zero-sum game is the game of matching pennies. In this have exactlyofopposed interests) game, each of the two players has a penny, independently to display either For all action profiles a ∈ A, and u1 (a) + u2 (a) =chooses c for some heads or tails. The two players then compare their pennies. If they are the same then constant c player 1 pockets both, and otherwise player 2 pockets them. The payoff matrix is Special case: zero sum shown in Figure 3.5. Heads Tails Heads 1 −1 Tails −1 1 Figure 3.5 From Optimality to Equilibrium Matching Pennies game. Lecture 4, Slide 6 ui (a) Recap= uj (a). Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Common-payoff games are also called pure coordination games, since in such games Games of Cooperation the agents have no conflicting interests; their sole challenge is to coordinate on an action that is maximally beneficial to all. Because of their special nature, we often represent common value games with an Players have exactly the same interests. abbreviated form of the matrix in which we list only one payoff in each of the cells. no conflict: all two players want the towards same things As an example, imagine drivers driving each other in a country without traffic rules, and who must independently decide whether to drive on the left or on the ∀a ∈ A, ∀i, j, ui (a) = uj (a) right. If the players choose the same side (left or right) they have some high utility, and otherwise they have a low utility. The game matrix is shown in Figure 3.4. Left Right Left 1 0 Right 0 1 Figure 3.4 Coordination game. At the other end of the spectrum from pure coordination games lie zero-sum games, 4, Slide 7 which (bearing in mind the comment we made earlier about positive affine Lecture transforma- From Optimality to Equilibrium Recap and Nash Equilibrium 0 Best Response−1 1 Pareto Optimality Rock Mixed Strategies General Games: Battle of the Sexes 1 0 −1 −1 1 0 Paper Scissors The most interesting games combine elements of cooperation and Figure 3.6 Rock, Paper, Scissors game. competition. B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 Battle of the Sexes game. Strategies in normal-form games We have so far defined the actions available to each player in a game, butLecture not yet his 4, Slide 8 From Optimality to Equilibrium Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Lecture Overview 1 Recap 2 Pareto Optimality 3 Best Response and Nash Equilibrium 4 Mixed Strategies From Optimality to Equilibrium Lecture 4, Slide 9 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Analyzing Games We’ve defined some canonical games, and thought about how to play them. Now let’s examine the games from the outside From the point of view of an outside observer, can some outcomes of a game be said to be better than others? From Optimality to Equilibrium Lecture 4, Slide 10 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Analyzing Games We’ve defined some canonical games, and thought about how to play them. Now let’s examine the games from the outside From the point of view of an outside observer, can some outcomes of a game be said to be better than others? we have no way of saying that one agent’s interests are more important than another’s intuition: imagine trying to find the revenue-maximizing outcome when you don’t know what currency has been used to express each agent’s payoff Are there situations where we can still prefer one outcome to another? From Optimality to Equilibrium Lecture 4, Slide 10 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Pareto Optimality Idea: sometimes, one outcome o is at least as good for every agent as another outcome o0 , and there is some agent who strictly prefers o to o0 in this case, it seems reasonable to say that o is better than o0 we say that o Pareto-dominates o0 . From Optimality to Equilibrium Lecture 4, Slide 11 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Pareto Optimality Idea: sometimes, one outcome o is at least as good for every agent as another outcome o0 , and there is some agent who strictly prefers o to o0 in this case, it seems reasonable to say that o is better than o0 we say that o Pareto-dominates o0 . An outcome o∗ is Pareto-optimal if there is no other outcome that Pareto-dominates it. From Optimality to Equilibrium Lecture 4, Slide 11 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Pareto Optimality Idea: sometimes, one outcome o is at least as good for every agent as another outcome o0 , and there is some agent who strictly prefers o to o0 in this case, it seems reasonable to say that o is better than o0 we say that o Pareto-dominates o0 . An outcome o∗ is Pareto-optimal if there is no other outcome that Pareto-dominates it. can a game have more than one Pareto-optimal outcome? From Optimality to Equilibrium Lecture 4, Slide 11 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Pareto Optimality Idea: sometimes, one outcome o is at least as good for every agent as another outcome o0 , and there is some agent who strictly prefers o to o0 in this case, it seems reasonable to say that o is better than o0 we say that o Pareto-dominates o0 . An outcome o∗ is Pareto-optimal if there is no other outcome that Pareto-dominates it. can a game have more than one Pareto-optimal outcome? does every game have at least one Pareto-optimal outcome? From Optimality to Equilibrium Lecture 4, Slide 11 equences in Optimality Figure 3.1. YourBestoptions the Equilibrium two rows, and Recap are shown Pareto Response are and Nash ue’s options are the columns. In each cell, the first number represents or, Pareto minus yourOptimal delay), and the second number your colleague’s Outcomes in represents Example Games C D C −1, −1 −4, 0 D 0, −4 −3, −3 Mixed Strategies Figure 3.1 The TCP user’s (aka the Prisoner’s) Dilemma. e options what should you adopt, C or D? Does it depend on what you lleague will do? Furthermore, from the perspective of the network operaof behavior can he expect from the two users? Will any two users behave n presented with this scenario? Will the behavior change if the network ws the users to communicate with each other before making a decision? hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the part multiple times? Do answers to the above questions depend on how From Optimality to Equilibrium Lecture 4, Slide 12 As example, two drivers towards other in a cou equences in an Figure 3.1. imagine YourBestoptions thedriving two rows, and each Recap are shown Pareto Optimality Response are and Nash Equilibrium Mixed Strategies traffic rules, and who must independently decide whether to drive on the l ue’s options are the columns. In each cell, the first number represents right. If the players choose the same side (left or right) they have some hig or, Pareto minus yourOptimal delay), and the second number your colleague’s Outcomes in represents Example Games otherwise they have a low utility. The game matrix is shown in Figure 3.4 C D C −1, −1 −4, 0 D 0, −4 −3, −3 Left Right Left 1 0 Right 0 1 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaonstant-sum tions) are more properly called constant-sum games. Unlike common-pa of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before making Shoham and Leyton-Brown, ws the users to communicate with each other a decision? 2006 hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the part multiple times? Do answers to the above questions depend on how From Optimality to Equilibrium Lecture 4, Slide 12 As example, two drivers towards other in a cou equences in an Figure 3.1. imagine YourBestoptions thedriving two rows, and each Recap are shown Pareto Optimality Response are and Nash Equilibrium Mixed Strategies traffic rules, and who must independently decide whether to drive on the l ue’s options are the columns. In each cell, the first number represents right. If the players choose the same side (left or right) they have some hig Rock Paper Scissors or, Pareto minus yourOptimal delay), and the second number your colleague’s Outcomes in represents Example Games otherwise they have a low utility. The game matrix is shown in Figure 3.4 0 Rock Paper Scissors C −1 D 1 Left Right C 1 −1, −1 0 −4, 0 −1 Left 1 0 D −1 0, −4 1 −3, −3 0 Right 0 1 Figure 3.6 Rock, Paper, Scissors game. 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. B F game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaB are2,more 1 properly 0, 0 onstant-sum tions) called constant-sum games. Unlike common-pa of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before making Shoham and Leyton-Brown, ws the users to communicate each a decision? 2006 F 0, 0with 1, 2 other hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the Figure 3.7 Battle of the Sexes game. part multiple times? Do answers to the above questions depend on how From Optimality to Equilibrium Lecture 4, Slide 12 As example, imagine two drivers driving towards other in a cou equences in an Figure 3.1. YourBestoptions are the twoat rows, and each Recap are shown Pareto Optimality Response and Nash Equilibrium Mixed Strategies competition; one player’s gain must come the expense of the other play traffic rules, and who must independently decide whether to drive on the l ue’s options are the columns. In each cell, the first number As in the case of common-payoff games, represents we can use an abbreviated m right. If the players choose the same side (left or right) they have some hig Rock Paper Scissors or, Pareto minus yourOptimal delay), and the second number your colleague’s Outcomes in represents Example Games represent zero-sum games, in which we write only one payoff value in ea otherwise they have a low utility. The game matrix is shown in Figure 3.4 value represents the payoff of player 1, and thus the negative of the payof representation is unambiguous, Rock Note, 0 though, that −1 whereas the 1 full matrix Left Right C D must explicit state whether the abbreviation we this matrix represents a com game or a zero-sum one. Paper 1classical example 0 −1 Left game 1 is the0game of matching pen C A−1, −1 −4, 0 of a zero-sum game, each of the two players has a penny, and independently chooses to d then compare their pennies. If they are th Scissors heads −1 or tails. The 1 two players 0 Right 0 1 Dplayer 0, −4 −3,both, −3 and otherwise 1 pockets player 2 pockets them. The pay shown in Figure 3.5. Figure 3.6 Rock, Paper, Scissors game. 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. Heads Tails B F game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaHeads 1 games. −1 Unlike common-pa B are2,more 1 properly 0, 0 onstant-sum tions) called constant-sum of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before Shoham and Leyton-Brown, Tailsmaking −1a decision? 1 2006 ws the users to communicate each F 0, 0with 1, 2 other hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to faceFigure this same decision with the 3.5 Matching Pennies game. Figure 3.7 Battle of the Sexes game. part multiple times? Do answers to the above questions depend on how From Optimality to Equilibrium Lecture 4, Slide 12 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Lecture Overview 1 Recap 2 Pareto Optimality 3 Best Response and Nash Equilibrium 4 Mixed Strategies From Optimality to Equilibrium Lecture 4, Slide 13 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Best Response If you knew what everyone else was going to do, it would be easy to pick your own action From Optimality to Equilibrium Lecture 4, Slide 14 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Best Response If you knew what everyone else was going to do, it would be easy to pick your own action Let a−i = ha1 , . . . , ai−1 , ai+1 , . . . , an i. now a = (a−i , ai ) Best response: a∗i ∈ BR(a−i ) iff ∀ai ∈ Ai , ui (a∗i , a−i ) ≥ ui (ai , a−i ) From Optimality to Equilibrium Lecture 4, Slide 14 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Nash Equilibrium Now let’s return to the setting where no agent knows anything about what the others will do What can we say about which actions will occur? From Optimality to Equilibrium Lecture 4, Slide 15 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Nash Equilibrium Now let’s return to the setting where no agent knows anything about what the others will do What can we say about which actions will occur? Idea: look for stable action profiles. a = ha1 , . . . , an i is a (“pure strategy”) Nash equilibrium iff ∀i, ai ∈ BR(a−i ). From Optimality to Equilibrium Lecture 4, Slide 15 ue’s options are the columns. In each cell, the first number represents Recap Pareto Optimality Best Response and Nash Equilibrium or, minus your delay), and the second number represents your colleague’s Mixed Strategies Nash Equilibria of Example Games C D C −1, −1 −4, 0 D 0, −4 −3, −3 Figure 3.1 The TCP user’s (aka the Prisoner’s) Dilemma. e options what should you adopt, C or D? Does it depend on what you lleague will do? Furthermore, from the perspective of the network operaof behavior can he expect from the two users? Will any two users behave n presented with this scenario? Will the behavior change if the network ws the users to communicate with each other before making a decision? hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the part multiple times? Do answers to the above questions depend on how gents are and how they view each other’s rationality? Optimality to Equilibrium ry From gives answers to many of these questions. It tells us that any rational Lecture 4, Slide 16 ue’s options are the columns. In each cell, the first number represents Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies right. If the the second players number choose the same side (left or right) they have some hig or, minus your delay), and represents your colleague’s otherwise they have a low utility. The game matrix is shown in Figure 3.4 Nash Equilibria of Example Games C D C −1, −1 −4, 0 D 0, −4 −3, −3 Left Right Left 1 0 Right 0 1 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaonstant-sum tions) are more properly called constant-sum games. Unlike common-pa of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before making Shoham and Leyton-Brown, ws the users to communicate with each other a decision? 2006 hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the part multiple times? Do answers to the above questions depend on how gents are and how they view each other’s rationality? Optimality to Equilibrium ry From gives answers to many of these questions. It tells us that any rational Lecture 4, Slide 16 ue’s options are the columns. In each cell, the first number represents Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies right. If the the second players choose the same side (left or right) they have some hig Rock Papernumber Scissors or, minus your delay), and represents your colleague’s otherwise they have a low utility. The game matrix is shown in Figure 3.4 Nash Equilibria of Example Games 0 Rock Paper Scissors C −1 D 1 Left Right C 1 −1, −1 0 −4, 0 −1 Left 1 0 D −1 0, −4 1 −3, −3 0 Right 0 1 Figure 3.6 Rock, Paper, Scissors game. 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. B F game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaB are2,more 1 properly 0, 0 onstant-sum tions) called constant-sum games. Unlike common-pa of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before making Shoham and Leyton-Brown, ws the users to communicate each a decision? 2006 F 0, 0with 1, 2 other hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to face this same decision with the Figure 3.7 Battle of the Sexes game. part multiple times? Do answers to the above questions depend on how gents are and how they view each other’s rationality? Optimality to Equilibrium ry From gives answers to many of these questions. It tells us that any rational Lecture 4, Slide 16 ue’s options are the columns. In each cell, the first number represents As in the case of common-payoff games, we can use an abbreviated m Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies right. If the the second players choose the same side (left or right) they have some hig Rock Papernumber Scissors or, minus your delay), and represents your colleague’s represent zero-sum games, in which we write only one payoff value in ea otherwise they have a low utility. The game matrix is shown in Figure 3.4 value represents the payoff of player 1, and thus the negative of the payof Nash Equilibria of Example Games Note, though, that whereas the representation is unambiguous, Rock 0 −1 1 full matrix Left Right C D must explicit state whether the abbreviation we this matrix represents a com game or a zero-sum one. Paper 1classical example 0 −1 Left game 1 is the0game of matching pen C A−1, −1 −4, 0 of a zero-sum game, each of the two players has a penny, and independently chooses to d then compare their pennies. If they are th Scissors heads −1 or tails. The 1 two players 0 Right 0 1 Dplayer 0, −4 −3,both, −3 and otherwise 1 pockets player 2 pockets them. The pay shown in Figure 3.5. Figure 3.6 Rock, Paper, Scissors game. 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. Heads Tails B F game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaHeads 1 games. −1 Unlike common-pa B are2,more 1 properly 0, 0 onstant-sum tions) called constant-sum of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before Shoham and Leyton-Brown, Tailsmaking −1a decision? 1 2006 ws the users to communicate each F 0, 0with 1, 2 other hanges to the delays would the users’ decisions still be the same? How rs behave if they have the opportunity to faceFigure this same decision with the 3.5 Matching Pennies game. Figure 3.7 Battle of the Sexes game. part multiple times? Do answers to the above questions depend on how gents are and how they view each other’s rationality? The popular of Rock, Scissors, also as R Optimality to Equilibrium Lectureknown 4, Slide 16 ry From gives answers to many of thesechildren’s questions.game It tells us thatPaper, any rational ue’s options are the columns. In each cell, the first number represents As in the case of common-payoff games, we can use an abbreviated m Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies right. If the the second players choose the same side (left or right) they have some hig Rock Papernumber Scissors or, minus your delay), and represents your colleague’s represent zero-sum games, in which we write only one payoff value in ea otherwise they have a low utility. The game matrix is shown in Figure 3.4 value represents the payoff of player 1, and thus the negative of the payof Nash Equilibria of Example Games Note, though, that whereas the representation is unambiguous, Rock 0 −1 1 full matrix Left Right C D must explicit state whether the abbreviation we this matrix represents a com game or a zero-sum one. Paper 1classical example 0 −1 Left game 1 is the0game of matching pen C A−1, −1 −4, 0 of a zero-sum game, each of the two players has a penny, and independently chooses to d then compare their pennies. If they are th Scissors heads −1 or tails. The 1 two players 0 Right 0 1 Dplayer 0, −4 −3,both, −3 and otherwise 1 pockets player 2 pockets them. The pay shown in Figure 3.5. Figure 3.6 Rock, Paper, Scissors game. 3.4 Coordination game. Figure 3.1 The TCP user’s (aka the Prisoner’s)Figure Dilemma. Heads Tails B F game At you the other of the from pure coordination games lie zero eero-sum options what should adopt,end C or D? spectrum Does it depend on what you which (bearing in mind the comment we made earlier about positive affine lleague will do? Furthermore, from the perspective of the network operaHeads 1 games. −1 Unlike common-pa B are2,more 1 properly 0, 0 onstant-sum tions) called constant-sum of behavior can he expect from the two users? Will any two users behave ames n presented with this scenario? Will the behavior change if the network c before Shoham and Leyton-Brown, Tailsmaking −1a decision? 1 2006 ws the users to communicate each F 0, 0with 1, 2 other hanges to the delays would the users’ decisions still be the same? How rs behave The if they have the opportunity to faceFigure this same decision with the paradox Prisoner’s dilemma: Nash equilibrium is game. the only Matching Pennies Figure 3.7 of Battle of the Sexes game. the3.5 part multiple times? Do answers to the above questions depend on how non-Pareto-optimal outcome! gents are and how they view each other’s rationality? The popular of Rock, Scissors, also as R Optimality to Equilibrium Lectureknown 4, Slide 16 ry From gives answers to many of thesechildren’s questions.game It tells us thatPaper, any rational Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Lecture Overview 1 Recap 2 Pareto Optimality 3 Best Response and Nash Equilibrium 4 Mixed Strategies From Optimality to Equilibrium Lecture 4, Slide 17 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Mixed Strategies It would be a pretty bad idea to play any deterministic strategy in matching pennies Idea: confuse the opponent by playing randomly Define a strategy si for agent i as any probability distribution over the actions Ai . pure strategy: only one action is played with positive probability mixed strategy: more than one action is played with positive probability these actions are called the support of the mixed strategy Let the set of all strategies for i be Si Let the set of all strategy profiles be S = S1 × . . . × Sn . From Optimality to Equilibrium Lecture 4, Slide 18 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Utility under Mixed Strategies What is your payoff if all the players follow mixed strategy profile s ∈ S? We can’t just read this number from the game matrix anymore: we won’t always end up in the same cell From Optimality to Equilibrium Lecture 4, Slide 19 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Utility under Mixed Strategies What is your payoff if all the players follow mixed strategy profile s ∈ S? We can’t just read this number from the game matrix anymore: we won’t always end up in the same cell Instead, use the idea of expected utility from decision theory: X ui (s) = ui (a)P r(a|s) a∈A P r(a|s) = Y sj (aj ) j∈N From Optimality to Equilibrium Lecture 4, Slide 19 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Best Response and Nash Equilibrium Our definitions of best response and Nash equilibrium generalize from actions to strategies. Best response: s∗i ∈ BR(s−i ) iff ∀si ∈ Si , ui (s∗i , s−i ) ≥ ui (si , s−i ) From Optimality to Equilibrium Lecture 4, Slide 20 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Best Response and Nash Equilibrium Our definitions of best response and Nash equilibrium generalize from actions to strategies. Best response: s∗i ∈ BR(s−i ) iff ∀si ∈ Si , ui (s∗i , s−i ) ≥ ui (si , s−i ) Nash equilibrium: s = hs1 , . . . , sn i is a Nash equilibrium iff ∀i, si ∈ BR(s−i ) From Optimality to Equilibrium Lecture 4, Slide 20 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Best Response and Nash Equilibrium Our definitions of best response and Nash equilibrium generalize from actions to strategies. Best response: s∗i ∈ BR(s−i ) iff ∀si ∈ Si , ui (s∗i , s−i ) ≥ ui (si , s−i ) Nash equilibrium: s = hs1 , . . . , sn i is a Nash equilibrium iff ∀i, si ∈ BR(s−i ) Every finite game has a Nash equilibrium! [Nash, 1950] e.g., matching pennies: both players play heads/tails 50%/50% From Optimality to Equilibrium Lecture 4, Slide 20 0 −1 1 1 0 −1 −1 1 0 Rock Recap Pareto Optimality Best Response and Nash Equilibrium Paper Mixed Strategies Computing Mixed Nash Equilibria: Battle of the Sexes Scissors Figure 3.6 B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 3.2.2 pure strategy Rock, Paper, Scissors game. Battle of the Sexes game. It’s hard in general to compute Nash equilibria, but it’s easy Strategies in normal-form games when you can guess the support We have so far defined the actions available to each player in a game, but not yet his set ofBoS, strategies, or his available choices. Certainly onewhere kind of strategy is to select For let’s look for an equilibrium all actions are a single action and play it; we call such a strategy a pure strategy, and we will use part of the support the notation we have already developed for actions to represent it. There is, however, another, less obvious type of strategy; a player can choose to randomize over the set of available actions according to some probability distribution; such a strategy is called mixed strategy a mixed strategy. Although it may not be immediately obvious why a player should introduce randomness into his choice of action, in fact in a multi-agent setting the role of mixed strategies is critical. We will return to this when we discuss solution concepts From Optimality for to Equilibrium Lecture 4, Slide 21 games in the next section. Recap Pareto Optimality Best Response and Nash Equilibrium 1 −1 Scissors Mixed Strategies 0 Computing Mixed Nash Equilibria: Battle of the Sexes Figure 3.6 B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 3.2.2 pure strategy mixed strategy Rock, Paper, Scissors game. Battle of the Sexes game. Let player 2 play B with p, F with 1 − p. Strategies in normal-form games If player 1 best-responds with a mixed strategy, player 2 must We have so far defined the actions available to each player in a game, but not yet his make him indifferent between F andoneBkind (why?) set of strategies, or his available choices. Certainly of strategy is to select a single action and play it; we call such a strategy a pure strategy, and we will use the notation we have already developed for actions to represent it. There is, however, another, less obvious type of strategy; a player can choose to randomize over the set of available actions according to some probability distribution; such a strategy is called a mixed strategy. Although it may not be immediately obvious why a player should introduce randomness into his choice of action, in fact in a multi-agent setting the role of mixed strategies is critical. We will return to this when we discuss solution concepts for games in the next section. We define a mixed strategy for a normal form game as follows. From Optimality Definition to Equilibrium 3.2.4 Lecture 4, Slide 21 Let (N, (A1 , . . . , An ), O, µ, u) be a normal form game, and for any Recap Pareto Optimality Best Response and Nash Equilibrium 1 −1 Scissors Mixed Strategies 0 Computing Mixed Nash Equilibria: Battle of the Sexes Figure 3.6 B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 3.2.2 pure strategy mixed strategy Rock, Paper, Scissors game. Battle of the Sexes game. Let player 2 play B with p, F with 1 − p. Strategies in normal-form games If player 1 best-responds with a mixed strategy, player 2 must We have so far defined the actions available to each player in a game, but not yet his make him indifferent between F andoneBkind (why?) set of strategies, or his available choices. Certainly of strategy is to select a single action and play it; we call such a strategy a pure strategy, and we will use the notation we have already developed for actions it. There is, however, u1 (B) = u1to(Frepresent ) another, less obvious type of strategy; a player can choose to randomize over the set of available actions according to some a strategy is called 2p + 0(1 probability − p) = distribution; 0p + 1(1such − p) a mixed strategy. Although it may not be immediately obvious why a player should 1 introduce randomness into his choice of action, in fact in a multi-agent setting the role p = when we discuss solution concepts of mixed strategies is critical. We will return to this 3 for games in the next section. We define a mixed strategy for a normal form game as follows. From Optimality Definition to Equilibrium 3.2.4 Lecture 4, Slide 21 Let (N, (A1 , . . . , An ), O, µ, u) be a normal form game, and for any Recap Pareto Optimality Best Response and Nash Equilibrium Scissors 1 −1 0 Mixed Strategies Computing Mixed Nash Equilibria: Battle of the Sexes Figure 3.6 Rock, Paper, Scissors game. B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 3.2.2 pure strategy mixed strategy Battle of the Sexes game. Likewise, player 1 must randomize to make player 2 indifferent. Strategies in normal-form games Why is player 1 willing to randomize? We have so far defined the actions available to each player in a game, but not yet his set of strategies, or his available choices. Certainly one kind of strategy is to select a single action and play it; we call such a strategy a pure strategy, and we will use the notation we have already developed for actions to represent it. There is, however, another, less obvious type of strategy; a player can choose to randomize over the set of available actions according to some probability distribution; such a strategy is called a mixed strategy. Although it may not be immediately obvious why a player should introduce randomness into his choice of action, in fact in a multi-agent setting the role of mixed strategies is critical. We will return to this when we discuss solution concepts for games in the next section. We define a mixed strategy for a normal form game as follows. Definition 3.2.4 Let (N, (A1 , . . . , An ), O, µ, u) be a normal form game, and for any From Optimality to Equilibrium Lecture 4, Slide 21 Recap Pareto Optimality Best Response and Nash Equilibrium Scissors 1 −1 0 Mixed Strategies Computing Mixed Nash Equilibria: Battle of the Sexes Figure 3.6 Rock, Paper, Scissors game. B F B 2, 1 0, 0 F 0, 0 1, 2 Figure 3.7 3.2.2 Battle of the Sexes game. Likewise, player 1 must randomize to make player 2 indifferent. Strategies in normal-form games Why is player 1 willing to randomize? We have so far defined the actions available to each player in a game, but not yet his pure strategy mixed strategy Let 1 orplay B with q, FCertainly with 1one−kind q. of strategy is to select set ofplayer strategies, his available choices. a single action and play it; we call such a strategy a pure strategy, and we will use u2 (B) u2 (F ) the notation we have already developed for = actions to represent it. There is, however, another, less obvious type of strategy; a player can choose to randomize over the set of q +to 0(1 q) = 0q + 2(1 such − q)a strategy is called available actions according some− probability distribution; a mixed strategy. Although it may not be immediately obvious why a player should 2 introduce randomness into his choice of action, q = in fact in a multi-agent setting the role of mixed strategies is critical. We will return to this 3 when we discuss solution concepts for games in the next section. 2 1 Thus the amixed strategies ( 3 form , 3 ),game ( 13 ,as32follows. ) are a Nash We define mixed strategy for a normal equilibrium. Definition 3.2.4 Let (N, (A1 , . . . , An ), O, µ, u) be a normal form game, and for any From Optimality to Equilibrium Lecture 4, Slide 21 Recap Pareto Optimality Best Response and Nash Equilibrium Mixed Strategies Interpreting Mixed Strategy Equilibria What does it mean to play a mixed strategy? Different interpretations: Randomize to confuse your opponent consider the matching pennies example Players randomize when they are uncertain about the other’s action consider battle of the sexes Mixed strategies are a concise description of what might happen in repeated play: count of pure strategies in the limit Mixed strategies describe population dynamics: 2 agents chosen from a population, all having deterministic strategies. MS is the probability of getting an agent who will play one PS or another. From Optimality to Equilibrium Lecture 4, Slide 22
© Copyright 2025 Paperzz