Notes - ADS HT24, Zero-sum games


Flashcards

Give the payoff matrix for rock, paper, scissors for two players Row and Column.

Then define what is meant by a mixed strategy for a general payoff matrix $A$, and give an expression for the payoff from the perspective of Row.


         rock  paper scissors
rock       0    -1      1
paper      1     0      -1
scissors   -1    1      0

A mixed strategy is where $R$ chooses a row $i$ with probability $p _ i$, and $C$ chooses column $j$ with probability $q _ j$ (and in each, the probabilities sum to one).

Then the expected payoff is

\[\sum_{ij} A_{ij} p_i q_j = p^\top A q\]

What does it mean for a pair of mixed strategies $(p, q)$ for players $R$ and $C$ to be a Nash equilibrium (where $p$ and $q$ are vectors of probabilities of choosing a particular $R$ow / $C$olumn in a payoff matrix)?


Intuitively $p$ and $q$ are the best response to the other, i.e.

  • $p^\top A q \ge r^\top A q$ for every mixed strategy $r$ of $R$
  • $p^\top A q \le p^\top A s$ for every mixed strategy $s$ of $C$.

Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that

\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]

For mixed strategies $p$ and $q$. Mathematically, how can you write the best payoff $R$ can expect if it chooses first, and how can you write the best payoff $R$ can expect if it chooses second?

In this context, state Von Neumann’s theorem.


  • $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
  • $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$

Von Neumman’s theorem says that choosing first or second has no effect on the expected payoff, i.e.

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that

\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]

For mixed strategies $p$ and $q$. Mathematically, we can write the best payoff $R$ can expect if it chooses first, and the best payoff $R$ can expect if it chooses second as follows:

  • $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
  • $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$

In this context, Von Neumman’s theorem states that choosing first or second has no effect on the expected payoff, i.e.

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

Quickly prove this result (you can assume $A _ {ij}> 0$).


The strategy is to formulate each side as a linear programming problem, and then magically they are both duals of one another, which implies the result by strong duality.

For $\max _ p \min _ q p^\top A q$:

Note that

\[p^\top A q = \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]

Hence

\[\min_q p^\top A q = \min_q \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]

Since $q$ is a vector of probabilities, it’s always going to be best to pick $q _ j$ to be $1$ for the $j$ such that $\sum _ i p _ i A _ {ij}$ is minimised, and $0$ for the other entires in $q$. So actually

\[\min_q p^\top A q = \min_j \sum_i p_i A_{ij}\]

Then we can write the LP as follows:

\[\begin{aligned} \max \quad& \left(\min_j \sum_i p_i A_{ij}\right) \\\\ \text{s.t.} \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&p_1, \ldots, p_m \ge 0 \end{aligned}\]

(why do the probabilities sum to less than $1$, rather than just having the constraint for $=1$? This is to make it easier to notice duality later, and at the optimal solution it must be the case that $p _ 1 + \cdots + p _ m = 1$ anyway, since $A _ {ij} > 0$ for each).

To remove the $\min$ from the objective function, we introduce a variable $z$ such that $z \le \min _ j \sum _ i p _ i A _ {ij}$, and at the optimal solution we have $z = \min _ j \sum _ i p _ i A _ {ij}$. This can be done by setting $z \le \sum _ i p _ i A _ {ij}$ for each $j$, i.e.

\[\begin{aligned} \max \quad& z \\\\ \text{s.t.} \quad& z \le \sum_i p_i A_{ij} \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]

For $\min _ q \max _ p p^\top A q$, we can similarly derive that this is equivalent to the LP

\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]

These two LPs form a primal-dual pair, so by strong duality we can conclude:

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

A proof they form a primal-dual pair:

Write the first LP as follows for clarity:

\[\begin{aligned} \max \quad& 1 z + 0 p_1 + 0 p_2 + \cdots + 0 p_n \\\\ \text{s.t.} \quad& z - \sum_i p_i A_{ij}\le 0 \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]

Introduce variables $q _ 1, \cdots, q _ n, y \ge 0$. Then multiply each inequality (apart from the last one) the corresponding dual variable. Sum all of these up, call the LHS $L$ and the RHS $\text{Bound}$. Then

\[\begin{aligned} L &= \sum^n_{j=1} q_j (\mathbf{z} - \sum_i p_i A_{ij}) + y \sum^m_{i = 1} \mathbf{p_i} \\\\ \text{Bound} &= y \end{aligned}\]

The primal variables have been put in bold. We want to rearrange $L$ so that we see the coefficient of each primal variable:

\[L = \mathbf{z} \left( \sum^n_{j=1} q_j \right) + \sum^m_{i=1} \mathbf{p_i} \left( y - \sum^n_{j=1} q_j A_{ij} \right)\]

Then we want to match these up with the coefficients of the primal variables in the dual to get the constraints:

\[\begin{aligned} &\sum^n_{j = 1} q_j \ge 1 \\\\ &y - \sum^n_{j=1} q_j A_{ij} \ge 0 \end{aligned}\]

Putting this all together, and minimising $\text{Bound}$, we get:

\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]

Consider the following generalisation of finding a Nash equilibrium for a two player zero-sum game:

Alice participates in two zero-sum games against Bob and Charlie, and she is constrained to use the same mixed strategy in both games. Given two $n \times m$ matrices $B$ and $C$, Alice, Bob are Charlie play the following game. Alice chooses a probability distribution $\pmb p = (p _ 1, \cdots, p _ n)$ and then Bob and Charlie choose probability distributions $\pmb q = (q _ 1, \cdots, q _ m)$ and $\pmb r = (r _ 1, \cdots, r _ m)$ respectively. Alice samples a row (to be used in both games) according to $\pmb p$, and then Bob and Charlie sample columns according to $\pmb q$ and $\pmb r$. Then Alice’s expected payoff is:

\[A(\pmb p, \pmb q, \pmb r) = \sum^n _ {i = 1} \sum^m _ {j=1} p _ i B _ {ij} q _ j + \sum^n _ {i = 1} \sum^m _ {j = 1} p _ i C _ {ij} r _ j\]

Bob’s payoff is

\[B(\pmb p, \pmb q) = -\sum^n _ {i = 1} \sum^m _ {j = 1} p _ i B _ {ij} q _ j\]

and Charlie’s payoff is

\[C(\pmb p, \pmb r) = -\sum^n _ {i = 1}\sum^m _ {j = 1} p _ i C _ {ij} r _ j\]

A triple of strategies $(\pmb p, \pmb q, \pmb r)$ is a Nash equilbrium of this game if each agent playes his or her best response to other agents’ strategies:

\[A(\pmb p, \pmb q, \pmb r) \ge A(\pmb p', \pmb q, \pmb r)\]

for any probabilitiy distribution $\pmb p’$ over $\{1, \cdots, n\}$, and similarly $B(\pmb p, \pmb q) \ge B(\pmb p, \pmb q’)$ for any probability distribution $\pmb q’$ and $C(\pmb p, \pmb r) \ge C(\pmb p, \pmb r’)$ for any probability distribution $\pmb r’$.

Quickly prove that for any choice of $n \times m$ matrices $B$ and $C$, this game has a Nash equilibrum. It is useful to remember complementary slackness for LPs and their dual: if $\pmb x$ and $\pmb y$ are optimal solutions to the primal and dual, then:

  • For each $i = 1, \cdots, n$, if $A _ {(i)} \pmb x < b _ i$, then $y _ i = 0$
  • For each $j = 1, \cdots, m$, if $\pmb y^\top A^{(j)} > c _ j$, then $x _ j = 0$

\[\min_{j=1,\ldots,m} \sum_{i=1}^{n} p_i B_{ij} + \min_{j=1,\ldots,m} \sum_{i=1}^{n} p_i C_{ij}\]

This can be captured by the following linear program:

\[\begin{align*} &\text{maximise} \quad z + z' \\\\ &\text{subject to} \\\\ & \quad z \leq \sum_{i=1}^{n} p_i B_{ij} \quad \text{for } j = 1, \ldots, m \\\\ & \quad z' \leq \sum_{i=1}^{n} p_i C_{ij} \quad \text{for } j = 1, \ldots, m \\\\ & \quad \sum_{i=1}^{n} p_i = 1 \\\\ & \quad z, z', p_1, \ldots, p_n \geq 0 \end{align*}\]

The dual to this program can be written as:

\[\begin{align*} &\text{minimise} \quad y \\\\ &\text{subject to} \\\\ & \quad y - \sum_{j=1}^{m} (q_j B_{ij} + r_j C_{ij}) \geq 0 \quad \text{for } i = 1, \ldots, n \\\\ & \quad \sum_{j=1}^{m} q_j \leq 1 \\\\ & \quad \sum_{j=1}^{m} r_j \leq 1 \\\\ & \quad q_1, r_1, \ldots, q_m, r_m \geq 0 \end{align*}\]

Let $(\mathbf{p}, z, z’)$ be an optimal solution to the primal LP and let $(\mathbf{q}, \mathbf{r}, y)$ be an optimal solution to the dual LP, where $\mathbf{p} = (p _ 1, \ldots, p _ n)$, $\mathbf{q} = (q _ 1, \ldots, q _ m)$, $\mathbf{r} = (r _ 1, \ldots, r _ m)$. We claim that the triple $(\mathbf{p}, \mathbf{q}, \mathbf{r})$ forms a Nash equilibrium. To see this, note first that

\[z = \min_{j=1,\ldots,m} \sum_{i=1}^{n} p_i B_{ij}, \quad z' = \min_{j=1,\ldots,m} \sum_{i=1}^{n} p_i C_{ij}.\]

By complementary slackness $q _ j > 0$ implies

\[\sum_{i=1}^{n} p_i B_{ij} = z = \min_{j=1,\ldots,m} \sum_{i=1}^{n} p_i B_{ij},\]

(here we actually use the contrapositive of complementary slackness: $q _ j \ne 0$ implies that $\sum _ {i=1}^{n} p _ i B _ {ij} \not> z$, i.e. $\sum _ {i=1}^{n} p _ i B _ {ij} = z$).

This says that $\mathbf{q}$ only assigns positive probability to columns that minimize the payoff of Alice (and hence maximize the payoff of Bob). Hence, $\mathbf{q}$ provides a best response to $\mathbf{p}$. By the same argument, $\mathbf{r}$ provides a best response to $\mathbf{p}$. Similarly,

\[y = \max_{i=1,\ldots,n} \sum_{j=1}^{m} (q_j B_{ij} + r_j C_{ij}),\]

so by complementary slackness $p _ i > 0$ implies $\sum _ {j=1}^{m} (q _ j B _ {ij} + r _ j C _ {ij}) = y$, i.e., $\mathbf{p}$ only assigns positive probability to rows that maximize Alice’s payoff. Hence, $\mathbf{p}$ provides a best response to $(\mathbf{q}, \mathbf{r})$.

Suppose we have a payoff matrix $A$ and a $R$ow player and a $C$olumn player. What are the maximin and minimax strategies and why do they form a Nash equilbrium?


  • Maximin is a mixed strategy for $R$, where we pick $p = \text{argmax} _ p \min _ q p^\top Aq$
  • Minimax is a mixed strategy for $C$, where we pick $q = \text{argmin} _ q \max _ q p^\top A q$

The fact that it is a Nash equilibrium can be seen by considering the LP and using complementary slackness – $p _ i$ or $q _ j$ is only non-zero at the minimum or maximum for the other player, so only positive probability is assigned to best choice.




Related posts