ADS HT24, Zero-sum games


Flashcards

Give the payoff matrix for rock, paper, scissors for two players Row and Column.

Then define what is meant by a mixed strategy for a general payoff matrix $A$, and give an expression for the payoff from the perspective of Row.


         rock  paper scissors
rock       0    -1      1
paper      1     0      -1
scissors   -1    1      0

A mixed strategy is where $R$ chooses a row $i$ with probability $p _ i$, and $C$ chooses column $j$ with probability $q _ j$ (and in each, the probabilities sum to one).

Then the expected payoff is

\[\sum_{ij} A_{ij} p_i q_j = p^\top A q\]

What does it mean for a pair of mixed strategies $(p, q)$ for players $R$ and $C$ to be a Nash equilibrium (where $p$ and $q$ are vectors of probabilities of choosing a particular $R$ow / $C$olumn in a payoff matrix)?


Intuitively $p$ and $q$ are the best response to the other, i.e.

  • $p^\top A q \ge r^\top A q$ for every mixed strategy $r$ of $R$
  • $p^\top A q \le p^\top A s$ for every mixed strategy $s$ of $C$.

Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that

\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]

For mixed strategies $p$ and $q$. Mathematically, how can you write the best payoff $R$ can expect if it chooses first, and how can you write the best payoff $R$ can expect if it chooses second?

In this context, state Von Neumann’s theorem.


  • $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
  • $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$

Von Neumman’s theorem says that choosing first or second has no effect on the expected payoff, i.e.

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that

\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]

For mixed strategies $p$ and $q$. Mathematically, we can write the best payoff $R$ can expect if it chooses first, and the best payoff $R$ can expect if it chooses second as follows:

  • $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
  • $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$

In this context, Von Neumman’s theorem states that choosing first or second has no effect on the expected payoff, i.e.

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

Quickly prove this result (you can assume $A _ {ij}> 0$).


The strategy is to formulate each side as a linear programming problem, and then magically they are both duals of one another, which implies the result by strong duality.

For $\max _ p \min _ q p^\top A q$:

Note that

\[p^\top A q = \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]

Hence

\[\min_q p^\top A q = \min_q \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]

Since $q$ is a vector of probabilities, it’s always going to be best to pick $q _ j$ to be $1$ for the $j$ such that $\sum _ i p _ i A _ {ij}$ is minimised, and $0$ for the other entires in $q$. So actually

\[\min_q p^\top A q = \min_j \sum_i p_i A_{ij}\]

Then we can write the LP as follows:

\[\begin{aligned} \max \quad& \left(\min_j \sum_i p_i A_{ij}\right) \\\\ \text{s.t.} \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&p_1, \ldots, p_m \ge 0 \end{aligned}\]

(why do the probabilities sum to less than $1$, rather than just having the constraint for $=1$? This is to make it easier to notice duality later, and at the optimal solution it must be the case that $p _ 1 + \cdots + p _ m = 1$ anyway, since $A _ {ij} > 0$ for each).

To remove the $\min$ from the objective function, we introduce a variable $z$ such that $z \le \min _ j \sum _ i p _ i A _ {ij}$, and at the optimal solution we have $z = \min _ j \sum _ i p _ i A _ {ij}$. This can be done by setting $z \le \sum _ i p _ i A _ {ij}$ for each $j$, i.e.

\[\begin{aligned} \max \quad& z \\\\ \text{s.t.} \quad& z \le \sum_i p_i A_{ij} \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]

For $\min _ q \max _ p p^\top A q$, we can similarly derive that this is equivalent to the LP

\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]

These two LPs form a primal-dual pair, so by strong duality we can conclude:

\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]

A proof they form a primal-dual pair:

Write the first LP as follows for clarity:

\[\begin{aligned} \max \quad& 1 z + 0 p_1 + 0 p_2 + \cdots + 0 p_n \\\\ \text{s.t.} \quad& z - \sum_i p_i A_{ij}\le 0 \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]

Introduce variables $q _ 1, \cdots, q _ n, y \ge 0$. Then multiply each inequality (apart from the last one) the corresponding dual variable. Sum all of these up, call the LHS $L$ and the RHS $\text{Bound}$. Then

\[\begin{aligned} L &= \sum^n_{j=1} q_j (\mathbf{z} - \sum_i p_i A_{ij}) + y \sum^m_{i = 1} \mathbf{p_i} \\\\ \text{Bound} &= y \end{aligned}\]

The primal variables have been put in bold. We want to rearrange $L$ so that we see the coefficient of each primal variable:

\[L = \mathbf{z} \left( \sum^n_{j=1} q_j \right) + \sum^m_{i=1} \mathbf{p_i} \left( y - \sum^n_{j=1} q_j A_{ij} \right)\]

Then we want to match these up with the coefficients of the primal variables in the dual to get the constraints:

\[\begin{aligned} &\sum^n_{j = 1} q_j \ge 1 \\\\ &y - \sum^n_{j=1} q_j A_{ij} \ge 0 \end{aligned}\]

Putting this all together, and minimising $\text{Bound}$, we get:

\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]

Suppose we have a payoff matrix $A$ and a $R$ow player and a $C$olumn player. What are the maximin and minimax strategies and why do they form a Nash equilbrium?


  • Maximin is a mixed strategy for $R$, where we pick $p = \text{argmax} _ p \min _ q p^\top Aq$
  • Minimax is a mixed strategy for $C$, where we pick $q = \text{argmin} _ q \max _ q p^\top A q$

The fact that it is a Nash equilibrium can be seen by considering the LP and using complementary slackness – $p _ i$ or $q _ j$ is only non-zero at the minimum or maximum for the other player, so only positive probability is assigned to best choice.




Related posts