ADS HT24, Zero-sum games
Flashcards
Give the payoff matrix for rock, paper, scissors for two players Row and Column.
Then define what is meant by a mixed strategy for a general payoff matrix $A$, and give an expression for the payoff from the perspective of Row.
rock paper scissors
rock 0 -1 1
paper 1 0 -1
scissors -1 1 0
A mixed strategy is where $R$ chooses a row $i$ with probability $p _ i$, and $C$ chooses column $j$ with probability $q _ j$ (and in each, the probabilities sum to one).
Then the expected payoff is
\[\sum_{ij} A_{ij} p_i q_j = p^\top A q\]What does it mean for a pair of mixed strategies $(p, q)$ for players $R$ and $C$ to be a Nash equilibrium (where $p$ and $q$ are vectors of probabilities of choosing a particular $R$ow / $C$olumn in a payoff matrix)?
Intuitively $p$ and $q$ are the best response to the other, i.e.
- $p^\top A q \ge r^\top A q$ for every mixed strategy $r$ of $R$
- $p^\top A q \le p^\top A s$ for every mixed strategy $s$ of $C$.
Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that
\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]
For mixed strategies $p$ and $q$. Mathematically, how can you write the best payoff $R$ can expect if it chooses first, and how can you write the best payoff $R$ can expect if it chooses second?
In this context, state Von Neumann’s theorem.
- $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
- $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$
Von Neumman’s theorem says that choosing first or second has no effect on the expected payoff, i.e.
\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]Suppose we have two players, $R$ow and $C$olumn playing a zero-sum game with payoff matrix $A$. We have that
\[\mathbb E[\text{expected payoff for A}] = p^\top A q = -\mathbb E[\text{expected payoff for B}]\]
For mixed strategies $p$ and $q$. Mathematically, we can write the best payoff $R$ can expect if it chooses first, and the best payoff $R$ can expect if it chooses second as follows:
- $R$ chooses first: payoff $\max _ p \min _ q p^\top A q$
- $R$ chooses second (i.e. $C$ chooses first): payoff $\min _ q \max _ p p^\top A q$
In this context, Von Neumman’s theorem states that choosing first or second has no effect on the expected payoff, i.e.
\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]
Quickly prove this result (you can assume $A _ {ij}> 0$).
The strategy is to formulate each side as a linear programming problem, and then magically they are both duals of one another, which implies the result by strong duality.
For $\max _ p \min _ q p^\top A q$:
Note that
\[p^\top A q = \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]Hence
\[\min_q p^\top A q = \min_q \sum_j q_j \left(\sum_i p_i A_{ij}\right)\]Since $q$ is a vector of probabilities, it’s always going to be best to pick $q _ j$ to be $1$ for the $j$ such that $\sum _ i p _ i A _ {ij}$ is minimised, and $0$ for the other entires in $q$. So actually
\[\min_q p^\top A q = \min_j \sum_i p_i A_{ij}\]Then we can write the LP as follows:
\[\begin{aligned} \max \quad& \left(\min_j \sum_i p_i A_{ij}\right) \\\\ \text{s.t.} \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&p_1, \ldots, p_m \ge 0 \end{aligned}\](why do the probabilities sum to less than $1$, rather than just having the constraint for $=1$? This is to make it easier to notice duality later, and at the optimal solution it must be the case that $p _ 1 + \cdots + p _ m = 1$ anyway, since $A _ {ij} > 0$ for each).
To remove the $\min$ from the objective function, we introduce a variable $z$ such that $z \le \min _ j \sum _ i p _ i A _ {ij}$, and at the optimal solution we have $z = \min _ j \sum _ i p _ i A _ {ij}$. This can be done by setting $z \le \sum _ i p _ i A _ {ij}$ for each $j$, i.e.
\[\begin{aligned} \max \quad& z \\\\ \text{s.t.} \quad& z \le \sum_i p_i A_{ij} \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]For $\min _ q \max _ p p^\top A q$, we can similarly derive that this is equivalent to the LP
\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]These two LPs form a primal-dual pair, so by strong duality we can conclude:
\[\max_p \min_q p^\top A q = \min_q \max_p p^\top A q\]A proof they form a primal-dual pair:
Write the first LP as follows for clarity:
\[\begin{aligned} \max \quad& 1 z + 0 p_1 + 0 p_2 + \cdots + 0 p_n \\\\ \text{s.t.} \quad& z - \sum_i p_i A_{ij}\le 0 \quad \forall j\\\\ \quad&p_1 + \cdots + p_m \le 1 \\\\ \quad&z, p_1, \ldots, p_m \ge 0 \end{aligned}\]Introduce variables $q _ 1, \cdots, q _ n, y \ge 0$. Then multiply each inequality (apart from the last one) the corresponding dual variable. Sum all of these up, call the LHS $L$ and the RHS $\text{Bound}$. Then
\[\begin{aligned} L &= \sum^n_{j=1} q_j (\mathbf{z} - \sum_i p_i A_{ij}) + y \sum^m_{i = 1} \mathbf{p_i} \\\\ \text{Bound} &= y \end{aligned}\]The primal variables have been put in bold. We want to rearrange $L$ so that we see the coefficient of each primal variable:
\[L = \mathbf{z} \left( \sum^n_{j=1} q_j \right) + \sum^m_{i=1} \mathbf{p_i} \left( y - \sum^n_{j=1} q_j A_{ij} \right)\]Then we want to match these up with the coefficients of the primal variables in the dual to get the constraints:
\[\begin{aligned} &\sum^n_{j = 1} q_j \ge 1 \\\\ &y - \sum^n_{j=1} q_j A_{ij} \ge 0 \end{aligned}\]Putting this all together, and minimising $\text{Bound}$, we get:
\[\begin{aligned} \min \quad& y \\\\ \text{s.t.} \quad& y \ge \sum_j A_{ij} q_j \quad \forall i\\\\ \quad&q_1 + \cdots + q_n \ge 1 \\\\ \quad&y, q_1, \ldots, q_n \ge 0 \end{aligned}\]Suppose we have a payoff matrix $A$ and a $R$ow player and a $C$olumn player. What are the maximin and minimax strategies and why do they form a Nash equilbrium?
- Maximin is a mixed strategy for $R$, where we pick $p = \text{argmax} _ p \min _ q p^\top Aq$
- Minimax is a mixed strategy for $C$, where we pick $q = \text{argmin} _ q \max _ q p^\top A q$
The fact that it is a Nash equilibrium can be seen by considering the LP and using complementary slackness – $p _ i$ or $q _ j$ is only non-zero at the minimum or maximum for the other player, so only positive probability is assigned to best choice.