Notes - Artificial Intelligence MT24, Perfect information games
Flashcards
Basic definitions
@Define a ply.
One turn by one player.
@Define a zero-sum game.
The total reward is constant for any outcome of the game.
Suppose a game is:
- Two player
- Deterministic
- Perfect information
- Zero-sum
What does a solution look like?
A contingent strategy, specifying a move for every move the opponent makes.
Minimax
Describe the minimax @algorithm for two-player, deterministic, perfect information, zero sum games, first abstractly and then as pseudocode and assuming $\mathbf{Max}$ goes first.
Compute a value $\text{Minimax}(s)$ for each node $s$, which represents $\mathbf{Max}$’s utility by playing optimally in the sub-game starting at $s$. Then
\[\text{Minimax}(s) = \begin{cases} \text{Utility}(s) &\text{if }s\text{ is a terminal node} \\\\ \max_{a \in \text{Actions}(s)} \text{Minimax(Result(}s, a)) &\text{if }s \text{ is a }\mathbf{Max} \text{ node} \\\\ \min_{a \in \text{Actions}(s)} \text{Minimax(Result(}s, a)) &\text{if }s \text{ is a }\mathbf{Min} \text{ node} \\\\ \end{cases}\]As pseudocode:
function MinimaxDecision(s) return an action:
return a in Actions(s) that maximises MinValue(Result(s, a))
function MaxValue(s) returns a utility value:
if TerminalTest(s):
return Utility(s)
v = -infty
for a in Actions(s):
v = max(v, MinValue(Result(s, a)))
return v
function MinValue(s) returns a utility value:
if TerminalTest(s):
return Utility(s)
v = +infty
for a in Actions(s):
v = min(v, MaxValue(Result(s, a)))
return v
Analyse minimax as a game tree search strategy along the dimensions of:
-
Complete?
-
Time?
-
Space?
-
Optimal?
- Complete? Yes, if tree is finite
- Time? $O(b^m)$
- Space? $O(bm)$
- Optimal? Yes, even against an optimal opponent
Calculate the minimax value of this game:


@example~
Minimax with alpha-beta pruning
Describe the basic idea behind $\alpha$-$\beta$ pruning and explicitly @define $\alpha$ and $\beta$ and their values at the start of the algorithm.
Prune away branches of the search tree that cannot influence the decision.
- $\alpha$: In the current search, the minimum score that the $\mathbf{Max}$ player is assured of. Initially $-\infty$.
- $\beta$: In the current search, the maximum score that the $\mathbf{Min}$ player is assured of. Initially $\infty$.
Describe the $\alpha$-$\beta$ pruning @algorithm.
Recall what $\alpha$ and $\beta$ mean:
- $\alpha$: In the current search, the minimum score that the $\mathbf{Max}$ player is assured of. Initially $-\infty$.
- $\beta$: In the current search, the maximum score that the $\mathbf{Min}$ player is assured of. Initially $\infty$.
The key steps to remember are:
- When minimising:
- Return if $v \le \alpha$: “If this node is worth less than the minimum score the maximising player is assured of, then we’re going to make sure we end up down the better part of the tree, so we might as well forget about this node”.
- Set $\beta = \min(\beta, v)$.
- When maximising:
- Return if $v \ge \beta$: “If this node is worth more than the maximum score the they (the minimising player) is assured of, then we might as well forget this node and push them down the other node instead”.
- Set $\alpha = \max(\alpha, v)$.
function AlphaBetaSearch(s):
# Pick the action which maximises the minimum assured payoff
return a in Actions(s) that maximises MinValue(Result(s, a), -infty, infty)
function MinValue(s, alpha, beta) returns a utility value:
if TerminalTest(s):
# If we are at a leaf node, return the utility
return Utility(s)
# Initially our best minimum score is infinity
v = infty
for a in Actions(s):
# If taking this action results in a better score, store that
# as the best so far
v = min(v, MaxValue(Result(s, a), alpha, beta))
# alpha is the minimum score that the maximising player is assured
# of
# This condition basically checks whether it's worth continuing to
# explore this node
# If v is smaller than the minimum score the maximising player is
# assured of, then the maximising player would instead force the
# game into another part of the tree, so there's no point searching
# ahead here
if v <= alpha:
return v
# beta is the maximum score that the minimising player is assured
# of (they want this as small as possible)
beta = min(beta, v)
return v
# The MaxValue function is very similar with a few changes:
# - Swap alpha and beta around
# - Swap min for max
# - Flip inequalities
function MaxValue(s, alpha, beta) returns a utility value:
if TerminalTest(s):
return Utility(s)
v = -infty
for each a in Actions(s):
v = max(v, MinValue(Result(s, a), alpha, beta)
if v >= beta:
return v
alpha = max(alpha, v)
return v
The time complexity of minimax is $O(b^m)$ where $b$ is the max branching factor and $m$ is the number of plies. What is the best-case time complexity of $\alpha$-$\beta$ pruning?
Calculate the minimax value all the values of $\alpha$ and $\beta$ at each node for the following game:
(oops, the $-3$ on the LHS should be a $+3$)


@example~
Full pseudocode: Recall what $\alpha$ and $\beta$ mean
- $\alpha$: In the current search, the minimum score that the $\mathbf{Max}$ player is assured of. Initially $-\infty$.
- $\beta$: In the current search, the maximum score that the $\mathbf{Min}$ player is assured of. Initially $\infty$.
The key steps to remember are:
- When minimising:
- Return if $v \le \alpha$: “If this node is worth less than the minimum score the maximising player is assured of, then we’re going to make sure we end up down the better part of the tree, so we might as well forget about this node”.
- Set $\beta = \min(\beta, v)$.
- When maximising:
- Return if $v \ge \beta$: “If this node is worth more than the maximum score the they (the minimising player) is assured of, then we might as well forget this node and push them down the other node instead”.
- Set $\alpha = \max(\alpha, v)$.
function AlphaBetaSearch(s):
# Pick the action which maximises the minimum assured payoff
return a in Actions(s) that maximises MinValue(Result(s, a), -infty, infty)
function MinValue(s, alpha, beta) returns a utility value:
if TerminalTest(s):
# If we are at a leaf node, return the utility
return Utility(s)
# Initially our best minimum score is infinity
v = infty
for a in Actions(s):
# If taking this action results in a better score, store that
# as the best so far
v = min(v, MaxValue(Result(s, a), alpha, beta))
# alpha is the minimum score that the maximising player is assured
# of
# This condition basically checks whether it's worth continuing to
# explore this node
# If v is smaller than the minimum score the maximising player is
# assured of, then the maximising player would instead force the
# game into another part of the tree, so there's no point searching
# ahead here
if v <= alpha:
return v
# beta is the maximum score that the minimising player is assured
# of (they want this as small as possible)
beta = min(beta, v)
return v
# The MaxValue function is very similar with a few changes:
# - Swap alpha and beta around
# - Swap min for max
# - Flip inequalities
function MaxValue(s, alpha, beta) returns a utility value:
if TerminalTest(s):
return Utility(s)
v = -infty
for each a in Actions(s):
v = max(v, MinValue(Result(s, a), alpha, beta)
if v >= beta:
return v
alpha = max(alpha, v)
return v
In minimax or $\alpha$-$\beta$ search, it might be infeasible to search the whole game tree. In this case, rather than using $\text{Utility}(s)$, a different function $\text{Eval}(s)$ will be used where $\text{Eval}$ approximates the utility at a node.
Under what conditions will the evaluation function be optimal?
Any monotonic transformation of $\text{Utility}$ preserves optimality, i.e. for every node $n$ and $n’$,
\[\text{Eval}(n) \le \text{Eval}(n') \iff \text{Utility}(n) \le \text{Utility}(n')\]@Define a quiescent state.
A state where $\text{Eval}(s)$ is unlikely to change significantly.
Expectiminimax
Describe the expectiminimax @algorithm for two-player, deterministic, zero sum games with chance abstractly, assuming $\mathbf{Max}$ goes first.
Maximise over expected value.
\[\text{Expectiminimax}(s) = \begin{cases} \text{Utility}(s) &\text{if }s \text{ is a terminal node} \\\\ \max_{a \in \text{Actions}(s)} \text{Expectiminimax(Result(}s, a)) &\text{if }s \text{ is a }\mathbf{Max} \text{ node} \\\\ \min_{a \in \text{Actions}(s)} \text{Expectiminimax(Result(}s, a)) &\text{if }s \text{ is a }\mathbf{Min} \text{ node} \\\\ \sum_{r \text{ result} } \mathbb P(r)\cdot\text{Expectiminimax(Result(}s, r)) &\text{if }s \text{ is a }\mathbf{Chance} \text{ node} \end{cases}\]In minimax or $\alpha$-$\beta$ search, it might be infeasible to search the whole game tree. In this case, rather than using $\text{Utility}(s)$, a different function $\text{Eval}(s)$ will be used where $\text{Eval}$ approximates the utility at a node.
In this context, any monotonic transformation of $\text{Utility}$ preserves optimality, i.e. for every node $n$ and $n’$,
\[\text{Eval}(n) \le \text{Eval}(n') \iff \text{Utility}(n) \le \text{Utility}(n')\]
However, if using the expectiminimax algorithm, what condition must instead be met?
Optimality is only preserved by a positive linear transformation of $\text{Utility}$.
@State the time complexity of expectiminimax in terms of:
- $b$, the branching factor for moves
- $n$, the number of possible outcomes at chance nodes
- $m$, the maximum number of moves
Calculate the expectiminimax value of this game:


@example~
Monte-Carlo tree search
Describe the Monte-Carlo tree search @algorithm.
Simulate random moves to estimate node values
- For each node, store $N _ \text{trials}$ and $N _ \text{wins}$.
- Repeat until time out (SESB):
- Selection: Start from the initial node and then pick successive nodes according to the scores.
- Expansion: Expand an unexpanded child $c$ of $n$.
- Simulation: Play a random game starting from $c$
- Backpropogation: Backpropagate simulation result to predecessors to trials and wins ratio.
Suppose we are deciding which node to explore in Monte-Carlo tree search and:
- $N _ \text{trials}$ is the number of simulations for this node
- $N _ \text{wins}$ is the number of wins out of all the simulations
- $T$ is then number of simulations for the parent node
- $\gamma$ is the exploration parameter
In this context, @define the upper-confidence bound method for scoring each node.
Score each node via the formula
\[\frac{N_\text{wins} }{N_\text{trials} } + \gamma \sqrt{\frac{\ln T}{N_\text{trials} } }\]Show how Monte-Carlo tree search would update this tree assuming that the node at the end of the path highlighted in black is selected, and it leads to a win.


@example~