Artificial Intelligence MT24, Partially observable environments
Flashcards
Basic definitions
@Define a belief state.
A set of possible physical states that the agent could feasibly be in.
Belief state search for sensorless problems
Describe the process used to convert a sensorless physical search problem into an ordinary search problem, by specifying the
- States
- Initial states
- Actions
- Transition model (in the deterministic and non-deterministic case)
- Goal test
- Step cost
Perform the search over belief state space, which is fully observable.
Belief states: the $2^n$ subsets of $n$ physical states.
Initial states: A subset of physical states.
Actions: Depends on what happens when an illegal action is attempted. If illegal actions do not affect environment, then:
\[\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)\]If illegal actions are not allowed, then:
\[\mathbf{Actions}(b) = \bigcap _ {s \in b} \mathbf{Actions} _ P(s)\]where $\text{Actions} _ P(s)$ is the action function for the original search problem.
Transition model:
In the deterministic case, the result is one belief state given by the action applied to all the underlying physical states in $b$:
\[\mathbf{Result}(b, a) = \bigcup _ {s \in b} \\{\mathbf{Result} _ P(s, a)\\}\]In the non-deterministic case, the result is still one belief state :
\[\mathbf{Results}(b, a) = \bigcup _ {s \in b} \mathbf{Results} _ P(s, a)\](note that $\mathbf{Results}(b, a)$ is a set of belief states).
Goal test:
\[\mathbf{GoalTest}(b) = \bigwedge _ {s \in b} \mathbf{GoalTest} _ P(s)\]Step cost: Action cost is dependent on the state, so assume that all step costs are the same.
Belief state search for partially observable problems
Suppose search is being done in a partially observable environment with deterministic percepts and actions. Describe the components of the search space, by specifying the
- States
- Initial states
- Actions
- Transition model
- Goal test
- Step cost
(Similar to sensorless search, but we can refine the transition model by taking into account the possible percepts).
Belief states: the $2^n$ subsets of $n$ physical states.
Initial states: A subset of physical states.
Actions: Depends on what happens when an illegal action is attempted. If illegal actions do not affect environment, then:
\[\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)\]If illegal actions are not allowed, then:
\[\mathbf{Actions}(b) = \bigcap _ {s \in b} \mathbf{Actions} _ P(s)\]where $\text{Actions} _ P(s)$ is the action function for the original search problem.
Transition model:
Compactly:
\[\mathbf{Results}(b, a) = \\{\mathbf{Result}_P(s, a) \mid s \in b\\} / \sim\]where $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$.
In more detail: first compute belief state as if this were sensorless search:
\[b' = \mathbf{Predict}(b, a) = \bigcup _ {s \in b} \\{\mathbf{Result} _ P(s, a)\\}\]Predict possible percepts, since the search is not online we have to consider all things we could observe:
\[\mathbf{PossiblePercepts}(b') = \bigcup _ {s \in b'} \\{\mathbf{Percept}(s)\\}\]Update given an observation, only keeping states where this observation is consistent with the possible percepts in that state.
\[\mathbf{Update}(b', o) = \\{s \in b' \mid o = \mathbf{Percept}(s)\\}\]Finally, combining everything together to give the final result:
\[\mathbf{Results}(b, a) = \bigcup _ {o \in \mathbf{PossiblePercepts}(b')} \\{\mathbf{Update}(b', o)\\}\]Goal test:
\[\mathbf{GoalTest}(b) = \bigwedge _ {s \in b} \mathbf{GoalTest} _ P(s)\]Step cost: Action cost is dependent on the state, so assume that all step costs are the same.
Draw the first level of the And-Or tree in the partially observable vacuum world, where percepts are of the form $[L/R, \text{Dirty}/\text{Clean}]$.


@example~
Localisation
How could you describe localisation in the context of belief state search?
Finding a path in belief state space from a set of physical states to a singleton set containing just one physical state.
Consider an agent $A$ in a $3 \times 4$ grid with sensors that tell it if there are empty squares in all 4 directions, and actuators that allow it to move up, down, left or right.
Initially it has no information about where it is:

(this is a compact representation of the current belief state as a set of physical states).
Show how the agent might localise itself using a percept, an action, and then an another percept.

@example~
Comparisons
Describe the algorithms you might use to solve searching problems in:
- Nondeterministic environments
- Sensorless environments
- Partially observable environments
- Nondeterministic partially observable environments
For those where And-Or search would work, explain what the And and Or nodes correspond to.
- Nondeterministic environments, And-Or:
- And: The environment chooses which state we end up in.
- Or: The agent gets to choose which action to take.
- Sensorless environments, belief state search
- No contingency plan is needed since you never will know what state you actually end up in.
- It would not be possible to just apply And-Or search where each physical state is one of the “And” nodes, since then this would find a solution which depended on knowing the state you were in.
- Partially observable environments, And-Or:
- And: Deal with all possible outcomes consistent with a possible percept.
- Or: The agent gets to choose which action to take.
- Nondeterministic partially observable environments, And-Or:
- And: Deal with all possible outcomes consistent with a possible percept in all possible next states
- Or: The agent chooses which action to take.
Describe the states, actions and transition models for:
- A fully observable environment with deterministic actions
- A fully observable environment with non-deterministic actions
- A sensorless environment with deterministic actions
- A sensorless environment with non-deterministic actions
- A partially observable environment with deterministic actions and deterministic percepts
- A partially observable environment with deterministic actions but non-deterministic percepts
- A partially observable environment with non-deterministic actions but deterministic percepts
- A partially observable environment with non-deterministic actions and non-deterministic percepts
In each case, state the algorithm you would use to solve the corresponding search problem.
Fully observable environment with deterministic actions:
- “Base case”
- $\mathbf{States}$
- $\mathbf{Actions}(s)$: What actions are available in state $s$?
- $\mathbf{Result}(s, a)$: Given a state $s$ and an action $a$, what state are you in after applying $a$ to $s$?
- Solve with: Your favourite tree/graph search algorithm
Fully observable environment with non-deterministic actions:
- $\mathbf{States}$
- $\mathbf{Actions}(s)$: set of actions
- $\mathbf{Results}(s, a)$: given a state $s$ and an action $a$, what are all the possible states you might be in after applying $a$ to $s$?
- Solve with: And-Or search to generate a contingency plan
Sensorless environment with deterministic actions:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- $\mathbf{Results}(b, a) = \{\mathbf{Result} _ P(s, a) \mid s \in b\}$
- Solve with: Your favourite tree/graph search algorithm over belief states
Sensorless environment with non-deterministic actions:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- $\mathbf{Results}(b, a) = \bigcup \{\mathbf{Results} _ P(s, a) \mid s \in b\}$, note this is still a set of physical states, not a set of belief states
- Solve with: Your favourite tree/graph search algorithm over belief states, note that And-Or search won’t be useful here
Partially observable environment with deterministic actions and deterministic percepts:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- Have an additional function $\mathbf{Percept}(s)$ which denotes what we observe in $s$
- $\mathbf{Results}(b, a) = \{\mathbf{Result} _ P(s, a) \mid s \in b\} / \sim$ where:
- For $s _ 1, s _ 2 \in \{\mathbf{Result} _ P(s, a) \mid s \in b\}$, $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$
- i.e. a set of belief states consisting of subsets of the all the possible successor physical states indistinguishable by percepts.
- Solve with: And-Or search
Partially observable environment with deterministic actions but non-deterministic percepts:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- Have an additional function $\mathbf{Percepts}(s)$ which denotes the set of possible observations in $s$
- $\mathbf{Results}(b, a) = \{b’’ \subseteq b’ \mid b’’ = \mathbf{Update}(b’, o), o \in \mathbf{PossiblePercepts}(b’) \}$, where:
- $b’ = \bigcup _ {s \in b} \mathbf{Results} _ P(s)$
- $\mathbf{Update}(b’, o) = \{s \in b’ \mid o \in \mathbf{Percepts}(s)\}$, i.e. which physical states are consistent with observation $o$?
- $\mathbf{PossiblePercepts}(b’) = \bigcup _ {s \in b’} \mathbf{Percepts}(s)$, i.e. what are all the possible observations we might receive after $b$?
- In other words, $\mathbf{Result}(b, a)$ is a set of belief states where each belief state is a set of physical states consistent with some possible observation $o$.
- Solve with: And-Or search
Partially observable environment with non-deterministic actions but deterministic percepts:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- Have an additional function $\mathbf{Percept}(s)$ which denotes what we observe in $s$
- $\mathbf{Results}(b, a) = \left(\bigcup \mathbf{Results} _ P(s, a) \mid s \in b\right) / \sim$ where:
- For $s _ 1, s _ 2 \in \{\mathbf{Result} _ P(s, a) \mid s \in b\}$, $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$
- i.e. a set of belief states consisting of subsets of the all the possible successor physical states indistinguishable by percepts.
- Solve with: And-Or search
Partially observable environment with non-deterministic actions and non-deterministic percepts:
- $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
- $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
- Have an additional function $\mathbf{Percepts}(s)$ which denotes the set of possible observations in $s$
- $\mathbf{Results}(b, a) = \{b’’ \subseteq b’ \mid b’’ = \mathbf{Update}(b’, o), o \in \mathbf{PossiblePercepts}(b’) \}$, where:
- $b’ = \bigcup _ {s \in b} \mathbf{Results} _ P(s)$
- $\mathbf{Update}(b’, o) = \{s \in b’ \mid o \in \mathbf{Percepts}(s)\}$, i.e. which physical states are consistent with observation $o$?
- $\mathbf{PossiblePercepts}(b’) = \bigcup _ {s \in b’} \mathbf{Percepts}(s)$, i.e. what are all the possible observations we might receive after $b$?
- In other words, $\mathbf{Result}(b, a)$ is a set of belief states where each belief state is a set of physical states consistent with some possible observation $o$.
- Solve with: And-Or search
Optimisations
A sensorless physical search problem can be converted into an ordinary search problem by performing a search over belief state space, where nodes are sets of states that the agent could be in. Suppose $b _ 1$ has just been searched, and no solution has been found from it.
What optimisation can be used in this case?
Mark all supersets $b _ 2 \supseteq b _ 1$ as visited, since if there is no solution from $b _ 1$, there can’t be a solution from $b _ 2$ (it’s a more constrained problem).
One problem with belief state search is that belief states are very large (exponential in the number of possible states). What are two approaches for addressing this?
- Use compact representation of belief states, perhaps using propositional logic
- Use “incremental belief state search” where you find a solution that works for one initial state, and then see if it extends to larger initial states