Artificial Intelligence MT24, Partially observable environments



Flashcards

Basic definitions

@Define a belief state.


A set of possible physical states that the agent could feasibly be in.

Belief state search for sensorless problems

Describe the process used to convert a sensorless physical search problem into an ordinary search problem, by specifying the

  • States
  • Initial states
  • Actions
  • Transition model (in the deterministic and non-deterministic case)
  • Goal test
  • Step cost

Perform the search over belief state space, which is fully observable.

Belief states: the $2^n$ subsets of $n$ physical states.

Initial states: A subset of physical states.

Actions: Depends on what happens when an illegal action is attempted. If illegal actions do not affect environment, then:

\[\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)\]

If illegal actions are not allowed, then:

\[\mathbf{Actions}(b) = \bigcap _ {s \in b} \mathbf{Actions} _ P(s)\]

where $\text{Actions} _ P(s)$ is the action function for the original search problem.

Transition model:

In the deterministic case, the result is one belief state given by the action applied to all the underlying physical states in $b$:

\[\mathbf{Result}(b, a) = \bigcup _ {s \in b} \\{\mathbf{Result} _ P(s, a)\\}\]

In the non-deterministic case, the result is still one belief state :

\[\mathbf{Results}(b, a) = \bigcup _ {s \in b} \mathbf{Results} _ P(s, a)\]

(note that $\mathbf{Results}(b, a)$ is a set of belief states).

Goal test:

\[\mathbf{GoalTest}(b) = \bigwedge _ {s \in b} \mathbf{GoalTest} _ P(s)\]

Step cost: Action cost is dependent on the state, so assume that all step costs are the same.

Belief state search for partially observable problems

Suppose search is being done in a partially observable environment with deterministic percepts and actions. Describe the components of the search space, by specifying the

  • States
  • Initial states
  • Actions
  • Transition model
  • Goal test
  • Step cost

(Similar to sensorless search, but we can refine the transition model by taking into account the possible percepts).

Belief states: the $2^n$ subsets of $n$ physical states.

Initial states: A subset of physical states.

Actions: Depends on what happens when an illegal action is attempted. If illegal actions do not affect environment, then:

\[\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)\]

If illegal actions are not allowed, then:

\[\mathbf{Actions}(b) = \bigcap _ {s \in b} \mathbf{Actions} _ P(s)\]

where $\text{Actions} _ P(s)$ is the action function for the original search problem.

Transition model:

Compactly:

\[\mathbf{Results}(b, a) = \\{\mathbf{Result}_P(s, a) \mid s \in b\\} / \sim\]

where $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$.

In more detail: first compute belief state as if this were sensorless search:

\[b' = \mathbf{Predict}(b, a) = \bigcup _ {s \in b} \\{\mathbf{Result} _ P(s, a)\\}\]

Predict possible percepts, since the search is not online we have to consider all things we could observe:

\[\mathbf{PossiblePercepts}(b') = \bigcup _ {s \in b'} \\{\mathbf{Percept}(s)\\}\]

Update given an observation, only keeping states where this observation is consistent with the possible percepts in that state.

\[\mathbf{Update}(b', o) = \\{s \in b' \mid o = \mathbf{Percept}(s)\\}\]

Finally, combining everything together to give the final result:

\[\mathbf{Results}(b, a) = \bigcup _ {o \in \mathbf{PossiblePercepts}(b')} \\{\mathbf{Update}(b', o)\\}\]

Goal test:

\[\mathbf{GoalTest}(b) = \bigwedge _ {s \in b} \mathbf{GoalTest} _ P(s)\]

Step cost: Action cost is dependent on the state, so assume that all step costs are the same.

Draw the first level of the And-Or tree in the partially observable vacuum world, where percepts are of the form $[L/R, \text{Dirty}/\text{Clean}]$.


@example~

Localisation

How could you describe localisation in the context of belief state search?


Finding a path in belief state space from a set of physical states to a singleton set containing just one physical state.

Consider an agent $A$ in a $3 \times 4$ grid with sensors that tell it if there are empty squares in all 4 directions, and actuators that allow it to move up, down, left or right.

Initially it has no information about where it is:

(this is a compact representation of the current belief state as a set of physical states).

Show how the agent might localise itself using a percept, an action, and then an another percept.


@example~

Comparisons

Describe the algorithms you might use to solve searching problems in:

  • Nondeterministic environments
  • Sensorless environments
  • Partially observable environments
  • Nondeterministic partially observable environments

For those where And-Or search would work, explain what the And and Or nodes correspond to.


  • Nondeterministic environments, And-Or:
    • And: The environment chooses which state we end up in.
    • Or: The agent gets to choose which action to take.
  • Sensorless environments, belief state search
    • No contingency plan is needed since you never will know what state you actually end up in.
    • It would not be possible to just apply And-Or search where each physical state is one of the “And” nodes, since then this would find a solution which depended on knowing the state you were in.
  • Partially observable environments, And-Or:
    • And: Deal with all possible outcomes consistent with a possible percept.
    • Or: The agent gets to choose which action to take.
  • Nondeterministic partially observable environments, And-Or:
    • And: Deal with all possible outcomes consistent with a possible percept in all possible next states
    • Or: The agent chooses which action to take.

Describe the states, actions and transition models for:

  • A fully observable environment with deterministic actions
  • A fully observable environment with non-deterministic actions
  • A sensorless environment with deterministic actions
  • A sensorless environment with non-deterministic actions
  • A partially observable environment with deterministic actions and deterministic percepts
  • A partially observable environment with deterministic actions but non-deterministic percepts
  • A partially observable environment with non-deterministic actions but deterministic percepts
  • A partially observable environment with non-deterministic actions and non-deterministic percepts

In each case, state the algorithm you would use to solve the corresponding search problem.


Fully observable environment with deterministic actions:

  • “Base case”
  • $\mathbf{States}$
  • $\mathbf{Actions}(s)$: What actions are available in state $s$?
  • $\mathbf{Result}(s, a)$: Given a state $s$ and an action $a$, what state are you in after applying $a$ to $s$?
  • Solve with: Your favourite tree/graph search algorithm

Fully observable environment with non-deterministic actions:

  • $\mathbf{States}$
  • $\mathbf{Actions}(s)$: set of actions
  • $\mathbf{Results}(s, a)$: given a state $s$ and an action $a$, what are all the possible states you might be in after applying $a$ to $s$?
  • Solve with: And-Or search to generate a contingency plan

Sensorless environment with deterministic actions:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • $\mathbf{Results}(b, a) = \{\mathbf{Result} _ P(s, a) \mid s \in b\}$
  • Solve with: Your favourite tree/graph search algorithm over belief states

Sensorless environment with non-deterministic actions:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • $\mathbf{Results}(b, a) = \bigcup \{\mathbf{Results} _ P(s, a) \mid s \in b\}$, note this is still a set of physical states, not a set of belief states
  • Solve with: Your favourite tree/graph search algorithm over belief states, note that And-Or search won’t be useful here

Partially observable environment with deterministic actions and deterministic percepts:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • Have an additional function $\mathbf{Percept}(s)$ which denotes what we observe in $s$
  • $\mathbf{Results}(b, a) = \{\mathbf{Result} _ P(s, a) \mid s \in b\} / \sim$ where:
    • For $s _ 1, s _ 2 \in \{\mathbf{Result} _ P(s, a) \mid s \in b\}$, $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$
    • i.e. a set of belief states consisting of subsets of the all the possible successor physical states indistinguishable by percepts.
  • Solve with: And-Or search

Partially observable environment with deterministic actions but non-deterministic percepts:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • Have an additional function $\mathbf{Percepts}(s)$ which denotes the set of possible observations in $s$
  • $\mathbf{Results}(b, a) = \{b’’ \subseteq b’ \mid b’’ = \mathbf{Update}(b’, o), o \in \mathbf{PossiblePercepts}(b’) \}$, where:
    • $b’ = \bigcup _ {s \in b} \mathbf{Results} _ P(s)$
    • $\mathbf{Update}(b’, o) = \{s \in b’ \mid o \in \mathbf{Percepts}(s)\}$, i.e. which physical states are consistent with observation $o$?
    • $\mathbf{PossiblePercepts}(b’) = \bigcup _ {s \in b’} \mathbf{Percepts}(s)$, i.e. what are all the possible observations we might receive after $b$?
    • In other words, $\mathbf{Result}(b, a)$ is a set of belief states where each belief state is a set of physical states consistent with some possible observation $o$.
  • Solve with: And-Or search

Partially observable environment with non-deterministic actions but deterministic percepts:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • Have an additional function $\mathbf{Percept}(s)$ which denotes what we observe in $s$
  • $\mathbf{Results}(b, a) = \left(\bigcup \mathbf{Results} _ P(s, a) \mid s \in b\right) / \sim$ where:
    • For $s _ 1, s _ 2 \in \{\mathbf{Result} _ P(s, a) \mid s \in b\}$, $s _ 1 \sim s _ 2$ iff $\mathbf{Percept}(s _ 1) = \mathbf{Percept}(s _ 2)$
    • i.e. a set of belief states consisting of subsets of the all the possible successor physical states indistinguishable by percepts.
  • Solve with: And-Or search

Partially observable environment with non-deterministic actions and non-deterministic percepts:

  • $\mathbf{States} = \mathcal P(\mathbf{States} _ P)$ where $\cdot _ P$ denotes the underlying physical problem, “belief states”
  • $\mathbf{Actions}(b) = \bigcup _ {s \in b} \mathbf{Actions} _ P(s)$ or $\bigcap _ {s \in b} \mathbf{Actions} _ P(s)$ depending on if invalid actions are allowed
  • Have an additional function $\mathbf{Percepts}(s)$ which denotes the set of possible observations in $s$
  • $\mathbf{Results}(b, a) = \{b’’ \subseteq b’ \mid b’’ = \mathbf{Update}(b’, o), o \in \mathbf{PossiblePercepts}(b’) \}$, where:
    • $b’ = \bigcup _ {s \in b} \mathbf{Results} _ P(s)$
    • $\mathbf{Update}(b’, o) = \{s \in b’ \mid o \in \mathbf{Percepts}(s)\}$, i.e. which physical states are consistent with observation $o$?
    • $\mathbf{PossiblePercepts}(b’) = \bigcup _ {s \in b’} \mathbf{Percepts}(s)$, i.e. what are all the possible observations we might receive after $b$?
    • In other words, $\mathbf{Result}(b, a)$ is a set of belief states where each belief state is a set of physical states consistent with some possible observation $o$.
  • Solve with: And-Or search

Optimisations

A sensorless physical search problem can be converted into an ordinary search problem by performing a search over belief state space, where nodes are sets of states that the agent could be in. Suppose $b _ 1$ has just been searched, and no solution has been found from it.

What optimisation can be used in this case?


Mark all supersets $b _ 2 \supseteq b _ 1$ as visited, since if there is no solution from $b _ 1$, there can’t be a solution from $b _ 2$ (it’s a more constrained problem).

One problem with belief state search is that belief states are very large (exponential in the number of possible states). What are two approaches for addressing this?


  • Use compact representation of belief states, perhaps using propositional logic
  • Use “incremental belief state search” where you find a solution that works for one initial state, and then see if it extends to larger initial states



Related posts