# AIMA: Perception

> Source: https://ollybritton.com/notes/textbooks/ai-a-modern-approach/communicating-percieving-and-acting/perception/ · Updated: 2021-04-04 · Tags: aiama, notes

> In which we connect the computer to the raw, unwashed world.

### Notes
* Can create a sensor model $P(E | S)$ which represents contains the evidence from the world coupled with knowledge about the current world state.
* Can break sensor model down into an object model, which describes the objects in the world, and a rendering model, which describes the geometry of the world.

* "Which aspects of the rich visual stimulus should be considered to help the agent make good action choices, and which aspects should be ignored?"
	* Feature extraction applies computations directly to sensor observations
	* Recognition marks objects in the world
	* Reconstruction builds a geometric model of the world from an image or set of images. _(This is the approach I want to take in my EPQ, building a model of the world and then converting that model into sound)._

* It's difficult because imaging distorts geometry, consider how parallel lines seem to converge together to a point when viewing them.
* Early image processing operations are cheapish, low-level operations and are first in the pipeline of operations.
* They are local in nature, and only consider nearby pixels without considering the image as a whole.

* Edge detection:
	* Edges occur when there are dramatic changes in intensity/brightness.
	* One way to identify them is to look for large values of the derivative of intensity $I'(x, y)$.
	* This almost works but there is a lot of noise in the image.
	* Applying a Gaussian blur can remove the noise and let you better identify blur.
	* There's an operation you can do called convolution and a theorem that lets you optimise it. You can find edges in 2D by doing $(I * N_\sigma^')(x, y)$ which gives you peaks where the edges are and you can mark edges that are above some threshold.
	* You can then find edge points by examining if the edge stops at that point and join it together with another edge point.
	* Convolution is a way of combining two functions together in a certain "region"?

* Texture
	* Texture makes sense for groups of pixels rather than individual pixels, unlike brightness.
	* Can compute the orientation of edge pixel (using the edge orientation algorithm) and create a histogram of orientations. Bricks will have two peaks, whereas leopard spots will be more uniformly distributed.
	* Computing texture can then be used to compute edges by looking at the boundary curves for when the histograms change dramatically.

* Optical flow
	* Optical flow looks at how the pixels change between different frames of the video
	* It creates a vector field for a vector at each pixel.
	* This lets you calculate things like distances because the optical flow will show slower apparent motion for farther away objects than close up ones.
	* A simple algorithm is trying to find pixels with similar intensities in successive frames and match them up.

* Segmentation
	* Splitting the image up into regions.
	* Can create histograms for certain features like brightness and edge orientation and then train a machine learning algorithm to identify "boundary contours" which divide the image up.

---
Olly Britton — https://ollybritton.com. Machine-readable index: https://ollybritton.com/llms.txt