AIMA - Perception
In which we connect the computer to the raw, unwashed world.
Notes
- Can create a sensor model $P(E \vert S)$ which represents contains the evidence from the world coupled with knowledge about the current world state.
-
Can break sensor model down into an object model, which describes the objects in the world, and a rendering model, which describes the geometry of the world.
- “Which aspects of the rich visual stimulus should be considered to help the agent make good action choices, and which aspects should be ignored?”
- Feature extraction applies computations directly to sensor observations
- Recognition marks objects in the world
- Reconstruction builds a geometric model of the world from an image or set of images. (This is the approach I want to take in my EPQ, building a model of the world and then converting that model into sound).
- It’s difficult because imaging distorts geometry, consider how parallel lines seem to converge together to a point when viewing them.
- Early image processing operations are cheapish, low-level operations and are first in the pipeline of operations.
-
They are local in nature, and only consider nearby pixels without considering the image as a whole.
- Edge detection:
- Edges occur when there are dramatic changes in intensity/brightness.
- One way to identify them is to look for large values of the derivative of intensity $I’(x, y)$.
- This almost works but there is a lot of noise in the image.
- Applying a Gaussian blur can remove the noise and let you better identify blur.
- There’s an operation you can do called convolution and a theorem that lets you optimise it. You can find edges in 2D by doing $(I * N _ \sigma^’)(x, y)$ which gives you peaks where the edges are and you can mark edges that are above some threshold.
- You can then find edge points by examining if the edge stops at that point and join it together with another edge point.
- Convolution is a way of combining two functions together in a certain “region”?
- Texture
- Texture makes sense for groups of pixels rather than individual pixels, unlike brightness.
- Can compute the orientation of edge pixel (using the edge orientation algorithm) and create a histogram of orientations. Bricks will have two peaks, whereas leopard spots will be more uniformly distributed.
- Computing texture can then be used to compute edges by looking at the boundary curves for when the histograms change dramatically.
- Optical flow
- Optical flow looks at how the pixels change between different frames of the video
- It creates a vector field for a vector at each pixel.
- This lets you calculate things like distances because the optical flow will show slower apparent motion for farther away objects than close up ones.
- A simple algorithm is trying to find pixels with similar intensities in successive frames and match them up.
- Segmentation
- Splitting the image up into regions.
- Can create histograms for certain features like brightness and edge orientation and then train a machine learning algorithm to identify “boundary contours” which divide the image up.