AIMA - Perception


In which we connect the computer to the raw, unwashed world.

Notes

  • Can create a sensor model $P(E \vert S)$ which represents contains the evidence from the world coupled with knowledge about the current world state.
  • Can break sensor model down into an object model, which describes the objects in the world, and a rendering model, which describes the geometry of the world.

  • “Which aspects of the rich visual stimulus should be considered to help the agent make good action choices, and which aspects should be ignored?”
    • Feature extraction applies computations directly to sensor observations
    • Recognition marks objects in the world
    • Reconstruction builds a geometric model of the world from an image or set of images. (This is the approach I want to take in my EPQ, building a model of the world and then converting that model into sound).
  • It’s difficult because imaging distorts geometry, consider how parallel lines seem to converge together to a point when viewing them.
  • Early image processing operations are cheapish, low-level operations and are first in the pipeline of operations.
  • They are local in nature, and only consider nearby pixels without considering the image as a whole.

  • Edge detection:
    • Edges occur when there are dramatic changes in intensity/brightness.
    • One way to identify them is to look for large values of the derivative of intensity $I’(x, y)$.
    • This almost works but there is a lot of noise in the image.
    • Applying a Gaussian blur can remove the noise and let you better identify blur.
    • There’s an operation you can do called convolution and a theorem that lets you optimise it. You can find edges in 2D by doing $(I * N _ \sigma^’)(x, y)$ which gives you peaks where the edges are and you can mark edges that are above some threshold.
    • You can then find edge points by examining if the edge stops at that point and join it together with another edge point.
    • Convolution is a way of combining two functions together in a certain “region”?
  • Texture
    • Texture makes sense for groups of pixels rather than individual pixels, unlike brightness.
    • Can compute the orientation of edge pixel (using the edge orientation algorithm) and create a histogram of orientations. Bricks will have two peaks, whereas leopard spots will be more uniformly distributed.
    • Computing texture can then be used to compute edges by looking at the boundary curves for when the histograms change dramatically.
  • Optical flow
    • Optical flow looks at how the pixels change between different frames of the video
    • It creates a vector field for a vector at each pixel.
    • This lets you calculate things like distances because the optical flow will show slower apparent motion for farther away objects than close up ones.
    • A simple algorithm is trying to find pixels with similar intensities in successive frames and match them up.
  • Segmentation
    • Splitting the image up into regions.
    • Can create histograms for certain features like brightness and edge orientation and then train a machine learning algorithm to identify “boundary contours” which divide the image up.



Related posts