How to create machine vision applications?

When creating machine vision applications, we must go far beyond image processing in signal theory relations. We usually work with monochrome or color images, where the brightness of individual pixels or the color intensities of individual pixels are stored with an accuracy of 8 bits – ie 256 possible levels. One pixel of a monochrome image thus occupies one byte. In addition to the three 8-bit color components of the red, green, and blue channels, the pixel value of a color image also includes the so-called alpha channel, which defines the transparency of the pixel. This information is important for possible merging (blending, blending) of images and in addition, the alignment of data on 4 bytes is also more advantageous for current processors than the granularity of data after 24 bits.

For demanding applications, pixel components tend to be 16-bit integers or even 32-bit floating-point numbers. One pixel then takes up 128 bits. However, the above formats are most commonly used with eight bits per color channel or brightness.

We can now introduce the concept of an image function defining the pixel value f (x, y) of two variables x and y, which determine the position of the pixel. Since our function is defined by discrete points (discrete domain and discrete domain), it is integrable in its entire domain and there is a direct and inverse transform.

Digital cameras that provide more than 8-bit image dynamics are quite rare. Usually, even the fewest bits of 8-bit cameras are affected by noise and data compression. One already perceives an image with about fifty levels of brightness as excellent. This corresponds to a resolution of about six bits per pixel.

Image representation and image data analysis

When creating machine vision applications, we must go far beyond image processing in signal theory relations, where the image is treated as a function of two variables, ie the position of the pixel in a two-dimensional coordinate system. Machine vision is more related to the field of artificial intelligence, where we use software to try to imitate some human abilities of image perception and analysis. The big obstacles are large volumes of data, their uncertainty and considerable complexity and low robustness of the algorithms used.

The main causes of most machine vision difficulties, ie in an attempt to “observe” the information hidden in the image data through computer-implemented algorithms, are, for example:

  • Loss of information during perspective projection of a three-dimensional scene into a two-dimensional image. This transformation is one-way, and it is no longer possible to reconstruct the shapes of three-dimensional objects in the scene from a two-dimensional image.
  • Also, the relationship between the shape of three-dimensional objects in a scene and the brightness of pixels in their two-dimensional projection is very complex and ambiguous. Models working with BRDF (Bidirectional Reflectance Distribution Functions) lighting equations are only an approximate approximation of reality.
  • The real image from the cameras is always more or less burdened by errors due to noise, electrical interference, color interpolation, linearity and saturation, optical defects and compression artifacts.
  • Images contain a large amount of data, most algorithms have to work with only a small part of the total image data, which reduces the quality of the “understanding” of the image. Click here for more info.

Machine vision tasks can be roughly divided into a sequence of four function blocks:

  • Preprocessing of all image data using filtering (ie transformation of the image function).
  • Finding image segments with characteristic features of the searched objects, eg edges, colors, etc.
  • Finding objects.
  • Finding relationships between objects.

Not every application needs to contain the steps of all these function blocks. However, the author of the application of machine vision must at least thoroughly understand the whole issue. He must always know what he is doing and what he wants to achieve with each step he takes when editing the image.