An image is
- an array of numbers
- made up of a matrix of pixels
- each pixel having an intensity between
A JPEG image is
- a 3D matrix
- with channels for red, green, and blue (RGB) pixel intensities.
For each pixel in the matrix, the intensity values across these three channels combine to create a full spectrum of colours, and varying the pixel values in each layer, changes the overall visual effect.
Most widely used Image Processing & Manipulation Tools
- scikit image [Note:
- Pillow – Python Imaging Library (Fork)
Numpy array format is perfect for sharing image data across all tools and is used extensively like a common currency in the image processing field.
To transpose an image means to flip and then mirror the image.
Resizing images in different aspect ratios distorts like squashing or stretching effect. To prevent distortion, resize proportionally with respect to the longest dimension meeting the required size and add padding to the shorter edge.
Good level of contrast. CDF - Cumulative Distribution Function of pixel values in the range of
3 simple ways to improve contrast - stretching histogram – scaling pixel values lowest to highest – however, values doesn't change histogram shape – only contrast may improve by little – Contrast/ Histogram Stretching
Better technique – Normalize pixel values on a scale of 0-255 with frequent intensities spread out evenly – Flattening histogram producing diagonal CDF – relatively uniform distribution.
Filters are used to change pixel values (visual effects) in an image. E.g., blurring, sharpening, embossing. Applied using a matrix of values – Kernel that is overlaid on the original image with pixel values to change in centre.
This process of matrix calculation of a filter over the original image pixels values to obtain a new set of calculated values in the range
0-255 producing a filtered version of image = Convolution
IF convoluted value > 255 then set to 255. IF convoluted value < 0 then set to 0.
But, on the edges of the original image, convolution cannot be performed so to solve it:
- Retain original values of edges as it is or
- adding a border of neutral pixel values or
- extending the original image and etc.
Edge detection - uses filters to find sudden changes of pixel values indicating boundaries, shapes and objects.
- convert images to Grayscale, so we deal with only 1 channel of pixels.
- Apply specific filter like Sobel Filter – the idea of 3×3 filter to find gradients
- 2 stages of process – 2 kernels to find horizontal & vertical gradients
Then add squares of
x,y values for each pixel and take the square root to determine the length of the vector
G = √ Gx² + Gy²
Then calculate the inverse tangent of those values to determine their angle
Maths morphology – using maths to change/ morph images.
Dilation – start by creating a mask aka. structuring element.
Place this over the image just like a filter. Now performing logical OR operation on any cell matching a cell in the image. IF matched, then activate that target cell by setting it to
0 = black. And repeat in all.
Hence the main effect of dilation is to enlarge images by extending pixels around edges and fill in small holes.
Contrary to Dilation, Erosion removes pixels in the image. Similar to above, if we compare cells using AND operation setting target cell to
0 only when all cells match Else set it to
255. This results in erosion; removes a layer of cells at edges, removes round edges and fine details from the image.
The net effect of applying
first dilation and then erosion = closing and
first erosion and then dilation = opening
Thresholding – binarizing pixels – 2 types – Global & Adaptive.
For, e.g. with a threshold of
IF pixel values > 127 then set to 255 IF pixel values < 127 then set to 0
Or vice versa to obtain inverse thresholding. Adapting threshold applies to localize regions of thresholding.
To calculate threshold values for specific regions, use Otsu's binarization algorithm.
After thresholding, erosion and dilation help separate foreground and background.
But for cases where multiple objects overlapping in the image.
- Convert image to grayscale.
- Apply thresholding to binarize image and erosion then dilation to obtain a monochrome format.
- Then use Distance Transformation to change intensities of foreground pixels based on their proximity to the background.
- Then threshold these to separate darker pixels from lighter ones.
Now we have 3 kinds of pixels –
- foreground pixels
- background pixels
- unknown pixels around edges.
Mark both distinct known regions (foreground and background pixels) to different
int values and make unknown pixels set to
Apply Watershed segmentation to fill in marked foreground leaving boundary pixels.
Limitations of classical algorithms
Classical algorithms used in machine learning are straightforward and used heavily. But as the limitations of computer vision, these classical algorithms can be heavily influenced by the variations of pixel values due to colour contrast and other image features.
Moreover, they apply to the entire set of pixel values when only some of the prominently important pixel values describe the features of the image that we’re trying to classify.
It is also more influenced by the background pixel values that have little or no weight.