[reprint handling] [opencv Python] Part IV image processing in opencv

Keywords: Python OpenCV

[opencv Python] image processing part IV in opencv (VI)
Part IV
Image processing in OpenCV
Opencv Python Chinese tutorial (handling) directory
Original text http://www.linuxidc.com/Linux/2015-08/121400.html It is now invalid. This series is used for archiving, self-learning and will never be used for commercial purposes.

23 image transformation

23.1 Fourier transform

In this section, we will learn:
• Fourier transform the image using OpenCV
• use FFT (fast Fourier transform) function in Numpy
• some uses of Fourier transform
• the functions we will learn are: cv2.dft(), cv2.idft(), etc
Fourier transform is often used to analyze the frequency characteristics of different filters. We can use 2D discrete Fourier transform (DFT) to analyze the frequency domain characteristics of images. A fast algorithm to realize DFT is called fast Fourier transform (FFT). Detailed knowledge of Fourier transform can be found in any book on image processing or signal processing. See the more resources section in this section.
For a sinusoidal signal: x(t) = Asin(2 π ft), its frequency is F. if we turn this signal to its frequency domain representation, we will see a peak in frequency f. If our signal is composed of discrete signals generated by sampling, we will get a similar spectrum, but it is continuous in front and discrete now. You can think of an image as a signal taken in two directions. Therefore, by performing Fourier transform in X direction and Y direction at the same time, we will get the frequency domain representation (spectrum diagram) of the image.
More intuitively, for a sinusoidal signal, if its amplitude changes very fast, we can say that it is a high-frequency signal. If it changes very slowly, we call it a low-frequency signal. You can apply this idea to the image, where the amplitude of the image changes very much? Boundary points or noise. Therefore, we say that the boundary and noise are the high-frequency components in the image (note that the high-frequency here means that they change very fast, rather than appear many times). If there is no such large amplitude change, we call it low-frequency component.
Now let's look at how to perform the Fourier transform.

23.1.1 Fourier transform in numpy

First, let's look at how to use Numpy for Fourier transform. The FFT package in Numpy can help us realize fast Fourier transform. The function NP. FFT. FFT 2 () can perform frequency conversion on the signal, and the output result is a complex array. The first parameter of this function is the input image, which is required to be in gray format. The second parameter is optional and determines the size of the output array. The size of the output array is the same as that of the input image. If the output result is larger than the input image, the input image needs to be supplemented with 0 before FFT. If the output result is smaller than the input image, the input image will be cut.

Now we get the result that the frequency 0 part (DC component) is in the upper left corner of the output image. If we want it (DC component) to be in the center of the output image, we also need to translate the result in two directions
N/2. The function np.fft.fftshift() can help us do this. (this makes it easier to analyze).
After the frequency transformation, we can construct the amplitude spectrum.

import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
f = np.fft.fft2(img)
fshift = np.fft.fftshift(f)
# The formula for constructing the amplitude diagram here has not been learned
magnitude_spectrum = 20*np.log(np.abs(fshift))
plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])

The results are as follows:

Magnitude Spectrum
We can see that the central part of the output result is whiter (brighter), which indicates that there are more low-frequency components.
Now that we can perform frequency domain transformation, we can perform some operations on the image in the frequency domain, such as high pass filtering and reconstructed image (inverse transformation of DFT). For example, we can use a 60x60 rectangular window to mask the image to remove the low-frequency components. Then use the function np.fft.ifftshift() to perform the inverse translation operation, so now the DC component returns to the upper left corner, and use the function np.ifft2() to perform the inverse FFT transformation. Similarly, we get a pile of complex numbers, and we can take absolute values for them:

rows, cols = img.shape
crow,ccol = rows/2 , cols/2
fshift[crow-30:crow+30, ccol-30:ccol+30] = 0
f_ishift = np.fft.ifftshift(fshift)
img_back = np.fft.ifft2(f_ishift)
img_back = np.abs(img_back)

plt.subplot(131),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(132),plt.imshow(img_back, cmap = 'gray')
plt.title('Image after HPF'), plt.xticks([]), plt.yticks([])
plt.title('Result in JET'), plt.xticks([]), plt.yticks([])


The results are as follows:

High Pass Filtering

The results above show that high pass filtering is actually a boundary detection operation. This is what we saw in the previous chapter on image gradients. At the same time, we also find that most of the data in the image are concentrated in the low-frequency region of the spectrum. Now that we know how to use Numpy for DFT and IDFT, let's take a look at how to use OpenCV for these operations.
If you observe carefully, especially the JET color image in the last chapter, you will see some unnatural things (such as the area I marked with red arrows). Look at the picture above. There are some striped structures, which is called ringing effect. This is because we use a rectangular window as a mask. This problem occurs when the mask is converted to a sinusoidal shape. Therefore, we generally do not apply rectangular window filtering. The best choice is Gaussian window.

23.1.2 Fourier transform in opencv

The corresponding functions in OpenCV are cv2.dft() and cv2.idft(). The result is the same as the previous output, but it is dual channel. The first channel is the real part of the result, and the second channel is the imaginary part of the result. The input image should first be converted to np.float32 format. Let's see how to operate.

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('messi5.jpg',0)

dft = cv2.dft(np.float32(img),flags = cv2.DFT_COMPLEX_OUTPUT)
dft_shift = np.fft.fftshift(dft)

magnitude_spectrum = 20*np.log(cv2.magnitude(dft_shift[:,:,0],dft_shift[:,:,1]))

plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(magnitude_spectrum, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])

Note: you can use the function cv2.cartToPolar(), which returns both amplitude and phase.
Now let's do the inverse DFT. In the previous part, we implemented an HPF (high pass filtering). Now let's do LPF (low pass filtering) to remove the high-frequency part. In fact, it is to blur the image. First, we need to build a mask. The place corresponding to the low-frequency region is set to 1 and the place corresponding to the high-frequency region is set to 0.

rows, cols = img.shape
crow,ccol = rows/2 , cols/2

# create a mask first, center square is 1, remaining all zeros
mask = np.zeros((rows,cols,2),np.uint8)
mask[crow-30:crow+30, ccol-30:ccol+30] = 1

# apply mask and inverse DFT
fshift = dft_shift*mask
f_ishift = np.fft.ifftshift(fshift)
img_back = cv2.idft(f_ishift)
img_back = cv2.magnitude(img_back[:,:,0],img_back[:,:,1])

plt.subplot(121),plt.imshow(img, cmap = 'gray')
plt.title('Input Image'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img_back, cmap = 'gray')
plt.title('Magnitude Spectrum'), plt.xticks([]), plt.yticks([])

The results are as follows:

Magnitude Spectrum

Note: the functions cv2.dft() and cv2.idft() in OpenCV are faster than Numpy. However, the Numpy function is more user-friendly. For a description of performance, see the following sections.

23.1.3 performance optimization of DFT

DFT performs better when the size of the array is some value. DFT is most efficient when the size of the array is an exponent of 2. When the size of the array is a multiple of 2, 3 and 5, the efficiency will also be high. So if you want to improve the efficiency of the code, you can modify the size of the input image (fill 0). For OpenCV, you must manually fill in 0. But Numpy, you only need to specify the size of the FFT operation, and it will automatically complement 0.
How do we determine the optimal size? OpenCV provides a function: cv2.getOptimalDFTSize().
It can be used by both cv2.dft() and np.fft.fft2(). Let's use IPython's magic command% timeit to test it.

In [16]: img = cv2.imread('messi5.jpg',0)
In [17]: rows,cols = img.shape
In [18]: print rows,cols
342 548

In [19]: nrows = cv2.getOptimalDFTSize(rows)
In [20]: ncols = cv2.getOptimalDFTSize(cols)
In [21]: print nrows, ncols
360 576

You see, the size of the array has changed from (342548) to (360576). Now let's make up 0 for it and see if the performance has improved. You can create a large 0 array and copy our data, or use the function CV2. Copymakecoder().

nimg = np.zeros((nrows,ncols))
nimg[:rows,:cols] = img


right = ncols - cols
bottom = nrows - rows
#just to avoid line breakup in PDF file
bordertype = cv2.BORDER_CONSTANT
nimg = cv2.copyMakeBorder(img,0,bottom,0,right,bordertype, value = 0)

Now let's look at Numpy's performance:

In [22]: %timeit fft1 = np.fft.fft2(img)
10 loops, best of 3: 40.9 ms per loop
In [23]: %timeit fft2 = np.fft.fft2(img,[nrows,ncols])
100 loops, best of 3: 10.4 ms per loop

Four times faster. Let's look at the performance of OpenCV:

In [24]: %timeit dft1= cv2.dft(np.float32(img),flags=cv2.DFT_COMPLEX_OUTPUT)
100 loops, best of 3: 13.5 ms per loop
In [27]: %timeit dft2= cv2.dft(np.float32(nimg),flags=cv2.DFT_COMPLEX_OUTPUT)
100 loops, best of 3: 3.11 ms per loop

It is also increased by 4 times. At the same time, we will also find that the speed of OpenCV is 3 times that of Numpy.
You can also test the performance of inverse FFT.

23.1.4 why is Laplacian a high pass filter?

I encountered a similar problem in the forum. Why is Laplacian a high pass filter?
Why is Sobel HPF? wait. The answer to the first question is given in the form of Fourier transform. Let's Fourier transform different operators and analyze them:

import cv2
import numpy as np
from matplotlib import pyplot as plt
# simple averaging filter without scaling parameter
mean_filter = np.ones((3,3))
# creating a guassian filter
x = cv2.getGaussianKernel(5,10)
#x.T is matrix transpose
gaussian = x*x.T
# different edge detecting filters
# scharr in x-direction
scharr = np.array([[-3, 0, 3],
[-3, 0, 3]])
# sobel in x direction
sobel_x= np.array([[-1, 0, 1],
[-2, 0, 2],
[-1, 0, 1]])
# sobel in y direction
sobel_y= np.array([[-1,-2,-1],
[0, 0, 0],
[1, 2, 1]])
# laplacian
laplacian=np.array([[0, 1, 0],
[1,-4, 1],
[0, 1, 0]])
filters = [mean_filter, gaussian, laplacian, sobel_x, sobel_y, scharr]
filter_name = ['mean_filter', 'gaussian','laplacian', 'sobel_x', \
'sobel_y', 'scharr_x']
fft_filters = [np.fft.fft2(x) for x in filters]
fft_shift = [np.fft.fftshift(y) for y in fft_filters]
mag_spectrum = [np.log(np.abs(z)+1) for z in fft_shift]
for i in xrange(6):
plt.subplot(2,3,i+1),plt.imshow(mag_spectrum[i],cmap = 'gray')
plt.title(filter_name[i]), plt.xticks([]), plt.yticks([])


Frequency Spectrum of different Kernels
From the image, we can see that each operator is allowed to pass through those signals. From this information, I
We can know that those are HPF and that is LPF.
More resources

  1. An Intuitive Explanation of Fourier Theoryby Steven Lehar
  2. Fourier Transformat HIPR
  3. What does frequency domain denote in case of images?

24 template matching

In this section we will learn:
1. Use template matching to find targets in an image
2. Function: cv2.matchTemplate(), cv2.minMaxLoc()
Template matching is a method used to search and find the position of template image in a large picture. OpenCV provides us with a function: cv2.matchTemplate(). Like 2D convolution, it also uses the template image to slide on the input image (large image), and compares the template image with the sub region of its corresponding input image at each position. OpenCV provides several different comparison methods (see the documentation for details). The returned result is a grayscale image, and each pixel value represents the matching degree between this area and the template.
If the size of the input image is (WXH) and the size of the template is (WXH), the size of the output result is (W-w+1, H-h+1). When you get this picture, you can use the function cv2.minMaxLoc() to find the location of the minimum and maximum values. The first value is the point (position) in the upper left corner of the rectangle, and (w, h) is the width and height of the moban template rectangle. This rectangle is the template area found.
Note: if the comparison method you use is cv2.TM_SQDIFF, the position corresponding to the minimum value is the matching area.

24.1 template matching in opencv

We have an example here: we search Messi's face in Messi's photos. So we want to make the following template:

Template Image
We will try different comparison methods so that we can compare their effects.

Copy code

import cv2
import numpy as np
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg',0)
img2 = img.copy()
template = cv2.imread('messi_face.jpg',0)
w, h = template.shape[::-1]
# All the 6 methods for comparison in a list
methods = ['cv2.TM_CCOEFF', 'cv2.TM_CCOEFF_NORMED', 'cv2.TM_CCORR',
for meth in methods:
img = img2.copy()
#The exec statement is used to execute Python statements stored in strings or files.
# For example, we can generate a string containing Python code at run time, and then execute these statements using exec statements.
#The eval statement evaluates a valid Python expression stored in a string
method = eval(meth)
# Apply template Matching
res = cv2.matchTemplate(img,template,method)
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(res)
# Using different comparison methods, the results are interpreted differently
# If the method is TM_SQDIFF or TM_SQDIFF_NORMED, take minimum
if method in [cv2.TM_SQDIFF, cv2.TM_SQDIFF_NORMED]:
top_left = min_loc
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
cv2.rectangle(img,top_left, bottom_right, 255, 2)
plt.subplot(121),plt.imshow(res,cmap = 'gray')
plt.title('Matching Result'), plt.xticks([]), plt.yticks([])
plt.subplot(122),plt.imshow(img,cmap = 'gray')
plt.title('Detected Point'), plt.xticks([]), plt.yticks([])

Copy code
The results are as follows:
    Template Image
    Template Image
    Template Image
    Template Image
    Template Image
    Template Image
We see cv2.tm_ The effect of ccorr is not as good as we think.

24.2 template matching of multiple objects

In the previous part, we searched for Messi's face in the picture, and Messi only appeared once in the picture. What if your target object only appears many times in the image? The function cv.imMaxLoc() only gives the maximum and minimum values. At this point, we will use the threshold.
In the following example, we want to find the coins in a screenshot of the classic game Mario.

import cv2
import numpy as np
from matplotlib import pyplot as plt

img_rgb = cv2.imread('mario.png')
img_gray = cv2.cvtColor(img_rgb, cv2.COLOR_BGR2GRAY)
template = cv2.imread('mario_coin.png',0)
w, h = template.shape[::-1]

res = cv2.matchTemplate(img_gray,template,cv2.TM_CCOEFF_NORMED)
threshold = 0.8
loc = np.where( res >= threshold)
for pt in zip(*loc[::-1]):
    cv2.rectangle(img_rgb, pt, (pt[0] + w, pt[1] + h), (0,0,255), 2)


    Template Matching

25 Hough line transformation

• understand the concept of Hough transform
• learn how to detect straight lines in a picture
• learning function: cv2.HoughLines(), cv2.HoughLinesP()
Hough transform is very popular in the technology of detecting various shapes. If the shape you want to detect can be written in a mathematical expression, you can use Hough transform to detect it. If the shape to be detected has a little damage or distortion, it can also be used. Let's see how to use Hough transform to detect straight lines. A straight line can be expressed mathematically y = m x + c y = mx + c y=mx+c or ρ = x c o s θ + y s i n θ ρ = xcosθ + y sinθ ρ= xcos θ+ ysin θ express.
ρ ρ ρ Is the vertical distance from the origin to the line, θ θ θ Is the angle between the vertical line and the horizontal axis of the straight line in the clockwise direction (if you use different coordinate systems, the directions may also be different. I describe it according to the coordinate system used by OpenCV). As shown in the figure below:

So if a line passes below the origin, ρ ρ ρ The value of should be greater than 0 and the angle less than 180. However, if you pass above the origin, the angle is not greater than 180, but also less than 180 ρ ρ ρ The value of is less than 0. The vertical line angle is 0 degrees and the horizontal line angle is 90 degrees. Let's see how the Hough transform works. Every straight line can be used ( ρ , θ ) (ρ,θ) ( ρ,θ) express.
So first create a 2D array (accumulator) and initialize the accumulator. All values are 0. Lines represent ρ ρ ρ, Column representation θ θ θ. The size of this array determines the accuracy of the final result. If you want the angle to be accurate to 1 degree, you need 180 columns. For ρ ρ ρ, The maximum value is the diagonal distance of the image. Therefore, if the accuracy is to reach the level of one pixel, the number of rows should be equal to the diagonal distance of the image.
Imagine that we have a line with a size of 100x100 in the center of the image. Take the first point on the line, and we know the (x, y) value here. Bring X and Y into the upper equations, and then traverse θ θ θ Values of: 0, 1, 2, 3,..., 180. Calculate the corresponding values respectively ρ ρ ρ So we get a series( ρ,θ) If the value pair also has a corresponding position in the accumulator, add 1 to this position. So now (50, 90) = 1 in the accumulator. (a point may exist in multiple straight lines, so for each point on the straight line, multiple values in the accumulator may be added by 1 at the same time).
Now take the second point on the line. Repeat the above process. Update the value in the accumulator. The value of (50,90) in the accumulator is now 2. All you do every time is update the value in the accumulator. Perform the above operation for each point on the line. After each operation, the value in the accumulator will be increased by 1, but sometimes it will be increased by 1 in other places, and sometimes it will not. In this way, the value of (50,90) in the final accumulator must be the largest. If you search the maximum value in the accumulator and find its position (50,90), it means that there is a straight line in the image. The distance from the straight line to the origin is 50, and the included angle between its vertical line and the horizontal axis is 90 degrees. The following animation well demonstrates this process (image courtesy: amostorkey).

Hough Transform Demo
This is how the Hough line transform works. It's simple. Maybe you can handle it yourself with Numpy. The following figure shows an accumulator. The two brightest points represent the parameters of the two lines in the image. (Image courtesy: Wikipedia).

Hough Transform accumulator

25.1 Hough transform in opencv

The whole process described above is encapsulated into a function in OpenCV: cv2.HoughLines().
The return value is ( ρ , θ ) (ρ,θ) (ρ,θ). ρ ρ ρ In pixels, θ θ θ The unit of is radians. The first parameter of this function is a binarized image, so binarization or Canny edge detection should be carried out before Hough transform. The second and third values represent ρ ρ ρ and θ θ θ Accuracy. The fourth parameter is the threshold. Only when the accumulated value is higher than the threshold, it is considered as a straight line, or it can be regarded as the shortest length of the detected straight line (in pixels).

import cv2
import numpy as np

img = cv2.imread('dave.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)

lines = cv2.HoughLines(edges,1,np.pi/180,200)
for rho,theta in lines[0]:
    a = np.cos(theta)
    b = np.sin(theta)
    x0 = a*rho
    y0 = b*rho
    x1 = int(x0 + 1000*(-b))
    y1 = int(y0 + 1000*(a))
    x2 = int(x0 - 1000*(-b))
    y2 = int(y0 - 1000*(a))



The results are as follows:

Hough Transform Line Detection

25.2 Probabilistic Hough Transform

From the above process, we can find that only a straight line needs two parameters, which requires a lot of calculation. Probabilistic_Hough_Transform is an optimization of Hough transform. It will not calculate every point, but randomly select a point set from an image (can you also use the image pyramid?) for calculation, which is enough for line detection. But using this transformation, we must reduce the threshold (the total number of points is less, and the threshold must be small!).
The following figure is a comparison of the two methods. (Image Courtesy : Franck Bettinger’s homepage)

Hough Transform and Probabilistic Hough Transform
The Progressive Probabilistic Hough Transform proposed by Matas, J., Galambos, C. and Kittler, J.V. used in OpenCV. This function is cv2.HoughLinesP().
It has two parameters.
• minLineLength - the shortest length of the line. Any line shorter than this will be ignored.
• MaxLineGap - the maximum interval between two line segments. If it is less than this value, the two lines are regarded as one line.
More awesome, starting point and end point of a line are the return values of this function. In the previous example, we only get the parameters of the lines, and you have to find all the lines. And here everything is direct and simple.

import cv2
import numpy as np

img = cv2.imread('dave.jpg')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
edges = cv2.Canny(gray,50,150,apertureSize = 3)
minLineLength = 100
maxLineGap = 10
lines = cv2.HoughLinesP(edges,1,np.pi/180,100,minLineLength,maxLineGap)
for x1,y1,x2,y2 in lines[0]:


The results are as follows:
    Probabilistic Hough Transform

26 Hough ring transformation

• learn to use Hough transform to find circles (rings) in the image.
• learning function: cv2.HoughCircles().
The mathematical expression of a circle is ( x − x c e n t e r ) 2 + ( y − y c e n t e r ) 2 = r 2 (x − x_{center} ) ^2 +(y − y_{center} ) ^2 = r^ 2 (x − xcenter) 2+(y − ycenter) 2=r2, where ( x c e n t e r , y c e n t e r ) ( x_{center} ,y_{center} ) (xcenter, ycenter) is the coordinate of the center of the circle, and r is the diameter of the circle. From this equation, we can see that a ring needs three parameters to determine. Therefore, the accumulator for circular Hough transform must be three-dimensional, so the efficiency will be very low. So OpenCV is used as a more ingenious method, Hoff gradient method, which can use the gradient information of the boundary.
The function we want to use is cv2.HoughCircles(). Its parameters are explained in detail in the document. Let's just look at the code here.

import cv2
import numpy as np
img = cv2.imread('opencv_logo.png',0)
img = cv2.medianBlur(img,5)
cimg = cv2.cvtColor(img,cv2.COLOR_GRAY2BGR)
circles = cv2.HoughCircles(img,cv2.HOUGH_GRADIENT,1,20,
circles = np.uint16(np.around(circles))
for i in circles[0,:]:
# draw the outer circle
# draw the center of the circle
cv2.imshow('detected circles',cimg)
#Python: cv2.HoughCircles(image, method, dp, minDist, circles, param1, param2, minRadius, maxRadius)
#image – 8-bit, single-channel, grayscale input image.
# The returned result is output vector of found circuits. Each vector is encoded as a
#3-element floating-point vector (x, y, radius) .
#circle_storage – In C function this is a memory storage that will contain
#the output sequence of found circles.
#method – Detection method to use. Currently, the only implemented method is
#CV_HOUGH_GRADIENT , which is basically 21HT , described in [Yuen90].
#dp – Inverse ratio of the accumulator resolution to the image resolution.
#For example, if dp=1 , the accumulator has the same resolution as the input image.
#If dp=2 , the accumulator has half as big width and height.
#minDist – Minimum distance between the centers of the detected circles.
#If the parameter is too small, multiple neighbor circles may be falsely
#detected in addition to a true one. If it is too large, some circles may be missed.
#param1 – First method-specific parameter. In case of CV_HOUGH_GRADIENT ,
#it is the higher threshold of the two passed to the Canny() edge detector
# (the lower one is twice smaller).
#param2 – Second method-specific parameter. In case of CV_HOUGH_GRADIENT ,
# it is the accumulator threshold for the circle centers at the detection stage.
#The smaller it is, the more false circles may be detected. Circles,
# corresponding to the larger accumulator values, will be returned first.
#minRadius – Minimum circle radius.
#maxRadius – Maximum circle radius.

The results are as follows:

Hough Circles

More resources

27 watershed algorithm image segmentation

In this section we will learn
• mask based image segmentation using watershed algorithm
• function: cv2.watershed()
Any gray image can be regarded as a topological plane, the area with high gray value can be regarded as a mountain peak, and the area with low gray value can be regarded as a valley. We pour different colors of water into each valley. As the water level rises, the water in different valleys will meet and converge. In order to prevent the water in different valleys from converging, we need to build a dam where the water converges. Keep pouring water and building dams until all the peaks are flooded. The dam we built is the segmentation of the image. This is the philosophy behind the watershed algorithm. You can deepen your understanding by visiting the website CMM web page on watershed.
However, this method usually results in over segmentation, which is caused by noise or other irregular factors in the image. In order to reduce this impact, OpenCV adopts a mask based watershed algorithm. In this algorithm, we need to set those valley points that will converge and those that will not. This is an interactive image segmentation. All we have to do is label our known objects differently. If an area must be a foreground or object, it is marked with a color (or gray value) label. If an area is definitely not an object but a background, it is marked with another color label. The remaining areas that are not sure whether they are foreground or background are marked with 0. This is our label. Then the watershed algorithm is implemented. With each irrigation, our label will be updated. When two labels of different colors meet, the dam will be built until all peaks are submerged. Finally, the value of the boundary object (dam) we get is - 1.

27.1 code

In the following example, we will segment the objects next to each other with distance transformation and watershed algorithm.
As shown in the figure below, these coins are next to each other. Even if you use threshold operations, they are still next to each other.


Let's start by finding an approximate estimate of the coin. We can use Otsu's binarization.

import numpy as np
import cv2
from matplotlib import pyplot as plt

img = cv2.imread('coins.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
ret, thresh = cv2.threshold(gray,0,255,cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)


Now we want to remove all white noise in the image. This requires the use of the open operation in morphology.
In order to remove small holes in objects, we need to use morphological closure operation. So we now know that the area near the center of the object must be the foreground, and the area away from the center of the object must be the background. The uncertain area is the boundary between coins.
So we're going to extract the area that must be a coin. Etching can remove edge pixels. The rest is definitely coins. This operation is effective when there is no contact between coins. But because the coins are in contact with each other, we have another better choice: distance transformation and appropriate threshold. Next, we need to find an area that must not be a coin. This is the need for expansion operation. Inflation extends the boundary of an object into the background. In this way, since the boundary areas are processed, we can know that those areas must be the foreground and those areas must be the background. As shown in the figure below.

Foreground and Background
The rest of the area is where we don't know how to distinguish. This is what the watershed algorithm needs to do.
These areas are usually the junction of foreground and background (or the junction of two foreground). We call it boundary. The boundary region is obtained by subtracting the region that must be the foreground from the region that must be the background.

# noise removal
kernel = np.ones((3,3),np.uint8)
opening = cv2.morphologyEx(thresh,cv2.MORPH_OPEN,kernel, iterations = 2)
# sure background area
sure_bg = cv2.dilate(opening,kernel,iterations=3)
# Finding sure foreground area
# The basic meaning of distance transformation is to calculate the distance from non-zero pixels to the nearest zero pixels in an image, that is, the shortest distance to zero pixels
# The most common distance transformation algorithm is realized by continuous etching operation. The stop condition of etching operation is that all foreground pixels are completely destroyed
# Corrosion. In this way, according to the order of corrosion, we can get each foreground pixel to the foreground center?? Pixel
# Distance. According to the distance value of each pixel, it is set to different gray values. In this way, the distance transformation of binary image is completed
#cv2.distanceTransform(src, distanceType, maskSize)
# The second parameter 0,1,2 represents CV respectively_ DIST_ L1, CV_ DIST_ L2 , CV_ DIST_ C
dist_transform = cv2.distanceTransform(opening,1,5)
ret, sure_fg = cv2.threshold(dist_transform,0.7*dist_transform.max(),255,0)
# Finding unknown region
sure_fg = np.uint8(sure_fg)
unknown = cv2.subtract(sure_bg,sure_fg)

As the result shows, in the thresholded image, we get the area that must be coins, and the coins are also divided. (in some cases, you may only need to segment the foreground without separating the adjacent objects. At this time, it is not necessary to use distance transformation, and corrosion is enough. Of course, corrosion can also be used to extract the area that must be the foreground.)

Distance Transform
Now I know that those are the background and those are coins. Then we can create a label (an array with the same size as the original image and data type of in32) and mark the area in it. Use different positive integer markers for the areas we have determined to classify (whether foreground or background), and use 0 markers for the areas we are not sure. We can use the function cv2.connectedComponents() to do this. It marks the background as 0, and other objects are marked with a positive integer starting from 1.
However, we know that if the background is marked as 0, the watershed algorithm will treat it as an unknown area. So we want to mark them with different integers. The uncertain area (unknown area defined by unknown in the output of function cv2.connectedComponents) is marked as 0.

# Marker labelling
ret, markers1 = cv2.connectedComponents(sure_fg)
# Add one to all labels so that sure background is not 0, but 1
markers = markers1+1
# Now, mark the region of unknown with zero
markers[unknown==255] = 0
 Result use JET Color map representation. The dark blue area is unknown. It must be the area of the coin marked with different colors. The rest of the area is the background marked with light blue.
Now the label is ready. To the last step: implement the watershed algorithm. The label image will be modified and the mark of the boundary area will become -1.
markers3 = cv2.watershed(img,markers)
img[markers3 == -1] = [255,0,0]

Copy code
The results are as follows. The boundary of some coins is well divided, and the boundary between some coins is not well divided.
    Marker Image

Now our marker is ready. It is time for final step, apply watershed. Then marker image will be modified. The boundary region will be marked with -1.

markers = cv2.watershed(img,markers)
img[markers == -1] = [255,0,0]

See the result below. For some coins, the region where they touch are segmented properly and for some, they are not.



  1. OpenCV's own example has an interactive watershed segmentation program: watershed.py.
    Play by yourself.

28 interactive foreground extraction using GrabCut algorithm

In this section, we will learn:
• the principle of GrabCut algorithm, which is used to extract the foreground of the image
• create an interactive program to complete foreground extraction

GrabCut algorithm is developed by Carsten of Microsoft Cambridge Research Institute_ Rother,Vladimir_Kolmogorov and Andrew_Blake put forward it in the article "GrabCut": interactive foreground extraction using iterated graph cuts ". This algorithm needs little human-computer interaction in the process of extracting foreground, and the result is very good.
How does it work from the user's point of view? At first, the user needs to frame the foreground area with a rectangle (the foreground area should be completely included in the rectangle). Then the algorithm performs iterative segmentation to achieve the best result. However, sometimes the segmentation result is not ideal, such as taking the foreground as the background, or taking the background as the foreground. In this case, you need to modify it. The user only needs to draw a pen (click the mouse) in the unsatisfactory part. A stroke is like telling the computer: "Hey, man, you've reversed here. Remember to change it in the next iteration!". Then, in the next iteration, you will get a better result.
As shown in the figure below. The player and the football are surrounded by a blue rectangle. There are several corrections I made. The white brush indicates that this is the foreground and the black brush indicates that this is the background. Finally, I got a good result.

GrabCut in Action
What happened in the whole process?
• the user enters a rectangle. All areas outside the rectangle must be the background (as we mentioned earlier, all objects should be contained in the rectangle). What's inside the rectangle is unknown. Similarly, any operation by the user to determine the foreground and background will not be changed by the program.
• the computer will make an initialization mark on our input image. It marks foreground and background pixels.
• use a Gaussian mixture model (GMM) to model the foreground and background.
• based on our input, GMM learns and creates a new pixel distribution. For pixels whose classification is unknown (either foreground or background), they can be classified according to their relationship with pixels with known classification (such as background) (just like clustering).
• this creates a graph based on the distribution of pixels. The nodes in the figure are pixels. In addition to pixel nodes, there are two nodes: Source_node and Sink_node. All foreground pixels are associated with Source_node is connected. All background pixels are the same as Sink_node is connected.
• connect pixels to source_ node/end_ The weight of nodes (edges) is determined by the probability that they belong to the same class (both foreground and background). The weight between two pixels is determined by the edge information or the similarity of two pixels. If the colors of two pixels are very different, the weight of the edges between them will be very small.
• use mincut algorithm to segment the graph obtained above. It divides the graph into sources according to the lowest cost equation_ Node and Sink_node. The cost equation is the sum of the weights of all the cut edges. After clipping, all are connected to the source_ The pixels of node are considered as foreground, and all are connected to sink_ The pixels of node are considered as the background.
• continue this process until the classification converges.
The following figure illustrates this process (image course: http://www.cs.ru.ac.za/research/g02m1682/ ):

Simplified Diagram of GrabCut Algorithm

28.1 demonstration

Now let's enter the grabcut algorithm in OpenCV. Opencv provides the following functions:

cv2.grabCut(). Let's first look at its parameters:
• img - input image
• mask mask image to determine which areas are background, foreground, possibly foreground / background, etc. Can be set to: cv2.GC_BGD,cv2.GC_FGD,cv2.GC_PR_BGD,cv2.GC_PR_FGD, or enter 0,1,2,3 directly.
• rect - rectangle containing foreground in the format (x,y,w,h)
• bdgModel, fgdModel - an array used internally in the algorithm. You only need to create two arrays with size (1,65) and data type np.float64.
• iterCount - number of iterations of the algorithm
• mode can be set to cv2.GC_INIT_WITH_RECT or cv2.GC_INIT_WITH_MASK,

It can also be used in combination. This is used to determine how we modify, rectangular mode or mask mode.
First, let's look at using rectangular mode. Load images, create mask images, and build bdgmodels and fgdmodels. Pass in the rectangle parameter. It's all so direct. Let the algorithm iterate 5 times. Since we are using rectangular mode, we modify the mode to cv2.GC_INIT_WITH_RECT. Run grabcut. The algorithm will modify the mask image. In the new mask image, all pixels are divided into four categories:
Background, foreground, background / foreground may be marked with 4 different labels (mentioned in the previous parameters).
Then we modify the mask image. All 0 pixels and 1 pixels are classified as 0 (e.g. background), and all 1 pixels and 3 pixels are classified as 1 (foreground). Our final mask image is ready. The segmented image is obtained by multiplying it with the input image.

import numpy as np
import cv2
from matplotlib import pyplot as plt
img = cv2.imread('messi5.jpg')
mask = np.zeros(img.shape[:2],np.uint8)
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (50,50,450,290)
# The return values of the function are updated mask, bgdmodel and fgdmodel
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]

The results are as follows:

Segmentation in rect mode
Oh, we lost Messi's hair! Let's help him find his hair. So we have to draw a stroke there (set the pixel to 1, it must be the foreground). At the same time, there are some grasslands we don't need. We need to remove them, and then we draw a stroke in this area (set the pixel to 0, it must be the background). The mask image can now be modified as mentioned earlier.
Actually, how did I do it? We use image editing software to open the input image, add a layer, and use brush tools to draw white where necessary (such as hair, shoes, balls, etc.); Use a black brush to paint where you don't need it (for example, logo, grass, etc.). Then fill the rest with gray and save it as a new mask image. Import the mask image in OpenCV and edit the original mask image according to the new mask image. The code is as follows:

# newmask is the mask image I manually labelled
newmask = cv2.imread('newmask.png',0)

# whereever it is marked white (sure foreground), change mask=1
# whereever it is marked black (sure background), change mask=0
mask[newmask == 0] = 0
mask[newmask == 255] = 1

mask, bgdModel, fgdModel = cv2.grabCut(img,mask,None,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_MASK)

mask = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask[:,:,np.newaxis]

The results are as follows:

Segmentation in mask mode
this is it. You can also enter masked image mode without rectangle initialization. The pixels of the rectangular area are marked with 2 pixels and 3 pixels (possibly background / foreground). Then, as in our second example, the pixel that must be the foreground is marked as 1 pixel. The grabCut function is then used directly in mask image mode.


  1. OpenCV comes with an interactive tool grabcut.py that uses the grabcut algorithm. Play it yourself.
  2. You can create an interactive sampling program, which can draw rectangles with sliding bars (adjust the thickness of the brush), etc.

Posted by filn on Wed, 24 Nov 2021 01:10:31 -0800