OCR (Optical Character Recognition) refers to the process that the electronic equipment checks the characters on the paper and translates the shape into the computer characters by the character recognition method; the OCR (Optical Character Recognition) uses the optical method to convert the text in the paper document into the image file of black and white dot matrix, and converts the text in the image into the text format by the recognition software for the word processing software to enter One step editing technology.
Generally speaking, OCR is divided into two parts: segmentation and recognition. This paper will discuss the problem of segmentation.
Generally, the first step is to scan the user's incoming photos and extract the areas to be identified, as shown below to extract the files.
Specific steps:
(1) get file outline
(2) obtain the point coordinates of the four corners of the document
(3) perspective transformation
Import library
import numpy as np import cv2 import matplotlib.pyplot as plt import math Please enter the code
Get file outline
image = cv2.imread('Original photos.jpg') #Read the original photo gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) #Two valued gray = cv2.GaussianBlur(gray, (5, 5), 0) #Gauss filtering kernel = np.ones((3,3),np.uint8) dilation = cv2.dilate(gray,kernel) #expand edged = cv2.Canny(dilation, 30, 120) #Edge extraction _, cnts, hierarchy = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) cv2.drawContours(image,cnts,-1,(0,0,255),3)
Get the coordinates of the four corners of the file
cnts0=cnts[0] cnts1=cnts[1] rect = np.zeros((4,2), dtype="float32") rect[0] = cnts1[np.argmin(np.sum(cnts1,axis=-1))] rect[2] = cnts0[np.argmax(np.sum(cnts0,axis=-1))] rect[1] = cnts1[np.argmin(np.diff(cnts1,axis=-1))] rect[3] = cnts0[np.argmax(np.diff(cnts0,axis=-1))]
Order of four corners: top left, top right, bottom right, bottom left
Top left and bottom right coordinates and maximum
The upper right coordinate difference is the smallest, and the lower left coordinate difference is the largest (Y-X)
Calculate the size of the corrected image according to the coordinates of four corners
(tl,tr,br,bl) = rect width1 = np.sqrt(((tr[0]-tl[0])**2)+((tr[1]-tl[1])**2)) width2 = np.sqrt(((br[0]-bl[0])**2)+((br[1]-bl[1])**2)) width = max(int(width1),int(width2)) height1 = np.sqrt(((tr[0]-br[0])**2)+((tr[1]-br[1])**2)) height2 = np.sqrt(((tl[0]-bl[0])**2)+((tl[1]-bl[1])**2)) height = max(int(height1),int(height2)) dst = np.array([ [0, 0], [width - 1, 0], [width - 1, height - 1], [0, height - 1]], dtype = "float32")
Perspective transformation
M = cv2.getPerspectiveTransform(rect, dst) warped = cv2.warpPerspective(image, M, (width, height))