OCR image correction

Keywords: Python

OCR (Optical Character Recognition) refers to the process that the electronic equipment checks the characters on the paper and translates the shape into the computer characters by the character recognition method; the OCR (Optical Character Recognition) uses the optical method to convert the text in the paper document into the image file of black and white dot matrix, and converts the text in the image into the text format by the recognition software for the word processing software to enter One step editing technology.
Generally speaking, OCR is divided into two parts: segmentation and recognition. This paper will discuss the problem of segmentation.
Generally, the first step is to scan the user's incoming photos and extract the areas to be identified, as shown below to extract the files.


Specific steps:
(1) get file outline
(2) obtain the point coordinates of the four corners of the document
(3) perspective transformation

Import library

import numpy as np
import cv2
import matplotlib.pyplot as plt
import math Please enter the code

Get file outline

image = cv2.imread('Original photos.jpg')                                             #Read the original photo
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)                                 #Two valued
gray = cv2.GaussianBlur(gray, (5, 5), 0)                                      #Gauss filtering
kernel = np.ones((3,3),np.uint8)  
dilation = cv2.dilate(gray,kernel)                                            #expand
edged = cv2.Canny(dilation, 30, 120)                                          #Edge extraction
_, cnts, hierarchy = cv2.findContours(edged,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
cv2.drawContours(image,cnts,-1,(0,0,255),3)

Get the coordinates of the four corners of the file

cnts0=cnts[0]
cnts1=cnts[1]

rect = np.zeros((4,2), dtype="float32")

rect[0] = cnts1[np.argmin(np.sum(cnts1,axis=-1))]
rect[2] = cnts0[np.argmax(np.sum(cnts0,axis=-1))]
rect[1] = cnts1[np.argmin(np.diff(cnts1,axis=-1))]
rect[3] = cnts0[np.argmax(np.diff(cnts0,axis=-1))]

Order of four corners: top left, top right, bottom right, bottom left
Top left and bottom right coordinates and maximum
The upper right coordinate difference is the smallest, and the lower left coordinate difference is the largest (Y-X)

Calculate the size of the corrected image according to the coordinates of four corners

(tl,tr,br,bl) = rect
    
width1 = np.sqrt(((tr[0]-tl[0])**2)+((tr[1]-tl[1])**2))
width2 = np.sqrt(((br[0]-bl[0])**2)+((br[1]-bl[1])**2))
width = max(int(width1),int(width2))
    
height1 = np.sqrt(((tr[0]-br[0])**2)+((tr[1]-br[1])**2))
height2 = np.sqrt(((tl[0]-bl[0])**2)+((tl[1]-bl[1])**2))
height = max(int(height1),int(height2))
    
dst = np.array([
    [0, 0],
    [width - 1, 0],
    [width - 1, height - 1],
    [0, height - 1]], dtype = "float32")

Perspective transformation

M = cv2.getPerspectiveTransform(rect, dst)
warped = cv2.warpPerspective(image, M, (width, height))

Posted by nashsaint on Tue, 03 Dec 2019 12:09:47 -0800