OCR recognition based on Tesseract

Keywords: Programming osd OpenCV iOS brew

Catalog

  • Demand background

  • Tesseract introduction and environment construction

  • Font training

  • Tesseract for iOS

  • summary

Demand background

Due to the limitation of the client core, most of the ID card identification in the market will be put in the server verification. The client is generally only responsible for grabbing the pictures and sending them to the server for identification. In this way, if the quality of the ID card image captured by the client can not be guaranteed, it will be difficult for the server to identify it, which will slow down the process of ID card identification, resulting in a bad user experience. (for example, as soon as the camera is opened, the process of my project starts to grab and upload the pictures to the server for identification. No matter whether the captured pictures are ID card pictures or not, the client waits for the server to return the results. If they are not resolved, the client continues to grab and upload the pictures until they are identified).

In view of the quality of the ID card image captured by the client, the client should first train the character library necessary for the ID card (in Chinese and English Library 21.9M and Chinese Library 52.7M provided by Tesseract, if used directly, the APP package size will be greatly increased). Then, the trained font is integrated into the Tesseract framework to identify and verify the gender, date of birth, ID number, effective date, portrait and other areas. After verification, the high-quality pictures are uploaded to the server for identification.

In this way, while greatly reducing the interaction between the client and the server, uploading high-quality pictures to the server for identification can increase the success rate of identification, reduce the identification time and improve the user experience.

Tesseract introduction and environment construction

brief introduction

Tesseract's OCR engine was first developed by HP laboratory in 1985, and has become one of the three most accurate identification engines in the OCR industry by 1995. However, HP soon decided to give up OCR business, and Tesseract has been in dust ever since. A few years later, HP realized that instead of putting Tesseract on the shelf, it would be better to contribute to the open source software industry and give it a new life. In 2005, Tesseract was obtained by the Information Technology Research Institute of Nevada, and Google was entrusted to improve and optimize it.

Tesseract has been published in Google Project as an open source project. It can read images of various formats and convert them into text in more than 60 languages by combining it with Leptonica image processing library. We can also continuously train our own library to enhance the ability of image to convert text. If the team needs depth, it can also be used as a template to develop an OCR engine that meets its own needs.

Environment building

install

1. Install Homebrew

Open Command Line Tool and enter the following command directly:


ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"

2. Install Tesseract

Open Command Line Tool, and select the corresponding command to install according to your needs:


// Install the training tool while installing the testeract

brew install --with-training-tools tesseract

// All languages are installed at the same time of installing testeract. The language pack is relatively large. If it is installed for a long time, it is recommended not to install it. Select it on demand

brew install --all-languages tesseract

// Install testeract and training tools and languages

brew install --all-languages --with-training-tools tesseract

// Install only testeract, no training tools

brew install tesseract

Download language library

According to your own needs, you can go to Here Select the required language library. For example, the simplified Chinese library we selected is: Chi sim.traineddata. Copy the downloaded file to the directory of / usr / local / cell / testeract / 3.05.01 (testeract version number) / share / testsdata.

Use

Use the following command for image and text recognition:

// eng font is used by default, imageName is the absolute path of the image, and result is the recognition result

tesseract imageName result

// Specify simplified Chinese

tesseract -l chi_sim imageName result

// Specify multiple languages, connected with + sign

tesseract -l chi_sim+eng imageName result

// View locally existing language libraries

tesseract --list-langs

There is a place to pay special attention to, parameter psm

//Enter the command to view the parameters of the psm

tesseract --help-psm

  0    Orientation and script detection (OSD) only.

  1    Automatic page segmentation with OSD.

  2    Automatic page segmentation, but no OSD, or OCR.

  3    Fully automatic page segmentation, but no OSD. (Default)

  4    Assume a single column of text of variable sizes.

  5    Assume a single uniform block of vertically aligned text.

  6    Assume a single uniform block of text.

  7    Treat the image as a single text line.

  8    Treat the image as a single word.

  9    Treat the image as a single word in a circle.

10    Treat the image as a single character.

Translation:

0 directed script monitoring (OSD)

1 use OSD for automatic paging

2 automatic paging, but without OSD or OCR (Optical Character Recognition)

3. Automatic paging, but OSD is not used (default)

4 assume a variable size text column.

5 assume a single uniform block of vertically aligned text.

6 suppose a unified text block.

7 treat the image as a single line of text.

8 treat images as a single word.

9 think of the image as a single word in a circle.

10 treat image as a single character

It is very important to choose different values of psm according to the situation. If the inappropriate values are selected, the recognition will fail.

For example, the following figure should be assumed as a unified text block:

 
num.png

Use command:


tesseract num.png result -l chi_sim

//Print:

Tesseract Open Source OCR Engine v3.05.01 with Leptonica

Empty page!!

Empty page!!

Use command:

tesseract num.png result -l chi_sim -psm 6

Open the result.txt file and successfully identify:

1234

1234

Font training

Install jTessBoxEditor

After climbing over the wall Here Download jTessBoxEditorFX-2.0-Beta.zip and unzip it to get the jTessBoxEditorFX folder. Since this is developed by Java, we should make sure to install JRE (Java Runtime Environment) before running jTessBoxEditor. Because there are many installation tutorials for JRE, we won't introduce it too much here.

Get sample file

Because the font of ID card is relatively fixed, so it does not need to do too many samples for training. For example, the number and X of the ID card number are in bold, and the font such as gender, birthday and expiry date are in square bold simplified. So we only need to input the corresponding font text on the ID card on the word, and then use the cutting tool to cut out the text, so that the sample can be obtained. Here is a sample picture of my ID card text:

Bold numbers:

 
Bold number.jpg

Blackbody X:

 
image

Chinese bold simplified Chinese characters:

 
image

Chinese bold simplified number:

 
image

[note]: the sample image file format must be tif/tiff

Merge sample file

Enter the jTessBoxEditor directory, execute the command java -Xms128m -Xmx1024m -jar jTessBoxEditorFX.jar on the terminal, and the following operation interface will appear:

 
image

Tools - > merge tiff, select all the sample files, save the merged file as font.tif, enter the directory where the file is located, and execute the command


tesseract font.tif font batch.nochop makebox

The generated file name is font.box file

Character correction

After generating the font.box file, you can use jTessBoxEditor to correct the characters. Select box editor > open to open font.tif, and the following operation interface will appear:

 
image

At the beginning, the recognition of numbers and letters is more accurate, but the recognized Chinese should be garbled. Select the text and input the correct text in the Character column to correct and save the characters. In this way, you can get a more accurate Character set (PS: if you want to recognize other fonts and characters, you need to get a large number of samples for training and correction. Because the font of the ID card is relatively fixed, you only need to recognize fixed characters, and the font library created is about 165KB).

Execute batch file

Generate character characteristics

Execution instruction


tesseract font.tif font nobatch box.train

Generate a. tr file that contains the characteristics of each character of the training page.

Calculated character set

Execution instruction


unicharset_extractor font.box

Generate a unicharset data file that contains the character set that testeract needs to know about possible output.

Font attribute

Execution instruction


echo 'font 0 0 0 0 0' > font_properties

You can redirect the font form information provided to the font ﹣ properties text file, and use the - F filename option to specify mftraining

When mftraining is running, each. tr file name must have the relevant entry in the font < properties file, otherwise mftraining will be aborted.

polymerization

When character features of all training pages are extracted, we need to aggregate them to create prototype files. These character shape features can be focused by using shapeclustering, mftraining, and cntraining programs.


// shapeclustering creates the main shape table through shape aggregation and writes it to a file: shapetable

shapeclustering -F font_properties -U unicharset font.tr

// mftraining will output two other data files: inttemp (shape prototype) and pffmtable (desired feature of each character)

mftraining -F font_properties -U unicharset -O font.unicharset font.tr

// Output normproto data file

cntraining font.tr

Generated font library

Rename normproto, inttemp, pffmtable and shapetable files after aggregation to font.normproto, font.inttemp, font.pffmtable and font.shapetable

Execute the following command to get the traineddata file


combine_tessdata font.

The final file directory should look like the following figure:

 
image

The generated font.traineddata is the font library we need.

Tesseract for iOS

Through the previous introduction, we know that the Tesseract framework is to recognize the text on the picture according to the font library we provide, and then convert it into the form of text output, and we also created our own font library. But often a picture does not have only one text block, there may be multiple text blocks. For example, ID card has multiple blocks such as ID number, gender, nationality, date of birth, name, etc. How to intercept them into blocks, and then provide each block to Tesseract framework for identification?

At present, two methods have been found to separate these blocks:

image processing techniques

Image processing technology is to use OpenCV library to gray, binary, corrosion, contour detection and so on.

1. Grayscale processing: image grayscale processing is to calculate the grayscale value of each pixel point of the specified image through a certain algorithm, so that the image only contains brightness but not color information.

 
image

2. Binarization: binarization is to convert the gray-scale image into an image containing only black and white colors. There is no other gray-scale change between them. In binary graph, 255 is white and 0 is black.

 
image

3. Corrosion: the corrosion of the picture is to enlarge the black block in the binary image. That is, the elements connecting the adjacent black pixels in the picture. Through corrosion, the ID number on the ID card can be connected together to form a rectangular area.

 
image

4. Contour detection: after the image is corroded, the adjacent points will be connected together to form a large area. At this time, each large area can be found through the corridor detection, so that it can be located in the area with the number on the ID card.

 
image

Code handling:


// Convert UIImage to Mat

cv::Mat resultImage;

UIImageToMat(image, resultImage);

// Convert to grayscale

cvtColor(resultImage, resultImage, cv::COLOR_BGR2GRAY);

// Using threshold binarization

cv::threshold(resultImage, resultImage, 100, 255, CV_THRESH_BINARY);

// Corrosion, filling (corrosion is to make black spots bigger)

cv::Mat erodeElement = getStructuringElement(cv::MORPH_RECT, cv::Size(26,26));

cv::erode(resultImage, resultImage, erodeElement);

// Detection of wheel tracks

std::vector<std::vector<cv::Point>> contours; // Define a container to store all detected corridors

cv::findContours(resultImage, contours, CV_RETR_TREE, CV_CHAIN_APPROX_SIMPLE, cvPoint(0, 0));


// Take out ID number area

std::vector<cv::Rect> rects;

cv::Rect numberRect = cv::Rect(0,0,0,0);

std::vector<std::vector<cv::Point>>::const_iterator itContours = contours.begin();

for ( ; itContours != contours.end(); ++itContours) {

    cv::Rect rect = cv::boundingRect(*itContours);

    rects.push_back(rect);

    // Algorithm principle

    if (rect.width > numberRect.width && rect.width > rect.height * 5) {

        numberRect = rect;

    }

}

The advantage of this method is that it does not need the user's fixed-point tracing frame, as long as the user puts the ID card in the camera, it can be processed automatically, and the code is relatively simple, which can greatly improve the user experience. The disadvantage is that the introduction of OpenCV library will increase the size of 4M packets.

Coordinate calculation processing

The principle of coordinate calculation is to give an ID card area on the mobile phone so that the user can put the ID card into the area, and obtain the specified area by calculating the coordinates (such as the effect of my Demo):

 
Positive
 
The other side

The advantage of this approach is that it does not need to introduce the OpenCV library, which leads to another increase in package size. The disadvantage is also obvious, that is, the user needs to aim at the designated area to trace the frame at a fixed point, or the error area obtained after coordinate calculation cannot be recognized by the Tesseract frame, and the user experience is not very good.

To sum up, the selection of implementation methods can be analyzed according to the specific situation of our own project. As our project is a financial APP, the introduction of TesseractOCRiOS framework has increased the package size by 5.1M. If OpenCV is introduced again, it is unacceptable to increase the package size by 9M for an OCR optimization function. If it is similar to the type of APP of Meitu XiuXiu, there are more places to use image processing, so it is more suitable to introduce OpenCV library. Therefore, the following will focus on the coordinate calculation processing.

Implementation steps

Importing Tesseract's iOS Library

Here, the third-party library is introduced through CocoaPods:


pod 'TesseractOCRiOS'

Import font library

Put the created word library into the testsdata folder and drag it into the project. Special attention should be paid here, because the TesseractOCRiOS library does not support path passing when looking for the word library, and the path it looks for is the word library in the testsdata folder under the main Bundle path. Therefore, you must use the Create folder references option to create the testsdata folder under the main Bundle to get the word library.

 
Attribute selection

Create OcrDetectView

Real time display camera imaging, and provide the capture image API for external calls, which is the source of photo data source.

Layout ID area box

Here, horizontal and vertical screen switching is supported through gravity sensing. Special attention should be paid here when the ID card area box switches between horizontal and vertical screens, the captured text area also needs to be synchronized for coordinate conversion, which will be described in detail later.

Intercept pictures

Regularly capture pictures every certain time (1 second in Demo). Because the pictures obtained are pictures of the whole mobile screen, it is necessary to process pictures according to coordinates and horizontal and vertical screens.

Coordinate calculation

Because the ID card area box is defined to let the user put the ID card into the area for identification, these coordinates can be obtained.

For example, to obtain ID card area:


// Get the screen zoom ratio (I did it on the 6s model, when the set width was 347.0 and the screen width was 375.0, so the screen zoom ratio was 347.0 / 375.0)

CGFloat scale  = 347.0 / 375.0;

// Get ID area image

- (UIImage *)fetchIDCardImage:(UIImage *)image isLandscape:(BOOL)isLandscape

{

    CGSize  size = [UIScreen mainScreen].bounds.size;

    CGFloat screenWidth  = size.width;

    CGFloat screenHeight = size.height;

// Picture actual width and height

    CGFloat width  = screenWidth  * scale;

    CGFloat height = screenHeight * scale;

// Zoom ratio of image to screen

    float px = image.size.width / screenWidth;

    float py = image.size.height / screenHeight;

    // Calculate x,y,w,h according to horizontal and vertical screens

    float x, y, w, h;

    if (isLandscape)

    {

    // As the horizontal screen is actually rotated 90 ° based on the center of the ID card area frame of the vertical screen

    // So the x under the horizontal screen is actually half the height of the ID card area frame plus the distance from the top of the screen in the vertical screen state

    // Reduce the width of the screen by half

        x = height / 2.0 + idcardBoxTopOffset - width / 2.0;

        // In the same way, y in the horizontal screen state is actually half of the screen width minus the height of ID card area

        y = (screenWidth - height) / 2.0;

        w = width;

        h = height;

        image = [UIImage imageWithCGImage:image.CGImage

                                    scale:image.scale

                              orientation:UIImageOrientationUp];

    }

    else

    {

        x = (screenWidth - width) / 2.0;

        y = idcardBoxTopOffset;

        w = width;

        h = height;

    }

// ID card area

    CGRect cutFrame = CGRectMake(x*py, y*px, w*py, h*px);

// Obtain the corresponding image according to the incoming ID card area

    UIImage *croppedImage = [JKOcrDetectUtils croppedImage:image inRect:cutFrame];

    croppedImage = [JKOcrDetectUtils adjustImageOrientation:croppedImage];

    return croppedImage;

}

Get ID number area:


- (UIImage *)fetchIDCardNoImage:(UIImage *)image isLandscape:(BOOL)isLandscape

{

    CGSize  size = [UIScreen mainScreen].bounds.size;

    CGFloat screenWidth  = size.width;

    CGFloat screenHeight = size.height;

    CGFloat width  = screenWidth  * self.widthScale;

    CGFloat height = screenHeight * self.heightScale;

    float px = image.size.width / screenWidth;

    float py = image.size.height / screenHeight;

    float x, y, w, h;

    if (isLandscape)

    {

        x = height / 2.0 + idcardBoxTopOffset - width / 2.0 + idcardNoOffsetX;

        y = (screenWidth - height) / 2.0 + idcardNoOffsetY;

        w = idcardNoWidth;

        h = idcardNoHeight;

        image = [UIImage imageWithCGImage:image.CGImage

                                    scale:image.scale

                              orientation:UIImageOrientationUp];

    }

    else

    {

        x = (screenWidth - width) / 2.0 + idcardNoOffsetX;

        y = idcardBoxTopOffset + idcardNoOffsetY;

        w = idcardNoWidth;

        h = idcardNoHeight;

    }

    CGRect cutFrame = CGRectMake(x*py, y*px, w*py, h*px);

    UIImage *croppedImage = [JKOcrDetectUtils croppedImage:image inRect:cutFrame];

    croppedImage = [JKOcrDetectUtils adjustImageOrientation:croppedImage];

    return croppedImage;

}

Distinguish

Through the coordinate calculation step, we can get the recognizable text block image samples. Here I get five blocks, namely gender, birth date, ID number, effective date and portrait.

As in the previous Mac recognition steps, initialize the word library - > set parameters such as psm - > pass in the picture to be recognized - > get the recognized text - > verify the text.

ID number code:


- (void)recognizeImageWithTesseract:(UIImage *)image mode:(DetectMode)mode completionBlock:(void(^)(BOOL isRecognized, NSString *recognizedText))completionBlock

{

    // Create the 'g8recognition operation' object to perform OCR recognition and initialize the word library asynchronously

    G8RecognitionOperation *operation = [[G8RecognitionOperation alloc] initWithLanguage:@"font"];

    // Setting the psm parameters

    operation.tesseract.pageSegmentationMode = G8PageSegmentationModeSingleBlock;

    // Set the maximum identification time

    operation.tesseract.maximumRecognitionTime = 1.0;

    // Set recognition picture

    operation.tesseract.image = image;

    __weak JKOcrService *wself = self;

    operation.recognitionCompleteBlock = ^(G8Tesseract *tesseract) {

        // Recognized text

        NSString *recognizedText = tesseract.recognizedText;

        __strong JKOcrService *sself = wself;

// Check text

        if ([JKOcrDetectUtils accurateVerifyIDCardNumber:recognizedText])

        {

            // Identify successful callback

            if (completionBlock) completionBlock(YES, recognizedText);

        }

        else

        {

// Identify failed callbacks

            if (completionBlock) completionBlock(NO, @"");

        }

    };

    // Add queue

    [self.operationQueue addOperation:operation];

}

Verification ID code:


+ (BOOL)accurateVerifyIDCardNumber:(NSString *)value

{

    value = [value stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];

    int length = 0;

    if (!value)

    {

        return NO;

    }

    else

    {

        length = (int)value.length;

        if (length !=15 && length !=18)

        {

            return NO;

        }

    }

    // Province code

    NSArray *areasArray = @[ @"11", @"12", @"13", @"14", @"15", @"21", @"22", @"23", @"31", @"32", @"33", @"34", @"35", @"36", @"37", @"41", @"42", @"43", @"44", @"45", @"46", @"50", @"51", @"52", @"53", @"54", @"61", @"62", @"63", @"64", @"65", @"71", @"81", @"82", @"91"];

    NSString *valueStart2 = [value substringToIndex:2];

    BOOL areaFlag = NO;

    for (NSString *areaCode in areasArray)

    {

        if ([areaCode isEqualToString:valueStart2])

        {

            areaFlag = YES;

            break;

        }

    }

    if (!areaFlag)

    {

        return false;

    }

    NSRegularExpression *regularExpression;

    NSUInteger numberofMatch;

    int year = 0;

    switch (length)

    {

        case 15:

            year = [value substringWithRange:NSMakeRange(6,2)].intValue +1900;

            if (year %4 == 0 || (year % 100 == 0 && year % 4 ==0))

            {

                regularExpression = [[NSRegularExpression alloc] initWithPattern:@"^[1-9][0-9]{5}[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))[0-9]{3}$"

                                                                        options:NSRegularExpressionCaseInsensitive

                                                                          error:nil];//Test validity of date of birth

            }

            else

            {

                regularExpression = [[NSRegularExpression alloc]initWithPattern:@"^[1-9][0-9]{5}[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))[0-9]{3}$"

                                                                        options:NSRegularExpressionCaseInsensitive

                                                                          error:nil];//Test validity of date of birth

            }

            numberofMatch = [regularExpression numberOfMatchesInString:value

                                                              options:NSMatchingReportProgress

                                                                range:NSMakeRange(0, value.length)];

            if (numberofMatch > 0)

            {

                return YES;

            }

            else

            {

                return NO;

            }

        case 18:

            year = [value substringWithRange:NSMakeRange(6,4)].intValue;

            if (year % 4 ==0 || (year % 100 ==0 && year % 4 ==0))

            {

                regularExpression = [[NSRegularExpression alloc] initWithPattern:@"^[1-9][0-9]{5}19[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|[1-2][0-9]))[0-9]{3}[0-9Xx]$"

                                                                        options:NSRegularExpressionCaseInsensitive

                                                                          error:nil];//Test validity of date of birth

            }

            else

            {

                regularExpression = [[NSRegularExpression alloc] initWithPattern:@"^[1-9][0-9]{5}19[0-9]{2}((01|03|05|07|08|10|12)(0[1-9]|[1-2][0-9]|3[0-1])|(04|06|09|11)(0[1-9]|[1-2][0-9]|30)|02(0[1-9]|1[0-9]|2[0-8]))[0-9]{3}[0-9Xx]$"

                                                                        options:NSRegularExpressionCaseInsensitive

                                                                          error:nil];//Test validity of date of birth

            }

            numberofMatch = [regularExpression numberOfMatchesInString:value

                                                              options:NSMatchingReportProgress

                                                                range:NSMakeRange(0, value.length)];

            if(numberofMatch >0)

            {

                int S = ([value substringWithRange:NSMakeRange(0,1)].intValue + [value substringWithRange:NSMakeRange(10,1)].intValue) *7 + ([value substringWithRange:NSMakeRange(1,1)].intValue + [value substringWithRange:NSMakeRange(11,1)].intValue) *9 + ([value substringWithRange:NSMakeRange(2,1)].intValue + [value substringWithRange:NSMakeRange(12,1)].intValue) *10 + ([value substringWithRange:NSMakeRange(3,1)].intValue + [value substringWithRange:NSMakeRange(13,1)].intValue) *5 + ([value substringWithRange:NSMakeRange(4,1)].intValue + [value substringWithRange:NSMakeRange(14,1)].intValue) *8 + ([value substringWithRange:NSMakeRange(5,1)].intValue + [value substringWithRange:NSMakeRange(15,1)].intValue) *4 + ([value substringWithRange:NSMakeRange(6,1)].intValue + [value substringWithRange:NSMakeRange(16,1)].intValue) *2 + [value substringWithRange:NSMakeRange(7,1)].intValue *1 + [value substringWithRange:NSMakeRange(8,1)].intValue *6 + [value substringWithRange:NSMakeRange(9,1)].intValue *3;

                int Y = S % 11;

                NSString *M = @"F";

                NSString *JYM = @"10X98765432";

                M = [JYM substringWithRange:NSMakeRange(Y,1)];// Judge check bit

                if ([M isEqualToString:[value substringWithRange:NSMakeRange(17,1)]])

                {

                    return YES;// Check bit of detection ID

                }

                else

                {

                    return NO;

                }

            }

            else

            {

                return NO;

            }

        default:

            return NO;

    }

}

The identification steps of gender, date of birth and effective date are similar, so we will not list them one by one. Then we can use dispatch group to identify separately and get a result set, then we can verify it uniformly. If all the identifications are passed, it means that this is a high-quality ID card picture, then we can upload this ID card picture to the server for identification.


__block BOOL isIDCardRecognized = NO;

__block BOOL isGenderRecognized = NO;

__block BOOL isBirthdayRecognized = NO;

__block BOOL isFaceRecognized = NO;

__weak JKOcrService *wself = self;

[self recognizeImageWithTesseract:idcardNoImage

                            mode:DetectModeIDCard

                  completionBlock:^(BOOL isRecognized, NSString *recognizedText) {

                      __strong JKOcrService *sself = wself;

                      if (isRecognized == YES)

                      {

                          isIDCardRecognized = YES;

                          sself.idcardNo = recognizedText;

                      }

                  }];

[self recognizeImageWithTesseract:genderImage

                            mode:DetectModeGender

                  completionBlock:^(BOOL isRecognized, NSString *recognizedText) {

                      __strong JKOcrService *sself = wself;

                      if (isRecognized == YES)

                      {

                          isGenderRecognized = YES;

                          sself.sex = recognizedText;

                      }

                  }];

[self recognizeImageWithTesseract:birthdayImage

                            mode:DetectModeBirthday

                  completionBlock:^(BOOL isRecognized, NSString *recognizedText) {

                      __strong JKOcrService *sself = wself;

                      if (isRecognized == YES)

                      {

                          isBirthdayRecognized = YES;

                          sself.birthday = recognizedText;

                      }

                  }];

[self accurateVerifyFace:faceImage

        completionBlock:^(BOOL isRecognized) {

            isFaceRecognized = isRecognized;

        }];

dispatch_group_wait(self.group, dispatch_time(DISPATCH_TIME_NOW, 1 * NSEC_PER_SEC));

if (isGenderRecognized && isIDCardRecognized && isBirthdayRecognized && isFaceRecognized)

{

    [self.operationQueue cancelAllOperations];

    dispatch_async(dispatch_get_main_queue(), ^{

        if (successHandler)

        {

        // High quality ID card pictures

            UIImage *idcardImage = [self fetchIDCardImage:image isLandscape:isLandscape];

// TODO: recognition successful

        }

    });

}

else

{

    // TODO: recognition failed

}

According to my statistics, the recognition interval of each sample is about 20ms, that is to say, as long as the samples are OK, the recognition time one by one is less than 100ms, which is more than enough compared with the set time interval of 1s, but asynchronous is also OK.

summary

Through the previous introduction, we learned what Tesseract is, what business scenarios it can be used for, and how to generate a word library through sample training. It also introduces how to use Tesseract to generate text for image recognition on Mac OS X, and introduces Tesseract ocrios, a library based on iOS, so that it can serve the mobile terminal.

However, the APP packaged with the Tesseract library will increase the package size by 5.1M. If the system font library is added, the package size will increase significantly. Although the self training sample generation font can solve this problem, but only for the purpose of ID card process optimization, such results are often difficult to accept.

In this case, whether Tesseract can play its best role, for example, bank card automatic identification also uses it for process optimization, etc.



By Thomson__
Link: https://www.jianshu.com/p/f0f05ff2dc2e
Source: Jianshu
The copyright belongs to the author. For commercial reprint, please contact the author for authorization. For non-commercial reprint, please indicate the source.

Posted by tomdelonge on Wed, 05 Feb 2020 04:01:10 -0800