Quick Start Development for Abstract Resolution of Order Class Picture Recognition Results

Keywords: OpenCV Windows Linux Ubuntu

Summary: This paper mainly introduces an abstract process and scheme for row and column resolution of order class picture recognition results to help improve development efficiency.

1. Background

In the face of order data paper files or pictures, only eye recognition is inefficient. Machine learning is needed to identify and analyze pictures to improve efficiency.At present, there are already charged image recognition services on the market, including Ali, Baidu and so on, which have a good recognition effect. However, for order-type pictures, we should not only pay attention to the text on the pictures, but also pay attention to the columns where the text is located to separate each data and data detailed fields.

This paper mainly introduces an abstract process and scheme for row and column resolution of order class picture recognition results to help improve development efficiency.

Note: This article only provides ideas, not source code.In addition, this article does not introduce artificial intelligence picture recognition, interested students can query related information online.

2. Resolution Process

opencv is an excellent tool for image processing, so it is the preferred software for image processing in this paper.

  • In order to improve the image recognition rate, we need to do image correction first. Here, we use a simpler Hoff transformation and noise point removal algorithm to correct the picture.
  • After picture correction, the picture recognition service is invoked to get the result. The general result format includes response code, error description, text block list (text and four-point coordinates), and so on.
  • Then use the abstract Tetris method to get row and column information based on the recognition results.
  • Finally, each row of data is assembled and displayed based on row and column information.

3. Details Processing

3.1 opencv Installation Summary

opencv installation, this article will only do simple tips, do not expand the introduction, there will be time to post separately in the future.

1)windows

2)linux

  • ubuntu is recommended, and it is best to use a completely new system, as opencv relies on many packages, requires a high version, and can be cumbersome to resolve conflicts.
  • Download Source
  • Install Dependent Packages
  • Compile Installation

We call opencv using java, where we need to install the Get Developer Package, opencv_javaxxx.dll for windows, libopencv_javaxxx.so for linux, and jvm for program initialization.The detailed code is as follows:

System.load(PropertieUtil.getPropertie("Here is dll or so Full path");

3.2 Picture Correction

3.2.1 Correction Exploration

Picture correction is a hard way to explore, and at first we came up with a simpler solution:

  • Invoke the picture recognition service first to get the result.
  • Then the oblique angle of each block is determined from the four-corner coordinates of each block.
  • Then the average inclination angle is calculated according to the dedrying algorithm.

In theory, this scheme is feasible, but practice has proved us wrong, because the coordinate pictures returned by the picture recognition service are not accurate, and most pictures are wrong.

It was found that Hough transformation might solve this problem, so I began to try to learn Hough transformation and dedrying algorithm, and finally found that it was feasible, and abstracted the common method, which can be corrected by simply configuring some parameters.

Picture correction consists of two steps:

  • Step 1: Correction and correction, judging the tilt angle of the picture is 90 degrees, 180 degrees, 270 degrees, 0 degrees, which can not be judged by mathematical methods, and needs to refer to machine learning.
  • Step 2: Angle fine-tuning, generally to make sure the picture is positive, and the tilt angle is around +30 degrees.

It is important to note that the above method is not possible to fine-tune all pictures with a set of parameters, but online data proves that for one type of picture, a set of parameters can basically correct most pictures.

Summary of 3.2.2 Hoff Transformations

Hoff transformation is a classical spatial transformation algorithm in mathematics, which is used to detect straight lines. The Image tilt angle can be calculated by the slope of a large number of detected straight lines.It is better to binarize and edge detection before Hough transformation. Search for detailed algorithm content by yourself. This paper will not expand.

3.2.3 Noise Point Removal Algorithm

Basic formula:

Upper limit = Mean + n*Standard deviation

Lower limit = mean-n*standard deviation

The value of n is generally 1-4. The larger the value, the higher the screening rate.

Finally, the conforming data is averaged.

The core code is as follows:

/**
     * Screening by standard deviation
     * @param values
     * @return
     */
    private static double[] calcBestCornList(double[] values) {
        // Calculate standard deviation
        StandardDeviation variance = new StandardDeviation();
        double evaluate = variance.evaluate(values);
        Mean mean = new Mean();
        double meanValue = mean.evaluate(values);
        double biggerValue = meanValue + CHOOSE_POWER * evaluate;
        double smallerValue = meanValue - CHOOSE_POWER * evaluate;
        List<Double> selected = Lists.newArrayList();
        for (double value : values) {
            if (value >= smallerValue && value <= biggerValue) {
                selected.add(value);
            }
        }
        double[] selectedValue = new double[selected.size()];
        for (int i = 0; i < selected.size(); i++) {
            selectedValue[i] = selected.get(i);
        }
        logger.info("Proportion:{}%,Filtered angle array:{}", (selectedValue.length / (float)values.length) * 100F, selected);
        return selectedValue;
    }

3.2.4 Hoff Variable Abstract Encapsulation

Basic process:

  • Define relevant parameters
  • Read pictures
  • Gray scale binarization
  • Draw outlines using opencv
  • Draw Hoff transformation lines several times as required by the parameters until the number of lines meets the parameters
  • Traverse the lines drawn, separate the horizontal and vertical lines, and calculate the angle of each line based on the configuration
  • Calculate the average tilt angle using a noise reduction algorithm that requires automatic repetition of non-zero numbers
  • Rotate pictures using opencv

The core code is as follows:

/**
     * Correct pictures by Hoff transformation
     * @param oldImg Original Picture
     * @param rotateParam Rotation parameters
     * @return
     */
    public static String rotateHoughLines(File oldFile, String oldImg, RotateParam rotateParam, String cid, String bankCode) throws Exception {

        Mat src= Imgcodecs.imread(oldFile.getAbsolutePath());
        //Read image into matrix
        if(src.empty()){
            throw new Exception("no file " + oldFile.getAbsolutePath());
        }
        // Picture Matrix for Computing
        Mat mathImg = src.clone();
        // Grayscale
        Imgproc.cvtColor(src, mathImg, Imgproc.COLOR_BGR2GRAY);
        logger.info("Binarization complete");
        // Get Contour
        Imgproc.Canny(src, mathImg, rotateParam.getCvtThreshould1(), rotateParam.getCvtThreshould2());
        logger.info("Contour Completion");
        // Hoff transformation for angle, detailed code omitted
        double corn = houghLines(mathImg, rotateParam, cid);
        logger.info("Hoff transformation complete, angle:{}", corn);
        if(corn == 0) {
            return oldImg;
        }
        return rotateOpenv(oldFile, corn, cid, bankCode);
    }

3.3 Common picture recognition schemes

Ali and Baidu all provide image recognition services, if they have the strength, they can also do it themselves, but it is not recommended to research on their own, because the sample demand is huge and the time cost is too high.

3.4 Analysis of Recognition Results

3.4.1 Road to Exploration

This chapter is the focus of this article, because the above mentioned are more basic services and algorithms, a large amount of development content is in this chapter.Pre-order pictures to be developed are huge (more than 100 types), each type of picture is very different. We have several people classified development, but each person uses a different method, and Li Si developed by Zhang San can not understand, but after all, the picture is more abstract, which is understandable.

After a while, we found a problem: it takes up to a week for each type to complete, and the resolution success rate is extremely low.It is imminent to develop an abstract method to extract row and column data.

Through research and development, we find that there are two commonly used methods to extract row and column data, coordinate method and heading method, but the resolution rate of these two methods is not high.After several weeks of thinking, I finally came up with a better method named Tetris, which finally solved the problem.

3.4.2 Tetris

Outline of ideas:

  • Get the recognition result data.
  • Sort the y-coordinates of all the data first.
  • Traverse through the sorting results, putting the first item in the first column result set first.
  • Start with the second and compare with the first result set.
  • Contrast method: If one of the data in the first column result set is to the right, it is considered a new column; if there is overlap between the y-axis method and some data in the first column result set, it is considered a new column.
  • If neither of the above is true, the data is considered to be in the current column and placed in the first result set.
  • And so on, continue to compare until you reach the last data in the last column.
  • By following the above method, in turn, a row result set can be obtained based on the x-axis criterion.

The ideas are as follows:

The summary code is as follows:

// Sort by the top left x-coordinate
        OcrWordInfo[] sortL = NoTableParseResult.ParseUtil.bubbleSortX(ocrResponse.getPrism_wordsInfo(), false);
        NoTableParseResult ntpr = new NoTableParseResult(param);
        ntpr.setHeight(converImg.height());
        ntpr.setWight(converImg.width());
        for (int i = 0; i < sortL.length; i++) {
            // Current data to compare
            OcrWordInfo ocrWordInfo = sortL[i];
            // Processing current column data
            ntpr.getUtil().testCurColData(ocrWordInfo);
        }
        // Processing last column
        ntpr.lastCol();

        /**
         * Determine whether it is the next column and process
         * @param ocrWordInfo
         * @return
         */
        public void testCurColData(OcrWordInfo ocrWordInfo) {

            // Traverse all data that already exists in the current column
            int size = this.test.getCol().size();
            if(size == 0) {
                this.test.addCol(ocrWordInfo);
                return;
            }
            for (int i = 0; i < size; i++) {
                OcrWordInfo temp = this.test.getCol().get(i);
                // Right most data
                int x1 = temp.getPos().get(1).getX();
                int x2 = temp.getPos().get(2).getX();
                // Leftmost side of current data
                int xx0 = ocrWordInfo.getPos().get(0).getX();
                int xx3 = ocrWordInfo.getPos().get(3).getX();

                int threholdx = this.test.param == null ? 0 : this.test.param.getCoverColXThrehold();
                if(xx0 >= (x1 - threholdx) && xx0 >= (x2 - threholdx) && xx3 >= (x1 - threholdx) && xx3 >= (x2 - threholdx)) {
                    // The current data is on the right side, which means column change!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                    this.test.colAdd();
                    this.test.addCol(ocrWordInfo);
                    return;
                } else {
                    // Determine whether to override coordinates
                    int y0 = temp.getPos().get(0).getY();
                    int y3 = temp.getPos().get(3).getY();
                    int yy0 = ocrWordInfo.getPos().get(0).getY();
                    int yy3 = ocrWordInfo.getPos().get(3).getY();
                    int threhold = (int)Math.round((y3 - y0) * (this.test.param == null ? 0.25 : this.test.param.getCoverThrehold()));
                    if(!(yy3 <= (y0 + threhold) || yy0 >= (y3 - threhold))) {
                        // Current list data overlap, the explanation is changed!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
                        this.test.colAdd();
                        this.test.addCol(ocrWordInfo);
                        return;
                    }
                }
            }
            // Execute this note without overwriting
            this.test.addCol(ocrWordInfo);
        }

3.4.3 Techniques for parsing row data

Skill Summary:

1) The Tetris method provides parameters for removing interference items, which can reduce the phenomenon of serial columns by removing left, right and right interference data according to the picture characteristics.

2) There are roughly two ways to parse data

  • This method is not universal and has a high recognition rate of simple and standard pictures, but it can not adapt to the confused pictures.
  • Put each row of data together in a space symbol and use a regular expression to'buckle'the data.Because the position of the key field is characteristic, such as amount format, direction of loan, date, etc. for the general type of order pictures, this method is universal, but the recognition rate is not high.

Which method to use depends on the characteristics of the picture.

3) The Tetris method provides some fine-tuning parameters for adapting to special scenarios, such as row break thresholds.

4) Some process pictures need to be saved in the middle, such as several diagrams of the correction process, the connection diagrams of the Tetris recognition results, etc.After all, this type of project is useless when looking for problems by logging, and these intermediate diagrams are needed to find problems more quickly.

4. Summary

The solution mentioned in this article can not completely solve all the order type picture parsing problems, it can make the novice quick start and rapid development, if you have better ideas, welcome to exchange.

Author: Liu Pengfei

Source: Yixin Technical College

Posted by renojim on Mon, 11 Nov 2019 19:54:26 -0800