2021SC@SDUSC
in my previous blogs, I learned and introduced the models and algorithms used in the direction classifier of paddeocr. Today's blog will learn the real function and algorithm implementation framework of the direction classifier.
after the text detection of the picture, the possible text parts in the picture are marked with a rectangular box. At this time, the text may be inverted, and then the direction classifier processes the rectangular box to process the words with incorrect angle into the normal direction. The text rectangle processed by the direction classifier is extracted by text recognition. Therefore, the direction classifier is an indispensable part between text detection and text recognition. Only the rectangular box with the correct direction can be recognized.
text angle classification is mainly used in the scene where the picture is not 0 degrees. In this scene, it is necessary to convert the detected text line in the picture to positive. In the PaddleOCR system, the text line images obtained after text detection are sent to the recognition model after affine transformation. At this time, only one angle classification of 0 and 180 degrees is required. Therefore, the built-in text angle classifier in PaddleOCR only supports the classification of 0 and 180 degrees. If you want to support more angles, you can modify the algorithm to support it.
Examples of 0 and 180 degree data samples:
Program directory structure of direction classifier in PaddleOCR:
PaddleOCR ├── configs // Configuration file, you can select the model structure and modify the hyperparameters through the yml file │ ├── cls // Direction classifier related profiles │ │ ├── cls_mv3.yml // Training configuration related, including backbone network, head, loss, optimizer and data │ └── ... │ ... ├── deploy // Deployment related │ ├── android_demo // android_demo │ │ ... │ ├── cpp_infer // C++ infer │ │ ├── CMakeLists.txt // Cmake file │ │ ├── docs // Documentation │ │ │ └── windows_vs2019_build.md │ │ ├── include // Header file │ │ │ ├── clipper.h // clipper Library │ │ │ ├── config.h // Forecast configuration │ │ │ ├── ocr_cls.h // Direction classifier │ │ │ └── ... │ │ ├── readme.md // Documentation │ │ ├── ... │ │ ├── src // source file │ │ │ ├── clipper.cpp │ │ │ ├── config.cpp │ │ │ ├── main.cpp │ │ │ ├── ocr_cls.cpp │ │ │ └── ... │ │ └── tools // Compiling and executing scripts │ │ ├── build.sh // Compile script │ │ ├── config.txt // configuration file │ │ └── run.sh // Test startup script │ ├── docker │ │ └── hubserving │ │ ├── cpu │ │ │ └── Dockerfile │ │ ├── gpu │ │ │ └── Dockerfile │ │ ├── README_cn.md │ │ ├── README.md │ │ └── sample_request.txt │ ├── hubserving // hubserving │ │ ├── ocr_cls // Direction classifier │ │ │ ├── config.json // serving configuration │ │ │ ├── __init__.py │ │ │ ├── module.py // prediction model │ │ │ └── params.py // Prediction parameters │ │ │ ... │ │ └── ocr_system // System prediction │ │ ├── config.json │ │ ├── __init__.py │ │ ├── module.py │ │ └── params.py │ ├── imgs // Forecast picture │ │ ├── cpp_infer_pred_12.png │ │ └── demo.png │ ├── ios_demo // ios demo │ │ ... │ ├── lite // lite deployment │ │ ├── cls_process.cc // Direction classifier data processing │ │ ├── cls_process.h │ │ ├── config.txt // Detection configuration parameters │ │ ├── crnn_process.cc // crnn data processing │ │ ├── crnn_process.h │ │ ├── db_post_process.cc // db data processing │ │ ├── db_post_process.h │ │ ├── Makefile // Compiled file │ │ ├── ocr_db_crnn.cc // Series prediction │ │ ├── prepare.sh // Data preparation │ │ ├── readme.md // Documentation │ │ ... │ └── slim │ └── quantization // Quantitative correlation │ ├── export_model.py // Export model │ ├── quant.py // quantification │ └── README.md // Documentation ├── doc // Documentation tutorial │ ... ├── ppocr // Network core code │ ├── data // data processing │ │ ├── imaug // Picture and label processing code │ │ │ ├── text_image_aug // tia data extension for text recognition │ │ │ │ ├── __init__.py │ │ │ │ ├── augment.py // tia_distort,tia_stretch and tia_perspective code │ │ │ │ ├── warp_mls.py │ │ │ ├── __init__.py │ │ │ ├── east_process.py // Data processing steps of EAST algorithm │ │ │ ├── make_border_map.py // Generate boundary graph │ │ │ ├── make_shrink_map.py // Generate contraction graph │ │ │ ├── operators.py // Basic image operations such as reading and normalization │ │ │ ├── randaugment.py // Random data augmentation operation │ │ │ ├── random_crop_data.py // Random clipping │ │ │ ├── rec_img_aug.py // Data expansion of text recognition │ │ │ └── sast_process.py // Data processing steps of SAST algorithm │ │ ├── __init__.py // Construct dataloader related code │ │ ├── lmdb_dataset.py // Read the dataset of lmdb dataset │ │ ├── simple_dataset.py // Read the dataset that stores the dataset in text format │ ├── losses // loss function │ │ ├── __init__.py // Construct loss related codes │ │ ├── cls_loss.py // Direction classifier loss │ ├── metrics // Evaluation index │ │ ├── __init__.py // Construct metric related codes │ │ ├── cls_metric.py // Direction classifier metric │ ├── modeling // Networking related │ │ ├── architectures // network │ │ │ ├── __init__.py // Construct model related code │ │ │ ├── base_model.py // Networking code │ │ ├── backbones // Backbone network │ │ │ ├── __init__.py // Construct backbone related codes │ │ │ ├── det_mobilenet_v3.py // Detect mobilenet_v3 │ │ │ └── ... │ │ ├── necks // Neck function │ │ │ ├──... │ │ ├── heads // Head function │ │ │ ├── __init__.py // Construct head related code │ │ │ ├── cls_head.py // Direction classifier header │ │ │ ├── ... │ │ ├── transforms // Image transformation │ │ │ ├── __init__.py // Construct transform related code │ │ │ └── tps.py // TPS transformation │ ├── optimizer // optimizer │ │ ├── __init__.py // Construct optimizer related code │ │ └── learning_rate.py // Learning rate attenuation │ │ └── optimizer.py // optimizer │ │ └── regularizer.py // Network regularization │ ├── postprocess // Post processing │ │ ├── cls_postprocess.py // Direction classifier post-processing │ │ ├── db_postprocess.py // DB post-processing │ │ └── .... │ └── utils // tool │ ├── dict // Small language dictionary │ .... │ ├── ic15_dict.txt // English numeric dictionary, case sensitive │ ├── ppocr_keys_v1.txt // Chinese dictionary for training Chinese model │ ├── logging.py // logger │ ├── save_load.py // Model saving and loading functions │ ├── stats.py // Statistics │ └── utility.py // Tool function ├── tools │ ├── eval.py // Evaluation function │ ├── export_model.py // Export information model │ ├── infer // Prediction based on prediction engine │ │ ├── predict_cls.py │ │ └── ... │ ├── infer_cls.py // Prediction classification based on training engine │ ├── ... │ ├── program.py // Overall process │ ├── test_hubserving.py │ └── train.py // Start training ├── paddleocr.py ├── README_ch.md // Chinese description document ├── README_en.md // English description document ├── README.md // Home page description document ├── requirements.txt // Installation dependency ├── setup.py // whl package packaging script ├── train.sh // Start training script
important configuration parts include configuration files (configs/cls/cls_mv3.yml), deployment direction classifiers (deploy/cpp_infer/include/ocr_cls.h, deploy/cpp_infer/src/ocr_cls.cpp), Hub Serving for service deployment (deploy/hubserving/ocr_cls), lite deployment (deploy/lite/cls_process.cc and cls_process.h), and quantitative downsizing (deploy / slim / quantification). The network core code includes random data augmentation (ppocr / data / imaug / augmentation. Py), loss function (ppocr/losses/cls_loss.py), evaluation index (ppocr/metrics/cls_metric.py), networking correlation (detect mobilenet_v3: ppocr/modeling/backbones/det_mobilenet_v3.py, direction classifier classification header: ppocr/modeling/heads/cls_head.py), and post-processing (ppocr/postprocess/cls_postprocess.py). Tools include prediction based on prediction engine (tools/infer/predict_cls.py) and prediction classification based on training engine (tools/infer_cls.py).
I will carefully learn the core code implementation of direction classifier in my future blog.