The use of python verification code recognition library pytesseract

Keywords: Python github yum pip

Author's environment centos7 Python 3

pytesseract is only an implementation interface of tesseract-ocr. So we need to install tesseract-ocr (the well-known open source OCR recognition engine).

 

Dependent installation

yum install-y automake autoconf libtool gcc gcc-c++
yum install-y libpng-devel libjpeg-devel libtiff-devel giflib-devel

 

Installing Dependent leptonica Library

wget http://www.leptonica.com/source/leptonica-1.72.tar.gz  
tar -xzvf leptonica-1.72.tar.gz  
cd leptonica-1.72
./configure
make && make install

 

Install tesseract-ocr

wget https://github.com/tesseract-ocr/tesseract/archive/3.04.00.tar.gz
mv 3.04.00  Tesseract3.04.00.tar.gz
tar -xvf Tesseract3.04.00.tar.gz  
cd tesseract-3.04.00/
./configure
make && make install

 

Install Language Pack:

wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata#English default package
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata#Chinese Traditional Style
wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_tra.traineddata#simplified Chinese

cp/mv *.traineddata /usr/local/share/tessdata/ #Mobile downloaded packages can also be manually moved to / usr/local/share/tessdata / under this path

 

Install pytesseract:

pip install Pillow 
pip install pytesseract

  

 

Up to now, the installation has been completed, with the following methods:

import pytesseract 
from PIL import Imag

image = Image.open("port_img.jpg")
text = pytesseract.image_to_string(image)
print(text)

 

 

Reference material:

https://www.cnblogs.com/dajianshi/p/4932882.html
https://stackoverflow.com/questions/33659458/tesseract-image-issue

Posted by slawrence10 on Wed, 06 Mar 2019 17:51:24 -0800