Author's environment centos7 Python 3
pytesseract is only an implementation interface of tesseract-ocr. So we need to install tesseract-ocr (the well-known open source OCR recognition engine).
Dependent installation
yum install-y automake autoconf libtool gcc gcc-c++ yum install-y libpng-devel libjpeg-devel libtiff-devel giflib-devel
Installing Dependent leptonica Library
wget http://www.leptonica.com/source/leptonica-1.72.tar.gz tar -xzvf leptonica-1.72.tar.gz cd leptonica-1.72 ./configure make && make install
Install tesseract-ocr
wget https://github.com/tesseract-ocr/tesseract/archive/3.04.00.tar.gz mv 3.04.00 Tesseract3.04.00.tar.gz tar -xvf Tesseract3.04.00.tar.gz cd tesseract-3.04.00/ ./configure make && make install
Install Language Pack:
wget https://github.com/tesseract-ocr/tessdata/raw/master/eng.traineddata#English default package wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_sim.traineddata#Chinese Traditional Style wget https://github.com/tesseract-ocr/tessdata/raw/master/chi_tra.traineddata#simplified Chinese cp/mv *.traineddata /usr/local/share/tessdata/ #Mobile downloaded packages can also be manually moved to / usr/local/share/tessdata / under this path
Install pytesseract:
pip install Pillow pip install pytesseract
Up to now, the installation has been completed, with the following methods:
import pytesseract from PIL import Imag image = Image.open("port_img.jpg") text = pytesseract.image_to_string(image) print(text)
Reference material:
https://www.cnblogs.com/dajianshi/p/4932882.html
https://stackoverflow.com/questions/33659458/tesseract-image-issue