I. Introduction
Pass the first two articles
Identity Card Identification (1): Identity Card Number Location
Identity Card Recognition (2): Extraction of Target Area Images
The images of each information region have been obtained. There's only one last step: identifying the text on the image.
For text recognition, tesseract-ocr is used.
2. Use of tesseract-ocr
Previously, the method of calling tesseract was used for recognition, but the efficiency was too low, so C# code was used directly for recognition.
private string GetText(Bitmap img)
{
using (var engine = new TesseractEngine(@"C:\Files\hci\file\Tesseract-OCR\tessdata", "chi_sim", EngineMode.Default))
{
var image = img;
{
using (var pix = PixConverter.ToPix(image))
{
using (var page = engine.Process(pix))
{
return page.GetText();
}
}
}
}
}
This function returns a string of recognition results
among
var engine = new TesseractEngine(@"C:\Files\hci\file\Tesseract-OCR\tessdata", "chi_sim", EngineMode.Default))
TesseractEngine has three parameters: tessdata data data path, language to be recognized and pattern to be recognized.
chi_sim is used to recognize Chinese, and the recognition mode is the default EngineMode.Default.
III. tesseract-ocr Identity Card Information
In order to improve the recognition speed and effect, the image of each region obtained in the first two steps is denoised and binarized.
public static Image<Gray, byte> BinImg(Image<Bgr, byte> img,int blockSize,int val)
{
return img.Convert<Gray, byte>().SmoothGaussian(5).ThresholdAdaptive(new Gray(255),
AdaptiveThresholdType.GaussianC,
ThresholdType.Binary,
blockSize,
new Gray(val));
}
> Adopting Gauss
Fuzzy, automatic threshold binarization, remove the background stripes of ID card, only retain the text information.
Through the test, we found that blockSize = 15, val = 0.00111 times the area of ID number area is better.
Bitmap id = Util.BinImg( Util.Rote(img, idRect),15,(int)(0.00111 *idArea)).Bitmap;
Bitmap address = Util.BinImg(Util.Rote(img, addressRect), 15, (int)(0.00111 *idArea)).Bitmap;
Bitmap date = Util.BinImg(Util.Rote(img, dateRect), 15, (int)(0.001111 *idArea)).Bitmap;
Bitmap sex = Util.BinImg(Util.Rote(img, sexRect), 15, (int)(0.00111 *idArea)).Bitmap;
Bitmap name = Util.BinImg(Util.Rote(img, nameRect), 15, (int)(0.00111 *idArea)).Bitmap;
//Last
Call private string GetText(Bitmap img) to recognize the text information on the picture.
Because the identification effect of the ID number is good, the gender and the date of birth can be processed by combining the identification result of the ID number with the image.
IV. Recognition results
Attachment 1: Verification of the Correctness of ID Card Number
private bool CheckIDCard(string idNumber)
{
long n = 0;
if (long.TryParse(idNumber.Remove(17), out n) == false
|| n < Math.Pow(10, 16) || long.TryParse(idNumber.Replace('x', '0').Replace('X', '0'), out n) == false)
{
return false;//Digital Verification
}
string address = "11x22x35x44x53x12x23x36x45x54x13x31x37x46x61x14x32x41x50x62x15x33x42x51x63x21x34x43x52x64x65x71x81x82x91";
if (address.IndexOf(idNumber.Remove(2)) == -1)
{
return false;//Provincial verification
}
string birth = idNumber.Substring(6, 8).Insert(6, "-").Insert(4, "-");
DateTime time = new DateTime();
if (DateTime.TryParse(birth, out time) == false)
{
return false;//Birthday verification
}
string[] arrVarifyCode = ("1,0,x,9,8,7,6,5,4,3,2").Split(',');
string[] Wi = ("7,9,10,5,8,4,2,1,6,3,7,9,10,5,8,4,2").Split(',');
char[] Ai = idNumber.Remove(17).ToCharArray();
int sum = 0;
for (int i = 0; i < 17; i++)
{
sum += int.Parse(Wi[i]) * int.Parse(Ai[i].ToString());
}
int y = -1;
Math.DivRem(sum, 11, out y);
if (arrVarifyCode[y] != idNumber.Substring(17, 1).ToLower())
{
return false;//Check Code Verification
}
return true;
}