Python's setting and decoding of the verification code for rotating pictures

Keywords: Python OpenCV Selenium

Python's setting and decoding of the verification code for rotating pictures

In this paper, we mainly use python + selenium to crack the verification code of rotation breakthrough, in which numpy and OpenCV (CV2) are used for image splicing, conversion, mask, recognition, etc

Rotate the picture to the correct position

The most difficult part of rotation breakthrough is how to calculate the rotation angle. It is impossible for us to automatically identify the position of the picture in the way of artificial intelligence, so we can only traverse the possible pictures and save them. There are usually two possibilities for traversal:

  1. HTML code has links to all pictures or Base64 picture data
  2. The program runs automatically, grabs pictures, analyzes the similarities and differences of pictures, and then adjusts them manually

The first method

The method of picture 1 is very simple. For example, when the invoice url issued by 51fapiao needs to download or browse the invoice, there will be a verification code. The verification code will appear randomly as sliding verification code or rotating verification code, but when checking the html with verification code, it is found that the url list of all the correct verification code pictures is saved

# Use regular expressions to extract the URLs of all correct images
html = driver.page_source
url_list = re.findall(
    "\'(http[^\']+?\d+?\.(?:jpg|png))\'", html, re.S)
# The rotated picture is usually a square, assuming that the width and height are w, H (usually w = h)
# For the convenience of recognition, all images need to be spliced together to form a row of images (or a column of images)
n = len(url_list)
img_all = np.zeros((h, w*n), dtype=np.uint8)
n = 0
for img_url in url_list:
        # Download Image and load
        r = requests.get(img_url)
        img_tmp = cv2.imdecode(np.asarray(
            bytearray(r.content), dtype=np.uint8), cv2.IMREAD_COLOR)
    # Resize, and enable mask to convert to grayscale image
    img_tmp = cv2.resize(img_tmp, (w, h))
    # Mask the picture with a circle (this will be explained later)
    img_tmp = cv2.add(img_tmp, np.zeros(
            img_tmp.shape, dtype=np.uint8), mask=mask)
    # Convert to grayscale image
    img_tmp = cv2.cvtColor(
        np.asarray(img_tmp), cv2.COLOR_BGR2GRAY)
    # show(img_tmp)
    # Make a large picture horizontally
    img_all[:, n*w:(n+1)*w] = img_tmp[:, :]
    n += 1

The second method (universal method)

Refresh the screen, find the element of the captcha image, and then process the image:

  • Mask pictures with circles
  • To grayscale

Let's talk about masks
The verification code usually has a background, and then the core image is in a circle. This is to mask the background image around the circle, and only compare the image in the center of the circle. We use OpenCV to mask

def get_chaptcha_image(driver):
    png_data = driver.find_element_by_tag_name(
    img = cv2.imdecode(np.frombuffer(png_data, dtype=np.uint8), 
    h, w = img.shape[:2] # The height and width of the verification code of the first method come from here

    # Make a round mask (only the middle round part passes through, other shielding)
    mask = np.zeros((h, w), dtype=np.uint8)
    (centerX, centerY) = (mask.shape[1] // 2, mask.shape[0] // 2)
    white = (255, 255, 255), (centerX, centerY), w//2-1, white, -1)
    # Enable mask and convert image to grayscale
    img = cv2.add(img, np.zeros(
        img.shape, dtype=np.uint8), mask=mask)
    img = cv2.cvtColor(np.asarray(img), cv2.COLOR_BGR2GRAY)
    return img

Through the above operations, we have found a verification code image, and then we can traverse all the images by doing the following operations

  • img_all (save all pictures)
  • Refresh the picture and take a verification code picture
  • Rotate the captcha image from 0 to 360 degrees at img_ Search in all, if the similarity is above 0.95, it is considered that the picture already exists, continue to refresh (if the new picture is not found after 20 times of refreshing, it is considered that all pictures have been found, stop the program)
  • Add non-existent pictures to img_ In all
  • Manually set the correct position of each picture and adjust and save it to the large picture (replace the original picture position)

The core code is as follows

def add_to_img_all(img_all, img):
    //Splicing img to img_all, return the new img_all
    h_all, w_all = img_all.shape[:2]
    h, w = img_all.shape[:2]
    w_new = w_all + w
    img_all_new = np.zeros((h, w_new), dtype=np.uint8)
    img_all_new[:,w_all:w_all+w,:] = img[:,:,:]
    return img_all_new
def is_existed(img_all, img, semilar = 0.95):
    //At img_ Find img in all and stop when similarity reaches semilar
        semilar: Similarity threshold
    //Return: (maxValue, maxLoc, Angle)
        maxValue: Similarity- 0: Do not know each other, 1: all similar
        maxLoc: Most similar location( x,y)
        Angle: Convert to the most similar rotation angle

img = get_chaptcha_image(driver)
img_all = img
h, w = img.shape[:2]
while True:
    img = get_chaptcha_image(driver)
    maxValue, maxLoc, Angle = is_existed(img_all, img
    if maxValue>0.9: # If the similarity is greater than 0.9, it is considered to exist
        times +=1
        if times>=20:
        times = 0
        img_all = add_to_img_all(img_all, img)

Posted by psychosquirrel on Mon, 22 Jun 2020 21:11:21 -0700