Python's setting and decoding of the verification code for rotating pictures
In this paper, we mainly use python + selenium to crack the verification code of rotation breakthrough, in which numpy and OpenCV (CV2) are used for image splicing, conversion, mask, recognition, etc
Rotate the picture to the correct position
The most difficult part of rotation breakthrough is how to calculate the rotation angle. It is impossible for us to automatically identify the position of the picture in the way of artificial intelligence, so we can only traverse the possible pictures and save them. There are usually two possibilities for traversal:
- HTML code has links to all pictures or Base64 picture data
- The program runs automatically, grabs pictures, analyzes the similarities and differences of pictures, and then adjusts them manually
The first method
The method of picture 1 is very simple. For example, when the invoice url issued by 51fapiao needs to download or browse the invoice, there will be a verification code. The verification code will appear randomly as sliding verification code or rotating verification code, but when checking the html with verification code, it is found that the url list of all the correct verification code pictures is saved
# Use regular expressions to extract the URLs of all correct images html = driver.page_source url_list = re.findall( "\'(http[^\']+?\d+?\.(?:jpg|png))\'", html, re.S) # The rotated picture is usually a square, assuming that the width and height are w, H (usually w = h) # For the convenience of recognition, all images need to be spliced together to form a row of images (or a column of images) n = len(url_list) img_all = np.zeros((h, w*n), dtype=np.uint8) n = 0 for img_url in url_list: try: # Download Image and load r = requests.get(img_url) img_tmp = cv2.imdecode(np.asarray( bytearray(r.content), dtype=np.uint8), cv2.IMREAD_COLOR) except: continue # Resize, and enable mask to convert to grayscale image img_tmp = cv2.resize(img_tmp, (w, h)) # Mask the picture with a circle (this will be explained later) img_tmp = cv2.add(img_tmp, np.zeros( img_tmp.shape, dtype=np.uint8), mask=mask) # Convert to grayscale image img_tmp = cv2.cvtColor( np.asarray(img_tmp), cv2.COLOR_BGR2GRAY) # show(img_tmp) # Make a large picture horizontally img_all[:, n*w:(n+1)*w] = img_tmp[:, :] n += 1
The second method (universal method)
Refresh the screen, find the element of the captcha image, and then process the image:
- Mask pictures with circles
- To grayscale
Let's talk about masks
The verification code usually has a background, and then the core image is in a circle. This is to mask the background image around the circle, and only compare the image in the center of the circle. We use OpenCV to mask
def get_chaptcha_image(driver): png_data = driver.find_element_by_tag_name( 'canvas').screenshot_as_png img = cv2.imdecode(np.frombuffer(png_data, dtype=np.uint8), cv2.IMREAD_COLOR) h, w = img.shape[:2] # The height and width of the verification code of the first method come from here # Make a round mask (only the middle round part passes through, other shielding) mask = np.zeros((h, w), dtype=np.uint8) (centerX, centerY) = (mask.shape[1] // 2, mask.shape[0] // 2) white = (255, 255, 255) cv2.circle(mask, (centerX, centerY), w//2-1, white, -1) # Enable mask and convert image to grayscale img = cv2.add(img, np.zeros( img.shape, dtype=np.uint8), mask=mask) img = cv2.cvtColor(np.asarray(img), cv2.COLOR_BGR2GRAY) return img
Through the above operations, we have found a verification code image, and then we can traverse all the images by doing the following operations
- img_all (save all pictures)
- Refresh the picture and take a verification code picture
- Rotate the captcha image from 0 to 360 degrees at img_ Search in all, if the similarity is above 0.95, it is considered that the picture already exists, continue to refresh (if the new picture is not found after 20 times of refreshing, it is considered that all pictures have been found, stop the program)
- Add non-existent pictures to img_ In all
- Manually set the correct position of each picture and adjust and save it to the large picture (replace the original picture position)
The core code is as follows
def add_to_img_all(img_all, img): ''' //Splicing img to img_all, return the new img_all ''' h_all, w_all = img_all.shape[:2] h, w = img_all.shape[:2] w_new = w_all + w img_all_new = np.zeros((h, w_new), dtype=np.uint8) img_all_new[:,:w_all,:]=img_all[:,:,:] img_all_new[:,w_all:w_all+w,:] = img[:,:,:] return img_all_new def is_existed(img_all, img, semilar = 0.95): ''' //At img_ Find img in all and stop when similarity reaches semilar //Parameters: img_all: img: semilar: Similarity threshold //Return: (maxValue, maxLoc, Angle) maxValue: Similarity- 0: Do not know each other, 1: all similar maxLoc: Most similar location( x,y) Angle: Convert to the most similar rotation angle ''' img = get_chaptcha_image(driver) img_all = img h, w = img.shape[:2] while True: driver.refresh() img = get_chaptcha_image(driver) maxValue, maxLoc, Angle = is_existed(img_all, img if maxValue>0.9: # If the similarity is greater than 0.9, it is considered to exist times +=1 if times>=20: break else: continue else: times = 0 img_all = add_to_img_all(img_all, img)