Python cracking BiliBili slider verification code
|Perfect is impossible, add a shock! Python breaks the BiliBili slider verification code, avoiding human-computer identification perfectly. It can have
Preparation
- Landing page of station B https://passport.bilibili.com/login
- python3
- pip install selenium (webdriver framework)
- pip install PIL (image processing)
- chrome driver: http://chromedriver.storage.googleapis.com/index.html
- firefox driver: https://github.com/mozilla/geckodriver/releases
The slider verification code of station B is as above.
This kind of verification code can be cracked by using selenium operation browser to drag the slider. There are two difficulties, one is how to determine the location of the drag, the other is to avoid human-computer identification (anti crawler).
Determine the displacement distance that the slider verification code needs to drag
There are three ways
- Artificial intelligence machine learning to determine the position of slide
- The position of the slider is determined by pixel comparison between the complete picture and the picture without the slider
- Edge detection algorithm, location determination
Each has its own advantages and disadvantages. Artificial intelligence machine learning, to determine the position of the slider, needs to be trained, which is more troublesome. It can also be seen whether there is an online api to call. Here are two other ways.
Compare the full picture with the missing slider picture
|Only introduced, this article does not implement. For station B, it is the way with the highest accuracy (100%), but it can not guarantee that the slider verification and upgrading of station B in the future will be unavailable.
There are three pictures in the slider verification module of station B: complete picture, missing slider picture and slider picture, which are all drawn by canvas. Be similar to:
Complete picture:
Missing slide diagram:
Slide chart:
The HTML code is similar to:
<div class="geetest_canvas_img geetest_absolute" style="display: block;"> <div class="geetest_slicebg geetest_absolute"> <canvas class="geetest_canvas_bg geetest_absolute" height="160" width="260"></canvas> <canvas class="geetest_canvas_slice geetest_absolute" width="260" height="160"></canvas> </div> <canvas class="geetest_canvas_fullbg geetest_fade geetest_absolute" height="160" width="260" style="display: none;"></canvas> </div>
We only need to get the canvas elements through selenium, execute js to get the canvas pixels, and traverse the pixels of the complete graph and the missing slider graph. Once we get the difference (a little pixel error needs to be allowed), the x-axis direction of the pixel matrix is the slider position.
In addition, because there is a distance between the slider graph and the origin of the canvas coordinate, this part of the distance needs to be subtracted.
Finally, use selenium to drag.
Edge detection algorithm, location determination
|The slider is basically a square, and the starting position of the square can be determined by the algorithm.
Introduce two ways
- The slider is square, with vertical edges, which are basically grayish black in the missing slider graph. Go through the pixels to find the edges that are basically gray and black.
- The position of the slide block in the missing slide block diagram is closed in gray and black. Through the algorithm, we can find the closed area, which is similar to the size of the slider, that is, the position that the slider needs to be dragged.
The second kind of implementation is a little complicated, and will not be implemented.
The following is the first implementation method. There will be undetectable or error situations. When using, you need to change a verification code. It is also possible that the detected edge is another one (because the slider of station B is not rectangular and there is arc edge), then the slider width needs to be subtracted
class VeriImageUtil(): def __init__(self): self.defaultConfig = { "grayOffset": 20, "opaque": 1, "minVerticalLineCount": 30 } self.config = copy.deepcopy(self.defaultConfig) def updateConfig(self, config): # temp = copy.deepcopy(config) for k in self.config: if k in config.keys(): self.config[k] = config[k] def getMaxOffset(self, *args): # Calculate the maximum number of offset averages av = sum(args) / len(args) maxOffset = 0 for a in args: offset = abs(av - a) if offset > maxOffset: maxOffset = offset return maxOffset def isGrayPx(self, r, g, b): # Whether it is a grayscale pixel point, and allow fluctuation offset return self.getMaxOffset(r, g, b) < self.config["grayOffset"] def isDarkStyle(self, r, g, b): # Gloomy style return r < 128 and g < 128 and b < 128 def isOpaque(self, px): # Opaque return px[3] >= 255 * self.config["opaque"] def getVerticalLineOffsetX(self, bgImage): # bgImage = Image.open("./image/bg.png") # bgImage.im.mode = 'RGBA' bgBytes = bgImage.load() x = 0 while x < bgImage.size[0]: y = 0 # Point "" line, number of gray lines verticalLineCount = 0 if x == 258: print(y) while y < bgImage.size[1]: px = bgBytes[x, y] r = px[0] g = px[1] b = px[2] # alph = px[3] # print(px) if self.isDarkStyle(r, g, b) and self.isGrayPx(r, g, b) and self.isOpaque(px): verticalLineCount += 1 else: verticalLineCount = 0 y += 1 continue if verticalLineCount >= self.config["minVerticalLineCount"]: # Continuous multiple pixels are gray pixels, straight lines # print(x, y) return x y += 1 x += 1 pass if __name__ == '__main__': bgImage = Image.open("./image/bg.png") veriImageUtil = VeriImageUtil() # veriImageUtil.updateConfig({ # "grayOffset": 20, # "opaque": 0.6, # "minVerticalLineCount": 10 # }) bgOffsetX = veriImageUtil.getVerticalLineOffsetX(bgImage) print("bgOffsetX:{} ".format(bgOffsetX))
Slide validation with selenium (will fail)
First of all, we need to get the picture verified by the slider from html. By executing js, turn the canvas pixel to base64, and then python can get it and drag it:
from selenium import webdriver import time import base64 from PIL import Image from io import BytesIO from selenium.webdriver.support.ui import WebDriverWait def checkVeriImage(driver): WebDriverWait(driver, 5).until( lambda driver: driver.find_element_by_css_selector('.geetest_canvas_bg.geetest_absolute')) time.sleep(1) im_info = driver.execute_script( 'return document.getElementsByClassName("geetest_canvas_bg geetest_absolute")[0].toDataURL("image/png");') # Get base64 encoded picture information im_base64 = im_info.split(',')[1] # Convert to bytes im_bytes = base64.b64decode(im_base64) with open('./temp_bg.png', 'wb') as f: # Save pictures to local area for easy Preview f.write(im_bytes) image_data = BytesIO(im_bytes) bgImage = Image.open(image_data) # The slider is 5-10 pixels away from the left offsetX = VeriImageUtil().getVerticalLineOffsetX(bgImage) eleDrag = driver.find_element_by_css_selector(".geetest_slider_button") action_chains = webdriver.ActionChains(driver) action_chains.drag_and_drop_by_offset(eleDrag,offsetX-10,0).perform()
It seems to be OK, but in fact, during the verification, we will encounter "the puzzle is eaten by the monster, please try again", leading to failure. This is because a robot (reptile) operation has been detected.
Avoiding human machine identification
|In fact, the human-machine identification of the slide block verification code of station B is not very good, which mainly depends on whether there is a dwell interval. At the beginning, I was misled by the online articles. What distance = initial speed times time t + 1/2 acceleration times (Time Square) simulation drag, in fact, is totally wrong.
Webdriver. Actionchains (driver). Drag and drop by offset (eledrag, offsetx-10,0). Perform() dragging the slider will cause the verification to fail. In station B, it's because the action is too fast.
Some students plan to add time.sleep(1) directly. It will not succeed. It will prompt that the puzzle is eaten by the monster. Please try again
In fact, the process of human verification of the slider can be classified as: drag the verification code to the designated position quickly by fingers, correct the error, stay for a while, and release the slider.
Simple implementation
The code can be implemented simply without the need to simulate the process of drag error correction. Ordinary websites will not count this, at least B stations will not.
def simpleSimulateDragX(self, source, targetOffsetX): """ //Simple drag mimics human drag: drag along the X-axis quickly, directly to the right position, pause for a while, and then release the drag action B The station is based on whether there is a pause time to distinguish the human and machine. This method is applicable. :param source: :param targetOffsetX: :return: None """ #Implementation of reference ` drag and drop by offset (eledrag, offsetx-10,0) ', using move method action_chains = webdriver.ActionChains(self.driver) # Click, ready to drag action_chains.click_and_hold(source) action_chains.pause(0.2) action_chains.move_by_offset(targetOffsetX,0) action_chains.pause(0.6) action_chains.release() action_chains.perform()
Implementation of the process of adding and modifying
In fact, there is an extra fix process in the last section, action_chains.move_by_offset(10,0)
def fixedSimulateDragX(self, source, targetOffsetX): #Implementation of reference ` drag and drop by offset (eledrag, offsetx-10,0) ', using move method action_chains = webdriver.ActionChains(self.driver) # Click, ready to drag action_chains.click_and_hold(source) action_chains.pause(0.2) action_chains.move_by_offset(targetOffsetX-10,0) action_chains.pause(0.6) action_chains.move_by_offset(10,0) action_chains.pause(0.6) action_chains.release() action_chains.perform()
Ultimate implementation
|In order to be more like human operation, we can randomize the dragging interval time, dragging times and distance. Although this is not useful for station B, it may lead to longer verification time.
Drag many times, you can use loop traversal, but the code may not be easy to understand, just judge directly, and at most, two or three times to complete the error correction process.
def __getRadomPauseScondes(self): """ :return:Random drag pause time """ return random.uniform(0.6, 0.9) def simulateDragX(self, source, targetOffsetX): """ //Imitating the drag action of human: quickly drag along the X axis (with error), then pause, and then correct the error //To prevent the robot from being detected and the verification failure such as "the picture is eaten by the monster" occurs :param source:Dragging html element :param targetOffsetX: Drag and drop target x Shaft distance :return: None """ action_chains = webdriver.ActionChains(self.driver) # Click, ready to drag action_chains.click_and_hold(source) # Drag times, two to three dragCount = random.randint(2, 3) if dragCount == 2: # Total error value sumOffsetx = random.randint(-15, 15) action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0) # Pause for a while action_chains.pause(self.__getRadomPauseScondes()) # Correct the error to prevent the robot from being detected, and the picture is eaten by the monster, etc. the verification fails action_chains.move_by_offset(-sumOffsetx, 0) elif dragCount == 3: # Total error value sumOffsetx = random.randint(-15, 15) action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0) # Pause for a while action_chains.pause(self.__getRadomPauseScondes()) # Sum of corrected errors fixedOffsetX = 0 # First correction error if sumOffsetx < 0: offsetx = random.randint(sumOffsetx, 0) else: offsetx = random.randint(0, sumOffsetx) fixedOffsetX = fixedOffsetX + offsetx action_chains.move_by_offset(-offsetx, 0) action_chains.pause(self.__getRadomPauseScondes()) # Last correction error action_chains.move_by_offset(-sumOffsetx + fixedOffsetX, 0) action_chains.pause(self.__getRadomPauseScondes()) else: raise Exception("Is there something wrong with the system?!") # Refer to action ﹐ chains.drag ﹐ and ﹐ drop ﹐ by ﹐ offset () action_chains.release() action_chains.perform()
Final chapter (complete code)
|Sample code and renderings
The complete sample code for this article.
# -*- coding: utf-8 -*- # @Date:2020/2/15 2:09 # @Author: Lu # @Description bilibili bili slider verification code identification. There is anti climbing restriction in station B. dragging too fast will prompt "the monster ate the puzzle, please try again". # At present, there are three pictures in station B. as long as you compare the pixels of the complete picture and the background picture of the missing slider, you can get the y-axis distance of the offset picture, minus the blank distance of the slider = the pixel distance to slide # In this paper, edge detection is used to detect whether there is a gray vertical line in the base map of the missing slider, that is, the target position of the slider has the probability of failure, and the application scope should be larger. from selenium import webdriver import time import base64 from PIL import Image from io import BytesIO from selenium.webdriver.support.ui import WebDriverWait import random import copy class VeriImageUtil(): def __init__(self): self.defaultConfig = { "grayOffset": 20, "opaque": 1, "minVerticalLineCount": 30 } self.config = copy.deepcopy(self.defaultConfig) def updateConfig(self, config): # temp = copy.deepcopy(config) for k in self.config: if k in config.keys(): self.config[k] = config[k] def getMaxOffset(self, *args): # Calculate the maximum number of offset averages av = sum(args) / len(args) maxOffset = 0 for a in args: offset = abs(av - a) if offset > maxOffset: maxOffset = offset return maxOffset def isGrayPx(self, r, g, b): # Whether it is a grayscale pixel point, and allow fluctuation offset return self.getMaxOffset(r, g, b) < self.config["grayOffset"] def isDarkStyle(self, r, g, b): # Gloomy style return r < 128 and g < 128 and b < 128 def isOpaque(self, px): # Opaque return px[3] >= 255 * self.config["opaque"] def getVerticalLineOffsetX(self, bgImage): # bgImage = Image.open("./image/bg.png") # bgImage.im.mode = 'RGBA' bgBytes = bgImage.load() x = 0 while x < bgImage.size[0]: y = 0 # Point "" line, number of gray lines verticalLineCount = 0 if x == 258: print(y) while y < bgImage.size[1]: px = bgBytes[x, y] r = px[0] g = px[1] b = px[2] # alph = px[3] # print(px) if self.isDarkStyle(r, g, b) and self.isGrayPx(r, g, b) and self.isOpaque(px): verticalLineCount += 1 else: verticalLineCount = 0 y += 1 continue if verticalLineCount >= self.config["minVerticalLineCount"]: # Continuous multiple pixels are grayscale pixels, straight lines, think it needs to slide so much # print(x, y) return x y += 1 x += 1 pass class DragUtil(): def __init__(self, driver): self.driver = driver def __getRadomPauseScondes(self): """ :return:Random drag pause time """ return random.uniform(0.6, 0.9) def simulateDragX(self, source, targetOffsetX): """ //Imitating the drag action of human: quickly drag along the X axis (with error), then pause, and then correct the error //To prevent the robot from being detected and the verification failure such as "the picture is eaten by the monster" occurs :param source:Dragging html element :param targetOffsetX: Drag and drop target x Shaft distance :return: None """ action_chains = webdriver.ActionChains(self.driver) # Click, ready to drag action_chains.click_and_hold(source) # Drag times, two to three dragCount = random.randint(2, 3) if dragCount == 2: # Total error value sumOffsetx = random.randint(-15, 15) action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0) # Pause for a while action_chains.pause(self.__getRadomPauseScondes()) # Correct the error to prevent the robot from being detected, and the picture is eaten by the monster, etc. the verification fails action_chains.move_by_offset(-sumOffsetx, 0) elif dragCount == 3: # Total error value sumOffsetx = random.randint(-15, 15) action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0) # Pause for a while action_chains.pause(self.__getRadomPauseScondes()) # Sum of corrected errors fixedOffsetX = 0 # First correction error if sumOffsetx < 0: offsetx = random.randint(sumOffsetx, 0) else: offsetx = random.randint(0, sumOffsetx) fixedOffsetX = fixedOffsetX + offsetx action_chains.move_by_offset(-offsetx, 0) action_chains.pause(self.__getRadomPauseScondes()) # Last correction error action_chains.move_by_offset(-sumOffsetx + fixedOffsetX, 0) action_chains.pause(self.__getRadomPauseScondes()) else: raise Exception("Is there something wrong with the system?!") # Refer to action ﹐ chains.drag ﹐ and ﹐ drop ﹐ by ﹐ offset () action_chains.release() action_chains.perform() def simpleSimulateDragX(self, source, targetOffsetX): """ //Simple drag mimics human drag: drag along the X-axis quickly, directly to the right position, pause for a while, and then release the drag action B The station is based on whether there is a pause time to distinguish the human and machine. This method is applicable. :param source: :param targetOffsetX: :return: None """ action_chains = webdriver.ActionChains(self.driver) # Click, ready to drag action_chains.click_and_hold(source) action_chains.pause(0.2) action_chains.move_by_offset(targetOffsetX, 0) action_chains.pause(0.6) action_chains.release() action_chains.perform() def checkVeriImage(driver): WebDriverWait(driver, 5).until( lambda driver: driver.find_element_by_css_selector('.geetest_canvas_bg.geetest_absolute')) time.sleep(1) im_info = driver.execute_script( 'return document.getElementsByClassName("geetest_canvas_bg geetest_absolute")[0].toDataURL("image/png");') # Get base64 encoded picture information im_base64 = im_info.split(',')[1] # Convert to bytes im_bytes = base64.b64decode(im_base64) with open('./temp_bg.png', 'wb') as f: # Save picture to local f.write(im_bytes) image_data = BytesIO(im_bytes) bgImage = Image.open(image_data) # The slider is 5 pixels away from the left offsetX = VeriImageUtil().getVerticalLineOffsetX(bgImage) print("offsetX: {}".format(offsetX)) if not type(offsetX) == int: # Unable to calculate, reload driver.find_element_by_css_selector(".geetest_refresh_1").click() checkVeriImage(driver) return elif offsetX == 0: # Unable to calculate, reload driver.find_element_by_css_selector(".geetest_refresh_1").click() checkVeriImage(driver) return else: dragVeriImage(driver, offsetX) def dragVeriImage(driver, offsetX): # Possible detection of right edge # Drag and drop eleDrag = driver.find_element_by_css_selector(".geetest_slider_button") dragUtil = DragUtil(driver) dragUtil.simulateDragX(eleDrag, offsetX - 10) time.sleep(2.5) if isNeedCheckVeriImage(driver): checkVeriImage(driver) return dragUtil.simulateDragX(eleDrag, offsetX - 6) time.sleep(2.5) if isNeedCheckVeriImage(driver): checkVeriImage(driver) return # Slider width about 40 dragUtil.simulateDragX(eleDrag, offsetX - 56) time.sleep(2.5) if isNeedCheckVeriImage(driver): checkVeriImage(driver) return dragUtil.simulateDragX(eleDrag, offsetX - 52) if isNeedCheckVeriImage(driver): checkVeriImage(driver) return def isNeedCheckVeriImage(driver): if driver.find_element_by_css_selector(".geetest_panel_error").is_displayed(): driver.find_element_by_css_selector(".geetest_panel_error_content").click(); return True return False def task(): # This step is very important. Set chrome as the developer mode to prevent it from being recognized by major websites and using Selenium # options = webdriver.ChromeOptions() # options.add_experimental_option('excludeSwitches', ['enable-automation']) options = webdriver.FirefoxOptions() # driver = webdriver.Firefox(executable_path=r"../../../res/webdriver/geckodriver_x64_0.26.0.exe",options=options) driver = webdriver.Firefox(executable_path=r"../../../res/webdriver/geckodriver_x64_0.26.0.exe",options=options) driver.get('https://passport.bilibili.com/login') time.sleep(3) driver.find_element_by_css_selector("#login-username").send_keys("1234567") driver.find_element_by_css_selector("#login-passwd").send_keys("abcdefg") driver.find_element_by_css_selector(".btn.btn-login").click() time.sleep(2) checkVeriImage(driver) pass # This method is used to confirm whether the element exists. If it exists, it returns flag=true, otherwise it returns false def isElementExist(driver, css): try: driver.find_element_by_css_selector(css) return True except: return False if __name__ == '__main__': task()