Shocked, Python cracked the BiliBili slider verification code to avoid human-computer identification perfectly

Keywords: Selenium Python Firefox pip

Python cracking BiliBili slider verification code

|Perfect is impossible, add a shock! Python breaks the BiliBili slider verification code, avoiding human-computer identification perfectly. It can have

Preparation

  • Landing page of station B https://passport.bilibili.com/login
  • python3
  • pip install selenium (webdriver framework)
  • pip install PIL (image processing)
  • chrome driver: http://chromedriver.storage.googleapis.com/index.html
  • firefox driver: https://github.com/mozilla/geckodriver/releases

The slider verification code of station B is as above.
This kind of verification code can be cracked by using selenium operation browser to drag the slider. There are two difficulties, one is how to determine the location of the drag, the other is to avoid human-computer identification (anti crawler).

Determine the displacement distance that the slider verification code needs to drag

There are three ways

  • Artificial intelligence machine learning to determine the position of slide
  • The position of the slider is determined by pixel comparison between the complete picture and the picture without the slider
  • Edge detection algorithm, location determination

Each has its own advantages and disadvantages. Artificial intelligence machine learning, to determine the position of the slider, needs to be trained, which is more troublesome. It can also be seen whether there is an online api to call. Here are two other ways.

Compare the full picture with the missing slider picture

|Only introduced, this article does not implement. For station B, it is the way with the highest accuracy (100%), but it can not guarantee that the slider verification and upgrading of station B in the future will be unavailable.

There are three pictures in the slider verification module of station B: complete picture, missing slider picture and slider picture, which are all drawn by canvas. Be similar to:

Complete picture:

Missing slide diagram:

Slide chart:

The HTML code is similar to:

<div class="geetest_canvas_img geetest_absolute" style="display: block;">
<div class="geetest_slicebg geetest_absolute">
	<canvas class="geetest_canvas_bg geetest_absolute" height="160" width="260"></canvas>
	<canvas class="geetest_canvas_slice geetest_absolute" width="260" height="160"></canvas>
</div>
<canvas class="geetest_canvas_fullbg geetest_fade geetest_absolute" height="160" width="260" style="display: none;"></canvas>
</div>

We only need to get the canvas elements through selenium, execute js to get the canvas pixels, and traverse the pixels of the complete graph and the missing slider graph. Once we get the difference (a little pixel error needs to be allowed), the x-axis direction of the pixel matrix is the slider position.
In addition, because there is a distance between the slider graph and the origin of the canvas coordinate, this part of the distance needs to be subtracted.
Finally, use selenium to drag.

Edge detection algorithm, location determination

|The slider is basically a square, and the starting position of the square can be determined by the algorithm.


Introduce two ways

  • The slider is square, with vertical edges, which are basically grayish black in the missing slider graph. Go through the pixels to find the edges that are basically gray and black.
  • The position of the slide block in the missing slide block diagram is closed in gray and black. Through the algorithm, we can find the closed area, which is similar to the size of the slider, that is, the position that the slider needs to be dragged.

The second kind of implementation is a little complicated, and will not be implemented.
The following is the first implementation method. There will be undetectable or error situations. When using, you need to change a verification code. It is also possible that the detected edge is another one (because the slider of station B is not rectangular and there is arc edge), then the slider width needs to be subtracted

class VeriImageUtil():

    def __init__(self):
        self.defaultConfig = {
            "grayOffset": 20,
            "opaque": 1,
            "minVerticalLineCount": 30
        }
        self.config = copy.deepcopy(self.defaultConfig)

    def updateConfig(self, config):
        # temp = copy.deepcopy(config)
        for k in self.config:
            if k in config.keys():
                self.config[k] = config[k]

    def getMaxOffset(self, *args):
        # Calculate the maximum number of offset averages
        av = sum(args) / len(args)

        maxOffset = 0
        for a in args:
            offset = abs(av - a)
            if offset > maxOffset:
                maxOffset = offset
        return maxOffset

    def isGrayPx(self, r, g, b):
        # Whether it is a grayscale pixel point, and allow fluctuation offset
        return self.getMaxOffset(r, g, b) < self.config["grayOffset"]

    def isDarkStyle(self, r, g, b):
        # Gloomy style
        return r < 128 and g < 128 and b < 128

    def isOpaque(self, px):
        # Opaque
        return px[3] >= 255 * self.config["opaque"]

    def getVerticalLineOffsetX(self, bgImage):
        # bgImage = Image.open("./image/bg.png")
        # bgImage.im.mode = 'RGBA'
        bgBytes = bgImage.load()

        x = 0
        while x < bgImage.size[0]:
            y = 0
            # Point "" line, number of gray lines
            verticalLineCount = 0
            if x == 258:
                print(y)
            while y < bgImage.size[1]:
                px = bgBytes[x, y]
                r = px[0]
                g = px[1]
                b = px[2]
                # alph = px[3]
                # print(px)
                if self.isDarkStyle(r, g, b) and self.isGrayPx(r, g, b) and self.isOpaque(px):
                    verticalLineCount += 1
                else:
                    verticalLineCount = 0
                    y += 1
                    continue

                if verticalLineCount >= self.config["minVerticalLineCount"]:
                    # Continuous multiple pixels are gray pixels, straight lines
                    # print(x, y)
                    return x

                y += 1

            x += 1
        pass


if __name__ == '__main__':
    bgImage = Image.open("./image/bg.png")
    veriImageUtil = VeriImageUtil()

    # veriImageUtil.updateConfig({
    #     "grayOffset": 20,
    #     "opaque": 0.6,
    #     "minVerticalLineCount": 10
    # })
        bgOffsetX = veriImageUtil.getVerticalLineOffsetX(bgImage)
    print("bgOffsetX:{} ".format(bgOffsetX))

Slide validation with selenium (will fail)

First of all, we need to get the picture verified by the slider from html. By executing js, turn the canvas pixel to base64, and then python can get it and drag it:

from selenium import webdriver
import time
import base64
from PIL import Image
from io import BytesIO
from selenium.webdriver.support.ui import WebDriverWait

def checkVeriImage(driver):    
    WebDriverWait(driver, 5).until(
        lambda driver: driver.find_element_by_css_selector('.geetest_canvas_bg.geetest_absolute'))
    time.sleep(1)
    im_info = driver.execute_script(
        'return document.getElementsByClassName("geetest_canvas_bg geetest_absolute")[0].toDataURL("image/png");')
    # Get base64 encoded picture information
    im_base64 = im_info.split(',')[1]
    # Convert to bytes
    im_bytes = base64.b64decode(im_base64)
    with open('./temp_bg.png', 'wb') as f:
        # Save pictures to local area for easy Preview
        f.write(im_bytes)
        
    image_data = BytesIO(im_bytes)
    bgImage = Image.open(image_data)
    # The slider is 5-10 pixels away from the left
    offsetX = VeriImageUtil().getVerticalLineOffsetX(bgImage)
    eleDrag = driver.find_element_by_css_selector(".geetest_slider_button")
    action_chains = webdriver.ActionChains(driver)
    action_chains.drag_and_drop_by_offset(eleDrag,offsetX-10,0).perform()

It seems to be OK, but in fact, during the verification, we will encounter "the puzzle is eaten by the monster, please try again", leading to failure. This is because a robot (reptile) operation has been detected.

Avoiding human machine identification

|In fact, the human-machine identification of the slide block verification code of station B is not very good, which mainly depends on whether there is a dwell interval. At the beginning, I was misled by the online articles. What distance = initial speed times time t + 1/2 acceleration times (Time Square) simulation drag, in fact, is totally wrong.

Webdriver. Actionchains (driver). Drag and drop by offset (eledrag, offsetx-10,0). Perform() dragging the slider will cause the verification to fail. In station B, it's because the action is too fast.
Some students plan to add time.sleep(1) directly. It will not succeed. It will prompt that the puzzle is eaten by the monster. Please try again

In fact, the process of human verification of the slider can be classified as: drag the verification code to the designated position quickly by fingers, correct the error, stay for a while, and release the slider.

Simple implementation

The code can be implemented simply without the need to simulate the process of drag error correction. Ordinary websites will not count this, at least B stations will not.

    def simpleSimulateDragX(self, source, targetOffsetX):
        """
        //Simple drag mimics human drag: drag along the X-axis quickly, directly to the right position, pause for a while, and then release the drag action
        B The station is based on whether there is a pause time to distinguish the human and machine. This method is applicable.
        :param source: 
        :param targetOffsetX: 
        :return: None
        """
		#Implementation of reference ` drag and drop by offset (eledrag, offsetx-10,0) ', using move method
        action_chains = webdriver.ActionChains(self.driver)
        # Click, ready to drag
        action_chains.click_and_hold(source)
        action_chains.pause(0.2)
        action_chains.move_by_offset(targetOffsetX,0)
        action_chains.pause(0.6)
        action_chains.release()
        action_chains.perform()


Implementation of the process of adding and modifying

In fact, there is an extra fix process in the last section, action_chains.move_by_offset(10,0)

    def fixedSimulateDragX(self, source, targetOffsetX):
		#Implementation of reference ` drag and drop by offset (eledrag, offsetx-10,0) ', using move method
        action_chains = webdriver.ActionChains(self.driver)
        # Click, ready to drag
        action_chains.click_and_hold(source)
        action_chains.pause(0.2)
        action_chains.move_by_offset(targetOffsetX-10,0)
        action_chains.pause(0.6)
        action_chains.move_by_offset(10,0)
        action_chains.pause(0.6)
        action_chains.release()
        action_chains.perform()

Ultimate implementation

|In order to be more like human operation, we can randomize the dragging interval time, dragging times and distance. Although this is not useful for station B, it may lead to longer verification time.

Drag many times, you can use loop traversal, but the code may not be easy to understand, just judge directly, and at most, two or three times to complete the error correction process.

    def __getRadomPauseScondes(self):
        """
        :return:Random drag pause time
        """
        return random.uniform(0.6, 0.9)

    def simulateDragX(self, source, targetOffsetX):
        """
        //Imitating the drag action of human: quickly drag along the X axis (with error), then pause, and then correct the error
        //To prevent the robot from being detected and the verification failure such as "the picture is eaten by the monster" occurs
        :param source:Dragging html element
        :param targetOffsetX: Drag and drop target x Shaft distance
        :return: None
        """
        action_chains = webdriver.ActionChains(self.driver)
        # Click, ready to drag
        action_chains.click_and_hold(source)
        # Drag times, two to three
        dragCount = random.randint(2, 3)
        if dragCount == 2:
            # Total error value
            sumOffsetx = random.randint(-15, 15)
            action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0)
            # Pause for a while
            action_chains.pause(self.__getRadomPauseScondes())
            # Correct the error to prevent the robot from being detected, and the picture is eaten by the monster, etc. the verification fails
            action_chains.move_by_offset(-sumOffsetx, 0)
        elif dragCount == 3:
            # Total error value
            sumOffsetx = random.randint(-15, 15)
            action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0)
            # Pause for a while
            action_chains.pause(self.__getRadomPauseScondes())

            # Sum of corrected errors
            fixedOffsetX = 0
            # First correction error
            if sumOffsetx < 0:
                offsetx = random.randint(sumOffsetx, 0)
            else:
                offsetx = random.randint(0, sumOffsetx)

            fixedOffsetX = fixedOffsetX + offsetx
            action_chains.move_by_offset(-offsetx, 0)
            action_chains.pause(self.__getRadomPauseScondes())

            # Last correction error
            action_chains.move_by_offset(-sumOffsetx + fixedOffsetX, 0)
            action_chains.pause(self.__getRadomPauseScondes())

        else:
            raise Exception("Is there something wrong with the system?!")

        # Refer to action ﹐ chains.drag ﹐ and ﹐ drop ﹐ by ﹐ offset ()
        action_chains.release()
        action_chains.perform()

Final chapter (complete code)

|Sample code and renderings

The complete sample code for this article.

# -*- coding: utf-8 -*-
# @Date:2020/2/15 2:09
# @Author: Lu
# @Description bilibili bili slider verification code identification. There is anti climbing restriction in station B. dragging too fast will prompt "the monster ate the puzzle, please try again".
# At present, there are three pictures in station B. as long as you compare the pixels of the complete picture and the background picture of the missing slider, you can get the y-axis distance of the offset picture, minus the blank distance of the slider = the pixel distance to slide
# In this paper, edge detection is used to detect whether there is a gray vertical line in the base map of the missing slider, that is, the target position of the slider has the probability of failure, and the application scope should be larger.


from selenium import webdriver
import time
import base64
from PIL import Image
from io import BytesIO
from selenium.webdriver.support.ui import WebDriverWait
import random
import copy


class VeriImageUtil():

    def __init__(self):
        self.defaultConfig = {
            "grayOffset": 20,
            "opaque": 1,
            "minVerticalLineCount": 30
        }
        self.config = copy.deepcopy(self.defaultConfig)

    def updateConfig(self, config):
        # temp = copy.deepcopy(config)
        for k in self.config:
            if k in config.keys():
                self.config[k] = config[k]

    def getMaxOffset(self, *args):
        # Calculate the maximum number of offset averages
        av = sum(args) / len(args)

        maxOffset = 0
        for a in args:
            offset = abs(av - a)
            if offset > maxOffset:
                maxOffset = offset
        return maxOffset

    def isGrayPx(self, r, g, b):
        # Whether it is a grayscale pixel point, and allow fluctuation offset
        return self.getMaxOffset(r, g, b) < self.config["grayOffset"]

    def isDarkStyle(self, r, g, b):
        # Gloomy style
        return r < 128 and g < 128 and b < 128

    def isOpaque(self, px):
        # Opaque
        return px[3] >= 255 * self.config["opaque"]

    def getVerticalLineOffsetX(self, bgImage):
        # bgImage = Image.open("./image/bg.png")
        # bgImage.im.mode = 'RGBA'
        bgBytes = bgImage.load()

        x = 0
        while x < bgImage.size[0]:
            y = 0
            # Point "" line, number of gray lines
            verticalLineCount = 0
            if x == 258:
                print(y)
            while y < bgImage.size[1]:
                px = bgBytes[x, y]
                r = px[0]
                g = px[1]
                b = px[2]
                # alph = px[3]
                # print(px)
                if self.isDarkStyle(r, g, b) and self.isGrayPx(r, g, b) and self.isOpaque(px):
                    verticalLineCount += 1
                else:
                    verticalLineCount = 0
                    y += 1
                    continue

                if verticalLineCount >= self.config["minVerticalLineCount"]:
                    # Continuous multiple pixels are grayscale pixels, straight lines, think it needs to slide so much
                    # print(x, y)
                    return x

                y += 1

            x += 1
        pass


class DragUtil():
    def __init__(self, driver):
        self.driver = driver

    def __getRadomPauseScondes(self):
        """
        :return:Random drag pause time
        """
        return random.uniform(0.6, 0.9)

    def simulateDragX(self, source, targetOffsetX):
        """
        //Imitating the drag action of human: quickly drag along the X axis (with error), then pause, and then correct the error
        //To prevent the robot from being detected and the verification failure such as "the picture is eaten by the monster" occurs
        :param source:Dragging html element
        :param targetOffsetX: Drag and drop target x Shaft distance
        :return: None
        """
        action_chains = webdriver.ActionChains(self.driver)
        # Click, ready to drag
        action_chains.click_and_hold(source)
        # Drag times, two to three
        dragCount = random.randint(2, 3)
        if dragCount == 2:
            # Total error value
            sumOffsetx = random.randint(-15, 15)
            action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0)
            # Pause for a while
            action_chains.pause(self.__getRadomPauseScondes())
            # Correct the error to prevent the robot from being detected, and the picture is eaten by the monster, etc. the verification fails
            action_chains.move_by_offset(-sumOffsetx, 0)
        elif dragCount == 3:
            # Total error value
            sumOffsetx = random.randint(-15, 15)
            action_chains.move_by_offset(targetOffsetX + sumOffsetx, 0)
            # Pause for a while
            action_chains.pause(self.__getRadomPauseScondes())

            # Sum of corrected errors
            fixedOffsetX = 0
            # First correction error
            if sumOffsetx < 0:
                offsetx = random.randint(sumOffsetx, 0)
            else:
                offsetx = random.randint(0, sumOffsetx)

            fixedOffsetX = fixedOffsetX + offsetx
            action_chains.move_by_offset(-offsetx, 0)
            action_chains.pause(self.__getRadomPauseScondes())

            # Last correction error
            action_chains.move_by_offset(-sumOffsetx + fixedOffsetX, 0)
            action_chains.pause(self.__getRadomPauseScondes())

        else:
            raise Exception("Is there something wrong with the system?!")

        # Refer to action ﹐ chains.drag ﹐ and ﹐ drop ﹐ by ﹐ offset ()
        action_chains.release()
        action_chains.perform()

    def simpleSimulateDragX(self, source, targetOffsetX):
        """
        //Simple drag mimics human drag: drag along the X-axis quickly, directly to the right position, pause for a while, and then release the drag action
        B The station is based on whether there is a pause time to distinguish the human and machine. This method is applicable.
        :param source: 
        :param targetOffsetX: 
        :return: None
        """

        action_chains = webdriver.ActionChains(self.driver)
        # Click, ready to drag
        action_chains.click_and_hold(source)
        action_chains.pause(0.2)
        action_chains.move_by_offset(targetOffsetX, 0)
        action_chains.pause(0.6)
        action_chains.release()
        action_chains.perform()

def checkVeriImage(driver):
    WebDriverWait(driver, 5).until(
        lambda driver: driver.find_element_by_css_selector('.geetest_canvas_bg.geetest_absolute'))
    time.sleep(1)
    im_info = driver.execute_script(
        'return document.getElementsByClassName("geetest_canvas_bg geetest_absolute")[0].toDataURL("image/png");')
    # Get base64 encoded picture information
    im_base64 = im_info.split(',')[1]
    # Convert to bytes
    im_bytes = base64.b64decode(im_base64)
    with open('./temp_bg.png', 'wb') as f:
        # Save picture to local
        f.write(im_bytes)

    image_data = BytesIO(im_bytes)
    bgImage = Image.open(image_data)
    # The slider is 5 pixels away from the left
    offsetX = VeriImageUtil().getVerticalLineOffsetX(bgImage)
    print("offsetX: {}".format(offsetX))
    if not type(offsetX) == int:
        # Unable to calculate, reload
        driver.find_element_by_css_selector(".geetest_refresh_1").click()
        checkVeriImage(driver)
        return
    elif offsetX == 0:
        # Unable to calculate, reload
        driver.find_element_by_css_selector(".geetest_refresh_1").click()
        checkVeriImage(driver)
        return
    else:
        dragVeriImage(driver, offsetX)


def dragVeriImage(driver, offsetX):
    # Possible detection of right edge
    # Drag and drop
    eleDrag = driver.find_element_by_css_selector(".geetest_slider_button")
    dragUtil = DragUtil(driver)
    dragUtil.simulateDragX(eleDrag, offsetX - 10)
    time.sleep(2.5)

    if isNeedCheckVeriImage(driver):
        checkVeriImage(driver)
        return
    dragUtil.simulateDragX(eleDrag, offsetX - 6)

    time.sleep(2.5)
    if isNeedCheckVeriImage(driver):
        checkVeriImage(driver)
        return
    # Slider width about 40
    dragUtil.simulateDragX(eleDrag, offsetX - 56)

    time.sleep(2.5)
    if isNeedCheckVeriImage(driver):
        checkVeriImage(driver)
        return
    dragUtil.simulateDragX(eleDrag, offsetX - 52)

    if isNeedCheckVeriImage(driver):
        checkVeriImage(driver)
        return


def isNeedCheckVeriImage(driver):
    if driver.find_element_by_css_selector(".geetest_panel_error").is_displayed():
        driver.find_element_by_css_selector(".geetest_panel_error_content").click();
        return True
    return False


def task():
    # This step is very important. Set chrome as the developer mode to prevent it from being recognized by major websites and using Selenium
    # options = webdriver.ChromeOptions()
    # options.add_experimental_option('excludeSwitches', ['enable-automation'])

    options = webdriver.FirefoxOptions()

    # driver = webdriver.Firefox(executable_path=r"../../../res/webdriver/geckodriver_x64_0.26.0.exe",options=options)
    driver = webdriver.Firefox(executable_path=r"../../../res/webdriver/geckodriver_x64_0.26.0.exe",options=options)

    driver.get('https://passport.bilibili.com/login')
    time.sleep(3)

    driver.find_element_by_css_selector("#login-username").send_keys("1234567")
    driver.find_element_by_css_selector("#login-passwd").send_keys("abcdefg")
    driver.find_element_by_css_selector(".btn.btn-login").click()
    time.sleep(2)
    checkVeriImage(driver)

    pass


#   This method is used to confirm whether the element exists. If it exists, it returns flag=true, otherwise it returns false
def isElementExist(driver, css):
    try:
        driver.find_element_by_css_selector(css)
        return True
    except:
        return False


if __name__ == '__main__':
    task()

105 original articles published, 57 praised, 340000 visitors+
Private letter follow

Posted by gunslinger008 on Sun, 16 Feb 2020 04:38:11 -0800