JS Reverse_Text Encryption base64 Parsing

Keywords: Python crawler

This article uses 58 Tongcheng's recruitment website for reference to learn and crawl the encrypted part of the font

58 Tongcheng Recruitment


As you can see from the diagram, the font information in the source code is replaced by question marks, so if you crawl normally, only those question marks will be crawled.

Right-click the page and select View Source Code.

You can see a complex piece of code in <style> where you can see the word base64, and you can guess that this font encryption is Base64 font encryption

You can learn about base64 encryption first

Knowing base64 encryption, we can map fonts accordingly

import requests
import re
import base64
from fontTools.ttLib import TTFont


if __name__ == '__main__':
    url = "https://wh.58.com/searchjob/"
    headers = {
        "authority": "wh.58.com",
        "method": "GET",
        "path": "/searchjob/pn2/?param8616=0&PGTID=0d302409-0009-e61a-20f5-358c3716a941&ClickID=15",
        "scheme": "https",
        "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "accept-encoding": "gzip, deflate, br",
        "accept-language": "zh-CN,zh;q=0.9",
        "cache-control": "max-age=0",
        "cookie": 'f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; myLat=""; myLon=""; id58=ETqdRGE6++PYpQ6p3yNSQw==; mcity=wh; f=n; commontopbar_new_city_info=158%7C%E6%AD%A6%E6%B1%89%7Cwh; city=wh; 58home=wh; commontopbar_ipcity=wh%7C%E6%AD%A6%E6%B1%89%7C0; 58tj_uuid=47caedb4-55a5-4a44-bb57-91bb7ce7b1c8; als=0; wmda_uuid=1cdfdf35506967429f31843b6f1bd427; wmda_new_uuid=1; xxzl_deviceid=Sr88BtIOyaCrETs%2Btdz%2FOETVatRQ8IXBAmHknmK2TR%2FArCWLa1XEpB11ixkOtJEH; sessionid=ec391b2a-d1fc-43f9-bd13-0d1772279374; param8716kop=1; wmda_visited_projects=%3B11187958619315%3B1731916484865%3B10104579731767; www58com="UserID=64731485788170&UserName=a7nfk1acl"; 58cooper="userid=64731485788170&username=a7nfk1acl"; 58uname=a7nfk1acl; passportAccount="atype=0&bstate=0"; fzq_h=cb094e9752a76e0476beca234279f9ad_1631255925348_b4fcd7a5eb1849ee921d695bc765d9ce_974345439; xxzl_smartid=e5fa2f7b05d87c127291a8b9c869d017; Hm_lvt_5bcc464efd3454091cf2095d3515ea05=1631255952; gr_user_id=c1d86aed-2c52-438e-ae31-2b78bafda3b6; fzq_js_zhaopin_list_pc=9186665da5bdeef3325b098477c9314f_1631256409752_7; Hm_lpvt_5bcc464efd3454091cf2095d3515ea05=1631256410; fzq_js_infodetailweb=a262eaeae94a0e65caa2c2f5a37bb7ff_1631257205157_6; Hm_lvt_b2c7b5733f1b8ddcfc238f97b417f4dd=1631257205; Hm_lpvt_b2c7b5733f1b8ddcfc238f97b417f4dd=1631257205; ppStore_fingerprint=C335BE454FB81AD76CE728603E7B7AB9401A07A52F0B3EBF%EF%BC%BF1631257206262; Hm_lvt_a3013634de7e7a5d307653e15a0584cf=1631259334; isSmartSortTipShowed=true; param8616=0; ljrzfc=1; wmda_session_id_1731916484865=1631263358082-ff45bcc2-a08f-52b5; utm_source=; new_uv=3; init_refer=https%253A%252F%252Fwh.58.com%252F; spm=; new_session=0; PPU="UID=64731485788170&UN=a7nfk1acl&TT=69a7a0ac8dd2aadfd2c4fae1f8a8d5a4&PBODY=dFCDDZaugxF1PCdTPzBsEOwwvP98N8Eh0F3ukdBsvIvHs_rTw7-qXNvOcxnPe4ghLDGAhk7Y0tD6xi62ED9_zEe5elBOieXR677BUFJd6nrDbawquBCM9IqSxdGv7Z0RyNZ1R8jTu5COSveRJM290xuWxtlCNSih1bZnJHtSWjE&VER=1&CUID=a04Tf6vLsnilkXjPOsrH0Q"; JSESSIONID=D6B42D72127A83BC043DFEDA6EE25517; jl_list_left_banner=9; Hm_lpvt_a3013634de7e7a5d307653e15a0584cf=1631265120; xxzl_cid=20862149cfa1444ab8db2d95095022c9; xzuid=168585f4-7646-4762-b558-26c3057c9b32',
        "referer": "https://wh.58.com/searchjob/?param8616=0&PGTID=0d302409-0009-e9c1-b0be-53e53305b2b2&ClickID=5",
        "sec-ch-ua": '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
        "sec-ch-ua-mobile": "?0",
        "sec-ch-ua-platform": "'Windows'",
        "sec-fetch-dest": "document",
        "sec-fetch-mode": "navigate",
        "sec-fetch-site": "same-origin",
        "sec-fetch-user": "?1",
        "upgrade-insecure-requests": "1",
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36",
        }

    response = requests.get(url=url, headers=headers).text
    # print(response)
    result = re.search(r"base64,(.*?)\)", response, flags=re.S).group(1)
    # print(result)
    b = base64.b64decode(result)
    # print(b)
    with open("ztku2.woff", "wb") as f:
        f.write(b)

    fonts = TTFont("ztku2.woff")
    fonts.saveXML("ztku2.xml")

Make two files, woff and xml. Next you need FontCreator, a software that you can download free resources online

When the download is complete, open our FontCreator and open two woff files (the program runs twice to make two different files)



This page opens and we click the first word of the first woff1 file - the last line in the prompt box you see

$E082

The last line of code for the word in the second diagram is

$EC04

Write down and two numbers, and we'll open two more xml files

Find <contour> for each of these two codes, and find that think of x, y as coordinates.
Let's try using the two coordinates x1-x2 y1-y2 selected in the diagram
A subtraction of both will lead to a surprising discovery

Find two identical values whose first two coordinates are subtracted by the same value, so we can find their rules. We find out each rule, store it in a dictionary, and let each one

&#xe8b5;-&#xef32;year

You can replace it with replace, replace the special code in the source code with our font, and crack it!

Posted by Renegade85 on Sat, 11 Sep 2021 10:21:56 -0700