Articles Catalogue
Tao Translation Website http://fanyi.youdao.com/
I. Testing
Get the website first, follow the normal process, bring the requested header and Form Data section, but each response is errorcode 50
II. Analysis
Open debugging window
Enter a test string in the translation box on the left, and then note that the request was initiated by that js
Found this JS file fanyi.min.js, note that min.js is a compressed version of JS file
Right-click the js file - Reveal in Sources panel, which automatically jumps to the Source column as follows
Click on the arrow position {} to format the js file and get the source code
Analysis of the Form Data section of post
Every time you enter a content in the translation box, you will get an asynchronous return result in the XHR column. Look at the response of this request to get our translation result.
Find the header part of the request and find that the parameters of Form Data for each request are changed, and js encryption is often used to encrypt the transmitted data.
- salt
- sign
- ts
- bv
Try to search for these parameters in the fanyi.min.js file just now
define("newweb/common/service", ["./utils", "./md5", "./jquery-1.7"], function(e, t) { var n = e("./jquery-1.7"); e("./utils"); e("./md5"); var r = function(e) { var t = n.md5(navigator.appVersion) , r = "" + (new Date).getTime() , i = r + parseInt(10 * Math.random(), 10); return { ts: r, bv: t, salt: i, sign: n.md5("fanyideskweb" + e + i + "n%A-rKaT5fb[Gy?;N5@Tj") } };
Viewing and testing already probably know the function of these parameters:
ts: A string representing the current timestamp. Note the difference between the js timestamp and python timestamp. The js timestamp is ms-level, so every time you use python, you get a timestamp of *1000.
bv: is a md5 encrypted navigator.appVersion. In the debug window of the console, enter navigator.appVersion. It is found that it is almost the same as the User-agent, but different.
salt: A string of ts and an integer with a random range of [0-9]
sign: Also a md5 encrypted string, e represents the input in the translation box
We already know the principle. Every time we enter the content in the translation box, it will be sent to the server through simple js encryption to verify, then we can get the content we want.
3. Code Implementation
Firstly, the above important parameters are implemented by python code.
Because of the shallow knowledge of js, python code has been debugged many times.
# Get the current timestamp # r = "" + (new Date).getTime() ts = r = str(int(round(time.time(), 3) * 1000)) print(ts) # Timestamp and 0-10 are spliced by strings # salt = i = r + parseInt(10 * Math.random(), 10) i = r + str(random.randint(0, 9)) salt = int(i) print(salt) # bv = t = n.md5(navigator.appVersion) # app_version = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" app_version = "5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" bv = hashlib.md5(app_version.encode(encoding="utf-8")).hexdigest() print(bv) # Get the sign value # sign: n.md5("fanyideskweb" + e + i + "n%A-rKaT5fb[Gy?;N5@Tj") sign = hashlib.md5(("fanyideskweb" + "translate" + i + "n%A-rKaT5fb[Gy?;N5@Tj").encode(encoding="utf-8")).hexdigest() print(sign)
Finally, the data and header parts are spliced together, and the return value is obtained through the requests.post method, and the desired results are obtained.
import time import random import hashlib import requests class YouDao: def __init__(self, url): self.url = url def get_data(self, content): # Get the current timestamp # r = "" + (new Date).getTime() ts = r = str(int(round(time.time(), 3) * 1000)) print(ts) # Timestamp and 0-10 are spliced by strings # salt = i = r + parseInt(10 * Math.random(), 10) i = r + str(random.randint(0, 9)) salt = int(i) print(salt) # bv = t = n.md5(navigator.appVersion) # app_version = "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" app_version = "5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36" bv = hashlib.md5(app_version.encode(encoding="utf-8")).hexdigest() print(bv) # Get the sign value # sign: n.md5("fanyideskweb" + e + i + "n%A-rKaT5fb[Gy?;N5@Tj") sign = hashlib.md5(("fanyideskweb" + content + i + "n%A-rKaT5fb[Gy?;N5@Tj").encode(encoding="utf-8")).hexdigest() print(sign) data = { 'i': content, 'from': "AUTO", 'to': "AUTO", 'smartresult': "dict", 'client': "fanyideskweb", 'salt': salt, 'sign': sign, 'ts': ts, 'bv': bv, 'doctype': "json", 'version': "2.1", 'keyfrom': "fanyi.web", 'action': "FY_BY_REALTlME", 'cache-control': "no-cache", 'typoResult': 'false' } headers = { "Cookie": 'OUTFOX_SEARCH_USER_ID_NCOO=152157970.25320017; OUTFOX_SEARCH_USER_ID="-1201548300@10.169.0.82"; _ga=GA1.2.2012357562.1553937054; _ntes_nnid=6497242c22b4e2ef0d2f403b1c6b7bf2,1560146318693; JSESSIONID=aaarX_cgU_By9_aKjPkXw; ___rl__test__cookies={}'.format(ts), 'Accept': "application/json, text/javascript, */*; q=0.01", 'Origin': "http://fanyi.youdao.com", 'User-Agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36", 'Referer': "http://fanyi.youdao.com/", 'X-Requested-With': "XMLHttpRequest" } return data, headers def translate(self, content): data, headers = self.get_data(content) response = requests.post(self.url, data=data, headers=headers) print(response.text) if __name__ == "__main__": url = "http://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule" instance = YouDao(url) instance.translate("good")
4. Summary of Thoughts
- Find the way to transmit the website data first, and what data the website will encrypt.
- By debugging js to find the encryption rules for these data, this section is usually the most painful.
- python simulates the implementation of encryption in js, and then debugs until you get the desired results
statement
This article mainly provides the use of communicative learning. Do not use it for improper behavior.
Reptiles are just beginning to learn. Correction is welcome if there is something wrong with them.