[Python 3 crawler] anti anti anti climbing: anti climbing

Keywords: Python Javascript encoding

1, Introduction

In the current anti crawling measures used by various websites, it is very common to use JavaScript encryption. Usually, JavaScript is used to encrypt a certain parameter, such as token or sign. In this example, we take this measure to anti crawl, using JavaScript to encrypt a parameter anti token, and this blog is about how to deal with it.

 

2, Site analysis

The site link of this crawling is: https://www.ly.com/hotel/beijing53/?spm0=10002.2001.1.0.1.4.17.

After the page is loaded, open the developer tool, switch to the XHR option, and find the following request:

  

Notice that there is an anti token in the parameter, which is an encrypted string. How to get the anti token?

 

3, Cracking steps

1. Search encryption method

In the developer tool, search anti token globally, find the JS file named list-newest.js, switch to the Sources page, find the JS file and open it, click "{}" in the lower left corner to format it for our reference, as shown below:

  

Search anti token in the JS file, and find a method to get anti token. The specific code is as follows:

e.getantitoken = function() {
    var t = $.cookie("wangba");
    t && void 0 !== t || (t = (new Date).getTime().toString(),
    $.cookie("wangba", t, {
        path: "/",
        domain: "ly.com"
    }));
    return (0,
    r["default"])(t)
}
;

You can see that first you want to get a value named wangba field from the Cookie? Internet Bar? Who knows. If wangba is empty, create a new one, which is actually a 13 bit timestamp.

var t = $.cookie("wangba");
t && void 0 !== t || (t = (new Date).getTime().toString(),

Put a breakpoint on the return line, refresh the page for debugging, and jump to the return method, as shown in the following figure:

  

i n order to know how anti token is generated, we need to know the meaning of each parameter n, i, o, r in this function, so we have to continue debugging the break point.

First of all, n is known through the code as n = a(30). After breaking the point, find the corresponding code of n parameter as follows:

n = {
        rotl: function(t, e) {
            return t << e | t >>> 32 - e
        },
        rotr: function(t, e) {
            return t << 32 - e | t >>> e
        },
        endian: function(t) {
            if (t.constructor == Number)
                return 16711935 & n.rotl(t, 8) | 4278255360 & n.rotl(t, 24);
            for (var e = 0; e < t.length; e++)
                t[e] = n.endian(t[e]);
            return t
        },
        randomBytes: function(t) {
            for (var e = []; t > 0; t--)
                e.push(Math.floor(256 * Math.random()));
            return e
        },
        bytesToWords: function(t) {
            for (var e = [], a = 0, n = 0; a < t.length; a++,
            n += 8)
                e[n >>> 5] |= t[a] << 24 - n % 32;
            return e
        },
        wordsToBytes: function(t) {
            for (var e = [], a = 0; a < 32 * t.length; a += 8)
                e.push(t[a >>> 5] >>> 24 - a % 32 & 255);
            return e
        },
        bytesToHex: function(t) {
            for (var e = [], a = 0; a < t.length; a++)
                e.push((t[a] >>> 4).toString(16)),
                e.push((15 & t[a]).toString(16));
            return e.join("")
        },
        hexToBytes: function(t) {
            for (var e = [], a = 0; a < t.length; a += 2)
                e.push(parseInt(t.substr(a, 2), 16));
            return e
        },
        bytesToBase64: function(t) {
            for (var e = [], n = 0; n < t.length; n += 3)
                for (var i = t[n] << 16 | t[n + 1] << 8 | t[n + 2], r = 0; r < 4; r++)
                    8 * n + 6 * r <= 8 * t.length ? e.push(a.charAt(i >>> 6 * (3 - r) & 63)) : e.push("=");
            return e.join("")
        },
        base64ToBytes: function(t) {
            t = t.replace(/[^A-Z0-9+\/]/gi, "");
            for (var e = [], n = 0, i = 0; n < t.length; i = ++n % 4)
                0 != i && e.push((a.indexOf(t.charAt(n - 1)) & Math.pow(2, -2 * i + 8) - 1) << 2 * i | a.indexOf(t.charAt(n)) >>> 6 - 2 * i);
            return e
        }
    },

Then i, i = a(12).utf-8 is known through the code. After breaking the point, find the corresponding code of i parameter as follows:

{
    stringToBytes: function(t) {
        return a.bin.stringToBytes(unescape(encodeURIComponent(t)))
    },
    bytesToString: function(t) {
        return decodeURIComponent(escape(a.bin.bytesToString(t)))
    }
}

Then o, know o = a(12).bin through the code. After breaking the point, find the code corresponding to the o parameter as follows:

{
    stringToBytes: function (t) {
        for (var e = [], a = 0; a < t.length; a++)
            e.push(255 & t.charCodeAt(a));
        return e
    }
,
    bytesToString: function (t) {
        for (var e = [], a = 0; a < t.length; a++)
            e.push(String.fromCharCode(t[a]));
        return e.join("")
    }
}

Here you can define an a12 and take the corresponding method out of it.

var a12 = {
    utf8: {
        stringToBytes: function (e) {
            return a12.bin.stringToBytes(unescape(encodeURIComponent(e)))
        },
        bytesToString: function (e) {
            return decodeURIComponent(escape(a.bin.bytesToString(e)))
        }
    },
    bin: {
        stringToBytes: function (e) {
            for (var t = [], a = 0; a < e.length; a++)
                t.push(255 & e.charCodeAt(a));
            return t
        },
        bytesToString: function (e) {
            for (var t = [], a = 0; a < e.length; a++)
                t.push(String.fromCharCode(e[a]));
            return t.join("")
        }
    }
};

Finally, there is one o parameter left. You can locate the following code through breakpoint debugging:

  

 

 

It can be seen that null is enough for this parameter o. So far, we have got all the parameters in the encryption method. The next thing we need to say is how to implement encryption to get anti token.

2. Implementation of encryption method

To implement the encryption method, we need to know that two parameters are passed in during encryption, one is a 13 bit timestamp, the other is a null value. Through debugging, we can see that the screenshot is as follows:

  

Organize the previous parameters and methods to get the following JavaScript code:

  1 //Definition antitoken
  2 function antitoken(e) {
  3     var a12 = {
  4         utf8: {
  5             stringToBytes: function (e) {
  6                 return a12.bin.stringToBytes(unescape(encodeURIComponent(e)))
  7             },
  8             bytesToString: function (e) {
  9                 return decodeURIComponent(escape(a.bin.bytesToString(e)))
 10             }
 11         },
 12         bin: {
 13             stringToBytes: function (e) {
 14                 for (var t = [], a = 0; a < e.length; a++)
 15                     t.push(255 & e.charCodeAt(a));
 16                 return t
 17             },
 18             bytesToString: function (e) {
 19                 for (var t = [], a = 0; a < e.length; a++)
 20                     t.push(String.fromCharCode(e[a]));
 21                 return t.join("")
 22             }
 23         }
 24     };
 25     var t = null;
 26     var n, i, o, s, r;
 27     n = {
 28         rotl: function (e, t) {
 29             return e << t | e >>> 32 - t
 30         },
 31         rotr: function (e, t) {
 32             return e << 32 - t | e >>> t
 33         },
 34         endian: function (e) {
 35             if (e.constructor == Number)
 36                 return 16711935 & n.rotl(e, 8) | 4278255360 & n.rotl(e, 24);
 37             for (var t = 0; t < e.length; t++)
 38                 e[t] = n.endian(e[t]);
 39             return e
 40         },
 41         randomBytes: function (e) {
 42             for (var t = []; e > 0; e--)
 43                 t.push(Math.floor(256 * Math.random()));
 44             return t
 45         },
 46         bytesToWords: function (e) {
 47             for (var t = [], a = 0, n = 0; a < e.length; a++,
 48                 n += 8)
 49                 t[n >>> 5] |= e[a] << 24 - n % 32;
 50             return t
 51         },
 52         wordsToBytes: function (e) {
 53             for (var t = [], a = 0; a < 32 * e.length; a += 8)
 54                 t.push(e[a >>> 5] >>> 24 - a % 32 & 255);
 55             return t
 56         },
 57         bytesToHex: function (e) {
 58             for (var t = [], a = 0; a < e.length; a++)
 59                 t.push((e[a] >>> 4).toString(16)),
 60                     t.push((15 & e[a]).toString(16));
 61             return t.join("")
 62         },
 63         hexToBytes: function (e) {
 64             for (var t = [], a = 0; a < e.length; a += 2)
 65                 t.push(parseInt(e.substr(a, 2), 16));
 66             return t
 67         },
 68         bytesToBase64: function (e) {
 69             for (var t = [], n = 0; n < e.length; n += 3)
 70                 for (var i = e[n] << 16 | e[n + 1] << 8 | e[n + 2], o = 0; o < 4; o++)
 71                     8 * n + 6 * o <= 8 * e.length ? t.push(a.charAt(i >>> 6 * (3 - o) & 63)) : t.push("=");
 72             return t.join("")
 73         },
 74         base64ToBytes: function (e) {
 75             e = e.replace(/[^A-Z0-9+\/]/gi, "");
 76             for (var t = [], n = 0, i = 0; n < e.length; i = ++n % 4)
 77                 0 != i && t.push((a.indexOf(e.charAt(n - 1)) & Math.pow(2, -2 * i + 8) - 1) << 2 * i | a.indexOf(e.charAt(n)) >>> 6 - 2 * i);
 78             return t
 79         }
 80     },
 81         i = a12.utf8,
 82         o = null,
 83         s = a12.bin,
 84         (r = function (e, t) {
 85                 e.constructor == String ? e = t && "binary" === t.encoding ? s.stringToBytes(e) : i.stringToBytes(e) : o(e) ? e = Array.prototype.slice.call(e, 0) : Array.isArray(e) || (e = e.toString());
 86                 for (var a = n.bytesToWords(e), l = 8 * e.length, c = 1732584193, d = -271733879, p = -1732584194, u = 271733878, m = 0; m < a.length; m++)
 87                     a[m] = 16711935 & (a[m] << 8 | a[m] >>> 24) | 4278255360 & (a[m] << 24 | a[m] >>> 8);
 88                 a[l >>> 5] |= 128 << l % 32;
 89                 a[14 + (l + 64 >>> 9 << 4)] = l;
 90                 var f = r._ff
 91                     , h = r._gg
 92                     , v = r._hh
 93                     , g = r._ii;
 94                 for (m = 0; m < a.length; m += 16) {
 95                     var y = c
 96                         , _ = d
 97                         , b = p
 98                         , $ = u;
 99                     d = g(d = g(d = g(d = g(d = v(d = v(d = v(d = v(d = h(d = h(d = h(d = h(d = f(d = f(d = f(d = f(d, p = f(p, u = f(u, c = f(c, d, p, u, a[m + 0], 7, -680876936), d, p, a[m + 1], 12, -389564586), c, d, a[m + 2], 17, 606105819), u, c, a[m + 3], 22, -1044525330), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 4], 7, -176418897), d, p, a[m + 5], 12, 1200080426), c, d, a[m + 6], 17, -1473231341), u, c, a[m + 7], 22, -45705983), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 8], 7, 1770035416), d, p, a[m + 9], 12, -1958414417), c, d, a[m + 10], 17, -42063), u, c, a[m + 11], 22, -1990404162), p = f(p, u = f(u, c = f(c, d, p, u, a[m + 12], 7, 1804603682), d, p, a[m + 13], 12, -40341101), c, d, a[m + 14], 17, -1502002290), u, c, a[m + 15], 22, 1236535329), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 1], 5, -165796510), d, p, a[m + 6], 9, -1069501632), c, d, a[m + 11], 14, 643717713), u, c, a[m + 0], 20, -373897302), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 5], 5, -701558691), d, p, a[m + 10], 9, 38016083), c, d, a[m + 15], 14, -660478335), u, c, a[m + 4], 20, -405537848), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 9], 5, 568446438), d, p, a[m + 14], 9, -1019803690), c, d, a[m + 3], 14, -187363961), u, c, a[m + 8], 20, 1163531501), p = h(p, u = h(u, c = h(c, d, p, u, a[m + 13], 5, -1444681467), d, p, a[m + 2], 9, -51403784), c, d, a[m + 7], 14, 1735328473), u, c, a[m + 12], 20, -1926607734), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 5], 4, -378558), d, p, a[m + 8], 11, -2022574463), c, d, a[m + 11], 16, 1839030562), u, c, a[m + 14], 23, -35309556), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 1], 4, -1530992060), d, p, a[m + 4], 11, 1272893353), c, d, a[m + 7], 16, -155497632), u, c, a[m + 10], 23, -1094730640), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 13], 4, 681279174), d, p, a[m + 0], 11, -358537222), c, d, a[m + 3], 16, -722521979), u, c, a[m + 6], 23, 76029189), p = v(p, u = v(u, c = v(c, d, p, u, a[m + 9], 4, -640364487), d, p, a[m + 12], 11, -421815835), c, d, a[m + 15], 16, 530742520), u, c, a[m + 2], 23, -995338651), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 0], 6, -198630844), d, p, a[m + 7], 10, 1126891415), c, d, a[m + 14], 15, -1416354905), u, c, a[m + 5], 21, -57434055), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 12], 6, 1700485571), d, p, a[m + 3], 10, -1894986606), c, d, a[m + 10], 15, -1051523), u, c, a[m + 1], 21, -2054922799), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 8], 6, 1873313359), d, p, a[m + 15], 10, -30611744), c, d, a[m + 6], 15, -1560198380), u, c, a[m + 13], 21, 1309151649), p = g(p, u = g(u, c = g(c, d, p, u, a[m + 4], 6, -145523070), d, p, a[m + 11], 10, -1120210379), c, d, a[m + 2], 15, 718787259), u, c, a[m + 9], 21, -343485551),
100                         c = c + y >>> 0;
101                     d = d + _ >>> 0;
102                     p = p + b >>> 0;
103                     u = u + $ >>> 0;
104                 }
105                 return n.endian([c, d, p, u])
106             }
107         )._ff = function (e, t, a, n, i, o, s) {
108             var r = e + (t & a | ~t & n) + (i >>> 0) + s;
109             return (r << o | r >>> 32 - o) + t
110         };
111 
112     r._gg = function (e, t, a, n, i, o, s) {
113         var r = e + (t & n | a & ~n) + (i >>> 0) + s;
114         return (r << o | r >>> 32 - o) + t
115     };
116 
117     r._hh = function (e, t, a, n, i, o, s) {
118         var r = e + (t ^ a ^ n) + (i >>> 0) + s;
119         return (r << o | r >>> 32 - o) + t
120     };
121 
122     r._ii = function (e, t, a, n, i, o, s) {
123         var r = e + (a ^ (t | ~n)) + (i >>> 0) + s;
124         return (r << o | r >>> 32 - o) + t
125     };
126 
127     r._blocksize = 16;
128     r._digestsize = 16;
129 
130     var a = n.wordsToBytes(r(e, t));
131     return t && t.asBytes ? a : t && t.asString ? s.bytesToString(a) : n.bytesToHex(a);
132 }

This is the encryption method implemented by JavaScript. The parameter e passed in is a 13 bit time stamp. After that, it can be called either by JS or Python. Here you can verify it.

First is the screenshot in the developer tool:

  

Then the code runs as follows:

  

Posted by Gordicron on Fri, 10 Apr 2020 01:06:50 -0700