WeChat official account: brother K crawler, QQ exchange group: 808574309, keep sharing crawler advance, JS/ Android reverse technology dry cargo!
brief introduction
When analyzing the JavaScript code of some sites, the simple code and functions are usually one by one, for example:
function a() {console.log("a")} function b() {console.log("a")} function c() {console.log("a")}
However, a slightly more complex site usually encounters a code structure similar to the following:
!function(i) { function n(t) { return i[t].call(a, b, c, d) } }([ function(t, e) {}, function(t, e, n) {}, function(t, e, r) {}, function(t, e, o) {} ]);
This writing method is very common in JavaScript and may be very simple for people familiar with JavaScript, but most crawler engineers write code in Python or Java. They may be confused to see this syntax. Because they often encounter it when stripping JS encrypted code, it is very important for crawler engineers to understand this syntax.
It seems that there is no official name for this writing method, which is equivalent to modular programming. Therefore, most people call it webpack. The above example looks laborious. Simply optimize it:
!function (allModule) { function useModule(whichModule) { allModule[whichModule].call(null, "hello world!"); } useModule(0) }([ function module0(param) {console.log("module0: " + param)}, function module1(param) {console.log("module1: " + param)}, function module2(param) {console.log("module2: " + param)}, ]);
Running the above code will output module0: hello world!, I believe that you can understand the general meaning through the simple variable name and function name. Call useModule(0), select the first of all functions and call hello world! Pass to module0 and output.
Carefully observe the above code, we will find that it is mainly used! The syntax of function() {} () and function.call() will be introduced one by one.
Function declaration and function expression
In ECMAScript (a standard of JavaScript), there are two most commonly used methods for creating function objects, that is, using function declarations or function expressions. ECMAScript specification makes it clear that function declarations must always have an identifier, that is, the function name, and function expressions can be omitted.
Function declaration will assign a name to the function and will be loaded into the scope before code execution, so it is possible to call the function before or after the function declaration:
test("Hello World!") function test(arg) { console.log(arg) }
Function expression, create an anonymous function, and then assign the anonymous function to a variable, which will be defined only when the code executes the function expression, so the calling function can run correctly after the function expression, otherwise an error will be reported:
var test = function (arg) { console.log(arg) } test("Hello World!")
IIFE calls the function expression immediately
IIFE is fully called immediate invoked function expressions, which is translated into immediate call function expressions, also known as self executing functions, immediate executing functions, self executing anonymous functions, etc. IIFE is a syntax. In essence, this mode is that function expressions (named or anonymous) are executed immediately after creation. When a function becomes an immediate function expression, the variables in the expression cannot be accessed externally. IIFE is mainly used to isolate the scope and avoid pollution.
IIFE basic syntax
IIFE is written flexibly, mainly in the following formats:
1. Anonymous functions are preceded by unary operators and followed by ():
!function () { console.log("I AM IIFE") }(); -function () { console.log("I AM IIFE") }(); +function () { console.log("I AM IIFE") }(); ~function () { console.log("I AM IIFE") }();
2. After the anonymous function, add (), and then enclose the whole with ():
(function () { console.log("I AM IIFE") }());
3. First enclose the anonymous functions with (), and then add ():
(function () { console.log("I AM IIFE") })();
4. Use the arrow function expression, first enclose the arrow function expression with (), and then add ():
(() => { console.log("I AM IIFE") })()
5. The anonymous function is preceded by the void keyword and followed by (). Void specifies that an expression is to be evaluated or run, but does not return a value:
void function () { console.log("I AM IIFE") }();
Sometimes, we may see the case of immediately executing the semicolon before and after the function, for example:
;(function () { console.log("I AM IIFE") }()) ;!function () { console.log("I AM IIFE") }()
This is because the immediate execution function is usually used as a separate module, which is generally no problem. However, it is recommended to add a semicolon before or after the immediate execution function, so as to effectively isolate it from the previous or subsequent code, otherwise unexpected errors may occur.
IIFE parameter transfer
The parameter transfer can be realized by placing the parameter in the () at the end:
var text = "I AM IIFE"; (function (param) { console.log(param) })(text); // I AM IIFE
var dict = {name: "Bob", age: "20"}; (function () { console.log(dict.name); })(dict); // Bob
var list = [1, 2, 3, 4, 5]; (function () { var sum = 0; for (var i = 0; i < list.length; i++) { sum += list[i]; } console.log(sum); })(list); // 15
Function.prototype.call() / apply() / bind()
Function.prototype.call(), Function.prototype.apply(), and Function.prototype.bind() are commonly used methods. They act as like as two peas, namely changing the this direction in functions.
- The call() method will immediately execute this function and accept one or more parameters separated by commas;
- The apply() method will immediately execute this function and accept an array containing multiple parameters;
- The bind() method does not execute this function immediately, returning a modified function, which is convenient for calling later, and accepts the same parameters as call().
call()
The call() method accepts multiple parameters. The first parameter thisArg specifies the point of this object in the function body. If the function is in non strict mode, it will be automatically replaced with pointing to the global object (window object in the browser) when it is specified as null or undefined. In strict mode, this in the function body is still null. Starting from the second parameter, each parameter is passed into the function in turn. The basic syntax is as follows:
function.call(thisArg, arg1, arg2, ...)
Example:
function test(a, b, c) { console.log(a + b + c) } test.call(null, 1, 2, 3) // 6
function test() { console.log(this.firstName + " " + this.lastName) } var data = {firstName: "John", lastName: "Doe"} test.call(data) // John Doe
apply()
The apply() method accepts two parameters. The first parameter thisArg is consistent with the call() method. The second parameter is a set with subscripts. Starting from ECMAScript version 5, this set can be an array or a class array. The apply() method passes the elements in this set as parameters to the called function. The basic syntax is as follows:
function.apply(thisArg, [arg1, arg2, ...])
Example:
function test(a, b, c) { console.log(a + b + c) } test.apply(null, [1, 2, 3]) // 6
function test() { console.log(this.firstName + " " + this.lastName) } var data = {firstName: "John", lastName: "Doe"} test.apply(data) // John Doe
bind()
The parameters accepted by bind() method and call() method are the same, but bind() returns a function. The basic syntax is as follows:
function.bind(thisArg, arg1, arg2, ...)
Example:
function test(a, b, c) { console.log(a + b + c) } test.bind(null, 1, 2, 3)() // 6
function test() { console.log(this.firstName + " " + this.lastName) } var data = {firstName: "John", lastName: "Doe"} test.bind(data)() // John Doe
Understanding webpack
With the above knowledge, let's understand modular programming, that is, the above-mentioned webpack writing method:
!function (allModule) { function useModule(whichModule) { allModule[whichModule].call(null, "hello world!"); } useModule(0) }([ function module0(param) {console.log("module0: " + param)}, function module1(param) {console.log("module1: " + param)}, function module2(param) {console.log("module2: " + param)}, ]);
First, the whole code is an IIFE immediate call function expression. The passed parameter is an array containing three methods, namely module0, module1 and module2, which can be regarded as three modules. Then the parameter allModule accepted by IIFE contains these three modules, and IIFE also contains a function useModule(), which can be regarded as a module loader, That is, which module to use. In the example, useModule(0) means to call the first module. In the function, use the call() method to change this point in the function, pass parameters, and call the corresponding module for output.
Rewrite webpack
We can easily rewrite the webpack modular writing method often encountered in crawler reverse. Take a piece of encryption code as an example:
CryptoJS = require("crypto-js") !function (func) { function acvs() { var kk = func[1].call(null, 1e3); var data = { r: "I LOVE PYTHON", e: kk, i: "62bs819idl00oac2", k: "0123456789abcdef" } return func[0].call(data); } console.log("Encrypted text:" + acvs()) function odsc(account) { var cr = false; var regExp = /(^\d{7,8}$)|(^0\d{10,12}$)/; if (regExp.test(account)) { cr = true; } return cr; } function mkle(account) { var cr = false; var regExp = /^([a-zA-Z0-9_\.\-\+])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/; if (regExp.test(account)) { cr = true; } return cr; } }([ function () { for (var n = "", t = 0; t < this.r.length; t++) { var o = this.e ^ this.r.charCodeAt(t); n += String.fromCharCode(o) } return encodeURIComponent(n) }, function (x) { return Math.ceil(x * Math.random()) }, function (e) { var a = CryptoJS.MD5(this.k); var c = CryptoJS.enc.Utf8.parse(a); var d = CryptoJS.AES.encrypt(e, c, { iv: this.i }); return d + "" }, function (e) { var b = CryptoJS.MD5(this.k); var d = CryptoJS.enc.Utf8.parse(b); var a = CryptoJS.AES.decrypt(e, d, { iv: this.i }).toString(CryptoJS.enc.Utf8); return a } ]);
It can be seen that the key encryption entry function is acvs(). acvs() calls the first and second functions in the IIFE parameter list. The remaining functions are interference items, and the r and e parameters are used in the first function, which can be directly passed in. Finally, it is rewritten as follows:
function a(r, e) { for (var n = "", t = 0; t < r.length; t++) { var o = e ^ r.charCodeAt(t); n += String.fromCharCode(o) } return encodeURIComponent(n) } function b(x) { return Math.ceil(x * Math.random()) } function acvs() { var kk = b(1e3); var r = "I LOVE PYTHON"; return a(r, kk); } console.log("Encrypted text:" + acvs())
summary
After reading this article, you may think that webpack is just like this. It looks really simple, but in fact, when we analyze specific sites, it is often not as simple as the above examples. This article aims to make you briefly understand the principle of modular programming webpack. Brother K will lead you to analyze more complex webpacks in practice! Please pay attention!