Summary
At present, wepack seems to have become one of the indispensable tools in front-end development, and his idea of "all modules" with the continuous iteration of webpack version (webpack 4) makes it faster and more efficient to package for our front-end engineering services.
I believe you are very skilled in using webpack. He uses a configuration object, including the configuration of entries, exits, plug-ins and so on. Then he packs the whole project project internally according to this configuration object, and cuts in from a js file (this is a single entry, of course, it can also set up multi-entry file packaging). All dependent files in this file are packaged through a specific load. Er and plug-ins will be packaged according to our needs, so that in the face of the current ES6, scss, less, postcss can be happily used, packaging tools will help us to let them run correctly on the browser. It saves time, effort and worry.
What are the core principles of packaging tools today? Today I'm going to simulate a small packaging tool to explore its core principles. Some of the knowledge in this article is point to point, no deep excavation, if interested, you can consult the information by yourself.
Skills are still shallow, but the entry-level understanding of the core principles of packaging tools, simple functions
Project address
MiniPack: Click github
principle
When we get to know the language of JavaScript more deeply, it is helpful for us to know some implementations of the lower level of javascript. Of course, it will be more helpful for us to improve our own technology.
JavaScript is a weak type of interpretive language, that is to say, we do not need a compiler to compile a version for us to execute before we execute it. For javascript, there is also a compilation process, but in most cases, compilation occurs in a few microseconds before the code is executed, and it will be executed as soon as possible after compilation is completed. That is to say, compile dynamically according to the execution of the code. In the process of compiling, we can get a grammar tree through grammar and lexical analysis. We can call it AST (Abstract Syntax Tree), also known as AST grammar tree, which refers to the tree structure corresponding to the source code grammar. That is to say, the source code of a programming language maps the statements in the source code to every node in the tree by constructing a grammar tree. ] And this AST makes us analyze the core of packaging tools.
We are all familiar with babel. What he makes front-end programmers happy is that he can let us write ES6, ES7, ES8... and so on. He will help us turn all into ES5 version which browsers can execute. Its core is to analyze the ES6 version grammar we write through a babylon js lexical parsing engine to get AST (abstract grammar tree), and then through the analysis of the ES6 version grammar we write. The depth traversal of the grammar tree modifies the structure and data of the tree. Finally, the ES5 grammar is generated by collating and modifying AST. This is the core of our use of babel. Here's an example of a grammar tree
Files to be converted (index.js)
// es6 index.js import add from './add.js' let sum = add(1, 2); export default sum // ndoe build.js const fs = require('fs') const babylon = require('babylon') // Read file content const content = fs.readFileSync(filePath, 'utf-8') // Generate AST through babylon const ast = babylon.parse(content, { sourceType: 'module' }) console.log(ast)
Execution file (build.js in node environment)
// node build.js // Introducing fs and babylon engines const fs = require('fs') const babylon = require('babylon') // Read file content const content = fs.readFileSync(filePath, 'utf-8') // Generate AST through babylon const ast = babylon.parse(content, { sourceType: 'module' }) console.log(ast)
Generated AST
ast = { ... ... comments:[], tokens:[Token { type: [KeywordTokenType], value: 'import', start: 0, end: 6, loc: [SourceLocation] }, Token { type: [TokenType], value: 'add', start: 7, end: 10, loc: [SourceLocation] }, Token { type: [TokenType], value: 'from', start: 11, end: 15, loc: [SourceLocation] }, Token { type: [TokenType], value: './add.js', start: 16, end: 26, loc: [SourceLocation] }, Token { type: [KeywordTokenType], value: 'let', start: 27, end: 30, loc: [SourceLocation] }, Token { type: [TokenType], value: 'sum', start: 31, end: 34, loc: [SourceLocation] }, ... ... Token { type: [KeywordTokenType], value: 'export', start: 48, end: 54, loc: [SourceLocation] }, Token { type: [KeywordTokenType], value: 'default', start: 55, end: 62, loc: [SourceLocation] }, Token { type: [TokenType], value: 'sum', start: 63, end: 66, loc: [SourceLocation] }, ] }
The above example is the AST grammar tree. When Babylon analyses the source code, it reads letters one by one like a scanner, and then analyses the grammar tree. (For grammar trees and babylon, you can refer to https://www.jianshu.com/p/019... . By traversing his attributes or values to modify according to the corresponding algorithm rules to re-compose the code. When we analyze our normal js files, AST will be very large, even tens of thousands, hundreds of thousands of lines, so we need excellent algorithms to ensure speed and efficiency. The following project uses babel-traverse to parse AST. Interested in the algorithm can go to understand. The knowledge points described in the above part are not in-depth. The reasons are as follows: the title, just to explore the principle of packaging tools, specific knowledge points interested in their own understanding. The principle part will be introduced here. Now we will start to carry out the actual combat.
Project directory
├── README.md ├── package.json ├── src │ ├── lib │ │ ├── bundle.js // Generate packaged files │ │ ├── getdep.js // Obtaining file dependencies from AST │ │ └── readcode.js //Read file code, generate AST, process AST, and convert ES6 code │ └── pack.js // Exposure Tool Entry Method └── yarn.lock
Mind mapping
Thoughts can be listed more clearly through mind mapping
Concrete realization
In the process combing, we found that the key point is to find the dependencies in each file. We use deps to collect dependencies. Thus, the dependency relationship is packaged layer by layer through dependency relationship. Step by step
Carding process mainly through code + interpretation
Read file code
First, we need a path to the entry file. We need the fs module of node to read the code in the specified file. Then we need the babylon module mentioned above to analyze the code to get the AST syntax tree. Then we need the babel-traverse library to get the module (path) information containing import from the AST code, that is, dependencies. We push the relative paths of all dependent files of the current module into an array of deps. So that we can go through it later to find dependencies.
const fs = require('fs') // Analysis engine const babylon = require('babylon') // traverse traversal of grammar tree and other operations const traverse = require('babel-traverse').default // Grammar Conversion Provided by babel const { transformFromAst } = require('babel-core') // Read File Code Function const readCode = function (filePath) { if(!filePath) { throw new Error('No entry file path') return } // Dependency Collection of Current Modules const deps = [] const content = fs.readFileSync(filePath, 'utf-8') const ast = babylon.parse(content, { sourceType: 'module' }) // Analyse AST and get import module information (path) // The ImportDeclaration method is a callback when traversing to import. traverse(ast, { ImportDeclaration: ({ node }) => { // Depending on push to deps // If there are multiple dependencies, use arrays deps.push(node.source.value) } }) // Conversion of es6 to es5 const {code} = transformFromAst(ast, null, {presets: ['env']}) // Return an object // Path, dependency, transformed es5 code // And the id of a module (custom) return { filePath, deps, code, id: deps.length > 0 ? deps.length - 1 : 0 } } module.exports = readCode
I believe that the above code is understandable, the comments in the code are very detailed, so there is no more verbosity here. It should be noted that babel-traverse is a library with little introduction to APIs and details, and can be used in other ways.
In addition, we need to emphasize that the return value of the final function is an object, which contains some important information in the current file (module). deps stores all the dependent file paths that are analyzed by the current module. Finally, we need to recursively traverse all the dependencies of each module, as well as the code. Later dependency collection will be used.
Dependency collection
By reading the file above, we get some important information about a single file (module). FilePath (file path), DEPs (all dependencies of the module), code (converted code), ID (id of the object module)
By defining deps as an array, we store the above important information objects of each file (module) in all dependencies.
Next, we collect the dependencies of the module's dependency module and the dependencies of the module's dependency module through the dependency relationship of the single file entry. We implement the readCode method recursively and circularly, push the object returned by readCode into deps array every time, and finally get all the modules in the dependency chain. Important information about blocks and dependencies.
const readCode = require('./readcode.js') const fs = require('fs') const path = require('path') const getDeps = function (entry) { // Important Information Objects of the Main Entry File Module Returned by Reading File Analysis const entryFileObject = readCode(entry) // deps is an array of important information objects synthesized for each dependency or module // deps is the ultimate core data we mentioned, through which we build the entire package file const deps = [entryFileObject ? entryFileObject : null] // Traversing deps // Get the filePath information and decide whether it's a css file or a js file for (let obj of deps) { const dirname = path.dirname(obj.filePath) obj.deps.forEach(rPath => { const aPath = path.join(dirname, rPath) if (/\.css/.test(aPath)) { // If it is a css file, no recursive readCode analysis code is performed. // Write the code directly into the style tag through the js operation const content = fs.readFileSync(aPath, 'utf-8') const code = ` var style = document.createElement('style') style.innerText = ${JSON.stringify(content).replace(/\\r\\n/g, '')} document.head.appendChild(style) ` deps.push({ filePath: aPath, reletivePaht: rPath, deps, code, id: deps.length > 0 ? deps.length : 0 }) } else { // If it is a js file, continue calling readCode to analyze the code let obj = readCode(aPath) obj.reletivePaht = rPath obj.id = deps.length > 0 ? deps.length : 0 deps.push(obj) } }) } // Return to deps return deps } module.exports = getDeps
Perhaps there is a doubt in the above code, maybe it's when deps traversal collects all dependencies, and it's possible to call them again and again. It's also possible that it's okay for deps to do what this array does in the end. Look down and you'll see later.
output file
Now, we can get all the files and their corresponding dependencies, and the code and id after conversion. Yes, deps returned in the last section (depending on it), there may be some questions in the last section. Next, we will go directly to the code and slowly unravel your doubts.
const fs = require('fs') // Compressed Code Library const uglify = require('uglify-js') // Four parameters // 1. Return values in the previous section of all dependent arrays // 2. Main Entry File Path // 3. Export Document Path // 4. Whether to compress the code of the output file // In addition to the first deps, all three of the above parameters need to be passed into the main entry method of the project to configure the object. const bundle = function (deps, entry, outPath, isCompress) { let modules = '' let moduleId deps.forEach(dep => { var id = dep.id // The key is coming. // Here, the deps module "id" is used as an attribute, and its attribute value is a function. // The body of the function is the "code" of the module currently traversed, that is, the converted code. // Generate a long character // 0: function(......){......}, // 1: function(......){......} // ... modules = modules + `${id}: function (module, exports, require) {${dep.code}},` }); // Self-executing functions, incoming mosaic objects, and deps // require allows us to customize and simulate modularity in commonjs let result = ` (function (modules, mType) { function require (id) { var module = { exports: {}} var module_id = require_moduleId(mType, id) modules[module_id](module, module.exports, require) return module.exports } require('${entry}') })({${modules}},${JSON.stringify(deps)}); function require_moduleId (typelist, id) { var module_id typelist.forEach(function (item) { if(id === item.filePath || id === item.reletivePaht){ module_id = item.id } }) return module_id } ` // Judging Compression if(isCompress) { result = uglify.minify(result,{ mangle: { toplevel: true } }).code } // Write file output fs.writeFileSync(outPath + '/bundle.js', result) console.log('Packing completed[ success](./bundle.js)') } module.exports = bundle
Here is a detailed description. Because we want to output files, there are a lot of strings.
Interpretation 1: modules string
The module string is finally traversed through deps to get a string of
modules = ` 0: function (module, module.exports, require){Code of corresponding module}, 1: function (module, module.exports, require){Code of corresponding module}, 2: function (module, module.exports, require){Code of corresponding module}, 3: function (module, module.exports, require){Code of corresponding module}, ... ... `
If we add "{sum"}"at both ends of the string, is that not an object if it is executed as code? Yeah, so 0, 1, 2, 3... becomes an attribute, and the value of the attribute is a function, so you can call the function directly through the attribute. And the content of this function is the code of each module we need to package after babel conversion.
Interpretation 2: result string
// The self-executing function passes in the module string above with {} (function (modules, mType) { // Customize require function to simulate modularization in commonjs function require (id) { // Define the module object and its exports properties var module = { exports: {}} // Conversion path and id, related functions have been invoked var module_id = require_moduleId(mType, id) // Call the function of the attributes of the incoming modules object modules[module_id](module, module.exports, require) return module.exports } require('${entry}') })({${modules}},${JSON.stringify(deps)}); // The path and id correspond to each other. The purpose is to call the function of the id attribute under the corresponding path. function require_moduleId (typelist, id) { var module_id typelist.forEach(function (item) { if(id === item.filePath || id === item.reletivePaht){ module_id = item.id } }) return module_id }
As for why we need to use the require_modulesId function to transform the relationship between path and id, we need to start with babel bar ES6 to ES5. Here is an example of ES6 to ES5.
ES6 Code:
import a from './a.js' let b = a + a export default b
ES5 Code:
'use strict'; Object.defineProperty(exports, "__esModule", { value: true }); var _a = require('./a.js'); var _a2 = _interopRequireDefault(_a); function _interopRequireDefault(obj) { return obj && obj.__esModule ? obj : { default: obj }; } var b = _a2.default + _a2.default; exports.default = b;
1. The above code is pre-conversion and post-conversion, interested can go. babel official website If you try, you can find that the line code var_a = require ('. / A. js'); the parameter of require he converted for us is the path of the file, and the function of the corresponding module we need to call has its attribute value named by id(0,1,2,3...), so we need to convert it.
2. There may also be some doubts about the use of function (module, module.exports, require) {...} as a form of commonjs modularization. The reason is that babel adopted the commonjs specification for the modularization of our later code.
Last
The last step is to wrap it up and expose an entry function. This step emulates the api of webpack, where a pack method passes in a config configuration object. This allows scripts to be written in package.json to be executed by npm/yarn.
const getDeps = require('./lib/getdep') const bundle = require('./lib/bundle') const pack = function (config) { if(!config.entryPath || !config.outPath) { throw new Error('pack Tool: Configure entry and exit paths') return } let entryPath = config.entryPath let outPath = config.outPath let isCompress = config.isCompression || false let deps = getDeps(entryPath) bundle(deps, entryPath, outPath, isCompress) } module.exports = pack
The incoming config has only three attributes, entryPath, outPath, isCompression.
summary
A simple implementation, just to explore the principle, there is no perfect function and stability. Hope to be helpful to the people you see.
Packaging tools, first through our code files for lexical and grammatical analysis, generate AST, then through the processing of AST, eventually transformed into what we want and browser compatible code, collect the dependencies of each file, and eventually form a dependency chain, and through this dependency relationship finally output the packaged files.
When you first come here, there are some unsound or wrong explanations. Please understand more. If you have any questions, you can exchange them in the comments section. And don't forget your _____________