Nanny teaching! I will learn to develop babel plug-in this time!

Keywords: Javascript

If you have a basic knowledge of babel, it is recommended to skip the pre knowledge section and go directly to the "plug-in writing" section.

Pre knowledge

What is AST

To learn babel, the necessary knowledge is to understand AST.

What is AST?

Let's take a look at the Wikipedia explanation:

In computer science, Abstract Syntax Tree (AST), or Syntax tree for short, is an abstract representation of the syntax structure of source code. It represents the syntax structure of the programming language in the form of a tree, and each node on the tree represents a structure in the source code

The words "an abstract representation of the syntax structure of source code" should be emphasized, which is the key to our understanding of AST. Speaking human language is to describe our code in a tree data structure according to some agreed specification, so that js engine and translator can understand it.

For example: just as the framework will use virtual dom to describe the real dom structure and then operate, AST is a good tool to describe the code for the lower level code.

Of course, AST is not unique to js. The code of each language can be converted into the corresponding ast, and there are many specifications of AST structure. Most of the specifications used in js are estree Of course, this is just a simple understanding.

What does AST look like

After understanding the basic concept of AST, what does ast look like?

astexplorer.net This website can generate ast online. We can try to generate AST in it to learn the structure

babel process

Q: how many stages does it take to put the refrigerator into the elephant?

Open the refrigerator - > insert the elephant - > close the refrigerator

babel is the same. babel compiles the code in the way of AST. First of all, it naturally needs to change the code into ast, then process the ast, and then convert the ast back after processing

This is the following process

Convert code to ast - > process ast - > Convert AST to code

Then we give them a more professional name

Parse - > Transform - > generate

parse

Turn the source code into an abstract syntax tree (AST) through parser

The main task of this stage is to convert code to AST, which will go through two stages: lexical analysis and syntax analysis. When the parse phase starts, the document is scanned first and lexical analysis is performed during this period. If we compare a code we wrote to a sentence, what lexical analysis does is to split the sentence. Just as the sentence "I'm eating" can be disassembled into "I", "I'm eating" and "eating", so is code. For example, const a = '1' will be disassembled into the most fine-grained words (tokon): 'const', 'a', 'a' = ',' 1 ', which is what the lexical analysis stage does.

After the lexical analysis, the tokens obtained from the analysis are handed over to the syntax analysis. The task of the syntax analysis stage is to generate AST according to the tokens. It will traverse tokens and finally generate a tree according to a specific structure, which is AST.

As shown in the following figure, we can see the structure of the above statement. We found several important information. The outermost layer is a VariableDeclaration, which means variable declaration. The type used is const. There is also a VariableDeclarator [variable declarator] object in the field declarations. We found two keywords a and 1.

In addition to these keywords, you can also find important information such as line number, which will not be described one by one here. In short, this is what we finally got as AST.

That's the problem. How to convert code into AST in babel? At this stage, we will use the parser @ babel/parser provided by babel, formerly known as Babylon. It is not developed by babel team, but acorn project based on fork.

It provides us with a method to convert code into AST. The basic usage is as follows:

More information can be found in the official documentation @babel/parser

transform

After the parse phase, we have successfully obtained the ast. After babel receives the ast, it will use @ babel/traverse to traverse it in depth first. The plug-in will be triggered at this stage to access each different type of AST node in the form of visitor function. Taking the above code as an example, we can write a VariableDeclaration function to access the VariableDeclaration node. This method will be triggered whenever this type of node is encountered. As follows:

The method accepts two parameters,

path

Path is the current access path, and contains node information, parent node information and many methods to operate on the node. These methods can be used to add, update, move and delete ATS.

state

state contains the information and parameter information of the current plugin, and can also be used to customize the transfer of data between nodes.

generate

Generate: print the converted AST into object code and generate sourcemap

This stage is relatively simple. After the AST is processed in the transform stage, the task of this stage is to convert the ast back to code. During this period, the ast will be depth first traversed, the corresponding code will be generated according to the information contained in the node, and the corresponding sourcemap will be generated.

Classic case attempt

As the saying goes, the best learning is hands-on. Let's try a simple classic case: convert the const of es6 in the above case into the var of es5

Step 1: convert to AST

Generate AST using @ babel/parser
It's simple. It's the same as the above case. At this time, the AST variable is the converted AST

const parser = require('@babel/parser');
const ast = parser.parse('const a = 1');

Step 2: process AST

Use @ babel/traverse to process AST

At this stage, by analyzing the generated AST structure, we determine that const is controlled by the kind field in variable declaration, so can we try to rewrite kind into the var we want? In that case, let's try

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
      // Access the actual AST node through path.node
      path.node.kind = 'var'
    }
});

OK, at this time, we modify kind based on Conjecture and change it to var, but we still don't know whether it is actually effective, so we need to convert it back to code to see the effect.

Step 3: generate code

Use @ babel/generator to process AST

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default
const generate = require('@babel/generator').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
      path.node.kind = 'var'
    }
});

// Put the processed AST into generate
const transformedCode = generate(ast).code
console.log(transformedCode)

Let's look at the effect:

The implementation is completed and successful. It is the effect we want ~

How to develop plug-ins

Through the above classic case, I have learned about the use of babel, but how should we write our usual plug-ins?

In fact, the basic idea of plug-in development is the same as that above, but as a plug-in, we only need to pay attention to the transformation stage

Our plug-in needs to export a function / object. If it is a function, it needs to return an object. We only need to do the same thing in the visitor of the changed object, and the function will accept several parameters. The api inherits a series of methods provided by babel. options are the parameters passed when we use the plug-in, and dirname is the file path during processing.

Take the above case as follows:

module.exports = {
    visitor: {
        VariableDeclaration(path, state) {
          path.node.kind = 'var'
        }
    }
}
// Or functional form
module.exports = (api, options, dirname) => {
    return {
        visitor: {
          VariableDeclaration(path, state) {
            path.node.kind = 'var'
          }
        }
    }
}

Plug in writing

On the basis of prior knowledge, we will explain and develop a babel plug-in step by step. First, we define the core requirements of the plug-in to be developed:

  • A function can be automatically inserted and called.
  • Automatically import dependencies related to insert functions.
  • You can specify the function to be inserted and the function to be inserted by annotation. If it is not specified by annotation, the default insertion position is in the first column.

The basic effects are shown as follows:

Before treatment

// log declares the method that needs to be inserted and called
// @inject:log
function fn() {
    console.log(1)
    // Specify the insertion line with @ inject:code
    // @inject:code
    console.log(2)
}

After treatment

// After importing package xxx, provide configuration in plug-in parameters
import log from 'xxx'
function fn() {
    console.log(1)
    log()
    console.log(2)
}

Thought arrangement

After understanding the general requirements, we don't have to worry about it. We have to think about how we want to start. We have envisaged the problems that need to be handled in the process.

  1. Find the function marked with @ inject, and then check whether there is a location mark of @ inject:code inside it.
  2. Import the appropriate packages for all inserted functions.
  3. When the tag is matched, all we need to do is insert the function. At the same time, we also need to deal with the functions in various cases, such as object method, iife, arrow function and so on.

Design plug-in parameters

In order to improve the flexibility of plug-ins, we need to design a more appropriate parameter rule. The plug-in parameter accepts an object.

  • key as the function name of the insertion function.

  • kind represents the import form. There are three import methods named, default and named. This is a design reference babel-helper-module-imports

    • named corresponds to import {a} from "B"
    • default corresponds to import a from "b"
    • Named corresponds to import * as a from "b"
  • require is the dependent package name

For example, I need to insert the log method, which needs to be imported from the package log4js in the form of named, and the parameters are as follows.

// babel.config.js
module.exports = {
  plugins: [
    // Fill in the js file address of our plugin
    ['./babel-plugin-myplugin.js', {
      log: {
        // The import method is named
        kind: 'named',
        require: 'log4js'
      }
    }]
  ]
}

start

OK, after knowing what to do and designing the rules of parameters, we can start.

First we enter https://astexplorer.net/ The code to be processed is generated into AST to facilitate us to sort out the structure, and then we carry out specific coding

The first is the function declaration statement. Let's analyze its AST structure and how to deal with it. Let's take a look at the demo

// @inject:log
function fn() {
    console.log('fn')
}

The AST structure generated is as follows. You can see two key attributes:

  • leadingComments indicates the front comments. You can see that there is an element inside, which is @ inject:log written in our demo
  • Body is the specific content of the function body. The console.log('fn ') written in the demo is in it at this time. We will wait for the code insertion operation to operate it

OK, you can know whether the function needs to be inserted through leading comments. For the body operation, you can realize our code insertion requirements..

First, we have to find the function declaration layer, because only this layer has the leadingComments attribute, and then we need to traverse it to match the function to be inserted. Then insert the matched function into the body, but we need to pay attention to the level of the pluggable body. The body in the function declaration is not an array, but a BlockStatement, which represents the function body of the function, and it also has a body, so our actual operation position is in the body of the BlockStatement

The code is as follows:

module.exports = (api, options, dirname) => {

  return {
    visitor: {
      // Match function declaration node
      FunctionDeclaration(path, state) {
        // path.get('body ') is equivalent to path.node.body
        const pathBody = path.get('body')
        if(path.node.leadingComments) {
          // Filter out all comments matching @ inject:xxx characters
          const leadingComments = path.node.leadingComments.filter(comment => /\@inject:(\w+)/.test(comment.value) )
          leadingComments.forEach(comment => {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // Match successful
            if( injectTypeMatchRes ) {
              // The first matching result is xxx in @ inject:xxx. Let's take it out
              const injectType = injectTypeMatchRes[1]
              // Get the key of the plug-in parameter to see if xxx has been declared in the plug-in parameter
              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // Search for @ code:xxx comments inside the body
                // Because the comment cannot be accessed directly, you need to access the leadingComments attribute of each AST node in the body
                const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                // If not declared, the default insertion position is the first line
                if( codeIndex === -1 ) {
                  // Operate the body of 'BlockStatement'
                  pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}()`)());
                }else {
                  pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}()`)());
                }
              }
            }
          })
        }
      }
  }
})

After writing, we look at the results. Log was successfully inserted. Because @ code:log is not used, it is inserted in the first line by default

Then we try to use the @ code:log identifier. We change the demo code to the following

// @inject:log
function fn() {
    console.log('fn')
    // @code:log
}

Run the code again to check the results. It is indeed successfully inserted at @ code:log

After handling the function declaration of our first case, someone may ask, what do you do with arrow functions without function bodies, such as:

// @inject:log
() => true

Is there a problem? no problem!

Without a function body, we just give it a function body. How to do it?

First, let's learn to analyze the AST structure. First, we see that the outermost layer is actually an ExpressionStatement expression declaration, and then the ArrowFunctionExpression arrow function expression is inside. It can be seen that the structure generated by the function declaration is very different from our previous structure. In fact, we don't have to be fascinated by such a multi-layer structure, We just need to find the information that is useful to us. In a word, we can find the layer on which there are leading comments. The leading comments here are on the expression statement, so we can just find it

After analyzing the structure, how to judge whether there is a function body? Remember the BlockStatement we saw in the body when processing the function declaration above, and the body of our arrow function you saw was Boolean literal. Therefore, we can judge the body type to know whether there is a function body. The specific method can use the type judgment method path.isBlockStatement() provided by babel to distinguish whether there is a function body.

module.exports = (api, options, dirname) => {

  return {
    visitor: {
      ExpressionStatement(path, state) {
        // Access to ArrowFunctionExpression
        const expression = path.get('expression')
        const pathBody = expression.get('body')
        if(path.node.leadingComments) {
          // Does the regular match comment have @ inject:xxx characters
          const leadingComments = path.node.leadingComments.filter(comment => /\@inject:(\w+)/.test(comment.value) )

          leadingComments.forEach(comment => {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // Match successful
            if( injectTypeMatchRes ) {
              // The first matching result is xxx in @ inject:xxx. Let's take it out
              const injectType = injectTypeMatchRes[1]
              // Get the key of the plug-in parameter to see if xxx has been declared in the plug-in parameter


              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // Determine whether there is a function body
                if (pathBody.isBlockStatement()) {
                  // Search for @ code:xxx comments inside the body
                  // Because the comment cannot be accessed directly, you need to access the leadingComments attribute of each AST node in the body
                  const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                  // If not declared, the default insertion position is the first line
                  if( codeIndex === -1 ) {
                    pathBody.node.body.unshift(api.template.statement(`${injectType}()`)());
                  }else {
                    pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${injectType}()`)());
                  }
                }else {
                  // Case without function body
                  // Use the ` @ Babel / template 'API provided by AST to generate ast with code snippets
                  const ast = api.template.statement(`{${injectType}();return BODY;}`)({BODY: pathBody.node});
                 // Replace the original body
                  pathBody.replaceWith(ast);
                }
              }
            }
          })
        }
      }
  }
}
}

It can be seen that except for the judgment of the new function body, the generated function body insertion code, and then the new AST to replace the original node, the logic is generally no different from the processing process of the previous function declaration.

The API usage of @ babel/template used to generate AST can be viewed in the document @babel/template

The functions in different cases are basically the same. The summary is as follows:

Analyze AST, find the node of leadingComments - > find the node of the pluggable body - > write the insertion logic

There are many actual processing situations, such as object attributes, iife, function expressions, etc. the processing ideas are the same, but they are repeated here. I will post the full plug-in code at the bottom of the article.

Automatic introduction

The first item is completed, and the second item of the requirement, how to automatically import the package we use, such as log4js used in the above case, then our processed code should automatically add:

import { log } from 'log4js'

At this point, we can consider that we need to deal with the following two situations

  1. log has been imported
  2. The log variable name is already occupied

For question 1, we need to search whether log4js has been imported and import log in the form of named. For question 2, we need to give log a unique alias and ensure that this alias is also used in subsequent code insertion. Therefore, this requires us to complete the logic of automatic introduction at the beginning of the file.

We have a general idea, but how can we complete the automatic introduction of logic in advance. In doubt, let's look at the structure of AST. You can see that the outermost layer of AST is the File node. It has a comments attribute, which contains all the comments in the current File. With this, we can analyze the functions to be inserted in the File and introduce them in advance. Let's look down. There is a Program inside. We will access it first because it will be called before other types of nodes, so we need to implement automatic introduction of logic at this stage.

Little knowledge: babel provides the path.traverse method, which can be used to synchronously access and process the child nodes under the current node.

As shown in the figure:

The code is as follows:

const importModule = require('@babel/helper-module-imports');

// ......
{
    visitor: {
      Program(path, state) {
        // Copy a copy of the options and hang it on the state. The original options cannot be operated
        state.options = JSON.parse(JSON.stringify(options))

        path.traverse({
          // First, access the original import node to check whether the log has been imported
          ImportDeclaration (curPath) {
            const requirePath = curPath.get('source').node.value;
            // Traversal options
            Object.keys(state.options).forEach(key => {
              const option = state.options[key]
              // The judgment package is the same
              if( option.require === requirePath ) {
                const specifiers = curPath.get('specifiers')
                specifiers.forEach(specifier => {

                  // If it is the default type import
                  if( option.kind === 'default' ) {
                    // Determine import type
                    if( specifier.isImportDefaultSpecifier() ) {
                      // Found an import with an existing default type
                      if( specifier.node.imported.name === key ) {
                        // Hang to identifierName for subsequent calls
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }

                    // In case of named import
                  if( option.kind === 'named' ) {
                    // 
                    if( specifier.isImportSpecifier() ) {
                      // Found an import with an existing default type
                      if( specifier.node.imported.name === key ) {
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }
                })
              }
            })
          }
        });


        // Process packages that are not imported
        Object.keys(state.options).forEach(key => {
          const option = state.options[key]
          // require is required and the identifierName field was not found
          if( option.require && !option.identifierName )  {

            // default form
            if( option.kind === 'default' ) {
              // Add default import
              // Generate a random variable name, roughly like this_ log2
              option.identifierName = importModule.addDefault(path, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name;
            }

            // named form
            if( option.kind === 'named' ) {
              option.identifierName = importModule.addNamed(path, key, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name
            }
          }

          // If require is not passed, it will be considered as a global method and will not be imported
          if( !option.require ) {
            option.identifierName = key
          }
        })
    }
  }
}

In the Program node, we first copied a copy of the received plug-in configuration options and hung it on the state. Previously, it was said that state can be used for data transmission between AST nodes. Then, we first access the import declaration, that is, the import statement under the Program to see whether log4js has been imported. If so, it will be recorded in the identifier name field, After accessing the import statement, we can judge whether it has been imported according to the identifier name field. If not, use the @babel/helper-module-imports Create import and create a unique variable name using the generateUid method provided by babel.

In this way, we also need to adjust the previous code slightly. Instead of directly using the method name extracted from the comment @ inject:xxx, we should use the identifierName. The key part of the code is modified as follows:

if( sourceModuleList.includes(injectType) ) {
  // Determine whether there is a function body
  if (pathBody.isBlockStatement()) {
    // Search for @ code:xxx comments inside the body
    // Because the comment cannot be accessed directly, you need to access the leadingComments attribute of each AST node in the body
    const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
    // If not declared, the default insertion position is the first line
    if( codeIndex === -1 ) {
      // Use identifierName 
      pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}()`)());
    }else {
      // Use identifierName 
      pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}()`)());
    }
  }else {
    // Case without function body
    // Use the ` @ Babel / template 'API provided by AST to generate ast with code snippets

    // Use identifierName 
    const ast = api.template.statement(`{${state.options[injectType].identifierName}();return BODY;}`)({BODY: pathBody.node});
    // Replace the original body
    pathBody.replaceWith(ast);
  }
}

The final effect is as follows:

We implement automatic function insertion and automatic introduction of dependent packages.

ending

This article is a summary of my study of the booklet "babel plug-in customs clearance secrets". I began to be like most of the students who wanted to write babel plug-ins but couldn't start, so this article is mainly written according to the ideas I explored when writing plug-ins. I hope it can also provide you with an idea.

The full version supports the insertion of custom code fragments, and the full code has been uploaded to github , also posted to npm . Welcome to star and issue.

It's a favor to star, not an accident, ha ha.

This article is composed of blog one article multi posting platform OpenWrite release!

Posted by mattclements on Sun, 26 Sep 2021 17:46:15 -0700