Compilation principle: write a js interpreter from 0

Keywords: github

A js interpreter was written a while ago without relying on any third-party libraries.

Most of the basic functions of JS, except for objects, have been written out so far. Share and review the main implementation and technical details here.

github.com/zuluoaaa/ma...

Resolve a fast-ranking function

0 Initialization

We enter a meaningful string of js strings

1 lexical analysis

Traverse through the loop input, interpreting the strings one by one as meaningful data structures (token is used here to represent this data structure)

var a = 1
 Copy Code

The above line of code will be parsed into the following four token s

[{token:"var"},{token:"indent",val:"a"},{token:"assign"},{token:"number",val:1}]
Copy Code

Converting a string to a token array is easy for the above inputs, and we can export the value by reading it one by one and skipping the spaces in it.

However, this is just the beginning. You may also need to process some different inputs, and you will have to read a part's value before you can tell what token it is.

For example, distinguish between == and =, > and >=...

The solution is also simple, paste the implementation code directly

            case "<":
             next = nextChar();
            if(next === "="){
                token.type = tokenTypes.T_LE;//Judge as <=
            }else {
                token.type = tokenTypes.T_LT;//Judge as <
                putBack(next);
            }
Copy Code

PS: In this step, we don't care if the syntax and semantics are correct. We are only responsible for resolving to token, which is the keyword that resolves to the corresponding keyword token; if it is a number, it resolves to the number token, and if it is a letter, it resolves to the variable token.These issues are then addressed by subsequent procedures.

2 Grammatical Analysis

Converts a token to an AST (grammar tree), that is, converts the set of tokens that you get to the grammar structure of the overall connection

This step is the focus of the whole interpreter.

Let's start with an example

var a = 1 + 3 * 2;
Copy Code

Corresponding grammar tree

       =
      /  \
     +    a
    / \
    *  1
  /  \
 3    2
 Copy Code

The reason variable A should be placed on the right, not the left, is that we will subsequently parse the execution AST through a prefix traversal, which makes it easier to evaluate.

The above is just a simple example. The most complex part of this step is dealing with various keywords if, else, function, while...

Processing complex expressions can also be daunting, with operators such as &&, >, <=, +, -, *, /?:...

For instance

let a = 8+6*3-2*5 > 12*3+(1+5) ? 1 : 2;
Copy Code

Anyone unfamiliar with interpreters and compilers (like me before) who parses the above expression into the corresponding AST is often stuck in this step, at a loss, stuck in self-doubt...

Because there are too many details to be dealt with here, I will only tell you the core architecture. If it is implemented, you will be interested to look inside my project code.

Core implementation of the primary method responsible for parsing

function statement(){
    let tree = null,left = null;
    while (true){
        let {token}  = gData;
        //Different keywords, jump to the corresponding parsing function
        switch (token.type) {
            case tokenTypes.T_VAR:
                left = varDeclaration();
                break;
            case tokenTypes.T_IF:
                left = ifStatement();
                break;
            case tokenTypes.T_WHILE:
                left = whileStatement();
                break;
            case tokenTypes.T_FUN:
                left = funStatement();
                break;
            case tokenTypes.T_RETURN:
                left = returnStatement();
                break;
            case tokenTypes.T_EOF://EOF is that the entire input string has been executed, exiting parsing
                return tree;
            default:
                 left = normalStatement();
        }
        //Basically, each loop parses only one line of statements. Here you combine multiple lines of statements and assemble the entire input string into a grammar tree.
        if(left !== null){
            if(tree === null){
                tree = left;
            }else{
                tree = new ASTNode().initTwoNode(ASTNodeTypes.T_GLUE,tree,left,null);
            }
        }
    }
}

function normalStatement() {
    let tree =  parseExpression(0);//Perform expression parsing to get the grammar tree
    semicolon();//Check Comma
    return tree;
}

...
Copy Code

The above is grammar parsing, and the following is the most core expression parsing (for example, parsing arithmetic expressions 1+3* (6+1)

First, define a set of prefix parsing and a set of suffix parsing map s, which automatically go to the corresponding parsing method according to the type, so that we have any new symbols to parse, add them directly to the inside, without changing the implementation inside the function


const prefixParserMap = {
    [tokenTypes.T_IDENT]:identifier,//variable
    [tokenTypes.T_INT]:int,
    [tokenTypes.T_STRING]:str,
    [tokenTypes.T_LPT]:group,//brackets
    [tokenTypes.T_LMBR]:array,//Brackets
    [tokenTypes.T_ADD]:prefix.bind(null,tokenTypes.T_ADD),
    [tokenTypes.T_SUB]:prefix.bind(null,tokenTypes.T_SUB),
};

const infixParserMap = {
    [tokenTypes.T_LPT]:{parser:funCall,precedence:precedenceList.call},
    [tokenTypes.T_QST]:{parser:condition,precedence:precedenceList.condition},//Ternary expression

    [tokenTypes.T_ASSIGN]:{parser:assign,precedence:precedenceList.assign},//=Assignment

    [tokenTypes.T_AND]:{parser:infix.bind(null,precedenceList.and),precedence:precedenceList.and},
    [tokenTypes.T_OR]:{parser:infix.bind(null,precedenceList.and),precedence:precedenceList.and},
    [tokenTypes.T_ADD]:{parser:infix.bind(null,precedenceList.sum),precedence:precedenceList.sum},
    [tokenTypes.T_SUB]:{parser:infix.bind(null,precedenceList.sum),precedence:precedenceList.sum},

    [tokenTypes.T_GT]:{parser:infix.bind(null,precedenceList.compare),precedence:precedenceList.compare},
    [tokenTypes.T_GE]:{parser:infix.bind(null,precedenceList.compare),precedence:precedenceList.compare},
    ...
};

Copy Code

Core implementation of expression parsing using Pratt analysis (also one of recursive descent analysis)

function parseExpression(precedenceValue) {
    let {token} = gData;

    //Gets the prefix parsing function corresponding to the current token
    let prefixParser = prefixParserMap[token.type];

    if(!prefixParser){
        errPrint(`unknown token : ${token.value}(${token.type})`)
    }

    let left = prefixParser();//Execute parse function
    scan();
    if(token.type === tokenTypes.T_SEMI
        || token.type === tokenTypes.T_RPT
        || token.type === tokenTypes.T_EOF
        || token.type === tokenTypes.T_COMMA
        || token.type === tokenTypes.T_COL
        || token.type === tokenTypes.T_RMBR
    ){
        return left;
    }
    let value = getPrecedence();//Gets the priority of the current operator
    while (value>precedenceValue){
// If the precedence of the current operator is greater than the precedence, continue parsing down
// For example, 1+6*7, it is clear that * has a higher priority than + so let's parse 6*7 first and go back to the previous one
        let type = token.type;
        if(token.type === tokenTypes.T_SEMI
            || token.type === tokenTypes.T_RPT
            || token.type === tokenTypes.T_EOF
            || token.type === tokenTypes.T_COMMA
            || token.type === tokenTypes.T_RMBR
        ){
            return left;
        }
        let infix = infixParserMap[type]; 
        scan();
        left = infix.parser(left,type);

        if(infixParserMap[token.type]){
            value = getPrecedence();
        }
    }

    return left;
}
Copy Code

About Pratt parsing, especially the introduction about Pratt written by this big man journal.stuffwithstuff.com/2011/03/19/...

3 Interpret Execution AST

The AST grammar tree obtained from the previous step is traversed through the preamble, executed node by node, evaluated, and accomplished by a simple interpreter.

function interpretAST(astNode,result=null,scope){
    ...

    let leftResult,rightResult;
    if(astNode.left){
        leftResult = interpretAST(astNode.left,null,scope);
    }
    if(astNode.right){
        rightResult = interpretAST(astNode.right,leftResult,scope);
    }

    ...

    switch (astNode.op) {
        case ASTNodeTypes.T_VAR:
            scope.add(astNode.value);
            return;
        case ASTNodeTypes.T_INT:
            return astNode.value;
        case ASTNodeTypes.T_STRING:
            return astNode.value;
        case ASTNodeTypes.T_ADD:
            if(rightResult === null || typeof rightResult === "undefined"){
                return leftResult;
            }
            return leftResult + rightResult;
        case ASTNodeTypes.T_SUB:
            if(rightResult === null || typeof rightResult === "undefined"){
                return -leftResult;
            }
            return leftResult - rightResult;
        case ASTNodeTypes.T_MUL:
            return leftResult * rightResult;
        case ASTNodeTypes.T_DIV:
            return leftResult / rightResult;
        case ASTNodeTypes.T_ASSIGN:
            return rightResult;
        case ASTNodeTypes.T_IDENT:
            return findVar(astNode.value,scope);
        case ASTNodeTypes.T_GE:
            return  leftResult >= rightResult;
        case ASTNodeTypes.T_GT:
            return  leftResult > rightResult;
        ...
Copy Code

Last

Once that's done, interested students can check out the complete implementation on my github repository github.com/zuluoaaa/ma...

Writing may not be very good, if there are any errors, please point out.

Posted by johnSTK on Sat, 02 May 2020 01:52:36 -0700

Programmer Group