Eloquent JavaScript Notes 11: A Programming Language

Keywords: Attribute REST TypeScript


1. Egg language example

do(define(x, 10),
   if(>(x, 5),
      print("large"),
      print("small")))
Equivalent to js code:

x = 10;
if(x>5)
  print("large");
else
  print("small");

1. Parser

Read a piece of code and convert it into a data structure, which can accurately reflect the program structure of the code.

Everything in Egg is Express and can be divided into four categories:

1. variable

Arbitrary character sequences, excluding spaces, cannot be reserved words

2. number

A bunch of numbers

3. string

Double quotation marks contain a paragraph of text that does not support escape characters

4. application

function, if, while, >, +, -,*, / etc.

For example, in the first paragraph of the code above, there are as many left parentheses as there are applications, with parameters in parentheses. Do, define, if, >, print are all applications.

The operator is application, and the order of writing is different from js. > (x, 5) corresponds to x > 5. The latter two print expressions are parameters of if and are also two branches.

Parser converts each expression into an object, and all expressions form an object tree. For example:

> (x,5)

It will be interpreted as:

{
  type: "apply",
  operator: {type: "word", name: ">"},
  args: [
    {type: "word", name: "x"},
    {type: "value", value: 5}
  ]
}

The expression object has three type s:

1. "value"

String or numeric value, with a value attribute that contains string and numeric values. As constant 5 above.

2. "word" 

Variable, with a name attribute, holds the name of the identifier. For example, the variables x and > above.

3. "apply"

application, which has an operator attribute, saves the expression of the operation, has an args attribute, and saves an array of parameters. As above > (x, 5). Note that > itself is a "word" type, which is combined with parentheses and parameters.

1. Expression Tree


This is the expression tree of the first piece of code. Each dot in the tree is an Expression, and all the subnodes are of the "apply" type.

1. Parser implementation code

function skipSpace(string) {
    var first = string.search(/\S/);
    if (first == -1)
        return "";
    return string.slice(first);
}

function parseExpression(program) {
    program = skipSpace(program);
    var match, expr;
    if (match = /^"([^"]*)"/.exec(program))
        expr = {type: "value", value: match[1]};
    else if (match = /^\d+\b/.exec(program))
        expr = {type: "value", value: Number(match[0])};
    else if (match = /^[^\s(),"]+/.exec(program))
        expr = {type: "word", name: match[0]};
    else
        throw new SyntaxError("Unexpected syntax: " + program);

    return parseApply(expr, program.slice(match[0].length));
}

function parseApply(expr, program) {
    program = skipSpace(program);
    if (program[0] != "(")
        return {expr: expr, rest: program};

    program = skipSpace(program.slice(1));
    expr = {type: "apply", operator:expr, args:[]};
    while (program[0] != ")") {
        var arg = parseExpression(program);
        expr.args.push(arg.expr);
        program = skipSpace(arg.rest);
        if (program[0] == ",")
            program = skipSpace(program.slice(1));
        else if (program[0] != ")")
            throw new SyntaxError("Expected ',' or ')'");
    }

    return parseApply(expr, program.slice(1));
}

function parse(program) {
    var result = parseExpression(program);
    if (skipSpace(result.rest).length > 0)
        throw new SyntaxError("Unexpected text after program");
    return result.expr;
}

console.log(parse("+(a, 10)"));

This is the complete Parser code, the core of which is parseExpression(). This is a recursive function that inputs a string and returns an expression object and the remaining string. Subexpressions are also resolved with this function. Each black dot in the figure above needs to be generated by calling parseExpression().

The code itself is still a bit difficult. I read it twice and knocked it again before I understood it thoroughly. Fortunately, the code is relatively short. Twenty minutes is enough.


1. evaluator

After parsing the code into a syntax tree, it can be executed.

function evaluate(expr, env) {
    switch (expr.type) {
        case "value":
            return expr.value;
        case "word":
            if (expr.name in env)
                return env[expr.name];
            else
                throw new ReferenceError("Undefined variable: " + expr.name);
        case "apply":
            if (expr.operator.type == "word" &&
                    expr.operator.name in specialForms) {
                return specialForms[expr.operator.name](expr.args, env);
            }

            var op = evaluate(expr.operator, env);
            if (typeof op != "function") {
                throw new TypeError("Applying a non-function.");
            }

            return op.apply(null, expr.args.map(function (arg) {
                return evaluate(arg, env);
            }));
    }
}
The first parameter of evaluation () is expression, which is the parsed grammar tree. Of course, it can also be a part, as long as it is a legitimate expression. The second parameter is env, which contains all defined variables (similar to global variables of js), predefined operators of Egg language (e.g. +, -,*, / etc.) and keywords (true, false, etc.).

Three expression types are processed in the evaluation () function body, and the "value" and "word" types are handled very simply. Processing of the "apply" type is quite confusing, so you need to look at specialForm and env first:

var specialForms = Object.create(null);

specialForms["if"] = function(args, env) {
    if (args.length != 3)
        throw new SyntaxError("Bad number of args to if");

    if (evaluate(args[0], env) !== false)
        return evaluate(args[1], env);
    else
        return evaluate(args[2], env);
};

specialForms["while"] = function(args, env) {
    if (args.length != 2)
        throw new SyntaxError("Bad number of args to while");

    while (evaluate(args[0], env) !== false)
        evaluate(args[1], env);

    // Since undefined does not exist in Egg, we return false,
    // for lack of a meaningful result.
    return false;
};

specialForms["do"] = function(args, env) {
    var value = false;
    args.forEach(function(arg) {
        value = evaluate(arg, env);
    });
    return value;
};

specialForms["define"] = function(args, env) {
    if (args.length != 2 || args[0].type != "word")
        throw new SyntaxError("Bad use of define");
    var value = evaluate(args[1], env);
    env[args[0].name] = value;
    return value;
};

var topEnv = Object.create(null);

topEnv["true"] = true;
topEnv["false"] = false;

["+", "-", "*", "/", "==", "<", ">"].forEach(function(op) {
    topEnv[op] = new Function("a, b", "return a " + op + " b;");
});

topEnv["print"] = function(value) {
    console.log(value);
    return value;
};

As you can see, in special Forms and topEnv, all operator s associated with application types are functions. So, lines 13 and 16 of evaluate() all return functions.

So far, the parser and evaluator code is complete, and you can execute an Egg code.

Provide a run() function, which can make our code more neat:

function run() {
    var env = Object.create(topEnv);
    var program = Array.prototype.slice.call(arguments, 0).join("\n");
    return evaluate(parse(program), env);
}

run("do(define(total, 0),",
    "   define(count, 1),",
    "   while(<(count, 11),",
    "         do(define(total, +(total, count)),",
    "            define(count, +(count, 1)))),",
    "   print(total))");
// → 55

1. Function

Our Egg language doesn't support function yet. Now add it. Add a keyword: fun.

specialForms["fun"] = function (args, env) {
    if (!args.length)
        throw new SyntaxError("Functions need a body");

    function name(expr) {
        if (expr.type != "word")
            throw new SyntaxError("Arg names must be words");

        return expr.name;
    }

    var argNames = args.slice(0, args.length - 1).map(name);
    var body = args[args.length - 1];

    return function () {
        if (arguments.length != argNames.length)
            throw new TypeError("Wrong number of arguments");

        var localEnv = Object.create(env);
        for (var i=0; i< arguments.length; i++) {
            localEnv[argNames[i]] = arguments[i];
        }

        return evaluate(body, localEnv);
    }
};

It's only a matter of looking forward, and I can't think of such a way to achieve it. Back down, back down.

1. Compilation

The parser and evaluator we just implemented are actually interpreters. You can add a layer of "compilation" between paser and "execution", which is to convert the parsed syntax tree into machine code (machine language). In this way, the efficiency of implementation will be greatly improved. Of course, Egg can also be compiled into js code, which will greatly improve the efficiency of execution. This is what TypeScript's transpiler does.

1. Exercise: Arrays

Array: array(1,2,5) of Egg

Implement array and two methods length(array), element(array, i)

Analysis:

array is followed by parentheses, so it's an apply type. parser does not need to be modified.

For apply types, eveluator is handled by function. Either add a definition in the special Forms or add a definition in the env, in fact, you can. We tend to put special grammatical vocabulary (if, do, while, etc.) in special Forms, then array is in topEnv. Moreover, the print function already exists in topEnv, so it makes sense to put length and element in topEnv.

JS itself has Array, so you just need to convert function al arguments (which is discussed in Chapter 4) into Array of js.

Note that arguments themselves are not Array, but to some extent they are the same type as Array and can be converted into real Array using Array.prototype.slice.call(arguments,0). This is a trick. There are many such tricks in js, and similar tricks will be encountered in later chapters.

topEnv["array"] = function () {
    return Array.prototype.slice.call(arguments, 0);
};

topEnv["length"] = function (array) {
    return array.length;
};

topEnv["element"] = function (array, i) {
    return array[i];
};
After adding array, execute the following Egg code to see what is output:
run("do(define(sum, fun(array,",
    "     do(define(i, 0),",
    "        define(sum, 0),",
    "        while(<(i, length(array)),",
    "          do(define(sum, +(sum, element(array, i))),",
    "             define(i, +(i, 1)))),",
    "        sum))),",
    "   print(sum(array(1, 2, 3))))");

1. Exercise: Closure

run("do(define(f, fun(a, fun(b, +(a, b)))),",
    "   print(f(4)(5)))");
According to the previous definition of fun, this code is converted to js code as follows:

var f = function(a) {
    return function(b) {
        return a + b;
    }
};
console.log(f(4)(5));
Closure is the variable that the inner function can get the outer function, such as a here. How does this work? Look at fun's implementation code, this part:

        var localEnv = Object.create(env);
        for (var i=0; i< arguments.length; i++) {
            localEnv[argNames[i]] = arguments[i];
        }

        return evaluate(body, localEnv);

Note the localEnv, which contains the env passed in from the outer layer, as well as arguments of its own. No matter how many layers of functions are embedded, these envs and arguments will be passed down.

localEnv = Object.create(env) takes env as the prototype of localEnv, and variables are stored as attributes of Env objects, so local variables do not cover global variables. There are many layers of function nesting, which will generate multi-layer prototype, and will not affect each other.

Is that the same principle for losure of js?

1. Exercise: Comments

Modify the skipSpace() method to enable Egg to support single-line annotations that begin with #.

Customary:

function skipSpace(string) {
  var first = string.search(/\S/);
  if (first == -1) return "";
  return string.slice(first);
}

Revised:

function skipSpace(string) {
  var skippable = string.match(/^(\s|#.*)*/);
  return string.slice(skippable[0].length);
}

Note that. in regular expressions does not match n.

Want a go:

console.log(parse("# hello\nx"));
// → {type: "word", name: "x"}

console.log(parse("a # one\n   # two\n()"));
// → {type: "apply",
//    operator: {type: "word", name: "x"},
//    args: []}


1. Exercise: Fixing Scope

Assign variables to set:

specialForms["set"] = function(args, env) {
  if (args.length != 2 || args[0].type != "word")
    throw new SyntaxError("Bad use of set");
  var varName = args[0].name;
  var value = evaluate(args[1], env);

  for (var scope = env; scope; scope = Object.getPrototypeOf(scope)) {
    if (Object.prototype.hasOwnProperty.call(scope, varName)) {
      scope[varName] = value;
      return value;
    }
  }
  throw new ReferenceError("Setting undefined variable " + varName);
};

Usage method:

run("do(define(x, 4),",
    "   define(setx, fun(val, set(x, val))),",
    "   setx(50),",
    "   print(x))");
// → 50
run("set(quux, true)");
// → Some kind of ReferenceError

Here again, the implementation of variable scope is reiterated.


Posted by stylezeca on Thu, 13 Dec 2018 07:06:06 -0800