Ruby 2.x Source Code Learning: Syntax Analysis & Intermediate Code Generation Method Analysis

Keywords: Ruby

Preface

This paper analyses how Ruby parses the top-level method definition, assuming that the reader has the basic knowledge of Compiler Principles and understands the basic use of yacc, bison (automatic parser) tools.

BNF syntax

parser.y contains all the syntax of the Ruby language. Below are the fragments related to functions (the parser.y file has more than 1W lines)
We focus on the grammar of function definitions, ignoring YACC grammar actions (the same below).

// parse.y

primary : k_def fname f_arglist bodystmt k_end
  • k_def, keyword def

  • fname, function name

  • f_arglist, list of function parameters

  • bodystmt, function internal statement block

  • k_end, keyword end

f_arglist

From the name, you can see that f_arglist represents a list of function parameters. Here is the f_arglist grammar definition.

// parse.y

f_arglist    : '(' f_args rparen
        |  f_args term
        ;

Ruby function definition can omit left and right parentheses

f_args

Ruby supports a variety of "exotic flower" function parameter transfer methods. The grammatical definition of f_args takes into account a variety of combinations, starting with the simplest:

// parse.y

f_args : f_arg opt_args_tail

f_arg : f_arg_item | f_arg ',' f_arg_item

f_arg_item : f_arg_asgn | tLPAREN  f_margs rparen

f_arg_asgn : f_norm_arg

f_norm_arg : f_bad_arg | tIDENTIFIER 

Each function parameter is separated by commas. Without considering the type of parameter (x), each parameter is a tIDENTIFIER (identifier).

Scope of action

The Context of Grammatical Analysis

Grammatical analysis is a very complicated and tedious process. Ruby uses parser_params structure as the abstraction of the context of grammatical analysis. It preserves the state variables in the process of grammatical analysis, including lexical analysis. The fields related to scope are listed below.

// parse.y or parse.c

struct parser_params {
    ...
    struct local_vars *lvtbl;
    ...
}

struct local_vars {
    struct vtable *args;
    struct vtable *vars;
    struct vtable *used;

    struct local_vars *prev;
    stack_type cmdargs;
}

struct vtable {
    ID *tbl;
    int pos;
    int capa;
    struct vtable *prev;
};

The local_vars structure stores parameters and local variables, and points to the upper local_vars (stack) through the prev pointer.

Scope chain (stack)

Now let's look at the YACC grammatical action of function definition

// parse.y

k_def fname
    {
        local_push(0);
        $<id>$ = current_arg;
        current_arg = 0;
    }
    {
        $<num>$ = in_def;
        in_def = 1;
    }
f_arglist
bodystmt
k_end

local_push creates a new scope and connects to the scope stack

// parse.y or parse.c

static void local_push_gen(struct parser_params*,int);
#define local_push(top) local_push_gen(parser,(top))

#define lvtbl            (parser->lvtbl)

static void
local_push_gen(struct parser_params *parser, int inherit_dvars)
{
    struct local_vars *local;

    // Allocated memory
    local = ALLOC(struct local_vars);
    // Link local s to scope chains
    local->prev = lvtbl;
    // Allocated memory
    local->args = vtable_alloc(0);
    local->vars = vtable_alloc(inherit_dvars ? DVARS_INHERIT : DVARS_TOPSCOPE);
    local->used = !(inherit_dvars &&
            (ifndef_ripper(compile_for_eval || e_option_supplied(parser))+0)) &&
    RTEST(ruby_verbose) ? vtable_alloc(0) : 0;
# if WARN_PAST_SCOPE
    local->past = 0;
# endif
    local->cmdargs = cmdarg_stack;
    CMDARG_SET(0);
    // Update the current scope, note: lvtbl is a macro definition!!!
    lvtbl = local;
}

parameter

We already know that when defining a function, the parser creates a new local_vars and adds it to the action chain. How do the function parameters be added to the scope? Let's look at a grammatical rule for function parameters:

// parse.y

f_arg_asgn    : f_norm_arg
{
    ID id = get_id($1);
    arg_var(id);
    current_arg = id;
    $$ = $1;
}
;

The answer lies in the arg_var method:

// parse.y or parse.c

static void arg_var_gen(struct parser_params*, ID);
#define arg_var(id) arg_var_gen(parser, (id))

static void arg_var_gen(struct parser_params *parser, ID id)
{
    vtable_add(lvtbl->args, id);
}

static void vtable_add(struct vtable *tbl, ID id)
{
    if (!POINTER_P(tbl)) {
        rb_bug("vtable_add: vtable is not allocated (%p)", (void *)tbl);
    }
    if (VTBL_DEBUG) printf("vtable_add: %p, %"PRIsVALUE"\n", (void *)tbl, rb_id2str(id));

    // tbl space is insufficient, expanding to 
    if (tbl->pos == tbl->capa) {
        tbl->capa = tbl->capa * 2;
        REALLOC_N(tbl->tbl, ID, tbl->capa);
    }
    //Put id into tbl
    tbl->tbl[tbl->pos++] = id;
}

local variable

How do function parameters be added to scopes? What about local variables? Do local variables have arg_var-like method calls? Let's first think about when a local variable is usually created: for dynamic scripting languages like Ruby, there is no variable declaration grammar in C, so it is automatically created when the variable is assigned (first used). Let's test this conjecture, or let's first look at a grammatical rule:

// parse.y

lhs : user_variable
{
    $$ = assignable($1, 0);
    /*%%%*/
    if (!$$) $$ = NEW_BEGIN(0);
}

The assignable function is more complex. The following code snippets are only related to the definition of local variables:

// parse.y or parse.c

static NODE* assignable_gen(struct parser_params *parser, ID id, NODE *val) {
    switch (id_type(id)) {
        case ID_LOCAL:
            if (dyna_in_block()) {
                if (dvar_curr(id)) {
                    ...
                } else if (dvar_defined(id)) {
                    ...
                } else if (local_id(id)) {
                    ...
                } else {
                    dyna_var(id)
                }
            } else {
                if (!local_id(id)) {
                    local_var(id);
                }
            }
    }
}

Do the corresponding processing according to whether id is in block scope or local scope

Generate AST

Generating YARV Virtual Machine Instructions

Posted by Piba on Sat, 06 Apr 2019 15:51:30 -0700