Preface
This paper analyses how Ruby parses the top-level method definition, assuming that the reader has the basic knowledge of Compiler Principles and understands the basic use of yacc, bison (automatic parser) tools.
BNF syntax
parser.y contains all the syntax of the Ruby language. Below are the fragments related to functions (the parser.y file has more than 1W lines)
We focus on the grammar of function definitions, ignoring YACC grammar actions (the same below).
// parse.y primary : k_def fname f_arglist bodystmt k_end
k_def, keyword def
fname, function name
f_arglist, list of function parameters
bodystmt, function internal statement block
k_end, keyword end
f_arglist
From the name, you can see that f_arglist represents a list of function parameters. Here is the f_arglist grammar definition.
// parse.y f_arglist : '(' f_args rparen | f_args term ;
Ruby function definition can omit left and right parentheses
f_args
Ruby supports a variety of "exotic flower" function parameter transfer methods. The grammatical definition of f_args takes into account a variety of combinations, starting with the simplest:
// parse.y f_args : f_arg opt_args_tail f_arg : f_arg_item | f_arg ',' f_arg_item f_arg_item : f_arg_asgn | tLPAREN f_margs rparen f_arg_asgn : f_norm_arg f_norm_arg : f_bad_arg | tIDENTIFIER
Each function parameter is separated by commas. Without considering the type of parameter (x), each parameter is a tIDENTIFIER (identifier).
Scope of action
The Context of Grammatical Analysis
Grammatical analysis is a very complicated and tedious process. Ruby uses parser_params structure as the abstraction of the context of grammatical analysis. It preserves the state variables in the process of grammatical analysis, including lexical analysis. The fields related to scope are listed below.
// parse.y or parse.c struct parser_params { ... struct local_vars *lvtbl; ... } struct local_vars { struct vtable *args; struct vtable *vars; struct vtable *used; struct local_vars *prev; stack_type cmdargs; } struct vtable { ID *tbl; int pos; int capa; struct vtable *prev; };
The local_vars structure stores parameters and local variables, and points to the upper local_vars (stack) through the prev pointer.
Scope chain (stack)
Now let's look at the YACC grammatical action of function definition
// parse.y k_def fname { local_push(0); $<id>$ = current_arg; current_arg = 0; } { $<num>$ = in_def; in_def = 1; } f_arglist bodystmt k_end
local_push creates a new scope and connects to the scope stack
// parse.y or parse.c static void local_push_gen(struct parser_params*,int); #define local_push(top) local_push_gen(parser,(top)) #define lvtbl (parser->lvtbl) static void local_push_gen(struct parser_params *parser, int inherit_dvars) { struct local_vars *local; // Allocated memory local = ALLOC(struct local_vars); // Link local s to scope chains local->prev = lvtbl; // Allocated memory local->args = vtable_alloc(0); local->vars = vtable_alloc(inherit_dvars ? DVARS_INHERIT : DVARS_TOPSCOPE); local->used = !(inherit_dvars && (ifndef_ripper(compile_for_eval || e_option_supplied(parser))+0)) && RTEST(ruby_verbose) ? vtable_alloc(0) : 0; # if WARN_PAST_SCOPE local->past = 0; # endif local->cmdargs = cmdarg_stack; CMDARG_SET(0); // Update the current scope, note: lvtbl is a macro definition!!! lvtbl = local; }
parameter
We already know that when defining a function, the parser creates a new local_vars and adds it to the action chain. How do the function parameters be added to the scope? Let's look at a grammatical rule for function parameters:
// parse.y f_arg_asgn : f_norm_arg { ID id = get_id($1); arg_var(id); current_arg = id; $$ = $1; } ;
The answer lies in the arg_var method:
// parse.y or parse.c static void arg_var_gen(struct parser_params*, ID); #define arg_var(id) arg_var_gen(parser, (id)) static void arg_var_gen(struct parser_params *parser, ID id) { vtable_add(lvtbl->args, id); } static void vtable_add(struct vtable *tbl, ID id) { if (!POINTER_P(tbl)) { rb_bug("vtable_add: vtable is not allocated (%p)", (void *)tbl); } if (VTBL_DEBUG) printf("vtable_add: %p, %"PRIsVALUE"\n", (void *)tbl, rb_id2str(id)); // tbl space is insufficient, expanding to if (tbl->pos == tbl->capa) { tbl->capa = tbl->capa * 2; REALLOC_N(tbl->tbl, ID, tbl->capa); } //Put id into tbl tbl->tbl[tbl->pos++] = id; }
local variable
How do function parameters be added to scopes? What about local variables? Do local variables have arg_var-like method calls? Let's first think about when a local variable is usually created: for dynamic scripting languages like Ruby, there is no variable declaration grammar in C, so it is automatically created when the variable is assigned (first used). Let's test this conjecture, or let's first look at a grammatical rule:
// parse.y lhs : user_variable { $$ = assignable($1, 0); /*%%%*/ if (!$$) $$ = NEW_BEGIN(0); }
The assignable function is more complex. The following code snippets are only related to the definition of local variables:
// parse.y or parse.c static NODE* assignable_gen(struct parser_params *parser, ID id, NODE *val) { switch (id_type(id)) { case ID_LOCAL: if (dyna_in_block()) { if (dvar_curr(id)) { ... } else if (dvar_defined(id)) { ... } else if (local_id(id)) { ... } else { dyna_var(id) } } else { if (!local_id(id)) { local_var(id); } } } }
Do the corresponding processing according to whether id is in block scope or local scope