Syntax analysis of PG database
1, Lexical analysis of gram.y
The lexical parser is explained in the src\backend\parser\gram.y file.
%{ Declarations %} Definitions %% Productions %% User subroutines
In the lexical parser, we talked about the knowledge of. y files. This article mainly explains how to use the lexical parser in PG.
1,Declarations
The included header file defines the functions used later, and defines macros and aliases. This part is C code and will not be introduced in detail.
2,Definitions
The function of Definitions in Bison is similar to that in Flex. Define some Bison specific variables or related options in this section
%purge-parser
Bison is instructed to create a reentrant parser. Unlike ordinary parsers, the type of yylval is union pointer instead of union
%expect
%expect N tells Bison that the parser should have N shift/reduce conflicts. If it does not match, Bison will report a compile time error.
%name-prefix
Name the function name. The default is yy
%Name prefix "base_yy" means that the default yyxx() will become base_yyxx(). For example, yyparse(),yylex(),yyerror(),yylval,yychar and
yydebug.
%locations
position
%parse-param
%The content of parse param declaration is between the parentheses of yyparse(), and any number of parameters can be declared
For example,% parse param {core_yyscan_t yyscanner}, the parameter is core_ yyscan_ t yyscanner.
%lex-param
%The content of Lex param declaration is between the parentheses of yylex(), and any number of parameters can be declared
For example,% Lex param {core_yyscan_t yyscanner}, the parameter is core_ yyscan_ t yyscanner.
%union
%union declares the type used by the identifier in the parser
Bison parser, every identifier, including tokens and non terminators, has a value associated with it. By default, the type of value is integer, but it is far from enough in practical application
%Union can create C language union declarations for identifier values
%union{ core_YYSTYPE core_yystype; /* these fields must match core_YYSTYPE: */ int ival; char *str; const char *keyword; ... }
Where core_ Yytype is of type core_ Yystype consortium
/* * The scanner returns extra data about scanned tokens in this union type. * Note that this is a subset of the fields used in YYSTYPE of the bison * parsers built atop the scanner. */ typedef union core_YYSTYPE{ int ival; /* for integer literals */ char *str; /* for identifiers and non- integer literals */ const char *keyword; /* canonical spelling of keywords */ } core_YYSTYPE;
Once a union is defined, Bison needs to be told which symbols have which types of values by placing the appropriate names in the Union in angle brackets (< >)
%type
Type definition, such as:
%type <node> stmt schema_stmt AlterEventTrigStmt AlterCollationStmt ...
The type representing the identifier / non terminator can be stmt/schema_stmt/AlterEventTrigStmt/…
%nonassoc
Declare non associative operators with% nonassoc.
%left
Left associative operator
%right
Right associative operator
3,Productions
The expression is parsed in turn according to the root node, and finally the results are collected to the root node.
**stmtblock: * * syntax block root node
/* * The target production for the whole parse. */ stmtblock: stmtmulti { pg_yyget_extra(yyscanner)->parsetree = $1; } ;
**stmt: * * all types of nodes
stmt : AlterEventTrigStmt | AlterCollationStmt | AlterDatabaseStmt | AlterDatabaseSetStmt | AlterDefaultPrivilegesStmt | AlterDomainStmt | AlterEnumStmt | AlterExtensionStmt | AlterExtensionContentsStmt | AlterFdwStmt | AlterForeignServerStmt | AlterForeignTableStmt | AlterFunctionStmt | AlterGroupStmt | AlterObjectDependsStmt | AlterObjectSchemaStmt | AlterOwnerStmt | AlterOperatorStmt | AlterPolicyStmt | AlterSeqStmt | AlterSystemStmt | AlterTableStmt | AlterTblSpcStmt | AlterCompositeTypeStmt | AlterPublicationStmt | AlterRoleSetStmt | AlterRoleStmt | AlterSubscriptionStmt | AlterTSConfigurationStmt | AlterTSDictionaryStmt | AlterUserMappingStmt | AnalyzeStmt | CallStmt | CheckPointStmt | ClosePortalStmt | ClusterStmt | CommentStmt | ConstraintsSetStmt | CopyStmt | CreateAmStmt | CreateAsStmt | CreateAssertionStmt | CreateCastStmt | CreateConversionStmt | CreateDomainStmt | CreateExtensionStmt | CreateFdwStmt | CreateForeignServerStmt | CreateForeignTableStmt | CreateFunctionStmt | CreateGroupStmt | CreateMatViewStmt | CreateOpClassStmt | CreateOpFamilyStmt | CreatePublicationStmt | AlterOpFamilyStmt | CreatePolicyStmt | CreatePLangStmt | CreateSchemaStmt | CreateSeqStmt | CreateStmt | CreateSubscriptionStmt | CreateStatsStmt | CreateTableSpaceStmt | CreateTransformStmt | CreateTrigStmt | CreateEventTrigStmt | CreateRoleStmt | CreateUserStmt | CreateUserMappingStmt | CreatedbStmt | DeallocateStmt | DeclareCursorStmt | DefineStmt | DeleteStmt | DiscardStmt | DoStmt | DropCastStmt | DropOpClassStmt | DropOpFamilyStmt | DropOwnedStmt | DropPLangStmt | DropStmt | DropSubscriptionStmt | DropTableSpaceStmt | DropTransformStmt | DropRoleStmt | DropUserMappingStmt | DropdbStmt | ExecuteStmt | ExplainStmt | FetchStmt | GrantStmt | GrantRoleStmt | ImportForeignSchemaStmt | IndexStmt | InsertStmt | ListenStmt | RefreshMatViewStmt | LoadStmt | LockStmt | NotifyStmt | PrepareStmt | ReassignOwnedStmt | ReindexStmt | RemoveAggrStmt | RemoveFuncStmt | RemoveOperStmt | RenameStmt | RevokeStmt | RevokeRoleStmt | RuleStmt | SecLabelStmt | SelectStmt | TransactionStmt | TruncateStmt | UnlistenStmt | UpdateStmt | VacuumStmt | VariableResetStmt | VariableSetStmt | VariableShowStmt | ViewStmt | /*EMPTY*/ { $$ = NULL; } ;
**a_expr: * * end expression parsing
/* * General expressions * This is the heart of the expression syntax. * * We have two expression types: a_expr is the unrestricted kind, and * b_expr is a subset that must be used in some places to avoid shift/reduce * conflicts. For example, we can't do BETWEEN as "BETWEEN a_expr AND a_expr" * because that use of AND conflicts with AND as a boolean operator. So, * b_expr is used in BETWEEN and we remove boolean keywords from b_expr. * * Note that '(' a_expr ')' is a b_expr, so an unrestricted expression can * always be used by surrounding it with parens. * * c_expr is all the productions that are common to a_expr and b_expr; * it's factored out just to eliminate redundant coding. * * Be careful of productions involving more than one terminal token. * By default, bison will assign such productions the precedence of their * last terminal, but in nearly all cases you want it to be the precedence * of the first terminal instead; otherwise you will not get the behavior * you expect! So we use %prec annotations freely to set precedences. */ a_expr: c_expr { $$ = $1; } | a_expr TYPECAST Typename { $$ = makeTypeCast($1, $3, @2); } | a_expr COLLATE any_name { CollateClause *n = makeNode(CollateClause); n->arg = $1; n->collname = $3; n->location = @2; $$ = (Node *) n; } | a_expr AT TIME ZONE a_expr %prec AT { $$ = (Node *) makeFuncCall(SystemFuncName("timezone"), list_make2($5, $1), @2); } /* * These operators must be called out explicitly in order to make use * of bison's automatic operator-precedence handling. All other * operator names are handled by the generic productions using "Op", * below; and all those operators will have the same precedence. * * If you add more explicitly-known operators, be sure to add them * also to b_expr and to the MathOp list below. */ | '+' a_expr %prec UMINUS { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", NULL, $2, @1); } | '-' a_expr %prec UMINUS { $$ = doNegate($2, @1); } | a_expr '+' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", $1, $3, @2); } | a_expr '-' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "-", $1, $3, @2); } | a_expr '*' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", $1, $3, @2); } | a_expr '/' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "/", $1, $3, @2); } | a_expr '%' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "%", $1, $3, @2); } | a_expr '^' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "^", $1, $3, @2); } | a_expr '<' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "<", $1, $3, @2); } | a_expr '>' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, ">", $1, $3, @2); } | a_expr '=' a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "=", $1, $3, @2); } | a_expr LESS_EQUALS a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "<=", $1, $3, @2); } | a_expr GREATER_EQUALS a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, ">=", $1, $3, @2); } | a_expr NOT_EQUALS a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "<>", $1, $3, @2); } | a_expr qual_Op a_expr %prec Op { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, $3, @2); } | qual_Op a_expr %prec Op { $$ = (Node *) makeA_Expr(AEXPR_OP, $1, NULL, $2, @1); } | a_expr qual_Op %prec POSTFIXOP { $$ = (Node *) makeA_Expr(AEXPR_OP, $2, $1, NULL, @2); } | a_expr AND a_expr { $$ = makeAndExpr($1, $3, @2); } | a_expr OR a_expr { $$ = makeOrExpr($1, $3, @2); } | NOT a_expr { $$ = makeNotExpr($2, @1); } | NOT_LA a_expr %prec NOT { $$ = makeNotExpr($2, @1); } | a_expr LIKE a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~", $1, $3, @2); } | a_expr LIKE a_expr ESCAPE a_expr %prec LIKE { FuncCall *n = makeFuncCall(SystemFuncName("like_escape"), list_make2($3, $5), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "~~", $1, (Node *) n, @2); } | a_expr NOT_LA LIKE a_expr %prec NOT_LA { $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~", $1, $4, @2); } | a_expr NOT_LA LIKE a_expr ESCAPE a_expr %prec NOT_LA { FuncCall *n = makeFuncCall(SystemFuncName("like_escape"), list_make2($4, $6), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_LIKE, "!~~", $1, (Node *) n, @2); } | a_expr ILIKE a_expr { $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*", $1, $3, @2); } | a_expr ILIKE a_expr ESCAPE a_expr %prec ILIKE { FuncCall *n = makeFuncCall(SystemFuncName("like_escape"), list_make2($3, $5), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "~~*", $1, (Node *) n, @2); } | a_expr NOT_LA ILIKE a_expr %prec NOT_LA { $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*", $1, $4, @2); } | a_expr NOT_LA ILIKE a_expr ESCAPE a_expr %prec NOT_LA { FuncCall *n = makeFuncCall(SystemFuncName("like_escape"), list_make2($4, $6), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_ILIKE, "!~~*", $1, (Node *) n, @2); } | a_expr SIMILAR TO a_expr %prec SIMILAR { FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"), list_make2($4, makeNullAConst(-1)), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~", $1, (Node *) n, @2); } | a_expr SIMILAR TO a_expr ESCAPE a_expr %prec SIMILAR { FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"), list_make2($4, $6), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "~", $1, (Node *) n, @2); } | a_expr NOT_LA SIMILAR TO a_expr %prec NOT_LA { FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"), list_make2($5, makeNullAConst(-1)), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~", $1, (Node *) n, @2); } | a_expr NOT_LA SIMILAR TO a_expr ESCAPE a_expr %prec NOT_LA { FuncCall *n = makeFuncCall(SystemFuncName("similar_escape"), list_make2($5, $7), @2); $$ = (Node *) makeSimpleA_Expr(AEXPR_SIMILAR, "!~", $1, (Node *) n, @2); } /* NullTest clause * Define SQL-style Null test clause. * Allow two forms described in the standard: * a IS NULL * a IS NOT NULL * Allow two SQL extensions * a ISNULL * a NOTNULL */ | a_expr IS NULL_P %prec IS { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; n->nulltesttype = IS_NULL; n->location = @2; $$ = (Node *)n; } | a_expr ISNULL { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; n->nulltesttype = IS_NULL; n->location = @2; $$ = (Node *)n; } | a_expr IS NOT NULL_P %prec IS { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; n->nulltesttype = IS_NOT_NULL; n->location = @2; $$ = (Node *)n; } | a_expr NOTNULL { NullTest *n = makeNode(NullTest); n->arg = (Expr *) $1; n->nulltesttype = IS_NOT_NULL; n->location = @2; $$ = (Node *)n; } | row OVERLAPS row { if (list_length($1) != 2) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("wrong number of parameters on left side of OVERLAPS expression"), parser_errposition(@1))); if (list_length($3) != 2) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("wrong number of parameters on right side of OVERLAPS expression"), parser_errposition(@3))); $$ = (Node *) makeFuncCall(SystemFuncName("overlaps"), list_concat($1, $3), @2); } | a_expr IS TRUE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_TRUE; b->location = @2; $$ = (Node *)b; } | a_expr IS NOT TRUE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_TRUE; b->location = @2; $$ = (Node *)b; } | a_expr IS FALSE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_FALSE; b->location = @2; $$ = (Node *)b; } | a_expr IS NOT FALSE_P %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_FALSE; b->location = @2; $$ = (Node *)b; } | a_expr IS UNKNOWN %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_UNKNOWN; b->location = @2; $$ = (Node *)b; } | a_expr IS NOT UNKNOWN %prec IS { BooleanTest *b = makeNode(BooleanTest); b->arg = (Expr *) $1; b->booltesttype = IS_NOT_UNKNOWN; b->location = @2; $$ = (Node *)b; } | a_expr IS DISTINCT FROM a_expr %prec IS { $$ = (Node *) makeSimpleA_Expr(AEXPR_DISTINCT, "=", $1, $5, @2); } | a_expr IS NOT DISTINCT FROM a_expr %prec IS { $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_DISTINCT, "=", $1, $6, @2); } | a_expr IS OF '(' type_list ')' %prec IS { $$ = (Node *) makeSimpleA_Expr(AEXPR_OF, "=", $1, (Node *) $5, @2); } | a_expr IS NOT OF '(' type_list ')' %prec IS { $$ = (Node *) makeSimpleA_Expr(AEXPR_OF, "<>", $1, (Node *) $6, @2); } | a_expr BETWEEN opt_asymmetric b_expr AND a_expr %prec BETWEEN { $$ = (Node *) makeSimpleA_Expr(AEXPR_BETWEEN, "BETWEEN", $1, (Node *) list_make2($4, $6), @2); } | a_expr NOT_LA BETWEEN opt_asymmetric b_expr AND a_expr %prec NOT_LA { $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN, "NOT BETWEEN", $1, (Node *) list_make2($5, $7), @2); } | a_expr BETWEEN SYMMETRIC b_expr AND a_expr %prec BETWEEN { $$ = (Node *) makeSimpleA_Expr(AEXPR_BETWEEN_SYM, "BETWEEN SYMMETRIC", $1, (Node *) list_make2($4, $6), @2); } | a_expr NOT_LA BETWEEN SYMMETRIC b_expr AND a_expr %prec NOT_LA { $$ = (Node *) makeSimpleA_Expr(AEXPR_NOT_BETWEEN_SYM, "NOT BETWEEN SYMMETRIC", $1, (Node *) list_make2($5, $7), @2); } | a_expr IN_P in_expr { /* in_expr returns a SubLink or a list of a_exprs */ if (IsA($3, SubLink)) { /* generate foo = ANY (subquery) */ SubLink *n = (SubLink *) $3; n->subLinkType = ANY_SUBLINK; n->subLinkId = 0; n->testexpr = $1; n->operName = NIL; /* show it's IN not = ANY */ n->location = @2; $$ = (Node *)n; } else { /* generate scalar IN expression */ $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "=", $1, $3, @2); } } | a_expr NOT_LA IN_P in_expr %prec NOT_LA { /* in_expr returns a SubLink or a list of a_exprs */ if (IsA($4, SubLink)) { /* generate NOT (foo = ANY (subquery)) */ /* Make an = ANY node */ SubLink *n = (SubLink *) $4; n->subLinkType = ANY_SUBLINK; n->subLinkId = 0; n->testexpr = $1; n->operName = NIL; /* show it's IN not = ANY */ n->location = @2; /* Stick a NOT on top; must have same parse location */ $$ = makeNotExpr((Node *) n, @2); } else { /* generate scalar NOT IN expression */ $$ = (Node *) makeSimpleA_Expr(AEXPR_IN, "<>", $1, $4, @2); } } | a_expr subquery_Op sub_type select_with_parens %prec Op { SubLink *n = makeNode(SubLink); n->subLinkType = $3; n->subLinkId = 0; n->testexpr = $1; n->operName = $2; n->subselect = $4; n->location = @2; $$ = (Node *)n; } | a_expr subquery_Op sub_type '(' a_expr ')' %prec Op { if ($3 == ANY_SUBLINK) $$ = (Node *) makeA_Expr(AEXPR_OP_ANY, $2, $1, $5, @2); else $$ = (Node *) makeA_Expr(AEXPR_OP_ALL, $2, $1, $5, @2); } | UNIQUE select_with_parens { /* Not sure how to get rid of the parentheses * but there are lots of shift/reduce errors without them. * * Should be able to implement this by plopping the entire * select into a node, then transforming the target expressions * from whatever they are into count(*), and testing the * entire result equal to one. * But, will probably implement a separate node in the executor. */ ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("UNIQUE predicate is not yet implemented"), parser_errposition(@1))); } | a_expr IS DOCUMENT_P %prec IS { $$ = makeXmlExpr(IS_DOCUMENT, NULL, NIL, list_make1($1), @2); } | a_expr IS NOT DOCUMENT_P %prec IS { $$ = makeNotExpr(makeXmlExpr(IS_DOCUMENT, NULL, NIL, list_make1($1), @2), @2); } | DEFAULT { /* * The SQL spec only allows DEFAULT in "contextually typed * expressions", but for us, it's easier to allow it in * any a_expr and then throw error during parse analysis * if it's in an inappropriate context. This way also * lets us say something smarter than "syntax error". */ SetToDefault *n = makeNode(SetToDefault); /* parse analysis will fill in the rest */ n->location = @1; $$ = (Node *)n; } ;
2, Semantic analysis
Semantic analysis is done in src\backend\parser\analyze.c. In Exec_ simple_ Get the lexical parse tree (pg_parse_query) in the query function, optimize the parse tree (pg_analyze_and_rewrite), specify execution plans (pg_plans_queries) and execute commands (PortalStart) for the parse tree.
This file mainly introduces how to obtain the lexical parsing tree.
1,pg_parse_query lexical parsing tree acquisition
/* * Do raw parsing (only). * * A list of parsetrees (RawStmt nodes) is returned, since there might be * multiple commands in the given string. * * NOTE: for interactive queries, it is important to keep this routine * separate from the analysis & rewrite stages. Analysis and rewriting * cannot be done in an aborted transaction, since they require access to * database tables. So, we rely on the raw parser to determine whether * we've seen a COMMIT or ABORT command; when we are in abort state, other * commands are not processed any further than the raw parse stage. */ //Pass the complete statement into the function List * pg_parse_query(const char *query_string) { List *raw_parsetree_list; TRACE_POSTGRESQL_QUERY_PARSE_START(query_string); if (log_parser_stats) ResetUsage(); //Call the lexical parser to parse the statement raw_parsetree_list = raw_parser(query_string); if (log_parser_stats) ShowUsage("PARSER STATISTICS"); #ifdef COPY_PARSE_PLAN_TREES /* Optional debugging check: pass raw parsetrees through copyObject() */ { List *new_list = copyObject(raw_parsetree_list); /* This checks both copyObject() and the equal() routines... */ if (!equal(new_list, raw_parsetree_list)) elog(WARNING, "copyObject() failed to produce an equal raw parse tree"); else raw_parsetree_list = new_list; } #endif /* * Currently, outfuncs/readfuncs support is missing for many raw parse * tree nodes, so we don't try to implement WRITE_READ_PARSE_PLAN_TREES * here. */ TRACE_POSTGRESQL_QUERY_PARSE_DONE(query_string); return raw_parsetree_list; }
/* * raw_parser * Given a query in string form, do lexical and grammatical analysis. * * Returns a list of raw (un-analyzed) parse trees. The immediate elements * of the list are always RawStmt nodes. */ List * raw_parser(const char *str) { core_yyscan_t yyscanner; base_yy_extra_type yyextra; int yyresult; //Initialize flex /* initialize the flex scanner */ yyscanner = scanner_init(str, &yyextra.core_yy_extra, &ScanKeywords, ScanKeywordTokens); /* base_yylex() only needs this much initialization */ yyextra.have_lookahead = false; //Initialize bison /* initialize the bison parser */ parser_init(&yyextra); //Perform parsing /* Parse! */ yyresult = base_yyparse(yyscanner); /* Clean up (release memory) */ scanner_finish(yyscanner); if (yyresult) /* error */ return NIL; return yyextra.parsetree; }
Final base_ The yyparse function is called and returns the yyextra.parsetree variable. This variable is assigned in gram.y. Complete parsing.
/* * The target production for the whole parse. */ stmtblock: stmtmulti { pg_yyget_extra(yyscanner)->parsetree = $1; } ;