2021SC@SDUSC
PostgreSQL code I am responsible for: query compilation and execution
Content of this blog: semantic analysis
In my last blog, I analyzed the lexical analysis and syntax analysis of query analysis. We can know that after lexical analysis and syntax analysis, we can get an analysis tree stored in SelectStmt structure. From the flow analysis diagram, we can know that we still need syntax analysis to get the query tree.
Semantic analysis: the semantic analysis phase will check whether there are components in the command that do not meet the semantic requirements and whether the command can be executed correctly.
In the semantic analysis phase, parse in analyze.c file is mainly responsible for semantic analysis_ Analyze function. parse_ The analyze function will generate a corresponding query tree according to the analysis tree, and then the query rewriting module will further modify the query tree and rewrite the query tree into a query tree linked list. In parse_ The analyze function is divided into seven cases according to the command type:
Important structures involved in semantic analysis:
Query: used to store the query tree and is the final output result of query analysis
ParseState: ParseState: used to record intermediate information for semantic analysis
struct ParseState { struct ParseState *parentParseState; //If it is currently a subquery, this field points to its outer query const char *p_sourcetext; //The original sql command is only used to report the location of parsing errors List *p_rtable; //Table involved in query -- range table List *p_joinexprs; //Join expression List *p_joinlist; //Connection item List *p_namespace; //Table name collection, used to check table name conflicts bool p_lateral_active; //Is there an associated reference List *p_ctenamespace;//A collection of public expression names List *p_future_ctes; //Not at p_ Common expressions in ctenamespace CommonTableExpr *p_parent_cte; /* this query's containing CTE */ Relation p_target_relation; //Target table RangeTblEntry *p_target_rangetblentry; /* target rel's RTE */ bool p_is_insert; //Is it an insert statement List *p_windowdefs; //Original definition form of window clause ParseExprKind p_expr_kind; //Type of parameter int p_next_resno; List *p_multiassign_exprs; List *p_locking_clause; //locking clause bool p_locked_from_parent; bool p_resolve_unknowns; //Unresolved types QueryEnvironment *p_queryEnv; /* curr env, incl refs to enclosing env */ bool p_hasAggs;//Is there an aggregate function bool p_hasWindowFuncs;//Is there a window function bool p_hasTargetSRFs;//Is there an SRF bool p_hasSubLinks;//Are there child links bool p_hasModifyingCTE;//Modify CTE Node *p_last_srf; /* * Optional hook functions for parser callbacks. These are null unless * set up by the caller of make_parsestate. */ PreParseColumnRefHook p_pre_columnref_hook; PostParseColumnRefHook p_post_columnref_hook; ParseParamRefHook p_paramref_hook; CoerceParamHook p_coerce_param_hook; void *p_ref_hook_state; /* common passthrough link for above */ };
typedef struct Query { NodeTag type;//Node type, T_Query CmdType commandType; //Command type QuerySource querySource; //Is it an original query or a query from a rule uint64 queryId; /* query identifier (can be set by plugins) */ bool canSetTag; //Used in query rewriting. If the query is converted from the original query, this field is false; If query is rewritten by query or newly added during query planning, this field is true Node *utilityStmt;//Define cursors or non optimizable query statements int resultRelation; //Result relationship bool hasAggs;//Is there an aggregate function in the target clause or having clause bool hasWindowFuncs; //Is there a window function in the target attribute bool hasTargetSRFs; /* has set-returning functions in tlist */ bool hasSubLinks; //Whether there are sub queries bool hasDistinctOn; //distinct clause bool hasRecursive; //Whether recursion is allowed in public expressions bool hasModifyingCTE; /* has INSERT/UPDATE/DELETE in WITH */ bool hasForUpdate; //Are there any updates bool hasRowSecurity; /* rewriter has applied some RLS policy */ List *cteList; //The with clause is used for public expressions List *rtable; //Range table FromExpr *jointree; //The connection tree describes the connections from and where List *targetList; //Target attribute OverridingKind override; /* OVERRIDING clause */ OnConflictExpr *onConflict; /* ON CONFLICT DO [NOTHING | UPDATE] */ List *returningList; //returning clause List *groupClause; //group clause List *groupingSets; //Group sets clause Node *havingQual; //having clause List *windowClause; //Window function clause List *distinctClause; //distinct clause List *sortClause; //order clause Node *limitOffset; //offset clause Node *limitCount; //limit clause List *rowMarks; //Row mark linked list, used in for update and for share clauses Node *setOperations; //Set operation (union, intersect, except) List *constraintDeps; List *withCheckOptions; /* * The following two fields identify the portion of the source text string * containing this query. They are typically only populated in top-level * Queries, not in sub-queries. When not set, they might both be zero, or * both be -1 meaning "unknown". */ int stmt_location; //Where the record begins int stmt_len; //Length of record string } Query;
parse_analyze function
Function parse_analyze first generates a ParseState structure to record the state of semantic analysis, and then completes the process of semantic analysis by calling the function transformStmt. The function transformStmt will call corresponding functions for row processing according to different query types.
The transformStmt function includes:
transformInsertStmt function
transformDeleteStmt function
transformUpdateStmt function
transformSelectStmt function
transformDeclareCursorStmt function
transformExplainStmt function
others
Query* parse_analyze( Node* parseTree, const char* sourceText, Oid* paramTypes, int numParams, bool isFirstNode, bool isCreateView) { // ParseState* pstate = make_parsestate(NULL); ... // transformation query = transformTopLevelStmt(pstate, parseTree, isFirstNode, isCreateView); ... pfree_ext(pstate->p_ref_hook_state); free_parsestate(pstate); ... // Return query tree return query; }
transformSelectStmt function
It can be seen from the above that there are many transformStmt functions. Different types correspond to different sql operation statements. Here I analyze the transformSelectStmt function, the most important select clause in sql query.
static Query * transformSelectStmt(ParseState *pstate, SelectStmt *stmt) //transformSelectStmt this function performs semantic analysis on the select query //SelectStmt *stmt is the analysis tree returned by lexical analysis //ParseState *pstate: used to record the state of semantic analysis { Query *qry = makeNode(Query);//makeNode linked list initialization //The query tree is finally organized in the data structure of query Node *qual;//postgresql converts all types of pointers to node type pointers to facilitate parameter passing ListCell *l; qry->commandType = CMD_SELECT;//The type corresponding to the query tree command. The command type is select //Handle with Clause if (stmt->withClause)//Judge whether there is a with Clause { qry->hasRecursive = stmt->withClause->recursive;//Gets whether the with clause in the expression supports recursion qry->cteList = transformWithClause(pstate, stmt->withClause); qry->hasModifyingCTE = pstate->p_hasModifyingCTE; } if (stmt->intoClause)//Judge whether there is an into. If there is an into, an error will be reported, because postgresql treats the into clause as a special case ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("SELECT ... INTO is not allowed here"),//Prompt: into is not allowed parser_errposition(pstate, exprLocation((Node *) stmt->intoClause)))); pstate->p_locking_clause = stmt->lockingClause; pstate->p_windowdefs = stmt->windowClause;//Store the window clause of the analysis tree in the form of linked list transformFromClause(pstate, stmt->fromClause);//Handle from clause qry->targetList = transformTargetList(pstate, stmt->targetList, EXPR_KIND_SELECT_TARGET); //targetList attribute for storing query results markTargetListOrigins(pstate, qry->targetList);//Indicate the source of query results //Handling the where clause in a select expression qual = transformWhereClause(pstate, stmt->whereClause, EXPR_KIND_WHERE, "WHERE"); //Handling the having clause in a select expression qry->havingQual = transformWhereClause(pstate, stmt->havingClause, EXPR_KIND_HAVING, "HAVING"); //Sort first. Because sorting results are required for both group processing and distinct clause processing, sort first and then filter groups //sort qry->sortClause = transformSortClause(pstate, stmt->sortClause, &qry->targetList, EXPR_KIND_ORDER_BY, false /* allow SQL92 rules */ ); //grouping qry->groupClause = transformGroupClause(pstate, stmt->groupClause, &qry->groupingSets, &qry->targetList, qry->sortClause, EXPR_KIND_GROUP_BY, false /* allow SQL92 rules */ ); //Judge whether there is distinct, that is, whether repeated results are allowed if (stmt->distinctClause == NIL)//If there is no distinct clause in the analysis tree { qry->distinctClause = NIL; qry->hasDistinctOn = false; } else if (linitial(stmt->distinctClause) == NULL)//The linitial function obtains the data of the first cell in the linked list { /* We had SELECT DISTINCT */ //Determine whether the data obtained in the distinct clause is empty qry->distinctClause = transformDistinctClause(pstate, &qry->targetList, qry->sortClause, false); qry->hasDistinctOn = false; } else { /* We had SELECT DISTINCT ON */ qry->distinctClause = transformDistinctOnClause(pstate, stmt->distinctClause, &qry->targetList, qry->sortClause); qry->hasDistinctOn = true; } /* transform LIMIT */ //Limit clause: limit the number of results to avoid full table scanning and improve query efficiency. //offset clause, select which line to start scanning from qry->limitOffset = transformLimitClause(pstate, stmt->limitOffset, EXPR_KIND_OFFSET, "OFFSET"); qry->limitCount = transformLimitClause(pstate, stmt->limitCount, EXPR_KIND_LIMIT, "LIMIT");
Limit can be used to force a SELECT statement to return a specified number of records
The high execution efficiency of Limit is stated under a specific condition: that is, the number of databases is large, but only part of the data needs to be queried.
The principle of high efficiency is to avoid full table scanning and improve query efficiency.
//Process window functions to calculate values for a set of rows qry->windowClause = transformWindowDefinitions(pstate, pstate->p_windowdefs, &qry->targetList);
Window function: window function, also known as analysis function, calculates values for a group of rows and returns a result for each row. This is different from aggregate functions; The aggregate function returns a result for a set of rows. The windowing function contains an OVER clause that defines the row window covering the row to be calculated. For each row, the system uses the selected row window as input to calculate the analysis function results and possibly aggregate them.
//Resolve unresolved output columns to type text if (pstate->p_resolve_unknowns)//Determine whether there are unresolved output columns resolveTargetListUnknowns(pstate, qry->targetList); qry->rtable = pstate->p_rtable;//Process select the range of the target table to query qry->jointree = makeFromExpr(pstate->p_joinlist, qual); //The connection tree handles the connection between from and where, and displays whether there are table connections in the from clause qry->hasSubLinks = pstate->p_hasSubLinks;//Check whether there are subqueries in from qry->hasWindowFuncs = pstate->p_hasWindowFuncs;//Check whether there is a window function in the objective function qry->hasTargetSRFs = pstate->p_hasTargetSRFs;//Check whether there are self-defined SRF functions qry->hasAggs = pstate->p_hasAggs;//Check whether there is an aggregate function in the target clause or the having clause foreach(l, stmt->lockingClause)//Check whether there are locking clauses: for update and for share clauses. If there are, lock them //Maintain consistency { transformLockingClause(pstate, qry, (LockingClause *) lfirst(l), false); } assign_query_collations(pstate, qry);//Mark all expressions with collation information in a given query if (pstate->p_hasAggs || qry->groupClause || qry->groupingSets || qry->havingQual) //Judge whether the aggregation function is used incorrectly or the grouping is wrong parseCheckAggregates(pstate, qry); return qry; }
Each sublist of GROUPING SETS can specify one or more columns or expressions, which will be interpreted in the same way as they appear directly in the GROUP BY clause.
SRF: set return fountain. PostgreSQL supports SRF functions, that is, functions that return multiple rows of data. We can also define them ourselves. We only need returns setof type.
Structure definition and interpretation involved in transformSelectStmt function
struct ListCell//A node that stores the analysis tree { union//Common body type. All members of the common body occupy the same memory. Modifying one member will affect all other members. { void *ptr_value; int int_value; Oid oid_value;//postgresql uses an unsigned integer to identify all objects and files } data; //The above three variables share the same memory ListCell *next; };
#define linitial(l) list_lfirst(head(l)) #define lfirst(lc) ((lc)->data.ptr_value) //The linitial function obtains the data of the first cell in the linked list
summary
Through the lexical syntax analysis of the last blog and the syntax analysis of this one, I basically understand the execution process of query analysis and the corresponding functions and implementation.
Thank you for your criticism!