05 SQL statement execution process analysis 2 - query logic optimization

Keywords: Database PostgreSQL SQL

sql execution statement flow analysis

The whole processing flow is in exec_simple_query function. The code structure is as follows:

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//Original syntax tree acquisition
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//Circular processing of sql statements
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//Analyze and rewrite the original syntax tree to generate a query syntax tree
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//Optimize the query syntax tree to generate an execution plan
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//Execute statement
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

Query logic optimization

It is analyzed that the rewritten query tree is not an optimized query tree. When the select sub query level is very deep, the base table of the lowest level is far from the tree root, which will increase the query time. In addition, the information of each node in the query tree is independent, which may cause redundant queries, so logical optimization is also needed. Therefore, query logic optimization is to optimize the query tree by traversing the query tree and ensuring that the semantics and final results of the syntax units in the query tree remain unchanged; Finally, a query tree without redundancy is obtained.

------Code in pg_plan_queries function

Corresponding to logical optimization is physical optimization, and their strategies are completely different:

Logic Optimization: similar to redundancy and lifting, decentralized optimization strategy

Physical optimization: cost based optimization strategy, which is described below

Basic steps of syntax optimization and related codes:

  • Tool class syntax (DDL, DML) is not processed
  • Non tool syntax, using pg_plan_query function, pg_plan_query calls the planner for processing
List *
pg_plan_queries(List *querytrees, int cursorOptions, ParamListInfo boundParams)
{
	List	   *stmt_list = NIL;
	ListCell   *query_list;

	foreach(query_list, querytrees)
	{
		Query	   *query = lfirst_node(Query, query_list);
		PlannedStmt *stmt;

		if (query->commandType == CMD_UTILITY)
		{
			/* Utility commands require no planning. */
			stmt = makeNode(PlannedStmt);
			stmt->commandType = CMD_UTILITY;
			stmt->canSetTag = query->canSetTag;
			stmt->utilityStmt = query->utilityStmt;
			stmt->stmt_location = query->stmt_location;
			stmt->stmt_len = query->stmt_len;
		}
		else
		{
			stmt = pg_plan_query(query, cursorOptions, boundParams);
		}

		stmt_list = lappend(stmt_list, stmt);
	}

	return stmt_list;
}


PlannedStmt *
pg_plan_query(Query *querytree, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *plan;

	...
	/* call the optimizer */
	plan = planner(querytree, cursorOptions, boundParams);

	...

	return plan;
}

Non tool syntax processing

The non tool class syntax is processed in the planner function. If the planner is set_ Hook calls the hook function, and standard is called by default_ The planner function handles.

standard_ The query tree is processed recursively in the planner function. When querying, the results are stored in the global results of planerglobal. Then call create_ The plan function creates an execution plan tree based on the PlannerInfo type. Finally, the basic information stored in planerglobal and planerinfo is transferred to planedstmt and returned.

PlannedStmt *
planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	PlannedStmt *result;

	if (planner_hook)
		result = (*planner_hook) (parse, cursorOptions, boundParams);
	else
		result = standard_planner(parse, cursorOptions, boundParams);
	return result;
}

PlannedStmt *
standard_planner(Query *parse, int cursorOptions, ParamListInfo boundParams)
{
	...
	
	/* primary planning entry point (may recurse for subqueries) */
	root = subquery_planner(glob, parse, NULL,
							false, tuple_fraction);

	/* Select best Path and turn it into a Plan */
	final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
	best_path = get_cheapest_fractional_path(final_rel, tuple_fraction);

	top_plan = create_plan(root, best_path);

	...
	
	/* build the PlannedStmt result */
	result = makeNode(PlannedStmt);

	result->commandType = parse->commandType;
	...
	result->stmt_len = parse->stmt_len;

	result->jitFlags = PGJIT_NONE;
	...

	return result;
}

Plan tree generation and optimization

In subquery_ Generated in the planner function, and the query statements are classified according to the type. The plan optimization part involves sub link lifting, function processing, sub query lifting and other operations. Because the generation and optimization of the plan tree are processed by type and executed at the same time, they are introduced together here.

Process CTE (universal table expression) expressions

SS_process_ctes: handle CTE clauses (with statements) in query statements. CTE is a temporary result set; The results of subqueries can be used as a separate result set. Therefore, during function processing, traverse the ctelish list, and then call subquery for each sub result set_ The planner function performs processing, and the processing results are stored in the root - > glob - > subplans linked list.

void
SS_process_ctes(PlannerInfo *root)
{
	ListCell   *lc;

	Assert(root->cte_plan_ids == NIL);

	foreach(lc, root->parse->cteList)
	{
		...
		
		/*
		 * Generate Paths for the CTE query.  Always plan for full retrieval
		 * --- we don't have enough info to predict otherwise.
		 */
		subroot = subquery_planner(root->glob, subquery,
								   root,
								   cte->cterecursive, 0.0);

		...

		plan = create_plan(subroot, best_path);

		...
		
		/*
		 * Add the subplan and its PlannerInfo to the global lists.
		 */
		root->glob->subplans = lappend(root->glob->subplans, plan);
		root->glob->subroots = lappend(root->glob->subroots, subroot);
		
		...
	}
}

Child link up

pull_up_sublinks: convert ANY (sub select) and EXISTS in the command to JOIN. In this way, child links and parent queries can be merged for unified optimization.

ANY statement is converted to semi join statement. The conversion is only applicable to WHERE statement or JOIN/ON statement.

EXISTS or NOT EXISTS statements are converted to semi join or anti semi join.

Introduction to basic process

When the child link is raised, because the node information related to WHERE is stored in jointree, you will enter root - > parse - > jointree to pull_ up_ sublinks_ jointree_ The recurse function performs the lifting operation. pull_ up_ sublinks_ jointree_ In the recurse function, check the types stored in the jointree and process them according to the type classification:

  • RangeTblRef: return directly without optimization

  • FromExpr: fromlist contains two fields: base table information (fromlist) and where conditional expression (quals); The processing flow is as follows:

    • fromlist: recursive call pull_up_sublinks_jointree_recurse function
    • Equals expression: call pull_ up_ sublinks_ qual_ The recurse function is raised from the where conditional expression (quals)
  • Joinexpr: joinexpr contains two fields: left and right base table information (larg and rarg) and on constraints (quals); The processing flow is as follows:

    • Call pull_ up_ sublinks_ jointree_ The recurse function handles left and right nodes
    • The corresponding where conditional expression (quals) is raised according to the join type

Relevant codes are as follows:

static Node *
pull_up_sublinks_jointree_recurse(PlannerInfo *root, Node *jtnode,
								  Relids *relids)
{
	if (jtnode == NULL)
	{
		*relids = NULL;
	}
	else if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;

		*relids = bms_make_singleton(varno);
		/* jtnode is returned unmodified */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		List	   *newfromlist = NIL;
		Relids		frelids = NULL;
		FromExpr   *newf;
		Node	   *jtlink;
		ListCell   *l;

		/* First, recurse to process children and collect their relids */
		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		jtnode = jtlink;
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j;
		Relids		leftrelids;
		Relids		rightrelids;
		Node	   *jtlink;

		/*
		 * Make a modifiable copy of join node, but don't bother copying its
		 * subnodes (yet).
		 */
		j = (JoinExpr *) palloc(sizeof(JoinExpr));
		memcpy(j, jtnode, sizeof(JoinExpr));
		jtlink = (Node *) j;

		/* Recurse to process children and collect their relids */
		j->larg = pull_up_sublinks_jointree_recurse(root, j->larg,
													&leftrelids);
		j->rarg = pull_up_sublinks_jointree_recurse(root, j->rarg,
													&rightrelids);

		/*
		 * Now process qual, showing appropriate child relids as available,
		 * and attach any pulled-up jointree items at the right place. In the
		 * inner-join case we put new JoinExprs above the existing one (much
		 * as for a FromExpr-style join).  In outer-join cases the new
		 * JoinExprs must go into the nullable side of the outer join. The
		 * point of the available_rels machinations is to ensure that we only
		 * pull up quals for which that's okay.
		 *
		 * We don't expect to see any pre-existing JOIN_SEMI or JOIN_ANTI
		 * nodes here.
		 */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &jtlink,
														 bms_union(leftrelids,
																   rightrelids),
														 NULL, NULL);
				break;
			case JOIN_LEFT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->rarg,
														 rightrelids,
														 NULL, NULL);
				break;
			case JOIN_FULL:
				/* can't do anything with full-join quals */
				break;
			case JOIN_RIGHT:
				j->quals = pull_up_sublinks_qual_recurse(root, j->quals,
														 &j->larg,
														 leftrelids,
														 NULL, NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}

		/*
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.  But we *do* need to include the join's own rtindex
		 * because we haven't yet collapsed join alias variables, so upper
		 * levels would mistakenly think they couldn't use references to this
		 * join.
		 */
		*relids = bms_join(leftrelids, rightrelids);
		if (j->rtindex)
			*relids = bms_add_member(*relids, j->rtindex);
		jtnode = jtlink;
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

From the above introduction, you can know the basic processing flow. The base table of child links will be mentioned in the top-level base table linked list.

		foreach(l, f->fromlist)
		{
			Node	   *newchild;
			Relids		childrelids;

			//Get the base list and linked list information here
			newchild = pull_up_sublinks_jointree_recurse(root,
														 lfirst(l),
														 &childrelids);
			newfromlist = lappend(newfromlist, newchild);
			frelids = bms_join(frelids, childrelids);
		}
		//Here, create a new node and assign values to the base list and linked list in the new node
		/* Build the replacement FromExpr; no quals yet */
		newf = makeFromExpr(newfromlist, NULL);
		/* Set up a link representing the rebuilt jointree */
		jtlink = (Node *) newf;
		
		//Here we assign a value to qulas
		/* Now process qual --- all children are available for use */
		newf->quals = pull_up_sublinks_qual_recurse(root, f->quals,
													&jtlink, frelids,
													NULL, NULL);

		/*
		 * Note that the result will be either newf, or a stack of JoinExprs
		 * with newf at the base.  We rely on subsequent optimization steps to
		 * flatten this and rearrange the joins as needed.
		 *
		 * Although we could include the pulled-up subqueries in the returned
		 * relids, there's no need since upper quals couldn't refer to their
		 * outputs anyway.
		 */
		*relids = frelids;
		
		//The new node is returned here
		jtnode = jtlink;

But how to complete the where conditional expression or on constraint in the end? Let's analyze pull again_ up_ sublinks_ qual_ The recurse function. In a e quals expression, there are three basic types to deal with

  • Sublink sublink statement:

    • The child link type is ANY_SUBLINK: call convert_ ANY_SUBLINK_ to_ The join function attempts to convert any type to join type. Create a new join after the conversion is successful_ Semi type JoinExpr node. The upper JoinExpr node passed in by the left child node of the node is filled, and the right child node is in convert_ ANY_SUBLINK_ to_ It is created as RangeTblRef type in the join function. Finally, pass in the information of the left and right child nodes to complete the quals upload.
    • The child link type is EXISTS_SUBLINK: call convert_EXISTS_sublink_to_join function, trying to convert exists type to join type. After the conversion is successful, a new JoinExpr node of unknown type is created. The upper JoinExpr node passed in by the left child node of the node is filled, and the right child node is in convert_ EXISTS_SUBLINK_ to_ In the join function, the node is created according to the child query of the incoming node. Finally, the information of the left and right sub nodes is passed in to complete the quals lifting.
  • NOT statement: refer to exists when it is a child link_ Sublink processing

  • AND, OR statements: traverse the BoolExpr node AND call pull recursively_ up_ sublinks_ qual_ Recurse processing

Introduction to specific conversion process

The actual conversion process is executed with the following limitations:

  • Child queries of child links cannot use var type variables of the parent node: form a loop
  • The var type variable of the parent query must be included in the comparison expression
  • The comparison expression cannot contain any virtual functions

var type variable: refers to the target column of the base table in query analysis and query optimization; Or it represents the output result of the sub query plan

convert_ ANY_ sublink_ to_ Introduction to join function:

  • contain_vars_of_level: parent node loop check to check whether the base table in the sub query is the base table of the parent node.
  • pull_varnos: compare expression checking
  • contain_volatile_functions: virtual function query
  • addRangeTableEntryForSubquery: the creation name is any_ The rangetbentry object of subquery is added to the base table (rtable linked list) of the parent query
  • generate_subquery_vars: create var variable according to rtable linked list to store sub link query results
  • convert_testexpr: call XXX_mutator function processing
  • Construct the JoinExpr node, and the larg of the node is filled by the caller
JoinExpr *
convert_ANY_sublink_to_join(PlannerInfo *root, SubLink *sublink,
							Relids available_rels)
{
	JoinExpr   *result;
	Query	   *parse = root->parse;
	Query	   *subselect = (Query *) sublink->subselect;
	Relids		upper_varnos;
	int			rtindex;
	RangeTblEntry *rte;
	RangeTblRef *rtr;
	List	   *subquery_vars;
	Node	   *quals;
	ParseState *pstate;

	Assert(sublink->subLinkType == ANY_SUBLINK);

	/*
	 * The sub-select must not refer to any Vars of the parent query. (Vars of
	 * higher levels should be okay, though.)
	 */
	if (contain_vars_of_level((Node *) subselect, 1))
		return NULL;

	/*
	 * The test expression must contain some Vars of the parent query, else
	 * it's not gonna be a join.  (Note that it won't have Vars referring to
	 * the subquery, rather Params.)
	 */
	upper_varnos = pull_varnos(sublink->testexpr);
	if (bms_is_empty(upper_varnos))
		return NULL;

	/*
	 * However, it can't refer to anything outside available_rels.
	 */
	if (!bms_is_subset(upper_varnos, available_rels))
		return NULL;

	/*
	 * The combining operators and left-hand expressions mustn't be volatile.
	 */
	if (contain_volatile_functions(sublink->testexpr))
		return NULL;

	/* Create a dummy ParseState for addRangeTableEntryForSubquery */
	pstate = make_parsestate(NULL);

	/*
	 * Okay, pull up the sub-select into upper range table.
	 *
	 * We rely here on the assumption that the outer query has no references
	 * to the inner (necessarily true, other than the Vars that we build
	 * below). Therefore this is a lot easier than what pull_up_subqueries has
	 * to go through.
	 */
	rte = addRangeTableEntryForSubquery(pstate,
										subselect,
										makeAlias("ANY_subquery", NIL),
										false,
										false);
	parse->rtable = lappend(parse->rtable, rte);
	rtindex = list_length(parse->rtable);

	/*
	 * Form a RangeTblRef for the pulled-up sub-select.
	 */
	rtr = makeNode(RangeTblRef);
	rtr->rtindex = rtindex;

	/*
	 * Build a list of Vars representing the subselect outputs.
	 */
	subquery_vars = generate_subquery_vars(root,
										   subselect->targetList,
										   rtindex);

	/*
	 * Build the new join's qual expression, replacing Params with these Vars.
	 */
	quals = convert_testexpr(root, sublink->testexpr, subquery_vars);

	/*
	 * And finally, build the JoinExpr node.
	 */
	result = makeNode(JoinExpr);
	result->jointype = JOIN_SEMI;
	result->isNatural = false;
	result->larg = NULL;		/* caller must fill this in */
	result->rarg = (Node *) rtr;
	result->usingClause = NIL;
	result->quals = quals;
	result->alias = NULL;
	result->rtindex = 0;		/* we don't need an RTE for it */

	return result;
}
Analysis of lifting principle

What is SEMI-JOIN: a table AA finds a matching record in another table BB, returns the records that meet the conditions in the first table AA, and the records in the BB table are not returned.

What is an IN statement: AA IN BB returns records IN AA that meet the BB condition.

From this we know:

  • The basic principles are the same, so you can convert the IN statement into SEMI-JOIN semi join statement.
  • Because the records on the right will not be displayed, the actual query statements are placed in the left child node of JoinExpr in the above processing for easy display.
  • Therefore, the "lifting" operation mentioned above is only the process of parsing the query statements in the sub links and converting them into node information in JoinExpr. This reduces the query action and saves time.

Subquery optimization

The original code is pull_up_subqueries; The original code comment is Check to see if any subqueries in the jointree can be merged into this query.

The name is a sub query, which is actually a pull_ up_ After the sublinks sublink lifting operation, the jointree is analyzed to try whether it can be optimized again. Because the child link lifting operation does not add the base table in the child query to the base table (rtable linked list) of the parent query. Therefore, you need to check whether the sub query can be merged into the parent query.

The specific operations are as follows: check whether the result set of alias still exists in the jointree tree. If so, replace it with the type of corresponding query statement (RangeTblRef, FromExpr, JoinExpr).

Finally in pull_ up_ subqueries_ The above process is implemented in the recurse function; pull_ up_ subqueries_ Introduction to recurse function: the function parses jointree, which contains three types:

  • RangeTblRef: Lifting

    • RTE_SUBQUERY is not a simple expression: since the level will change after being raised, the index number, level number, variable parameters, etc. need to be adjusted Please refer to the code for relevant adjustment, which will not be introduced here
    • RTE_SUBQUERY is a simple expression: a simple query tree, which directly raises the sub query tree
    • RTE_VALUES: raised as RTE value
  • FromExpr: traverse fromlist and call pull recursively_ up_ subqueries_ recurse

  • JoinExpr: call pull_ up_ subqueries_ The recurse function handles the left and right nodes and modifies the corresponding parameters according to the join type

static Node *
pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
						   JoinExpr *lowest_outer_join,
						   JoinExpr *lowest_nulling_outer_join,
						   AppendRelInfo *containing_appendrel)
{
	Assert(jtnode != NULL);
	if (IsA(jtnode, RangeTblRef))
	{
		int			varno = ((RangeTblRef *) jtnode)->rtindex;
		RangeTblEntry *rte = rt_fetch(varno, root->parse->rtable);

		/*
		 * Is this a subquery RTE, and if so, is the subquery simple enough to
		 * pull up?
		 *
		 * If we are looking at an append-relation member, we can't pull it up
		 * unless is_safe_append_member says so.
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_subquery(rte->subquery, rte, lowest_outer_join) &&
			(containing_appendrel == NULL ||
			 is_safe_append_member(rte->subquery)))
			return pull_up_simple_subquery(root, jtnode, rte,
										   lowest_outer_join,
										   lowest_nulling_outer_join,
										   containing_appendrel);

		/*
		 * Alternatively, is it a simple UNION ALL subquery?  If so, flatten
		 * into an "append relation".
		 *
		 * It's safe to do this regardless of whether this query is itself an
		 * appendrel member.  (If you're thinking we should try to flatten the
		 * two levels of appendrel together, you're right; but we handle that
		 * in set_append_rel_pathlist, not here.)
		 */
		if (rte->rtekind == RTE_SUBQUERY &&
			is_simple_union_all(rte->subquery))
			return pull_up_simple_union_all(root, jtnode, rte);

		/*
		 * Or perhaps it's a simple VALUES RTE?
		 *
		 * We don't allow VALUES pullup below an outer join nor into an
		 * appendrel (such cases are impossible anyway at the moment).
		 */
		if (rte->rtekind == RTE_VALUES &&
			lowest_outer_join == NULL &&
			containing_appendrel == NULL &&
			is_simple_values(root, rte))
			return pull_up_simple_values(root, jtnode, rte);

		/* Otherwise, do nothing at this node. */
	}
	else if (IsA(jtnode, FromExpr))
	{
		FromExpr   *f = (FromExpr *) jtnode;
		ListCell   *l;

		Assert(containing_appendrel == NULL);
		/* Recursively transform all the child nodes */
		foreach(l, f->fromlist)
		{
			lfirst(l) = pull_up_subqueries_recurse(root, lfirst(l),
												   lowest_outer_join,
												   lowest_nulling_outer_join,
												   NULL);
		}
	}
	else if (IsA(jtnode, JoinExpr))
	{
		JoinExpr   *j = (JoinExpr *) jtnode;

		Assert(containing_appendrel == NULL);
		/* Recurse, being careful to tell myself when inside outer join */
		switch (j->jointype)
		{
			case JOIN_INNER:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 lowest_outer_join,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			case JOIN_LEFT:
			case JOIN_SEMI:
			case JOIN_ANTI:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_FULL:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 j,
													 NULL);
				break;
			case JOIN_RIGHT:
				j->larg = pull_up_subqueries_recurse(root, j->larg,
													 j,
													 j,
													 NULL);
				j->rarg = pull_up_subqueries_recurse(root, j->rarg,
													 j,
													 lowest_nulling_outer_join,
													 NULL);
				break;
			default:
				elog(ERROR, "unrecognized join type: %d",
					 (int) j->jointype);
				break;
		}
	}
	else
		elog(ERROR, "unrecognized node type: %d",
			 (int) nodeTag(jtnode));
	return jtnode;
}

UNION ALL statement processing

/*
	 * If this is a simple UNION ALL query, flatten it into an appendrel. We
	 * do this now because it requires applying pull_up_subqueries to the leaf
	 * queries of the UNION ALL, which weren't touched above because they
	 * weren't referenced by the jointree (they will be after we do this).
	 */
	 if (parse->setOperations)
		flatten_simple_union_all(root);

RowMark processing

/*
	 * Preprocess RowMark information.  We need to do this after subquery
	 * pullup, so that all base relations are present.
	 */
	preprocess_rowmarks(root);

Expression optimization processing

Target column processing, withCheckOptions processing, RETURN expression processing, HAVING statement processing, WINDOWS statement processing, LIMIT OFF statement processing

All call preprocess_expression function.

Processing flow

  • flatten_join_alias_vars: flatten variable aliases in links
  • eval_const_expressions: constant expression preprocessing
  • canonicalize_qual: regularize the conditional expression in quals
  • SS_process_sublinks: convert sublink to subquery plan
  • SS_replace_correlation_vars: handles the variables in the Param node
  • make_ands_implicit: convert quals or havingqual to implicit AND format
static Node *
preprocess_expression(PlannerInfo *root, Node *expr, int kind)
{
	/*
	 * Fall out quickly if expression is empty.  This occurs often enough to
	 * be worth checking.  Note that null->null is the correct conversion for
	 * implicit-AND result format, too.
	 */
	if (expr == NULL)
		return NULL;

	/*
	 * If the query has any join RTEs, replace join alias variables with
	 * base-relation variables.  We must do this first, since any expressions
	 * we may extract from the joinaliasvars lists have not been preprocessed.
	 * For example, if we did this after sublink processing, sublinks expanded
	 * out from join aliases would not get processed.  But we can skip this in
	 * non-lateral RTE functions, VALUES lists, and TABLESAMPLE clauses, since
	 * they can't contain any Vars of the current query level.
	 */
	if (root->hasJoinRTEs &&
		!(kind == EXPRKIND_RTFUNC ||
		  kind == EXPRKIND_VALUES ||
		  kind == EXPRKIND_TABLESAMPLE ||
		  kind == EXPRKIND_TABLEFUNC))
		expr = flatten_join_alias_vars(root->parse, expr);

	/*
	 * Simplify constant expressions.
	 *
	 * Note: an essential effect of this is to convert named-argument function
	 * calls to positional notation and insert the current actual values of
	 * any default arguments for functions.  To ensure that happens, we *must*
	 * process all expressions here.  Previous PG versions sometimes skipped
	 * const-simplification if it didn't seem worth the trouble, but we can't
	 * do that anymore.
	 *
	 * Note: this also flattens nested AND and OR expressions into N-argument
	 * form.  All processing of a qual expression after this point must be
	 * careful to maintain AND/OR flatness --- that is, do not generate a tree
	 * with AND directly under AND, nor OR directly under OR.
	 */
	expr = eval_const_expressions(root, expr);

	/*
	 * If it's a qual or havingQual, canonicalize it.
	 */
	if (kind == EXPRKIND_QUAL)
	{
		expr = (Node *) canonicalize_qual((Expr *) expr, false);

#ifdef OPTIMIZER_DEBUG
		printf("After canonicalize_qual()\n");
		pprint(expr);
#endif
	}

	/* Expand SubLinks to SubPlans */
	if (root->parse->hasSubLinks)
		expr = SS_process_sublinks(root, expr, (kind == EXPRKIND_QUAL));

	/*
	 * XXX do not insert anything here unless you have grokked the comments in
	 * SS_replace_correlation_vars ...
	 */

	/* Replace uplevel vars with Param nodes (this IS possible in VALUES) */
	if (root->query_level > 1)
		expr = SS_replace_correlation_vars(root, expr);

	/*
	 * If it's a qual or havingQual, convert it to implicit-AND format. (We
	 * don't want to do this before eval_const_expressions, since the latter
	 * would be unable to simplify a top-level AND correctly. Also,
	 * SS_process_sublinks expects explicit-AND format.)
	 */
	if (kind == EXPRKIND_QUAL)
		expr = (Node *) make_ands_implicit((Expr *) expr);

	return expr;
}
Classification processing of various types

XXX will eventually be called in multiple type conversion phases of the above process_ XXX_ Mutator function. XXX_ XXX_ The mutator function implements classification conversion according to various types.

Here we mainly introduce SS_ process_ The process_ invoked in the sublinks process sublinks_ Mutator function, because the nodes in the child link are of uncertain type, classification processing will also be implemented according to the type when calling the function. When the types are not satisfied, expression will be called_ tree_ Mutator function.

  • SubLink sublink type: call process again for SubLink - > testexpr_ sublinks_ The mutator function parses and obtains the testexpr node, and then calls make_ The subplan function parses the testexpr node and creates a child plan node.
  • AND and OR types: traverse the nodes in bool AND then call process_sublinks_mutator function to parse, store the parsing results in a new list, AND finally create an expr node to store the list AND the corresponding AND, OR type information AND return.

Relevant codes are as follows:

static Node *
process_sublinks_mutator(Node *node, process_sublinks_context *context)
{
	process_sublinks_context locContext;

	locContext.root = context->root;

	if (node == NULL)
		return NULL;
	if (IsA(node, SubLink))
	{
		. . . 
		/*
		 * Now build the SubPlan node and make the expr to return.
		 */
		return make_subplan(context->root,
							(Query *) sublink->subselect,
							sublink->subLinkType,
							sublink->subLinkId,
							testexpr,
							context->isTopQual);
	}

	/*
	 * Don't recurse into the arguments of an outer PHV or aggregate here. Any
	 * SubLinks in the arguments have to be dealt with at the outer query
	 * level; they'll be handled when build_subplan collects the PHV or Aggref
	 * into the arguments to be passed down to the current subplan.
	 */
	if (IsA(node, PlaceHolderVar))
	{
		if (((PlaceHolderVar *) node)->phlevelsup > 0)
			return node;
	}
	else if (IsA(node, Aggref))
	{
		if (((Aggref *) node)->agglevelsup > 0)
			return node;
	}

	/*
	 * We should never see a SubPlan expression in the input (since this is
	 * the very routine that creates 'em to begin with).  We shouldn't find
	 * ourselves invoked directly on a Query, either.
	 */
	Assert(!IsA(node, SubPlan));
	Assert(!IsA(node, AlternativeSubPlan));
	Assert(!IsA(node, Query));

	/*
	 * Because make_subplan() could return an AND or OR clause, we have to
	 * take steps to preserve AND/OR flatness of a qual.  We assume the input
	 * has been AND/OR flattened and so we need no recursion here.
	 *
	 * (Due to the coding here, we will not get called on the List subnodes of
	 * an AND; and the input is *not* yet in implicit-AND format.  So no check
	 * is needed for a bare List.)
	 *
	 * Anywhere within the top-level AND/OR clause structure, we can tell
	 * make_subplan() that NULL and FALSE are interchangeable.  So isTopQual
	 * propagates down in both cases.  (Note that this is unlike the meaning
	 * of "top level qual" used in most other places in Postgres.)
	 */
	if (is_andclause(node))
	{
		...
		return (Node *) make_andclause(newargs);
	}

	if (is_orclause(node))
	{
		...
		return (Node *) make_orclause(newargs);
	}

	/*
	 * If we recurse down through anything other than an AND or OR node, we
	 * are definitely not at top qual level anymore.
	 */
	locContext.isTopQual = false;

	return expression_tree_mutator(node,
								   process_sublinks_mutator,
								   (void *) &locContext);
}
Create sub query plan

make_ The subplan function flow is as follows:

  • tuple_fraction value setting: 0-1 indicates the proportion number of record queries. The proportion is based on EXISTS_SUBLINK,ALL_SUBLINK. (because ANY, EXISTS and ANY functions are inconsistent).
  • Call subquery_ The planner function performs sub link query tree optimization, which is consistent with the complete query tree optimization.
  • create_plan,build_subplan create plan.

After the sub plan is created, return to.

Expression optimization in conditional statements

Call preprocess_ qual_ The conditions function traverses the jointree node, finds the qual node according to the basic type of the node, and calls preprocess_ The expression function processes the qual node, including:

  • RangeTblRef: do nothing
  • FromExpr: traversal, recursive call preprocess_qual_conditions function, and then call preprocess_expression handles qual nodes in individual nodes
  • JoinExpr: call preprocess for the left and right child nodes_ qual_ Conditions function, and then call preprocess_expression handles the qual node in the node

Eliminate external connections

reduce_outer_joins

/*
 * reduce_outer_joins
 *		Attempt to reduce outer joins to plain inner joins.
 *
 * The idea here is that given a query like
 *		SELECT ... FROM a LEFT JOIN b ON (...) WHERE b.y = 42;
 * we can reduce the LEFT JOIN to a plain JOIN if the "=" operator in WHERE
 * is strict.  The strict operator will always return NULL, causing the outer
 * WHERE to fail, on any row where the LEFT JOIN filled in NULLs for b's
 * columns.  Therefore, there's no need for the join to produce null-extended
 * rows in the first place --- which makes it a plain join not an outer join.
 * (This scenario may not be very likely in a query written out by hand, but
 * it's reasonably likely when pushing quals down into complex views.)
 *
 * More generally, an outer join can be reduced in strength if there is a
 * strict qual above it in the qual tree that constrains a Var from the
 * nullable side of the join to be non-null.  (For FULL joins this applies
 * to each side separately.)
 *
 * Another transformation we apply here is to recognize cases like
 *		SELECT ... FROM a LEFT JOIN b ON (a.x = b.y) WHERE b.y IS NULL;
 * If the join clause is strict for b.y, then only null-extended rows could
 * pass the upper WHERE, and we can conclude that what the query is really
 * specifying is an anti-semijoin.  We change the join type from JOIN_LEFT
 * to JOIN_ANTI.  The IS NULL clause then becomes redundant, and must be
 * removed to prevent bogus selectivity calculations, but we leave it to
 * distribute_qual_to_rels to get rid of such clauses.
 *
 * Also, we get rid of JOIN_RIGHT cases by flipping them around to become
 * JOIN_LEFT.  This saves some code here and in some later planner routines,
 * but the main reason to do it is to not need to invent a JOIN_REVERSE_ANTI
 * join type.
 *
 * To ease recognition of strict qual clauses, we require this routine to be
 * run after expression preprocessing (i.e., qual canonicalization and JOIN
 * alias-var expansion).
 */
void
reduce_outer_joins(PlannerInfo *root)
{
	reduce_outer_joins_state *state;

	/*
	 * To avoid doing strictness checks on more quals than necessary, we want
	 * to stop descending the jointree as soon as there are no outer joins
	 * below our current point.  This consideration forces a two-pass process.
	 * The first pass gathers information about which base rels appear below
	 * each side of each join clause, and about whether there are outer
	 * join(s) below each side of each join clause. The second pass examines
	 * qual clauses and changes join types as it descends the tree.
	 */
	state = reduce_outer_joins_pass1((Node *) root->parse->jointree);

	/* planner.c shouldn't have called me if no outer joins */
	if (state == NULL || !state->contains_outer)
		elog(ERROR, "so where are the outer joins?");

	reduce_outer_joins_pass2((Node *) root->parse->jointree,
							 state, root, NULL, NIL, NIL);
}

Generate query plan

grouping_ The planner function first processes the LIMIT, ORDER BY and GROUP BY statements, and then determines whether it is a UNION/INTERSECT/EXCEPT statement or an ordinary statement according to the setOperations value:

  • Processing LIMIT statements

  • Process UNION/INTERSECT/EXCEPT statements and call plan_ set_ The operations function performs internal classification processing

    • union recursive processing generate_recursion_path: merge the left and right clauses after processing
    • Non recursive processing_ set_ Operations: follow the basic process
  • Handle ordinary statements according to the basic process

static void

grouping_planner(PlannerInfo *root, bool inheritance_update,

double tuple_fraction)

{

Query *parse = root->parse;

int64 offset_est = 0;

int64 count_est = 0;

double limit_tuples = -1.0;

bool have_postponed_srfs = false;

PathTarget *final_target;

List *final_targets;

List *final_targets_contain_srfs;

bool final_target_parallel_safe;

RelOptInfo *current_rel;

RelOptInfo *final_rel;

FinalPathExtraData extra;

ListCell *lc;

static void
grouping_planner(PlannerInfo *root, bool inheritance_update,
				 double tuple_fraction)
{
	...
	
	//Processing LIMIT statements
	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
	if (parse->limitCount || parse->limitOffset)
	{
		tuple_fraction = preprocess_limit(root, tuple_fraction,
										  &offset_est, &count_est);

		/*
		 * If we have a known LIMIT, and don't have an unknown OFFSET, we can
		 * estimate the effects of using a bounded sort.
		 */
		if (count_est > 0 && offset_est >= 0)
			limit_tuples = (double) count_est + (double) offset_est;
	}

	/* Make tuple_fraction accessible to lower-level routines */
	root->tuple_fraction = tuple_fraction;

	if (parse->setOperations)
	{
		...
		
		//Processing UNION/INTERSECT/EXCEPT statements
		/*
		 * Construct Paths for set operations.  The results will not need any
		 * work except perhaps a top-level sort and/or LIMIT.  Note that any
		 * special work for recursive unions is the responsibility of
		 * plan_set_operations.
		 */
		current_rel = plan_set_operations(root);

		...
	}
	else
	{
		//Common statement processing
		...
	}
	...
}
Common statement processing flow

Except for special statements, other statements execute the normal execution process:

  • preprocess_ Groupclaim handles grouping statements: rearrange the elements after GROUP BY, and adjust the order according to ORDER BY. It is convenient to quickly complete ORDER BY and GROUP BY operations using indexes in the future.
  • preprocess_targetlist statement: I don't understand how to handle it
  • get_agg_clause_costs collects the cost used by the aggregate function:
  • select_active_windows executes windows functions:
  • query_planner creates query access path: because this part is more important, let's talk about it separately
Create query access path

Note: before reading the code, you need to understand the principle of query engine, otherwise you don't know why.

query_ The planner function is divided into ordinary statements and statements with fromlist linked list length of 1 ("select expression" and "insert... Values()") – this type calls the function to process directly and return results. Let's focus on the processing flow of ordinary statements (ordinary queries have three elements: data source, output result and query condition, which are filled in sequence below):

  • setup_simple_rel_arrays collects base table information: set root - > append from the root - > parse - > rtable table_ rel_ Array table.

  • add_base_rels_to_query build a reoptinfo array (base table information) (set data source): create a reoptinfo array according to the jointree type, and store the reoptinfo data in root - > simple_ rel_ In array. Simply put, it is to create an array of the base table and then fill in the data sources in the base table.

    • build_simple_rel set the reoptinfo parameter to populate the output linked list targetlist and query condition equals
  • build_base_rel_tlists set the target column (set the output result): set the output result of the query statement, traverse the target list (the input parameter here is root - > processed_tlist), find out all Var type nodes and add them to the reltargetlist of releptinfo of the base table to which Var belongs (if they already exist, they will not be added repeatedly). In short, it is the output result of filling the base table; Bind the column name to the data source: (select a1, b1 from aa, BB where aa. a1 = BB. B1; ------- bind the relationship between a1 and aa)

    • pull_var_clause: call pull_var_clause_ Var variable used by Walker function query
    • add_vars_to_targetlist: add Var variable to root - > Simple_ rel_ In array
  • deconstruct_ Join tree set constraint conditions (set query conditions): call deconstruct_ The recurse function processes by type:

    • RangeTblRef: returns directly without processing
    • FromExpr: find all the base table Relids information in FromExpr, and then bind the constraints related to the base table to the base table reoptinfo. Call distribute_ qual_ to_ The rels function performs binding operations. The binding function involves complex processes, which will be described later.
    • JoinExpr: search according to the type of join, find all the Relids information of the base table in the left and right child nodes in JoinExpr, and then bind the constraints related to the base table to the base table reoptinfo. Finally, call make_. The outerjoininfo function performs connection sequence processing
  • reconsider_ outer_ join_ Claims handles external connections:

  • generate_ base_ implied_ Equality create constraints: add all constraints in the claim for optimization, and bind the constraints in the claim with the base table information reoptinfo.

  • (*qp_callback) (root, qp_extra)-----standard_qp_callback callback: process the sorted pathkey

  • create_lateral_join_info building lateraljoin information: Processing in sub query, omitted

Posted by scottlowe on Sat, 04 Dec 2021 17:11:50 -0800