05 SQL statement execution process analysis 3 query physical optimization

Keywords: Database PostgreSQL SQL

sql execution statement flow analysis

The whole processing flow is in exec_simple_query function. The code structure is as follows:

/*
 * exec_simple_query
 *
 * Execute a "simple Query" protocol message.
 */
static void
exec_simple_query(const char *query_string)
{
	...
	//Original syntax tree acquisition
	/*
	 * Do basic parsing of the query or queries (this should be safe even if
	 * we are in aborted transaction state!)
	 */
	parsetree_list = pg_parse_query(query_string);

	...
	//Circular processing of sql statements
	/*
	 * Run through the raw parsetree(s) and process each one.
	 */
	foreach(parsetree_item, parsetree_list)
	{
		...
		
		//Analyze and rewrite the original syntax tree to generate a query syntax tree
		querytree_list = pg_analyze_and_rewrite(parsetree, query_string,
												NULL, 0, NULL);
		//Optimize the query syntax tree to generate an execution plan
		plantree_list = pg_plan_queries(querytree_list,
										CURSOR_OPT_PARALLEL_OK, NULL);

		...
		
		//Execute statement
		/*
		 * Run the portal to completion, and then drop it (and the receiver).
		 */
		(void) PortalRun(portal,
						 FETCH_ALL,
						 true,	/* always top level */
						 true,
						 receiver,
						 receiver,
						 completionTag);

		...
	}
	...
}

Query physical optimization

What is physical optimization: select the best path among the effective query paths

Physical optimization principle: reduce the physical access cost of tuples in the query access path; Because physical I/O is the main bottleneck of the current database architecture, the main purpose of physical optimization is to reduce physical I/O operations.

Physical optimization code implementation: logical optimization and physical optimization code are implemented in query_ Implemented in the planner function, first perform logic optimization (see 05 SQL statement execution process analysis 2 query logic optimization for details), and finally pass the result of logic optimization to make as a parameter_ one_ Rel function for physical optimization. Let's talk about make_one_rel function.

Physical optimization code steps:

- 1,Query system physical parameters
- 2,Search all possible query paths
- 3,Select the optimal path

make_ one_ The tree view of rel function is as follows:

The follow-up program is complex. In order to facilitate the description, the tree structure is used for parsing

make_one_rel
    |-- ...
    |-- set_base_rel_sizes //Set the physical parameters of each base table in the query statement
        |-- loop Loops: Structures root->simple_rel_array Get base table information from
        |-- set_rel_size //The basic structure is filled according to the base table type, and the size of the base table type is estimated
            |-- check_index_predicates //Partial index check
            |-- set_baserel_size_estimates //Physical parameter estimation: involving "selection rate" 
            
    |-- set_base_rel_pathlists //Set the feasible scanning scheme for each base table
        |-- loop Loops: Structures root->simple_rel_array Get base table information from
        |-- set_rel_pathlist //Building access paths to base table types
            |-- set_plain_rel_pathlist //Processing of common base table
                |-- create_plain_partial_paths //Sequential access
                |-- create_index_paths //Index access
                |-- create_tidscan_paths //TID access
            |-- ... //Sub queries, public expressions, etc. refer to the common base table
    |-- make_rel_from_joinlist //Create query access paths for all feasible solutions
        |-- draft.md
        |-- page.md
        |-- post.md
        |-- draft.md

set_base_rel_sizes sets the physical parameters of each base table in the query statement

check_ index_ Predictions partial index check

Obtaining addresses through indexes is a commonly used technology. When obtaining physical parameters, because it is not necessary to generate all indexes (because index queries also need to access I/O), partial indexes are introduced - indexes are established on some data in the source data table, and the range of index data is constrained by a set of constraints.

Partial index syntax uses where to specify constraints.

set_baserel_size_estimates determines the query time of constraints

This involves the query cost caused by data query constraints, so we need to know the number of data tuples that meet the constraints.

The number can be obtained in two ways:

  • Full table scanning ---- it will cost a huge I/O cost, so it is not the minimum cost
  • Tuple proportion satisfying constraint conditions ---- tuple proportion needs to be obtained through an algorithm. The "selection rate" is used here

Selection rate: describes the discrimination of a number in a table. (because the stored data is changing, the value is also changing)

[selection rate]: the possible discrimination of the index in the whole table (the discrimination also changes with the stored data variables, which is the query plan drift. Someone is studying this, which is not discussed here). Because full table scanning will access a lot of I/O, when creating an index, use where to add judgment statements and create some indexes.

Numerical type calculation formula:

[Num(No.(x - 1)) + (N - X.min)/(X.max-X.min)] / NumOfBucket

Num(No.(x - 1)): the number of barrels before the barrels satisfied under the constraint

N: Constraints

10. Max, X.min: meet the upper and lower boundaries of the barrel under constraints

NumOfBucket: total number of buckets

cpu_cost_function io_cost_function query execution cost

Finally, the optimal or suboptimal query plan will be calculated through the cost of CPU and I/O

set_base_rel_pathlists sets the feasible scanning scheme for each base table

Common base table is the basis of sub query and common expression (CTE), so here we focus on the access path construction method of common base table.

Three query paths for base tables: sequential scan, index scan, and TID scan

Therefore, there are also three access methods to the base table, set_ base_ rel_ Each method is built in the pathlists function to find the optimal access method.

Posted by Elhombrebala on Sat, 04 Dec 2021 15:37:27 -0800