sql execution statement flow analysis
The whole processing flow is in exec_simple_query function. The code structure is as follows:
/* * exec_simple_query * * Execute a "simple Query" protocol message. */ static void exec_simple_query(const char *query_string) { ... //Original syntax tree acquisition /* * Do basic parsing of the query or queries (this should be safe even if * we are in aborted transaction state!) */ parsetree_list = pg_parse_query(query_string); ... //Circular processing of sql statements /* * Run through the raw parsetree(s) and process each one. */ foreach(parsetree_item, parsetree_list) { ... //Analyze and rewrite the original syntax tree to generate a query syntax tree querytree_list = pg_analyze_and_rewrite(parsetree, query_string, NULL, 0, NULL); //Optimize the query syntax tree to generate an execution plan plantree_list = pg_plan_queries(querytree_list, CURSOR_OPT_PARALLEL_OK, NULL); ... //Execute statement /* * Run the portal to completion, and then drop it (and the receiver). */ (void) PortalRun(portal, FETCH_ALL, true, /* always top level */ true, receiver, receiver, completionTag); ... } ... }
Query physical optimization
What is physical optimization: select the best path among the effective query paths
Physical optimization principle: reduce the physical access cost of tuples in the query access path; Because physical I/O is the main bottleneck of the current database architecture, the main purpose of physical optimization is to reduce physical I/O operations.
Physical optimization code implementation: logical optimization and physical optimization code are implemented in query_ Implemented in the planner function, first perform logic optimization (see 05 SQL statement execution process analysis 2 query logic optimization for details), and finally pass the result of logic optimization to make as a parameter_ one_ Rel function for physical optimization. Let's talk about make_one_rel function.
Physical optimization code steps:
- 1,Query system physical parameters - 2,Search all possible query paths - 3,Select the optimal path
make_ one_ The tree view of rel function is as follows:
The follow-up program is complex. In order to facilitate the description, the tree structure is used for parsing
make_one_rel |-- ... |-- set_base_rel_sizes //Set the physical parameters of each base table in the query statement |-- loop Loops: Structures root->simple_rel_array Get base table information from |-- set_rel_size //The basic structure is filled according to the base table type, and the size of the base table type is estimated |-- check_index_predicates //Partial index check |-- set_baserel_size_estimates //Physical parameter estimation: involving "selection rate" |-- set_base_rel_pathlists //Set the feasible scanning scheme for each base table |-- loop Loops: Structures root->simple_rel_array Get base table information from |-- set_rel_pathlist //Building access paths to base table types |-- set_plain_rel_pathlist //Processing of common base table |-- create_plain_partial_paths //Sequential access |-- create_index_paths //Index access |-- create_tidscan_paths //TID access |-- ... //Sub queries, public expressions, etc. refer to the common base table |-- make_rel_from_joinlist //Create query access paths for all feasible solutions |-- draft.md |-- page.md |-- post.md |-- draft.md
set_base_rel_sizes sets the physical parameters of each base table in the query statement
check_ index_ Predictions partial index check
Obtaining addresses through indexes is a commonly used technology. When obtaining physical parameters, because it is not necessary to generate all indexes (because index queries also need to access I/O), partial indexes are introduced - indexes are established on some data in the source data table, and the range of index data is constrained by a set of constraints.
Partial index syntax uses where to specify constraints.
set_baserel_size_estimates determines the query time of constraints
This involves the query cost caused by data query constraints, so we need to know the number of data tuples that meet the constraints.
The number can be obtained in two ways:
- Full table scanning ---- it will cost a huge I/O cost, so it is not the minimum cost
- Tuple proportion satisfying constraint conditions ---- tuple proportion needs to be obtained through an algorithm. The "selection rate" is used here
Selection rate: describes the discrimination of a number in a table. (because the stored data is changing, the value is also changing)
[selection rate]: the possible discrimination of the index in the whole table (the discrimination also changes with the stored data variables, which is the query plan drift. Someone is studying this, which is not discussed here). Because full table scanning will access a lot of I/O, when creating an index, use where to add judgment statements and create some indexes.
Numerical type calculation formula:
[Num(No.(x - 1)) + (N - X.min)/(X.max-X.min)] / NumOfBucket
Num(No.(x - 1)): the number of barrels before the barrels satisfied under the constraint
N: Constraints
10. Max, X.min: meet the upper and lower boundaries of the barrel under constraints
NumOfBucket: total number of buckets
cpu_cost_function io_cost_function query execution cost
Finally, the optimal or suboptimal query plan will be calculated through the cost of CPU and I/O
set_base_rel_pathlists sets the feasible scanning scheme for each base table
Common base table is the basis of sub query and common expression (CTE), so here we focus on the access path construction method of common base table.
Three query paths for base tables: sequential scan, index scan, and TID scan
Therefore, there are also three access methods to the base table, set_ base_ rel_ Each method is built in the pathlists function to find the optimal access method.