PostgreSQL NULL value relative position and QUERY optimization - nulls first last, asc desc

Keywords: github PostgreSQL Database SQL

Label

PostgreSQL, NULLS FIRST, NULLS LAST, ASC, DESC, default behavior, sort

background

In the database, NULL value refers to the value of UNKNOWN, which does not store any value. When sorting, it is specified by grammar before or after the line with value.

for example

-- Express null In front of the line of value  
select * from tbl order by id nulls first;  
  
-- Express null Behind the Valuable Line  
select * from tbl order by id nulls last;

At the same time, for rows with values, you can specify the order or the inversion.

-- Express press ID Column row row  
select * from tbl order by id [asc];  
  
-- Express press ID Column reverse row  
select * from tbl order by id desc;

The default ordering rules are as follows:

desc nulls first : null large small    
  
asc nulls last : small large null

This is true when nulls [first|last] is combined with asc|desc.

The order of values is as follows:

1. DEFAULT: (Considering that NULL is larger than any value)

desc nulls first : Order: null large small    
  
asc nulls last   : Order: small large null

2. NON DEFAULT: (Considering that NULL is smaller than any value)

desc nulls last : Order: large small null       
  
asc nulls first : Order: null small large

Since the index is fixed, if the sort condition does not match the sort rule of the index when the sort condition is input, the benefit of using the index (sequential scanning) will be unavailable. Cause some unnecessary trouble.

Problems arising from inconsistency between index definition and scan definition

1. Build tables and input test data

create table cc(id int not null);  
  
insert into cc select generate_series(1,1000000);

2. Indexing (null is smaller than any value with non-default configuration)

create index idx_cc on cc (id asc nulls first);  
  
//or  
  
create index idx_cc on cc (id desc nulls last);

3. When queries are inconsistent with the order of index definition (referring to the relative position of NULL), SORT needs to be re-SORT even if index is used.

select * from table order by id desc nulls first limit 1;   
select * from table order by id [asc] nulls last limit 1;

Extra SORT was used

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id limit 1;  
                                                                 QUERY PLAN                                                                    
---------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=27969.43..27969.43 rows=1 width=4) (actual time=263.972..263.972 rows=1 loops=1)  
   Output: id  
   Buffers: shared hit=7160  
   ->  Sort  (cost=27969.43..30469.43 rows=1000000 width=4) (actual time=263.970..263.970 rows=1 loops=1)  
         Output: id  
         Sort Key: cc.id  
         Sort Method: top-N heapsort  Memory: 25kB  
         Buffers: shared hit=7160  
         ->  Bitmap Heap Scan on public.cc  (cost=8544.42..22969.42 rows=1000000 width=4) (actual time=29.927..148.733 rows=1000000 loops=1)  
               Output: id  
               Heap Blocks: exact=4425  
               Buffers: shared hit=7160  
               ->  Bitmap Index Scan on idx_cc  (cost=0.00..8294.42 rows=1000000 width=0) (actual time=29.380..29.380 rows=1000000 loops=1)  
                     Buffers: shared hit=2735  
 Planning time: 0.098 ms  
 Execution time: 264.009 ms  
(16 rows)

3. Query, when the index definition is consistent (referring to the relative location of NULL), the index is valid and no additional SORT is required.

select * from table order by id desc nulls last limit 1;   
select * from table order by id [asc] nulls first limit 1;

No additional SORT required.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from cc order by id nulls first limit 1;  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.42..0.45 rows=1 width=4) (actual time=0.014..0.014 rows=1 loops=1)  
   Output: id  
   Buffers: shared hit=4  
   ->  Index Only Scan using idx_cc on public.cc  (cost=0.42..22719.62 rows=1000000 width=4) (actual time=0.013..0.013 rows=1 loops=1)  
         Output: id  
         Heap Fetches: 1  
         Buffers: shared hit=4  
 Planning time: 0.026 ms  
 Execution time: 0.022 ms  
(9 rows)

Summary

In PostgreSQL, sequential and inverted indexes are common. The difference is the relative position of null.

Therefore, when creating an index, it is important to align it with the business requirements and use a consistent NULL relative order (nulls first or nulls last with asc,desc) (that is, NULL next to large value or small value), whereas asc,desc, the value, is actually indifferent.

If the order of business requirements is inconsistent with the order of index (referring to the relative order of null), it will lead to the problem of index requiring full sweep and SORT.

Kernel improvement

1. When the constraints are not null, the relative position of care null should not be allowed, because there is no NULL value, the optimizer should be able to choose non-Sort mode scanning regardless of whether the relative position of NULL is consistent with the SQL of the business request.

2. Improve index scanning method to support circular scanning.

Posted by comicrage on Mon, 10 Dec 2018 18:06:05 -0800

Programmer Group