Reflections on PostgreSQL BRIN Index Kernel Code Optimization

Keywords: PostgreSQL Database

Label

PostgreSQL, BRIN block-level index, scanning method, data structure, pages_per_range algorithm

background

BRIN is a new block-level index interface for PostgreSQL 9.5, which stores the boundary values (maximum, minimum) and other statistical information of the indexed fields at the block level.

When a field needs to be retrieved, the whole BRIN index needs to be scanned (this is the point worth optimizing at the core level of the BRIN index in the future). Then skip the HEAP PAGE that does not meet the requirement and scan the HEAP PAGE that does not meet the requirement. To achieve the purpose of data filtering.

As a result of the principle, the better the correlation is, the better the filterability of BRIN index is.

BRIN also supports a variety of types, multi-column fields and so on.

1, multiple columns

2, single row

3. Spatial data types

If you still don't know about BRIN, you can read some of my case articles.

Selection and Optimization of PostGIS Spatial Index (GiST, BRIN, R-Tree) - Aliyun RDS PostgreSQL Best Practice

Automatically Selecting the Right Index Access Interface (btree,hash,gin,gist,sp-gist,brin,bitmap...)

PostgreSQL Parallel Writing to Heap Table, How to Guarantee Sequential Linear Storage-BRIN Index Optimization

PostgreSQL 10.0 preview Enhancement - BRIN Index Update smooth ness

PostgreSQL Aggregated Storage and BRIN Index - High Concurrency Behavior, Track Type High Throughput Data Query Scenario Interpretation

PostgreSQL Internet of Things Black Technology - Index of Several Hundreds of Times of Weight Loss (BRIN index)

<PostgreSQL 9.5 new feature - lets BRIN be used with R-Tree-like indexing strategies For "inclusion" opclasses>

<PostgreSQL 9.5 new feature - BRIN (block range index) index>

How many blocks do pages_per_range, the parameter supported by BRIN index, count the boundaries at a time?

This paper will take the following scenario as an example to explain the setting algorithm of pages_per_range parameter of BRIN index, the optimization of BRIN index column and the idea of optimization of BRIN index core.

"Trillion (100TB) E-commerce Advertising - PostgreSQL Single Machine How to Realize Millisecond Circle"

brin Scanning Principle

The scanning principle of BRIN index is very simple. It scans the metadata of BRIN, compares the metadata with the user input conditions, filters the HEAP PAGE that does not meet the requirements, and only scans the HEAP PAGE that needs to be scanned.

Relevance optimization of BRIN index columns

Because BRIN is a block-level index, if the boundaries of blocks are large, or the overlap between blocks is high, then the filter of BRIN index is poor.

So BRIN is only suitable for storing columns with good linear correlation with values.

pg_stats.correlation can observe the linear correlation of columns.

Of course, we can also modify its storage artificially, change its linear correlation (sorted storage is the simplest method), and even change the local linear correlation. If you want to know more about the underlying principles, please refer to the following article.

Decrypted Hand of God - Aliyun HDB for PostgreSQL database metascan features (storage level, block level, batch level filtering and data orchestration)

Points Optimized by Multiple Conditional Scanning

When our query condition is multiple query conditions, PostgreSQL combines the scans of multiple indexes into one, skipping unqualified ones. This is bitmapAnd, bitmapOr.

<PostgreSQL bitmapAnd, bitmapOr, bitmap index scan, bitmap heap scan>

However, because the scanning of PostgreSQL BRIN index needs to scan the whole BRIN index, so every condition needs to be scanned once. When BRIN itself is large, the condition will multiply in more than one time.

The scanning mode of BRIN index is the key point that PostgreSQL can optimize at the future kernel level, such as organizing the boundary of BRIN by tree, without sweeping it all at once (too violent). PostgreSQL hasn't been done at present. Perhaps no one has built a BRIN index of page s_per_range=1 on tens of billions of tables.

In fact, I'll show you this later. Everyone knows why I want to optimize the pages_per_range parameter.

BRIN Index Parameter pages_per_range Selection Recommendation Algorithms

pages_per_range is granular and defaults to 128 (which means counting boundaries every 128 data blocks), which determines two things.

1. The accuracy of BRIN index. pages_per_range=1, indicating that the boundary is accurate to a block of data. The smaller the pages_per_range, the higher the accuracy, the better the filtering (note that the better the filtering depends on the linear correlation of the column, otherwise it is blind).

2. The size of BRIN index itself. The smaller pages_per_range, the larger the BRIN index itself. The bigger the BRIN is, the higher the cost of scanning BRIN blocks with a single BRIN index is.

So how appropriate is pages_per_range set?

According to my experience, 311GB table, set to 512 is a good choice. The smaller the table, the smaller the pages_per_range setting can be.

For 311GB tables, if pages_per_range=1, the BRIN index itself is as large as 1.6GB. It's hard to sweep. When set to 512, it's only about a few MB. Sweep it quickly (although the filterability may be poor, BRIN is swept once for every condition).

DEMO

1,pages_per_range=1

postgres=# \d bi_user_tmall_vis1  
    Unlogged table "public.bi_user_tmall_vis1"  
 Column |  Type   | Collation | Nullable | Default   
--------+---------+-----------+----------+---------  
 uid    | bigint  |           |          |   
 bid    | bigint  |           |          |   
 cnt    | integer |           |          |   
Indexes:  
    "idx_bi_user_tmall_vis1" brin (bid, cnt) WITH (pages_per_range='1')  
  
  
 public | idx_bi_user_tmall_vis1 | index | postgres | bi_user_tmall_vis1 | 1644 MB    |   
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where bid=1 and cnt between 1 and 100;  
                                                                  QUERY PLAN                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=264463.65..274155.70 rows=7351 width=20) (actual time=8213.046..8213.057 rows=4 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
   Rows Removed by Index Recheck: 153  
   Heap Blocks: lossy=1  
   Buffers: shared hit=269675  
   ->  Bitmap Index Scan on idx_bi_user_tmall_vis1  (cost=0.00..264461.81 rows=7379 width=0) (actual time=8213.023..8213.023 rows=10 loops=1)  
         Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
         Buffers: shared hit=269674  
 Planning time: 0.046 ms  
 Execution time: 8213.080 ms  
(11 rows)  

2,pages_per_range=128

postgres=# \d bi_user_tmall_vis1  
    Unlogged table "public.bi_user_tmall_vis1"  
 Column |  Type   | Collation | Nullable | Default   
--------+---------+-----------+----------+---------  
 uid    | bigint  |           |          |   
 bid    | bigint  |           |          |   
 cnt    | integer |           |          |   
Indexes:  
    "idx_bi_user_tmall_vis1" brin (bid, cnt) WITH (pages_per_range='128')  
    
 public | idx_bi1        | index | postgres | bi_user_tmall_vis1 | 13 MB      |   
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100);  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=2071.47..28408.93 rows=7351 width=20) (actual time=61.110..62.974 rows=4 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
   Rows Removed by Index Recheck: 20092  
   Heap Blocks: lossy=128  
   Buffers: shared hit=2236  
   ->  Bitmap Index Scan on idx_bi1  (cost=0.00..2069.63 rows=20096 width=0) (actual time=61.100..61.100 rows=1280 loops=1)  
         Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
         Buffers: shared hit=2108  
 Planning time: 0.072 ms  
 Execution time: 62.994 ms  
(11 rows)  
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) or (bid=12000 and cnt <10000);  
                                                                                                                                 QUERY PLAN                                                                                                    
                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=6324.38..242299.15 rows=153721 width=20) (actual time=184.909..191.652 rows=138 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)) OR ((bi_user_tmall_vis1.bid = 12000) AND (bi  
_user_tmall_vis1.cnt < 10000)))  
   Rows Removed by Index Recheck: 60150  
   Heap Blocks: lossy=384  
   Buffers: shared hit=6708  
   ->  BitmapOr  (cost=6324.38..6324.38 rows=180864 width=0) (actual time=184.896..184.896 rows=0 loops=1)  
         Buffers: shared hit=6324  
         ->  Bitmap Index Scan on idx_bi1  (cost=0.00..2069.63 rows=20096 width=0) (actual time=61.600..61.600 rows=1280 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
               Buffers: shared hit=2108  
         ->  Bitmap Index Scan on idx_bi1  (cost=0.00..2069.73 rows=80384 width=0) (actual time=61.522..61.522 rows=1280 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=2108  
         ->  Bitmap Index Scan on idx_bi1  (cost=0.00..2069.73 rows=80384 width=0) (actual time=61.773..61.773 rows=1280 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=2108  
 Planning time: 0.091 ms  
 Execution time: 191.684 ms  
(19 rows)  

3, pages_per_range=256

public | idx_bi         | index | postgres | bi_user_tmall_vis1 | 6624 kB    |   
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where bid=1 and cnt between 1 and 100;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=1038.00..53587.92 rows=7351 width=20) (actual time=30.259..33.742 rows=4 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
   Rows Removed by Index Recheck: 40188  
   Heap Blocks: lossy=256  
   Buffers: shared hit=1310  
   ->  Bitmap Index Scan on idx_bi  (cost=0.00..1036.16 rows=40192 width=0) (actual time=30.251..30.251 rows=2560 loops=1)  
         Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
         Buffers: shared hit=1054  
 Planning time: 0.061 ms  
 Execution time: 33.759 ms  
(11 rows)  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) or (bid=12000 and cnt <10000);  
                                                                                                                                 QUERY PLAN                                                                                                    
                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=3223.91..265138.74 rows=153721 width=20) (actual time=90.760..105.509 rows=138 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)) OR ((bi_user_tmall_vis1.bid = 12000) AND (bi  
_user_tmall_vis1.cnt < 10000)))  
   Rows Removed by Index Recheck: 120438  
   Heap Blocks: lossy=768  
   Buffers: shared hit=3930  
   ->  BitmapOr  (cost=3223.91..3223.91 rows=200960 width=0) (actual time=90.746..90.746 rows=0 loops=1)  
         Buffers: shared hit=3162  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..1036.16 rows=40192 width=0) (actual time=30.838..30.838 rows=2560 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
               Buffers: shared hit=1054  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..1036.23 rows=80384 width=0) (actual time=29.966..29.966 rows=2560 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=1054  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..1036.23 rows=80384 width=0) (actual time=29.940..29.940 rows=2560 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=1054  
 Planning time: 0.131 ms  
 Execution time: 105.555 ms  
(19 rows)  

4,pages_per_range=512

 public | idx_bi                 | index | postgres | bi_user_tmall_vis1 | 3336 kB    |   
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where bid=1 and cnt between 1 and 100;  
                                                               QUERY PLAN                                                                 
----------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=521.47..105255.40 rows=7351 width=20) (actual time=16.024..25.791 rows=4 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
   Rows Removed by Index Recheck: 80380  
   Heap Blocks: lossy=512  
   Buffers: shared hit=529 read=511  
   ->  Bitmap Index Scan on idx_bi  (cost=0.00..519.63 rows=80384 width=0) (actual time=16.010..16.010 rows=5120 loops=1)  
         Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
         Buffers: shared hit=528  
 Planning time: 0.238 ms  
 Execution time: 25.822 ms  
(11 rows)  
  
  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) or (bid=12000 and cnt <10000);  
                                                                                                                                 QUERY PLAN                                                                                                    
                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=1674.17..315338.06 rows=153721 width=20) (actual time=47.115..78.014 rows=138 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)) OR ((bi_user_tmall_vis1.bid = 12000) AND (bi  
_user_tmall_vis1.cnt < 10000)))  
   Rows Removed by Index Recheck: 241014  
   Heap Blocks: lossy=1536  
   Buffers: shared hit=2608 read=512  
   ->  BitmapOr  (cost=1674.17..1674.17 rows=241151 width=0) (actual time=47.099..47.099 rows=0 loops=1)  
         Buffers: shared hit=1584  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..519.63 rows=80384 width=0) (actual time=16.167..16.167 rows=5120 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
               Buffers: shared hit=528  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..519.63 rows=80384 width=0) (actual time=15.494..15.494 rows=5120 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=528  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..519.63 rows=80384 width=0) (actual time=15.437..15.437 rows=5120 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=528  
 Planning time: 0.145 ms  
 Execution time: 78.062 ms  
(19 rows)  

5,pages_per_range=sqrt(pg_class.relpages)=6384

 public | idx_bi         | index | postgres | bi_user_tmall_vis1 | 312 kB     |   
  
postgres=# create index idx_bi on bi_user_tmall_vis1 using brin (bid,cnt) WITH (pages_per_range='6384');  
CREATE INDEX  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) or (bid=12000 and cnt <10000);  
                                                                                                                                 QUERY PLAN                                                                                                    
                                  
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
--------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=252.98..3620468.00 rows=153721 width=20) (actual time=4.027..138.993 rows=138 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)) OR ((bi_user_tmall_vis1.bid = 12000) AND (bi  
_user_tmall_vis1.cnt < 10000)))  
   Rows Removed by Index Recheck: 1002150  
   Heap Blocks: lossy=6384  
   Buffers: shared hit=1662 read=4848  
   ->  BitmapOr  (cost=252.98..252.98 rows=3006577 width=0) (actual time=4.010..4.010 rows=0 loops=1)  
         Buffers: shared hit=126  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.373..1.373 rows=63840 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
               Buffers: shared hit=42  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.325..1.325 rows=63840 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=42  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.310..1.310 rows=63840 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 12000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=42  
 Planning time: 0.307 ms  
 Execution time: 139.046 ms  
(19 rows)  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100) or (bid=2000 and cnt <10000) ;  
                                                                                            QUERY PLAN                                                                                              
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=132.06..2459840.30 rows=80537 width=20) (actual time=2.735..112.409 rows=65 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: (((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100)) OR ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000)))  
   Rows Removed by Index Recheck: 1002223  
   Heap Blocks: lossy=6384  
   Buffers: shared hit=6468  
   ->  BitmapOr  (cost=132.06..132.06 rows=2004385 width=0) (actual time=2.720..2.720 rows=0 loops=1)  
         Buffers: shared hit=84  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.401..1.401 rows=63840 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
               Buffers: shared hit=42  
         ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.318..1.318 rows=63840 loops=1)  
               Index Cond: ((bi_user_tmall_vis1.bid = 2000) AND (bi_user_tmall_vis1.cnt < 10000))  
               Buffers: shared hit=42  
 Planning time: 0.126 ms  
 Execution time: 112.449 ms  
(16 rows)  
  
postgres=# explain (analyze,timing,costs,buffers,verbose) select * from bi_user_tmall_vis1 where (bid=1 and cnt between 1 and 100);  
                                                              QUERY PLAN                                                                 
---------------------------------------------------------------------------------------------------------------------------------------  
 Bitmap Heap Scan on public.bi_user_tmall_vis1  (cost=47.73..1258330.06 rows=7351 width=20) (actual time=1.381..97.717 rows=4 loops=1)  
   Output: uid, bid, cnt  
   Recheck Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
   Rows Removed by Index Recheck: 1002284  
   Heap Blocks: lossy=6384  
   Buffers: shared hit=6426  
   ->  Bitmap Index Scan on idx_bi  (cost=0.00..45.90 rows=1002192 width=0) (actual time=1.368..1.368 rows=63840 loops=1)  
         Index Cond: ((bi_user_tmall_vis1.bid = 1) AND (bi_user_tmall_vis1.cnt >= 1) AND (bi_user_tmall_vis1.cnt <= 100))  
         Buffers: shared hit=42  
 Planning time: 0.109 ms  
 Execution time: 97.744 ms  
(11 rows)  

Contrast of different pages_per_range

Index accuracy Single Table Data Volume Single table size Index size 1 conditions 2 conditions 3 conditions
pages_per_range=1 6 billion 400 million 311GB 1.6GB 8.2 seconds - -
pages_per_range=128 6 billion 400 million 311GB 13MB 62 milliseconds - 191 milliseconds
pages_per_range=256 6 billion 400 million 311GB 6MB 33 milliseconds - 105 milliseconds
pages_per_range=512 6 billion 400 million 311GB 3MB 25 milliseconds - 78 milliseconds
pages_per_range=sqrt(pg_class.relpages)=6384 6 billion 400 million 311GB 300KB 97 milliseconds 112 milliseconds 139 milliseconds

Although the precision is high, the bigger the index itself is, the higher the cost of scanning the index itself is. 8.2 seconds is the case.

When the precision is set to 512, the single condition becomes 25 milliseconds, while the index size is only 3 MB.

It's not unexpected to be unhappy.

Reflections on BRIN Kernel Optimization

In order to reduce the scanning overhead of BRIN index itself, we can regenerate the boundary of BRIN index into a tree, scan through the tree, improve the speed, not the way of full scan.

In the future, we just need to consider the accuracy = 1. Because this filter is the best, and the scanning cost of BRIN index itself is very low. Thus, the efficiency of BRIN index is greatly improved under the condition of massive data.

Summary

This paper mainly talks about the principle of BRIN index, the principle of scanning, the principle of precision parameters, and how to select precision parameters. It also talks about how to reduce the cost of BRIN index itself by adjusting the method of optimizing BRIN index scanning.

Well, have a good time. You know how much PostgreSQL, she can give you how much, PG is a very playable enterprise-level open source database, refueling.

The characteristics of BRIN index can be used to support massive data filtering at trillion level or even larger, while the storage of index and the impact on writing are almost zero. Absolutely belongs to the characteristics of black technology level.

Reference resources

Selection and Optimization of PostGIS Spatial Index (GiST, BRIN, R-Tree) - Aliyun RDS PostgreSQL Best Practice

Automatically Selecting the Right Index Access Interface (btree,hash,gin,gist,sp-gist,brin,bitmap...)

PostgreSQL Parallel Writing to Heap Table, How to Guarantee Sequential Linear Storage-BRIN Index Optimization

PostgreSQL 10.0 preview Enhancement - BRIN Index Update smooth ness

PostgreSQL Aggregated Storage and BRIN Index - High Concurrency Behavior, Track Type High Throughput Data Query Scenario Interpretation

PostgreSQL Internet of Things Black Technology - Index of Several Hundreds of Times of Weight Loss (BRIN index)

<PostgreSQL 9.5 new feature - lets BRIN be used with R-Tree-like indexing strategies For "inclusion" opclasses>

<PostgreSQL 9.5 new feature - BRIN (block range index) index>

Posted by monkeypaw201 on Thu, 30 May 2019 12:21:31 -0700