PostgreSQL spatial aggregation performance - administrative area, electronic fence spatial aggregation - time, space thermal map

Keywords: Database PostgreSQL SQL Oracle

Label

PostgreSQL, spatial aggregation, spatial heat map, administrative region, electronic fence

background

For a certain time interval (or other conditions), how many objects (space point information) appear in some fences and administrative areas (polygon information), and render these polygons by color depth.

for example

Example

1. Area data - fence, administrative area (polygon information)

Generate 10000 area data

create table t1 (  
  gid int,   -- polygon ID(User defined fences, administrative areas)  
  face box   -- Spatial information, which can be used in practice PostGIS Of geometry type  
);  
  
insert into t1 select row_number() over(), box (point(x,y),point(x+1,y+1)) from generate_series(0,99) t1(x),generate_series(0,99) t2(y);  
  
create index idx_t1_face on t1 using gist(face);  

2. Point data, location information of spatial objects

Generate 10 million points of data

create table t2 (  
  id int,   -- object ID  
  pos point,   -- Location, when in use, can be used PostGIS Of geometry type  
  att text   -- Other properties, you can have more other properties  
);  
  
insert into t2 select id, point(random()*100, random()*100) from generate_series(1,10000000) t(id);  

3. Create a function, input the value of the point, and get the value of the face.

Support for parallel SQL

create or replace function get_gid(point) returns int as $$  
  select gid from t1 where face @> box($1,$1) limit 1;  
$$ language sql strict immutable parallel safe;  

4. The SQL aggregated by GID space is as follows

select gid, count(*) from t1 join t2 on (t1.face @> box(t2.pos,t2.pos)) group by gid;  
  
//or  
  
select get_gid(pos) as gid, count(*) from t2 group by 1;  

5. Parallel aggregation with PG

postgres=# show max_worker_processes ;  
 max_worker_processes   
----------------------  
 128  
(1 row)  
  
  
set max_parallel_workers=128;  
set max_parallel_workers_per_gather =28;  
set enable_sort=off;  
set parallel_tuple_cost =0;  
set parallel_setup_cost =0;  
set min_parallel_table_scan_size =0;  
set min_parallel_index_scan_size =0;  
set work_mem ='2GB';  
alter table t1 set (parallel_workers =28);  
alter table t2 set (parallel_workers =28);  
analyze t1;  
analyze t2;  

10 million points, 10000 surfaces, space polymerization performance

5.6 seconds

postgres=# explain analyze select get_gid(pos) as gid, count(*) from t2 group by 1;  
                                                               QUERY PLAN                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------  
 Finalize HashAggregate  (cost=86550.79..86594.18 rows=200 width=12) (actual time=5592.898..5594.204 rows=10000 loops=1)  
   Group Key: (get_gid(pos))  
   ->  Gather  (cost=86483.13..86526.52 rows=5600 width=12) (actual time=5485.528..5536.356 rows=290000 loops=1)  
         Workers Planned: 28  
         Workers Launched: 28  
         ->  Partial HashAggregate  (cost=86483.13..86526.52 rows=200 width=12) (actual time=5443.795..5445.860 rows=10000 loops=29)  
               Group Key: get_gid(pos)  
               ->  Parallel Seq Scan on t2  (cost=0.00..84806.71 rows=386720 width=4) (actual time=1.014..5311.532 rows=344828 loops=29)  
 Planning Time: 0.118 ms  
 Execution Time: 5595.278 ms  
(10 rows)  

1 million points, 10000 surfaces, space polymerization performance

690 milliseconds

truncate t2;  
insert into t2 select id, point(random()*100, random()*100) from generate_series(1,1000000) t(id);  
  
postgres=# explain analyze select get_gid(pos) as gid, count(*) from t2 group by 1;  
                                                             QUERY PLAN                                                                
-------------------------------------------------------------------------------------------------------------------------------------  
 Finalize HashAggregate  (cost=8061.46..8104.85 rows=200 width=12) (actual time=687.602..688.897 rows=10000 loops=1)  
   Group Key: (get_gid(pos))  
   ->  Gather  (cost=7993.80..8037.18 rows=5600 width=12) (actual time=582.986..632.877 rows=280419 loops=1)  
         Workers Planned: 28  
         Workers Launched: 28  
         ->  Partial HashAggregate  (cost=7993.80..8037.18 rows=200 width=12) (actual time=541.534..543.355 rows=9670 loops=29)  
               Group Key: get_gid(pos)  
               ->  Parallel Seq Scan on t2  (cost=0.00..7838.98 rows=35714 width=4) (actual time=1.010..527.932 rows=34483 loops=29)  
 Planning Time: 0.130 ms  
 Execution Time: 689.867 ms  
(10 rows)  

If prepolymerization - faster

For example, when data is written, the GID is calculated, written into the GID field of T2 table, and aggregated directly by GID. Speed will fly.

10 million points, 258 milliseconds.

postgres=# alter table t2 add column gid int;
ALTER TABLE
  
-- hypothesis GID Calculated in advance (e.g insert Set to get_gid(pos),It's just for testing. All of them are updated at one time.)
postgres=# update t2 set gid = get_gid(pos);
UPDATE 10000000
postgres=# vacuum full t2;
VACUUM



postgres=# explain analyze select gid,count(*) from t2 group by gid;
                                                              QUERY PLAN                                                              
--------------------------------------------------------------------------------------------------------------------------------------
 Finalize HashAggregate  (cost=3882.87..3884.76 rows=9983 width=12) (actual time=255.763..257.094 rows=10000 loops=1)
   Group Key: gid
   ->  Gather  (cost=2669.26..2671.14 rows=279524 width=12) (actual time=135.953..200.398 rows=290000 loops=1)
         Workers Planned: 28
         Workers Launched: 28
         ->  Partial HashAggregate  (cost=2669.26..2671.14 rows=9983 width=12) (actual time=98.026..99.585 rows=10000 loops=29)
               Group Key: gid
               ->  Parallel Seq Scan on t2  (cost=0.00..1121.05 rows=357143 width=4) (actual time=0.005..30.248 rows=344828 loops=29)
 Planning Time: 0.078 ms
 Execution Time: 258.268 ms
(10 rows)

Reference resources

PostGIS is slightly slower than the built-in geometry operator.

PostgreSQL Oracle compatibility - Custom parallel aggregate function

HTAP database PostgreSQL scenario and performance test 23 - (OLAP) parallel computing

https://www.postgresql.org/docs/11/functions-geometry.html

http://postgis.net/

Posted by ZephyrWest on Thu, 21 Nov 2019 14:12:51 -0800