Technology behind Unmanned Driving -_PostGIS Point Cloud Application

Keywords: PostgreSQL SQL Database Session

Summary: Labels PostgreSQL, PostGIS, box, grid, pointcloud, pgpointcloud, point aggregation, KNN, auto-driving, auto-distribution, scenes of science fiction movies with unmanned backgrounds are gradually changing from screen to reality with the development of technology.

Label

PostgreSQL, PostGIS, box, grid, pointcloud, pgpointcloud, point aggregation, KNN, automatic driving, automatic distribution, unmanned driving, robot distribution, Logistics

background

Scenes of science fiction movies are gradually changing from screen to reality with the development of technology.From military to civil, such as unmanned aerial vehicles that automobile manufacturers and technology companies are trying to drive.

Unmanned driving has a wide range of applications, such as the courier industry. When the time is ripe, the profession of courier may gradually disappear from society in the future (emancipating the courier's hands and creativity, allowing more people to participate in technological changes in their lives).

Whether it is an unmanned vehicle or an aircraft, navigation is indispensable, navigation is indispensable to location information, which is composed of countless points.The more points, the higher the precision, just like photos, the pixels and light-sensitive elements of the former mobile phone were very rough, and the photos taken were not as clear as those of the present mobile phone.

To achieve unmanned availability, the density of points is also required, for example, 1 square kilometer is 1 million points per square meter.One meter is obviously a rough data, but it's good that if you're just driving, you don't need to cover all the points, just the area covered by the road.

For example, how many points do 100 kilometers of a two-way, six-lane, 25-meter wide road need to be described?If you need 2.5 million points per square meter, if you divide 10 more grids per square meter, you need 25 million points, if you divide 100 grids per square meter, you need 250 million points.(PostgreSQL PostGIS allows you to do this more finely, using point latitude and longitude (float8,float8), rather than a grid representation.Some databases are represented by geohash with limited precision.Interesting children's shoes can be found in Gehash vs PostGIS> )

Once you know the approximate point size, how can it be combined with unmanned driving?

For ease of description, point lattices are referred to as point clouds.When starting points and paths for unmanned driving are set, points involving roads need to be loaded into the vehicle, and each point should have other attributes besides its location, such as baskethole covers, pits, cement road OR bitumen roads, traffic lights, hard shoulders, etc. These attributes can be stored using other fields.

Point clouds are used as location and attribute data in conjunction with auto-driving software.

1. What to use to store point clouds

PostGIS is a very popular GIS data management plug-in, which is widely used in astronomy, scientific research, military industry, and the Internet. RDS for PostgreSQL in Aliyun integrates PostGIS.You can choose RDS for PostgreSQL or install PostgreSQL+PostGIS yourself.

We can store the data in PostgreSQL, use geometry for the data type, use GiST for the index, and retrieve performance leverage.

2. Modeling

Each road consists of several points. When the vehicle starts, it locates and loads nearby points to return to the points that intersect the polygon of the road.To be unmanned, the resolution (density) of point clouds is very demanding. As mentioned earlier, a two-way, six-lane road with a width of 25 meters requires 250 million points for 100 kilometers per square meter.

If an unmanned vehicle runs at a speed of 1 kilometer per minute (equivalent to 60 kilometers per hour), then the point loading speed should reach at least 2.5 million points per minute.

Let's first use each record to represent a point (let's talk about optimization later).

1. Table structure (with RDS for PostgreSQL as an example)

create extension postgis;  -- Establish postgis Plug-in unit  
  
postgres=# create table cloudpoint_test(  
  id serial primary key,  -- Primary key  
  loc geometry,        -- Latitude and longitude(or point)  
  other text           -- Other Properties  
);  
CREATE TABLE

2. GIST Index

postgres=# create index idx on cloudpoint_test using gist(loc) with (buffering=on);  
CREATE INDEX

3. Data writing speed of point clouds

100 million test data, 50 concurrent, 100 million test data.(The boundary of the random point is 10,000 vertically, covering one point.)

vi ins.sql  
  
insert into cloudpoint_test (loc,other) values (st_makepoint(random()*10000, random()*10000) , 'test');

2.Pour in data performance indicators, about 166,000 records per second.

pgbench -M prepared -n -r -P 1 -f ./ins.sql -c 50 -j 50 -t 2000000  
  
transaction type: Custom query  
scaling factor: 1  
query mode: prepared  
number of clients: 50  
number of threads: 50  
number of transactions per client: 2000000  
number of transactions actually processed: 100000000/100000000  
latency average: 0.298 ms  
latency stddev: 0.854 ms  
tps = 166737.438650 (including connections establishing)  
tps = 166739.148413 (excluding connections establishing)  
statement latencies in milliseconds:  
        0.297896        insert into cloudpoint_test (loc,other) values (st_makepoint(random()*10000, random()*10000) , 'test');

4. Point cloud search design

For best performance, it is recommended to refer to the following article, Principle please be interested in the children's shoes open for close reading

Find near GIS performance optimization - PostGIS long lat geometry distance search tuning using gist knn function

1. The search point function is as follows

create or replace function ff(geometry, float8, int) returns setof record as $$                                                          
declare  
  v_rec record;  
  v_limit int := $3;  
begin  
  set local enable_seqscan=off;   -- Forced Index, Scan lines enough to exit.  
  for v_rec in   
    select *,  
    ST_Distance ( $1, loc ) as dist   
    from cloudpoint_test   
    order by loc <-> $1           -- Return from near to far in distance order  
  loop  
    if v_limit <=0 then           -- Determine if the number of records returned has reached LIMIT Number of records  
      raise notice 'Sufficient limit Set % Bar data, But distance % Points within may also have.', $3, $2;  
      return;  
    end if;  
    if v_rec.dist > $2 then       -- Determine if the distance is greater than the requested distance   
      raise notice 'distance % Points within have been output', $2;  
      return;  
    else  
      return next v_rec;  
    end if;  
    v_limit := v_limit -1;  
  end loop;  
end;  
$$ language plpgsql strict volatile;

2. Search SQL Usage

The search distance st_makepoint(1500,1500) is less than 100 and returns up to 10,000 records (limie 10000) by distance.

postgres=# select * from ff(st_makepoint(1500,1500), 100, 10000) as t (id int, loc geometry, other text, dist float8);  
NOTICE:  Sufficient limit 10,000 sets of data, But points within 100 may still exist.  
    id     |                    loc                     | other |       dist          
-----------+--------------------------------------------+-------+-------------------  
  54528779 | 01010000000000EFF6307297400000010D306E9740 | test  | 0.710901366481036  
  52422694 | 01010000000080EE51B171974000003DE6256D9740 | test  | 0.829108575682196  
  20123322 | 0101000000000074AD5C6F97400000C766CE739740 | test  |   0.9648380442046  
  58784192 | 010100000000803A65F4749740008012FDD8709740 | test  |  1.25666215808279  
......

5. Search speed of point clouds

1. Search for 200,000 points within 10,000 near st_makepoint(5000,5000) and return from near to far.

1 second.

explain (analyze,verbose,timing,costs,buffers) select * from ff(st_makepoint(5000,5000), 1000, 200000) as t(id int, loc geometry, other text, dist float8);  
  
NOTICE:  Sufficient limit 200,000 sets of data, But points within 1,000 may still exist.  
                                                       QUERY PLAN                                                         
------------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff t  (cost=0.25..10.25 rows=1000 width=76) (actual time=917.748..945.838 rows=200000 loops=1)  
   Output: id, loc, other, dist  
   Function Call: ff('0101000000000000000088B340000000000088B340'::geometry, 1000::double precision, 200000)  
   Buffers: shared hit=201288, temp read=1418 written=1417  
 Planning time: 0.057 ms  
 Execution time: 959.534 ms  
(6 rows)

2. Search for 1 million points within 2000 near st_makepoint(5000,5000) and return from near to far.

10 seconds.

explain (analyze,verbose,timing,costs,buffers) select * from ff(st_makepoint(5000,5000), 2000, 1000000) as t(id int, loc geometry, other text, dist float8);  
  
NOTICE:  Sufficient limit 1000000 data sets, But points within 2,000 may still exist.  
                                                        QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff t  (cost=0.25..10.25 rows=1000 width=76) (actual time=8867.007..9006.401 rows=1000000 loops=1)  
   Output: id, loc, other, dist  
   Function Call: ff('0101000000000000000088B340000000000088B340'::geometry, 2000::double precision, 1000000)  
   Buffers: shared hit=1006220, temp read=7082 written=7081  
 Planning time: 0.059 ms  
 Execution time: 9074.267 ms  
(6 rows)

3. Search for 5 million points within 3,000 million near st_makepoint(5000,5000) and return from near to far.

43 seconds.

explain (analyze,verbose,timing,costs,buffers) select * from ff(st_makepoint(5000,5000), 3000, 5000000) as t(id int, loc geometry, other text, dist float8);  
  
NOTICE:  Sufficient limit 5000000 data sets, But points within 3,000 may still exist.  
                                                         QUERY PLAN                                                            
-----------------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff t  (cost=0.25..10.25 rows=1000 width=76) (actual time=42168.038..42861.462 rows=5000000 loops=1)  
   Output: id, loc, other, dist  
   Function Call: ff('0101000000000000000088B340000000000088B340'::geometry, 3000::double precision, 5000000)  
   Buffers: shared hit=5030448, temp read=35402 written=35401  
 Planning time: 0.060 ms  
 Execution time: 43201.879 ms  
(6 rows)

Now it looks like the performance is good, right?But there is no end to optimizing, so there is still room for optimizing. If you are interested, you can continue to look down.

6. Point Cloud Search Bottleneck Analysis

Bottlenecks fall into three categories

1. Function callback

Because of the large number of records that need to be returned, PostgreSQL also has a large number of callbacks to return records, which results in a large overhead.See:

Analysis Acceleration Engine Black Technology - LLVM, Column Storage, Multicore Parallel, Operator Reuse Big Marriage - Together to Open PostgreSQL's Treasure Box

Use perf to observe

perf record -ag -p $PID  
  
perf report -ag --stdio

2. Discrete Scan IO Amplification

Since point data may be random, there is no order at the heap storage level and no order when searching for points near a point. Discrete IO requests result in HEAP IO amplification.The principle is detailed in:

Statistical Principles and Solutions Behind Heap Scan IO Amplification Induced by Index Sequential Scan

3. GiST Index Discreteness Quality

When GiST indexes are built, there are also problems with the dispersion of index entries. Reference:

https://www.postgresql.org/docs/10.0/static/gist-implementation.html

Beginning in version 9.2,   
PostgreSQL supports a more efficient method to build GiST indexes based on buffering,   
which can dramatically reduce the number of random I/Os needed for non-ordered data sets.   
  
For well-ordered data sets the benefit is smaller or non-existent,   
because only a small number of pages receive new tuples at a time,   
and those pages fit in cache even if the index as whole does not.

How to optimize the above three questions?

7. Point cloud search optimization

1. get next tuple callback optimization

There are too many callbacks, there may be bottlenecks when returning or processing too many records, and there are three optimization methods.

1 Aggregation

Previously, we refined to 100 points per square meter (100 records), and if we aggregate these 100 records into one record per square meter (as a grid), the total number of records will drop 100 times.

100 million records fell to 1 million, and the number of records requested by users dropped 100 times.

The aggregated table structure is designed as follows

postgres=# create table cloudpoint_test_agg(  
  id serial primary key,        -- Primary key  
  loc_box geometry,          -- lattice(Closed polygon), Represents points within a range  
  loc_agg geometry[],        -- Latitude and longitude arrays(or point array)  
  other_agg text[]           -- Array of other properties  
);  
CREATE TABLE  
  
create index idx_cloudpoint_test_agg_1 on cloudpoint_test_agg using gist(loc_box) with (buffering=on);

How do I aggregate?

First, road polygons are gridded and converted to adjacent hexagonal grids using the following methods:

Art and Technology Value of Honeycomb - PostgreSQL PostGIS's hex-grid

The converted road information is as follows

create table cloudpoint_test_grid(  
  id serial primary key,    -- Hexagon ID  
  loc_box geometry     -- Single hexagon  
);

The JOIN of grid and point cloud data is aggregated into aggregated data.

insert into cloudpoint_test_agg (loc_box,loc_agg,other_agg)  
  select t1.loc_box, array_agg(t2.loc) as loc_agg, array_agg(t2.other) as other_agg from   
    cloudpoint_test_grid t1 join cloudpoint_test t2  
    on (ST_Contains(t1.loc_box, t2.loc)) group by t1.loc_box;

Example

This example uses standard squares for gridding, ((0,0), (10000,10000)) areas are gridded as follows

One grid per 100*100.

http://postgis.net/docs/manual-2.3/ST_MakeBox2D.html

do language plpgsql $$  
declare  
  x int;  
  y int;  
begin  
  for x in select generate_series(0,10000,100) loop  
    for y in select generate_series(0,10000,100) loop  
      insert into cloudpoint_test_grid(loc_box) values (ST_MakeBox2D( st_makepoint(x,y), st_makepoint(x+100, y+100) ));  
    end loop;  
  end loop;  
end;  
$$;

postgres=# select * from cloudpoint_test_grid;  
  id   |                                                                                          loc_box                                                                                             
-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
     1 | 010300000001000000050000000000000000000000000000000000000000000000000000000000000000005940000000000000594000000000000059400000000000005940000000000000000000000000000000000000000000000000  
     2 | 010300000001000000050000000000000000000000000000000000594000000000000000000000000000006940000000000000594000000000000069400000000000005940000000000000594000000000000000000000000000005940  
     3 | 010300000001000000050000000000000000000000000000000000694000000000000000000000000000C0724000000000000059400000000000C072400000000000005940000000000000694000000000000000000000000000006940  
     4 | 0103000000010000000500000000000000000000000000000000C07240000000000000000000000000000079400000000000005940000000000000794000000000000059400000000000C0724000000000000000000000000000C07240  
  
......

Generate aggregated data

insert into cloudpoint_test_agg (loc_box,loc_agg,other_agg)  
  select t1.loc_box, array_agg(t2.loc) as loc_agg, array_agg(t2.other) as other_agg from   
    cloudpoint_test_grid t1 join cloudpoint_test t2  
    on (ST_Contains(t1.loc_box, t2.loc)) group by t1.loc_box;   
  
INSERT 0 10000  
  
select count(*) from cloudpoint_test_agg;  
10000

The function to search for nearby points from aggregated point cloud data is as follows

create or replace function ff1(geometry, float8, int) returns setof record as $$                                                          
declare  
  v_rec record;  
  v_limit int := $3;  
begin  
  set local enable_seqscan=off;   -- Forced Index, Scan lines enough to exit.  
  for v_rec in   
    select *,  
    ST_Distance ( $1, loc_box ) as dist   
    from cloudpoint_test_agg   
    order by loc_box <-> $1           -- Return from near to far in distance order  
  loop  
    if v_limit <=0 then           -- Determine if the number of records returned has reached LIMIT Number of records  
      raise notice 'Sufficient limit Set % Bar data, But distance % Points within may also have.', $3, $2;  
      return;  
    end if;  
    if v_rec.dist > $2 then       -- Determine if the distance is greater than the requested distance   
      raise notice 'distance % Points within have been output', $2;  
      return;  
    else  
      return next v_rec;  
    end if;  
    v_limit := v_limit - array_length(v_rec.loc_agg, 1);  -- Deductions grid Internal point Number  
  end loop;  
end;  
$$ language plpgsql strict volatile;

Example of aggregated search SQL

The search distance st_makepoint(1500,1500) is less than 100 and returns up to 10,000 records (limie 10000) by distance.

postgres=# select * from ff1(st_makepoint(1500,1500), 100, 10000) as t (id int, loc_box geometry, loc_agg geometry[], other_agg text[], dist float8);

Aggregated Speed Test

1. Search for 200,000 points within 10,000 near st_makepoint(5000,5000) and return from near to far.

15ms.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff1(st_makepoint(5000,5000), 1000, 200000) as t (id int, loc_box geometry, loc_agg geometry[], other_agg text[], dist float8);  
NOTICE:  Sufficient limit 200,000 sets of data, But points within 1,000 may still exist.  
                                                     QUERY PLAN                                                       
--------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff1 t  (cost=0.25..10.25 rows=1000 width=108) (actual time=15.642..15.643 rows=20 loops=1)  
   Output: id, loc_box, loc_agg, other_agg, dist  
   Function Call: ff1('0101000000000000000088B340000000000088B340'::geometry, 1000::double precision, 200000)  
   Buffers: shared hit=465  
 Planning time: 0.082 ms  
 Execution time: 15.670 ms  
(6 rows)

2. Search for 1 million points within 2000 near st_makepoint(5000,5000) and return from near to far.

63ms.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff1(st_makepoint(5000,5000), 2000, 1000000) as t (id int, loc_box geometry, loc_agg geometry[], other_agg text[], dist float8);  
NOTICE:  Sufficient limit 1000000 data sets, But points within 2,000 may still exist.  
                                                     QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff1 t  (cost=0.25..10.25 rows=1000 width=108) (actual time=63.287..63.296 rows=100 loops=1)  
   Output: id, loc_box, loc_agg, other_agg, dist  
   Function Call: ff1('0101000000000000000088B340000000000088B340'::geometry, 2000::double precision, 1000000)  
   Buffers: shared hit=2143  
 Planning time: 0.057 ms  
 Execution time: 63.322 ms  
(6 rows)

3. Search for 5 million points within 3,000 million near st_makepoint(5000,5000) and return from near to far.

312 milliseconds.

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from ff1(st_makepoint(5000,5000), 3000, 5000000) as t (id int, loc_box geometry, loc_agg geometry[], other_agg text[], dist float8);  
NOTICE:  Sufficient limit 5000000 data sets, But points within 3,000 may still exist.  
                                                      QUERY PLAN                                                         
-----------------------------------------------------------------------------------------------------------------------  
 Function Scan on public.ff1 t  (cost=0.25..10.25 rows=1000 width=108) (actual time=312.315..312.394 rows=501 loops=1)  
   Output: id, loc_box, loc_agg, other_agg, dist  
   Function Call: ff1('0101000000000000000088B340000000000088B340'::geometry, 3000::double precision, 5000000)  
   Buffers: shared hit=10729  
 Planning time: 0.077 ms  
 Execution time: 312.463 ms  
(6 rows)

Performance comparison before and after aggregation

2 pgpointcloud

pgpointcloud is a plug-in for storing point cloud data (LIDAR) in PostgreSQL, which has functionality similar to PostGIS raster data, as well as richer extensions.

https://github.com/pgpointcloud/pointcloud

LIDAR sensors can quickly generate millions of points when scanning space, each containing not only XYZ coordinate values, but also other dimension information, such as time, RGB values, other attributes, and possibly a batch of points.

Therefore, there is no fixed type to store data reported by the LIDAR sensor. pgpointcloud uses the "schema document" type to describe data reported by the LIDAR sensor in a format consistent with the standard of the PDAL library.

Each point contains information from several dimensions, each of which may be of any data type, similar to a composite type.

LIDAR sensors quickly produce millions of points with large numbers of variables measured on each point.   
  
The challenge for a point cloud database extension is efficiently storing this data while allowing high fidelity access to the many variables stored.  
  
Much of the complexity in handling LIDAR comes from the need to deal with multiple variables per point.   
  
The variables captured by LIDAR sensors varies by sensor and capture process.   
  
Some data sets might contain only X/Y/Z values.   
  
Others will contain dozens of variables: X, Y, Z; intensity and return number; red, green, and blue values; return times; and many more.   
  
There is no consistency in how variables are stored:   
intensity might be stored in a 4-byte integer, or in a single byte; X/Y/Z might be doubles, or they might be scaled 4-byte integers.  
  
PostgreSQL Pointcloud deals with all this variability by using a "schema document" to describe the contents of any particular LIDAR point.   
  
Each point contains a number of dimensions, and each dimension can be of any data type,   
with scaling and/or offsets applied to move between the actual value and the value stored in the database.   
  
The schema document format used by PostgreSQL Pointcloud is the same one used by the PDAL library.

The pgpointcloud is a plug-in specifically designed to handle LIDAR data and is recommended.

3 LLVM(JIT)

The cost of getting next tuple may be magnified due to the number of records being processed. The solution is to reduce the cost of callbacks, either by using code gen or by using vector computing.

References are as follows:

PostgreSQL 10.0 preview Performance Enhancement - Launch of JIT Development Framework (Towards HTAP)

Analysis Acceleration Engine Black Technology - LLVM, Column Storage, Multicore Parallel, Operator Reuse Big Marriage - Together to Open PostgreSQL's Treasure Box

PostgreSQL Vectorization Execution Plug-in (Tile Implementation) 10x Speed Up OLAP

The performance improvements refer to the above documents, where vector computing has a 10-fold improvement and LLVM has at least a 2-fold improvement.

2. GiST index optimization

GiST index optimization to reduce fragmentation.Use GiST's buffering switch, which can be specified when indexing.

create index idx_cloudpoint_test_agg_1 on cloudpoint_test_agg using gist(loc_box) with (buffering=on);  
create index idx_cloudpoint_test_1 on cloudpoint_test using gist(loc) with (buffering=on);

3. Streaming Return

Due to the large number of records returned, in addition to batch one-time returns, there is also a method, streaming returns.

Streaming returns have two methods, cursor and asynchronous message.

1. Cursors implement streaming returns.

begin;  
  
declare cur1 cursor for select * from (select *, ST_Distance ( st_makepoint(5000,5000), loc ) as dist from cloudpoint_test order by st_makepoint(5000,5000) <-> loc ) t where dist < 1000;  
  
fetch 100 from cur1;  
  
fetch ...;  
  
-- The client receives enough data, or when the distance exceeds, does not receive any more, closes the cursor, and exits the transaction.  
  
close cur1;  
  
end;

Specific usage, refer to PostgreSQL driver corresponding to programming language, such as jdbc for postgresql, libpq, etc.

2. Streaming returns are achieved with asynchronous messages.

Session 1, listening channel

listen abcd;

Session 2, Initiate Request, Send Asynchronous Message to Listening Channel

create or replace function ff(geometry, float8, int, text) returns void as $$   
declare  
  v_rec record;  
  v_limit int := $3;  
begin  
  set local enable_seqscan=off;   -- Forced Index, Scan lines enough to exit.  
  for v_rec in   
    select *,  
    ST_Distance ( $1, loc ) as dist   
    from cloudpoint_test   
    order by loc <-> $1           -- Return from near to far in distance order  
  loop  
    if v_limit <=0 then           -- Determine if the number of records returned has reached LIMIT Number of records  
      raise notice 'Sufficient limit Set % Bar data, But distance % Points within may also have.', $3, $2;  
      return;  
    end if;  
    if v_rec.dist > $2 then       -- Determine if the distance is greater than the requested distance   
      raise notice 'distance % Points within have been output', $2;  
      return;  
    else  
      -- return next v_rec;  
      perform pg_notify ($4, v_rec::text);  
    end if;  
    v_limit := v_limit -1;  
  end loop;  
end;  
$$ language plpgsql strict volatile;

Session 2 initiates a search request

postgres=# select ff(st_makepoint(5000,5000), 1000, 10, 'abcd');  
NOTICE:  Sufficient limit 10 sets of data, But points within 1,000 may still exist.  
 ff   
----  
   
(1 row)

Session 1 will receive messages from channels asynchronously

Specific usage, refer to PostgreSQL driver corresponding to programming language, such as jdbc for postgresql, libpq, etc.