Label
PostgreSQL, HTAP, OLTP, OLAP, Scenario and Performance Testing
background
PostgreSQL is a database with a long history, which can be traced back to 1973. It was first awarded by the 2014 Computer Turing Prize winner and the originator of relational database. Michael_Stonebraker PostgreSQL has similar functions, performance, architecture and stability as Oracle.
The PostgreSQL community has many contributors, from all walks of life around the world. Over the years, PostgreSQL releases a large version every year, known for its enduring vitality and stability.
In October 2017, PostgreSQL released version 10, which carries many amazing features. The goal is to meet the requirements of mixed HTAP scenarios of OLAP and OLTP:
PostgreSQL 10 features, the most popular HTAP database for developers
1. Multi-core Parallel Enhancement
2. fdw aggregation push-down
3. Logical Subscription
4, zoning
5. Multi-copies at the financial level
6. json, jsonb full-text retrieval
7. There are also some features of plug-in form, such as vector computing, JIT, SQL graph computing, SQL stream computing, distributed parallel computing, time series processing, gene sequencing, chemical analysis, image analysis, etc.
PostgreSQL can be seen in various application scenarios:
PostgreSQL has developed very rapidly in recent years. From the database scoring trend of dbranking, a well-known database evaluation website, we can see the upward trend of PostgreSQL:
From the annual community meetings held in PostgreSQL China, we can see the same trend. More and more companies are participating, more and more companies are sharing, and more and more subjects are shared. They span traditional enterprises, Internet, medical, financial, state-owned enterprises, logistics, e-commerce, social networking, vehicle networking, sharing XX, cloud, game, public transport, aviation, railway, military, training. Training, consulting services, etc.
The next series of articles will introduce various application scenarios and corresponding performance indicators of PostgreSQL.
Environmental Science
Reference to Environmental Deployment Method:
Aliyun ECS: 56 core, 224G, 1.5TB*2 SSD cloud disk.
Operating System: CentOS 7.4 x64
Database version: PostgreSQL 10
PS: The CPU and IO performance of ECS is discounted by physical opportunity, and can be estimated by one-fold performance decline. Running physical mainframe can be estimated by multiplying the performance tested here by 2.
Scenario-String Search-Similarity Query (OLTP)
1, background
String search is a very common business requirement, which includes:
1. Prefix + Fuzzy Query. (b-tree index can be used)
2. Suffix + Fuzzy Query. (b-tree index can be used)
3. Fuzzy query before and after. (You can use pg_trgm and gin indexes)
https://www.postgresql.org/docs/10/static/pgtrgm.html
4. Full-text retrieval. (Full-text search types and gin or rum indexes can be used)
5. Regular query. (You can use pg_trgm and gin indexes)
6. Similar query. (You can use pg_trgm and gin indexes)
Generally speaking, the database does not have the ability to speed up after 3 years, but PostgreSQL is very powerful, it can perfectly support the acceleration of such queries. (Refers to queries and writes that do not conflict, and the index BUILD is real-time.)
Users do not need to synchronize data to search engines, and then query, and search engines can only do full-text retrieval, and you do not do regular, similar, ambiguous these requirements.
PostgreSQL can greatly simplify the user's architecture, cost of development, and ensure the absolute real-time data query.
2, design
There are 100 million texts, each of which is a random string of 128 Chinese characters. Similar queries are made according to randomly provided strings.
Similar queries solve problems that can not be solved by fuzzy queries and full-text retrieval, such as business requirements PostgreSQL, which can match p0stgresl. Because they have enough similarity.
https://www.postgresql.org/docs/10/static/pgtrgm.html
PostgreSQL Text Data Analysis Practice - Similarity Analysis
17 Text Similarity Algorithms and GIN Index-pg_similarity
Talking about Similarity Algorithms - Effective similarity search in PostgreSQL
Similar applications include image similarity search:
Application of PostgreSQL in Video and Picture Reduplication and Image Search Business
"Weak water 3,000, only take a ladle, when image search meets PostgreSQL (Haar Wavet)"
3. Preparing test sheets
create extension pg_trgm; create table t_likeall ( id int, info text ); create index idx_t_likeall_1 on t_likeall using gin (info gin_trgm_ops); -- Setting Similarity Threshold(0-1,floating-point) -- select set_limit(0.7); -- Query records that exceed similar thresholds -- SELECT info, similarity(info, 'Character string') AS sml -- FROM t_likeall -- WHERE info % 'Character string' -- Find records that exceed similar thresholds -- ORDER BY sml DESC;
4. Preparing test functions (optional)
-- Generating Random Chinese Character Strings create or replace function gen_hanzi(int) returns text as $$ declare res text; begin if $1 >=1 then select string_agg(chr(19968+(random()*20901)::int), '') into res from generate_series(1,$1); return res; end if; return null; end; $$ language plpgsql strict;
-- Using random strings like Query (for pressure measurement) create or replace function get_t_likeall_test(int, real) returns setof record as $$ declare str text; begin -- Select an input string select info into str from t_likeall_test where id=$1; -- Setting Similarity Threshold perform set_limit($2); -- Find records that exceed similar thresholds return query execute 'select *, '''||str||''' as str, similarity(info, '''||str||''') as sml from t_likeall where info % '''||str||''' order by sml DESC limit 1'; end; $$ language plpgsql strict;
5. Preparing test data
insert into t_likeall select id, gen_hanzi(128) from generate_series(1,100000000) t(id);
Generate 2 million random strings and add some noise interference to achieve the purpose of similar queries.
create table t_likeall_test (id serial primary key, info text); -- Interception of 120 Chinese characters starting at any position, Adding 6 jamming characters, 2 million insert into t_likeall_test (info) select substring(info, (random()*10)::int, 120)||gen_hanzi(6) from t_likeall limit 2000000;
6. Preparing test scripts
vi test.sql \set id random(1,2000000) select * from get_t_likeall_test(:id, 0.855) as t(id int , info text, str text, sml real);
7, test
Single similar query efficiency, response time is less than 20 milliseconds. (With bound variables and CACHE hits, the response time is lower. )
postgres=# select * from get_t_likeall_test(2, 0.855) as t(id int , info text, str text, sml real); -[ RECORD 1 ]---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- id | 2 info | Move and subtract silk, scallion, singing fragrance and donating secretly str | Sing Fangjia's Dictionary with Shallots sml | 0.855072 Time: 19.455 ms
Pressure measurement
CONNECTS=56 TIMES=300 export PGHOST=$PGDATA export PGPORT=1999 export PGUSER=postgres export PGPASSWORD=postgres export PGDATABASE=postgres pgbench -M prepared -n -r -f ./test.sql -P 5 -c $CONNECTS -j $CONNECTS -T $TIMES
8. Test results
transaction type: ./test.sql scaling factor: 1 query mode: prepared number of clients: 56 number of threads: 56 duration: 300 s number of transactions actually processed: 459463 latency average = 36.562 ms latency stddev = 8.063 ms tps = 1531.259508 (including connections establishing) tps = 1531.422742 (excluding connections establishing) script statistics: - statement latencies in milliseconds: 0.002 \set id random(1,2000000) 36.565 select * from get_t_likeall_test(:id, 0.855) as t(id int , info text, str text, sml real);
TPS: 1531
Average response time: 36.562 milliseconds
Reference resources
PostgreSQL, Greenplum Application Case Book "Tathagata Palm" - Catalogue
"Database Selection - Elephant Eighteen Touches - To Architects and Developers"
PostgreSQL uses pgbench to test sysbench related case s