Label
PostgreSQL, 10.0, Radix tree, character encoding conversion
background
PostgreSQL 10.0 uses radix tree to improve the performance of UTF-8 and other character encoding conversions.
The encoding map file is arranged according to the new radix tree, and its performance is much better than that of binary search.
Use radix tree for character encoding conversions. author Heikki Linnakangas <heikki.linnakangas@iki.fi> Mon, 13 Mar 2017 18:46:39 +0000 (20:46 +0200) committer Heikki Linnakangas <heikki.linnakangas@iki.fi> Mon, 13 Mar 2017 18:46:39 +0000 (20:46 +0200) Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp
For this discussion of patch, see Mail Group, URL at the end of this article.
PostgreSQL community style is very rigorous, a patch may be discussed in the mail group for several months or even years, according to your opinion repeated amendments, patch merged into master is very mature, so the stability of PostgreSQL is well known.