CentOS 7 postgresql13 installs zhparser and configures Chinese full-text retrieval

Keywords: Database PostgreSQL

Requirements: use postgresql13 to test the Chinese full-text retrieval function

Install postgresql13 database

I use the docker container to install and test
1. Create a centos7 image container

docker run -di --name postgres13 --privileged=true -p 5432:5432 centos:7 /usr/sbin/init
docker exec -it postgres13 bash

2. Install yum source

yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm

3. Start installation

yum install -y gcc gcc-c++ epel-release llvm5.0 llvm5.0-devel clang libicu-devel perl-ExtUtils-Embed readline readline-devel zlib zlib-devel openssl openssl-devel pam-devel libxml2-devel libxslt-devel openldap-devel systemd-devel tcl-devel python-devel centos-release-scl-rh
yum makecache
yum -y install postgresql13-server.x86_64 postgresql13-devel.x86_64 

5. Initialize database

/usr/pgsql-13/bin/postgresql-13-setup initdb
systemctl enable postgresql-13
systemctl start postgresql-13

6. Configure account information

su -postgres
psql

Create user

create user test with password '123.com';

Create database

create database test_db owner test;

sign out

\q

7. Set remote configuration
Modify the configuration file postgresql.conf – allow remote

Location: / var/lib/pgsql/10/data/postgresql.conf
Modify: listen_addresses = '*'

Modify profile pg_hba.conf – access rules

Location: / var/lib/pgsql/10/data/pg_hba.conf
Modify: add host all 0.0.0.0/0 MD5 access rule under Ipv4 at the bottom

Restart service

systemctl restart postgresql-13.service

Full text retrieval

A data type tsvector is provided to store the preprocessed documents, and a type tsquery is provided to represent the processed documents query . There are many functions and operators available for these data type , the most important of which is the matching operator @@ . Full text search can Use indexes to speed up.

Full text search in PostgreSQL is based on the matching operator @ @, which returns true when a tsvector (document) matches a tsquery (query)

e.g
Search for the existence of cat and rat in a fat cat sat on a mat and ate a fat rat

SELECT 'a fat cat sat on a mat and ate a fat rat'::tsvector @@ 'cat & rat'::tsquery;
 ?column?
----------
 t

As suggested in the above example, a tsquery is NOT just an unprocessed text, at most a tsvector. A tsquery contains search terms. They must be normalized word bits, AND and, OR, NOT, AND FOLLOWED BY can be used Operators combine multiple terms . There are several functions to_tsquery,plainto_tsquery and phraseto_tsquery can be used to convert the text written by users into correct tsquery. They mainly use the method of regularizing the words appearing in the text. Similarly, to_tsvector is used to parse and normalize a document string. So in practice, a text search match may look more like:

SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('fat & rat');
 ?column? 
----------
 t

In tsquery, the & (AND) operator specifies that both of its parameters must appear in the document to indicate a match. Similarly, the | (OR) operator specifies that at least one parameter must appear, AND! The (NOT) operator specifies that its parameters cannot be matched until they appear. For ex amp le, query fat &! Rat matches documents that contain fat but NOT rat.
e.g
Search fat cats ate fat rats for ate and no ca

SELECT to_tsvector('fat cats ate fat rats') @@ to_tsquery('ate & ! ca');
 ?column? 
----------
 t

Judge whether two words are adjacent

SELECT to_tsvector('fatal error not fatal') @@ to_tsquery('fatal <-> error');
 ?column? 
----------
 t
 
SELECT to_tsvector('fatal error not fatal') @@ to_tsquery('fatal <-> fatal');
 ?column? 
----------
 f

Query fields from tables

First create a test table

CREATE TABLE test(id int,info TEXT,crt_time TIMESTAMP);
INSERT INTO test SELECT generate_series(1,10000),md5(random()::TEXT),clock_timestamp();

You can do a full-text search without an index.
View the row of fcd801772aa08e53eb52fefc0504467d in the id column in the info column

SELECT id
FROM test
WHERE to_tsvector('english', info) @@ to_tsquery('english', 'fcd801772aa08e53eb52fefc0504467d');


The above query specifies that you want to use the english configuration to parse and normalize strings. We can also ignore the configuration parameters:

SELECT id
FROM test
WHERE to_tsvector(info) @@ to_tsquery('fcd801772aa08e53eb52fefc0504467d');

View the full-text retrieval configuration that is already installed in the database

\dF *


At present, full-text retrieval does not support Chinese, and additional plug-ins need to be installed

Extended file installation directory: / usr/pgsql-13/share/extension

Install Chinese search component

install zhparser

scws needs to be installed before installing zhparser
Install compiler tools

yum install gcc make wget git -y
wget http://www.xunsearch.com/scws/down/scws-1.2.3.tar.bz2
./configure && make && make install
git clone https://github.com/amutu/zhparser.git
export PATH=/usr/pgsql-13/bin:$PATH
make && make install

Login database

[root@229372496fad zhparser-master]# su - postgres
-bash-4.2$ psql
psql (13.4)
Type "help" for help.

postgres=# create extension zhparser;
CREATE EXTENSION
postgres=# \dFp
         List of text search parsers
   Schema   |   Name   |     Description     
------------+----------+---------------------
 pg_catalog | default  | default word parser
 public     | zhparser | 
(2 rows)

Problem summary

Question 1: make: pg_config: Command not found make: *** No targets. Stop.
solve:

export PATH=/usr/pgsql-13/bin:$PATH

Recompile and install again

make & make install

Posted by tlawless on Mon, 27 Sep 2021 21:32:17 -0700