Gausdb 200 uses GDS to import data from a remote server

Keywords: Big Data SSL PostgreSQL Database

Gausdb 200 supports importing data in TEXT, CSV and FIXED formats that exist on the remote server into the cluster. This article introduces the use of GDS (Gauss Data Service) tools to import data from remote servers into GaussDB 200. The environment is as follows:

1. Prepare source data

Here, from the PostgreSQL database, use the copy command to export a csv file, as follows:

rhnschema=> copy rhnpackagefile to '/tmp/rhnpackagefile.csv' with csv;

2. Install and start GDS

  • Create users and user groups
[root@hd04 bin]# groupadd gdsgrp
[root@hd04 bin]# useradd -g gdsgrp gds_user
  • Create related directory
    Create storage source data and GDS installation directory (customized).
    [root@hd04 ~]# MKDIR / opt / bin -- GDS installation directory
    [root@hd04 ~]# mkdir /opt/data -- source data storage directory
    [root@hd04 ~]# chown -R gds_user:gdsgrp /opt/bin/gds
    [root@hd04 ~]# chown -R gds_user:gdsgrp /opt/data
  • Copy source data to data directory
    [root@hd04 ~]# su - gds_user
    [gds_user@hd04 ~]$ scp root@hd01:/tmp/rhnpackagefile.csv /opt/data
  • Install GDS service
    GDS tools are located in the GassDB installation package and need to be extracted separately. Take the installation package of redhat as an example, as follows:
    [root@hd04 ~]# tar -xzf GaussDB_200_6.5.1_RHEL.tar.gz

    After decompression, find the GaussDB-Kernel-V300R002C00-REDHAT-64bit-Gds.tar.gz package, and extract it to the GDS installation directory.

    [gds_user@hd04 ~]$ cd /opt/bin
    [gds_user@hd04 bin]$ tar -xzf /tmp/GaussDB-Kernel-V300R002C00-REDHAT-64bit-Gds.tar.gz

    Then modify / etc/profile as root user and add the following contents:

    [root@hd04 ~]# vi /etc/profile
    export LD_LIBRARY_PATH="/opt/bin/lib:$LD_LIBRARY_PATH"

    This is mainly to configure the GDS library path, because it relies on the Cjson dynamic library.

  • Start GDS service
    [root@hd04 ~]# su - gds_user
    [gds_user@hd04 ~]$ /opt/bin/gds/gds -d /opt/data -p 192.168.120.25:5000 -H 192.168.110.1/24,192.168.120.1/24 -l /opt/bin/gds/gds_log.txt -D

    There are two starting modes of GDS: non ssl data transmission mode and ssl encryption data transmission mode. The non ssl data transfer mode is used above.
    For ssl mode, you need to copy the GDS certificate created by GaussDB to the GDS installation directory, as follows:

    [gds_user@hd04 ~]$ cd /opt/bin/
    [gds_user@hd04 bin]$ scp -r root@hd01:/opt/huawei/Bigdata/mppdb/core/share/sslcert/gds ./

    Start GDS service in ssl mode as follows:

    [gds_user@hd04 ~]$ /opt/bin/gds/gds -d /opt/data -p 192.168.120.25:5000 -H 192.168.110.1/24 -l /opt/bin/gds/gds_log.txt -D --enable-ssl --ssl-dir /opt/bin/gds

    Parameter description is shown in the figure below:

    3. Create tables and import data

    Here, you need to create the appearance and target table as follows:

    The appearance parameters are as follows:

    Create the target table as follows:

    Use the following command to import data:

    rhnschema=# INSERT INTO rhnpackagefile2 SELECT * FROM f_rhnpackagefile;
    INSERT 0 27942567


    During the import process, if there is a data format error, the system will record it in an error table (automatically created by the system, the default name format is table name err), for example: rhnpackagefile error.

    4. Optimize the query performance of imported data

    After the data import is completed, execute the ANALYZE statement to generate table statistics. The execution plan generator uses these statistics to generate the most efficient query execution plan.

If a large number of updates or deletions are made during the import process, run the VACUUM FULL command and then the ANALYZE command. A large number of update and delete operations will produce a large number of disk page fragments, which will gradually reduce the efficiency of the query. VACUUM FULL can recover disk page fragments and return them to the operating system.

rhnschema=# VACUUM FULL rhnpackagefile2;
VACUUM
rhnschema=# ANALYZE rhnpackagefile2;
ANALYZE

5. Stop GDS service

After the import is complete, you can stop GDS services if they are not needed. As a GDS user, query the process number of GDS service and kill it.

Posted by phppssh on Sun, 03 Nov 2019 23:13:09 -0800