Four deployment modes and basic operations of HBase

Keywords: Programming HBase Zookeeper ssh Hadoop

This paper mainly includes two parts. The first part mainly includes four installation methods of HBase, which are: (1) stand-alone mode, (2) pseudo-cluster mode, (3) using zookeeper which comes with HBase to build distributed cluster, and (4) using zookeeper which is installed independently to build distributed cluster. The second part shows the basic operation of HBase through the HBase shell, such as creating tables, inserting records, querying records, deleting records and so on.

The deployment of HBase includes:

Deployment mode Explain
standalone mode Stand-alone mode, often used for local development
Pseudo Cluster Model Use the zookeeper that comes with HBase
Cluster mode Use the zookeeper that comes with HBase
Cluster mode Install zookeeper separately

I. Installation of HBase

The installation of HBase in this paper is based on Hadoop already installed, so we need to export environment variables such as JAVA_HOME, HADOOP_HOME (no need for stand-alone mode, pseudo-distributed mode and distributed mode) and configure SSH mutual trust.

0 Public Configuration

Export environment variables for HBase

export HBASE_HOME=/root/software/hbase-1.2.1
export PATH=$PATH:$HBASE_HOME/bin

View the hbase version: hbase version

1 stand-alone mode

Configure hbase-env.sh

Add the following to hbase-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64
export HBASE_MANAGES_ZK=true 

Note: HBASE_MANAGES_ZK=true means that zookeeper is managed by hbase itself, and no separate zookeeper is needed. HBASE_MANAGES_ZK=false means that zookeeper is deployed independently.

Configure hbase-site.xml

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>file:///data/hbase</value>
    </property>
</configuration>

Note: hbase.rootdir is used to specify the storage location of HBase data, because if not set, hbase.rootdir defaults to / tmp/hbase-${user.name}, which means that data will be lost every time the system restarts. In this configuration, HBase uses the default file system directly.

start and stopping

$HBASE_HOME/bin/start-hbase.sh
$HBASE_HOME/bin/stop-hbase.sh

2 Pseudo Cluster Model

Configure hbase-env.sh

Add the following to hbase-env.sh

export JAVA_HOME=$JAVA_HOME
export HBASE_CLASSPATH=$HADOOP_HOME/etc/hadoop
export HBASE_MANAGES_ZK=true 
  • HBASE_MANAGES_ZK=true, which means that zookeeper is managed by hbase itself, and no separate deployment of zookeeper is required.
  • ExpoHBASE_CLASSPATH="", which means that hdfs is used as the storage of HBase.

Configure hbase-site.xml

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://master:8020/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
</configuration>

Explain:

  • base.rootdir, which specifies the storage location of HBase data, is already using hdfs.
  • hbase.cluster.distributed sets the cluster in a distributed mode.

start and stopping

Start (including start hdfs and hbase)

$HBASE_HOME/sbin/start-dfs.sh
$HBASE_HOME/bin/start-hbase.sh

Stop (including stop hbase and hdfs)

$HBASE_HOME/bin/stop-hbase.sh
$HBASE_HOME/sbin/stop-dfs.sh

3 Cluster mode (using zookeeper with hbase)

Configure hbase-env.sh

Add the following to hbase-env.sh

export JAVA_HOME=$JAVA_HOME
export HBASE_CLASSPATH=$HADOOP_HOME/etc/hadoop
export HBASE_MANAGES_ZK=true
  • HBASE_MANAGES_ZK=true, which means using the zookeeper that comes with hbase.

Configure hbase-site.xml

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://master:8020/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.master</name>
        <value>master:6000</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>slave1:2181,slave2:2181,slave3:2181</value>
    </property>
    <property>
        <name>zookeeper.znode.parent</name>
        <value>/hbase</value>
    </property>
    <property>
        <name>hbase.zookeeper.property</name>
        <value>/data/zookeeper/data</value>
    </property>
</configuration>

Explain:

  • base.rootdir is used to specify the storage location of HBase data.
  • hbase.cluster.distributed sets the cluster in a distributed mode.
  • hbase.master specifies the host name and port of the hmaster of hbase;
  • hbase.zookeeper.quorum specifies that the host address using zookeeper must be odd.
  • hbase.zookeeper.property specifies the zookeeper data storage directory. The default path is / tmp. If not configured, the data will be emptied after reboot.

Configure regionservers

Add the slave node of HBase in the region servers file, similar to the slaves in hadoop, one line at a time.

slave1
slave2
slave3

start and stopping

Start (including start hdfs and hbase)

$HBASE_HOME/sbin/start-dfs.sh
$HBASE_HOME/bin/start-hbase.sh

Stop (including stop hbase and hdfs)

$HBASE_HOME/bin/stop-hbase.sh
$HBASE_HOME/sbin/stop-dfs.sh

After successful start-up of HBase:

  • The processes on the master node are: HMaster
  • Processes on slave nodes are: HRegionServer, HQuorumPerr

4 cluster mode (zookeeper installed separately)

Install zookeeper

The download and decompression of zookeeper are not discussed here, but the configuration of zookeeper cluster is started directly.
First, export the zookeeper environment edit and add the following to the ~/.bash_profile

export ZOOKEEPER_HOME=/root/software/zookeeper-3.4.10
export PATH=$PATH:$ZOOKEEPER_HOME/bin

After decompressing zookeeper, enter the conf directory and copy to generate zoo.cfg

cp zoo_sample.cfg zoo.cfg

Configure zoo.cfg. Add the following to zoo. CFG

clientPort=2181
dataDir=/data/zookeeper/zk_data
server.1=master:2888:3888
server.2=slave2:2888:3888
server.3=slave3:2888:3888

Explanation: The first port is the communication port between master and slave. The default port is 2888. The second port is the port for leader election. The default port for new election is 3888 when the cluster is started or after the leader is suspended.

Distribution of decompressed zookeeper to slave 2, slave 3

scp -r /root/software/zookeeper-3.4.10 slave2:/root/software/zookeeper-3.4.10
scp -r /root/software/zookeeper-3.4.10 slave3:/root/software/zookeeper-3.4.10

Create a data catalog

ssh root@master 'mkdir -p /data/zookeeper/zk_data'
ssh root@slave2 'mkdir -p /data/zookeeper/zk_data'
ssh root@slave3 'mkdir -p /data/zookeeper/zk_data'

Write to myid

ssh root@master  'echo 1 > /data/zookeeper/zk_data/myid'
ssh root@slave2  'echo 2 > /data/zookeeper/zk_data/myid'
ssh root@slave3  'echo 3 > /data/zookeeper/zk_data/myid'

Note: The zookeeper cluster deployed in this paper consists of three nodes: master, slave1 and slave2. The value written to myid should correspond to the value after server in zoo.cfg.

Start, stop, and view zookeeper status

ssh root@master   '/root/software/zookeeper-3.4.10/bin/zkServer.sh start'
ssh root@slave2   '/root/software/zookeeper-3.4.10/bin/zkServer.sh start'
ssh root@slave3   '/root/software/zookeeper-3.4.10/bin/zkServer.sh start'

ssh root@master   '/root/software/zookeeper-3.4.10/bin/zkServer.sh status'
ssh root@slave2   '/root/software/zookeeper-3.4.10/bin/zkServer.sh status'
ssh root@slave3  '/root/software/zookeeper-3.4.10/bin/zkServer.sh status'

ssh root@master   '/root/software/zookeeper-3.4.10/bin/zkServer.sh stop'
ssh root@slave2   '/root/software/zookeeper-3.4.10/bin/zkServer.sh stop'
ssh root@slave3   '/root/software/zookeeper-3.4.10/bin/zkServer.sh stop'

Configure hbase-env.sh

export JAVA_HOME=$JAVA_HOME
export HBASE_CLASSPATH=$HADOOP_HOME/etc/hadoop
export HBASE_MANAGES_ZK=false 

Where HBASE_MANAGES_ZK=false, it means that instead of using the zookeeper that comes with hbase, it uses the zookeeper that is deployed independently.

Configure hbase-site.xml

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://master:8020/hbase</value>
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
    <property>
        <name>hbase.master</name>
        <value>master:6000</value>
    </property>
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>slave1:2181,slave2:2181,slave3:2181</value>
    </property>
    <property>
        <name>zookeeper.znode.parent</name>
        <value>/hbase</value>
    </property>
    <property>
        <name>hbase.zookeeper.property</name>
        <value>/data/zookeeper/data</value>
    </property>
</configuration>

Explain:

  • base.rootdir is used to specify the storage location of HBase data.
  • hbase.cluster.distributed sets the cluster in a distributed mode.
  • hbase.master specifies the host name and port of the hmaster of hbase;
  • hbase.zookeeper.quorum specifies that the host address using zookeeper must be odd.
  • hbase.zookeeper.property specifies the zookeeper data storage directory. The default path is / tmp. If not configured, the data will be emptied after reboot.

Configure regionservers

Add the slave node of HBase in the region servers file, similar to the slaves in hadoop, one line at a time.

slave1
slave2
slave3

start and stopping

Start zookeeper first, refer to the above "start, stop, view zookeeper status".

Start (including start hdfs and hbase)

$HBASE_HOME/sbin/start-dfs.sh
$HBASE_HOME/bin/start-hbase.sh

Stop (including stop hbase and hdfs)

$HBASE_HOME/bin/stop-hbase.sh
$HBASE_HOME/sbin/stop-dfs.sh

After successful start-up of HBase:

  • The processes on the master node are: HMaster, QuorumPeerMain
  • Processes on slave nodes are: HRegionServer, QuorumPeerMain

Note: Quorum PeerMain (that is, zookeeper process) instead of Quorum Peer process appears in both master node and slave node of hbase, which means that HBase uses a separate zookeeper at this time.

II. Operation of HBase

The following operations are mainly performed in the shell of hbase, entering the hbase shell

hbase shell

Create table

create 'student','Sname','Ssex','Sage','Sdept','course'
create 'teacher',{NAME=>'username',VERSIONS=>5} // Create a number of versions representing the specified saved versions

See Table Details

describe 'student'

Display all tables

list

insert data

put 'student','95001','Sname','LiYing'
put 'student','95001','Ssex','Male'
put 'student','95001','course:math','80'
put 'student','95001','course:english','90'

put 'student','95002','Sname','ZhangYiDa'
put 'student','95002','Ssex','Femal'
put 'student','95002','course:math','90'
put 'student','95002','course:english','70'

Note: Only one column of a row of data in a table, that is, one cell, can be added to the data at a time. Therefore, it is inefficient to insert data directly with shell command. In practical applications, data are usually manipulated by programming. When you run the commands: put'student','95001','Sname','LiYing', you add a row of data with the student number 95001 and the name LiYing to the student table with the row key 95001.

Query data

There are two commands in HBase for viewing data:
The get command is used to view a row of data in a table.
The scan command is used to view all the data of a table

get 'student','95001'
get 'student','95001','course'
get 'student','95001','course:math'

scan 'student'

Delete data

In HBase, delete and deletal commands are used to delete data. The difference between them is:
Delete is used to delete a data, which is the reverse operation of put.
(2) The deletal operation is used to delete a row of data.

delete 'student','95001','Ssex'
deleteall 'student','95001'

Modifying data

When adding data, HBase automatically adds a timestamp to the added data, so when it needs to modify the data, it only needs to add data directly, and HBase will generate a new version, thus completing the "change" operation. The old version is still retained, and the system will periodically recycle the garbage data, leaving only the latest versions. The number of versions saved can be specified when the table is created. Here is an example of an operation:

hbase(main):034:0> get 'student','95001'
COLUMN                             CELL                                                                                               
 Sname:                            timestamp=1537497681798, value=LiYing                                                              
 Ssex:                             timestamp=1537497682400, value=Male                                                                
 course:english                    timestamp=1537497872225, value=90                                                                  
 course:math                       timestamp=1537497681859, value=80                                                                  
4 row(s) in 0.0310 seconds

hbase(main):035:0> put 'student','95001','course:english','100'
0 row(s) in 0.0130 seconds

hbase(main):036:0> get 'student','95001'
COLUMN                             CELL                                                                                               
 Sname:                            timestamp=1537497681798, value=LiYing                                                              
 Ssex:                             timestamp=1537497682400, value=Male                                                                
 course:english                    timestamp=1537498062541, value=100                                                                 
 course:math                       timestamp=1537497681859, value=80                                                                  
4 row(s) in 0.0130 seconds

Delete table

There are two steps to delete a table. The first step is to make the table unavailable and the second step is to delete the table. A table that drop s directly without disable fails.

disable 'student'
drop 'student'

Query history table

create 'teacher',{NAME=>'username',VERSIONS=>5}

put 'teacher','91001','username','Mary'
put 'teacher','91001','username','Mary1'
put 'teacher','91001','username','Mary2'
put 'teacher','91001','username','Mary3'
put 'teacher','91001','username','Mary4'  
put 'teacher','91001','username','Mary5'

get 'teacher','91001',{COLUMN=>'username',VERSIONS=>5}

hbase(main):064:0> get 'teacher','91001',{COLUMN=>'username',VERSIONS=>5}
COLUMN                             CELL                                                                                               
 username:                         timestamp=1537498459746, value=Mary5                                                               
 username:                         timestamp=1537498455244, value=Mary4                                                               
 username:                         timestamp=1537498455193, value=Mary3                                                               
 username:                         timestamp=1537498455174, value=Mary2                                                               
 username:                         timestamp=1537498455149, value=Mary1                                                               
5 row(s) in 0.0110 seconds

Quit hbase

exit

Reference resources

Posted by simpli on Sat, 21 Sep 2019 07:50:49 -0700