Article catalog
- 1, Single machine environment construction
- 1.1 download
- 1.2 decompression
- 1.3 configure environment variables
- 1.4 modify configuration
- 1.5 start up
- 1.6 verification
- 2, Cluster environment construction
- 3, Some operations of zookeeper
- 3.1 change zoo.cfg
- three point two zkServer.sh Common operations
- 3.3 common Shell commands of zookeeper
- 1 start service and connection service
- 2 help command
- 3 view node list
- 4 new node
- 5 view node
- 6. View node status
- 7 update node
- 1.7 deleting nodes
- 8 monitor
- 9 zookeeper four word command
- 4, Zookeeper automation script
1, Single machine environment construction
1.1 download
Download the corresponding version of Zookeeper, here I download version 3.4.14. Official download address:
# https://www.apache.org/dyn/closer.lua/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz #perhaps wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.14/zookeeper-3.4.14.tar.gz
1.2 decompression
# tar -zxvf zookeeper-3.4.14.tar.gz -C /home/hadoop/
1.3 configure environment variables
# vim /etc/profile
Add environment variable:
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.14 export PATH=$ZOOKEEPER_HOME/bin:$PATH
Make the configured environment variables effective:
# source /etc/profile
1.4 modify configuration
Enter the conf / directory of the installation directory, copy the configuration sample and modify it:
# cp zoo_sample.cfg zoo.cfg
Specify the data storage directory and log file directory (the directory does not need to be created in advance, and the program will automatically create it). After modification, the complete configuration is as follows:
# The number of milliseconds of each tick tickTime=2000 # The number of ticks that the initial # synchronization phase can take initLimit=10 # The number of ticks that can pass between # sending a request and getting an acknowledgement syncLimit=5 # the directory where the snapshot is stored. # do not use /tmp for storage, /tmp here is just # example sakes. dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/log # the port at which the clients will connect clientPort=2181 # the maximum number of client connections. # increase this if you need to handle more clients #maxClientCnxns=60 # # Be sure to read the maintenance section of the # administrator guide before turning on autopurge. # # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance # # The number of snapshots to retain in dataDir #autopurge.snapRetainCount=3 # Purge task interval in hours # Set to "0" to disable auto purge feature #autopurge.purgeInterval=1
Configuration parameter description:
- Ticketime: the basic time unit used for calculation. For example, session timeout: N*tickTime;
- initLimit: used for clustering. It allows the initial connection time from the node to be connected and synchronized to the master node, expressed as a multiple of ticketime;
- syncLimit: used for cluster, sending messages between master node and slave node, length of request and response time (heartbeat mechanism);
- dataDir: data storage location;
- dataLogDir: log directory;
- clientPort: port used for client connection, default 2181
1.5 start up
Since the environment variable has been configured, you can start it directly with the following command:
zkServer.sh start
1.6 verification
Use JPS to verify whether the process has been started, and the presence of QuorumPeerMain indicates that the start is successful.
[hadoop@hadoop-nn-01 conf]$ jps -m 37193 QuorumPeerMain /home/hadoop/zookeeper-3.4.14/bin/../conf/zoo.cfg 37210 Jps -m
2, Cluster environment construction
In order to ensure the high availability of the cluster, the number of nodes in Zookeeper cluster should be odd, and there should be at least three nodes, so this is a demonstration of building a three node cluster. Here, I use three hosts to build. The host names are respectively hadoop-nn-01, hadoop-nn-02 and hadoop-dn-01.
2.1 modify configuration
Extract a zookeeper installation package and modify its configuration file zoo.cfg , as follows. Then use the scp command to distribute the installation package to three servers and configure the environment variables on three machines:
tickTime=2000 initLimit=10 syncLimit=5 dataDir=/data/zookeeper/data dataLogDir=/data/zookeeper/log clientPort=2181 # server.1 this 1 is the identification of the server, which can be any valid number. This is the identification of the server node. This identification should be written to the myid file under the dataDir directory # Name inter cluster communication port and election port server.1=hadoop-nn-01:2888:3888 server.2=hadoop-nn-02:2888:3888 server.3=hadoop-dn-01:2888:3888 (host name, Heartbeat port, data port) # Distribute the installation package to hadoop-nn-02 scp -r /home/hadoop/zookeeper-3.4.14/ hadoop-nn-02:/home/hadoop/ # Distribute the installation package to hadoop-dn-01 scp -r /home/hadoop/zookeeper-3.4.14/ hadoop-dn-01:/home/hadoop/
2.2 identification node
Create a new myid file in the dataDir directory of the three hosts, and write the corresponding node ID. Zookeeper cluster identifies cluster nodes through myid file, and carries out node communication through node communication port and election port configured above, and selects Leader node.
To create a storage directory:
# All three hosts execute the command mkdir -vp /data/zookeeper/data/
Create and write the node ID to the myid file:
# hadoop-nn-01 host echo "1" > /data/zookeeper/data/myid # hadoop-nn-02 host echo "2" > /data/zookeeper/data/myid # hadoop-dn-01 host echo "3" > /data/zookeeper/data/myid
2.3 start cluster
On three hosts, execute the following command to start the service:
/home/hadoop/zookeeper-3.4.14/bin/zkServer.sh start /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg
2.4 cluster verification
Use after startup zkServer.sh Status view the status of each node in the cluster, and hadoop-nn-02 is the leader node, hadoop-nn-01 and hadoop-dn-01 are the follower nodes.
[hadoop@hadoop-nn-01 bin]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/zookeeper-3.4.14/bin/../conf/zoo.cfg Mode: follower [hadoop@hadoop-nn-02 bin]$ ./zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/zookeeper-3.4.14/bin/../conf/zoo.cfg Mode: leader [hadoop@hadoop-dn-01 conf]$ zkServer.sh status ZooKeeper JMX enabled by default Using config: /home/hadoop/zookeeper-3.4.14/bin/../conf/zoo.cfg Mode: follower
3, Some operations of zookeeper
3.1 change zoo.cfg
scp /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg hadoop-nn-02:/home/hadoop/zookeeper-3.4.14/conf/ scp /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg hadoop-dn-01:/home/hadoop/zookeeper-3.4.14/conf/
three point two zkServer.sh Common operations
####1. View zkServer.sh Help information
[hadoop@hadoop-nn-01 conf]$ zkServer.sh help ZooKeeper JMX enabled by default Using config: /home/hadoop/zookeeper-3.4.14/bin/../conf/zoo.cfg Usage: /home/hadoop/zookeeper-3.4.14/bin/zkServer.sh {start|start-foreground|stop|restart|status|upgrade|printcmd}
####2. Start / shut down / restart zk server
/home/hadoop/zookeeper-3.4.14/bin/zkServer.sh stop /home/hadoop/zookeeper-3.4.14/bin/zkServer.sh start /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg /home/hadoop/zookeeper-3.4.14/bin/zkServer.sh restart /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg
####3. View server status
zkServer.sh status
####4. View port
[hadoop@hadoop-nn-01 version-2]$ netstat -nltp | grep 2181 (Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.) tcp 0 0 :::2181 :::* LISTEN 37668/java
3.3 common Shell commands of zookeeper
1 start service and connection service
#If the connection service does not specify a service address, it will connect to by default localhost:2181 zkCli.sh -server hadoop-nn-01:2181
2 help command
Use help to view all commands and formats.
3 view node list
There are two commands to view the node list: ls path and ls2 path. The latter is an enhancement of the former. You can not only view all nodes under the specified path, but also view the information of the current node.
4 new node
create [-s] [-e] path data acl #Where - s is an ordered node and - e is a temporary node
Create node and write data:
create /test 123456 [zk: hadoop-nn-01:2181(CONNECTED) 10] create -s /test/a "aaa" Created /test/a0000000000 [zk: hadoop-nn-01:2181(CONNECTED) 11] create -s /test/b "bbb" Created /test/b0000000001 [zk: hadoop-nn-01:2181(CONNECTED) 12] create -s /test/c "ccc" Created /test/c0000000002
Create a temporary node, which will be deleted after the session expires:
[zk: hadoop-nn-01:2181(CONNECTED) 17] create -e /test/tmp "tmp" Created /test/tmp
5 view node
# format get path [watch] [zk: hadoop-nn-01:2181(CONNECTED) 18] get /test 123456 #Node data cZxid = 0x300000007 ctime = Thu Jun 18 16:09:14 CST 2020 mZxid = 0x300000007 mtime = Thu Jun 18 16:09:14 CST 2020 pZxid = 0x30000000b cversion = 4 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 6 numChildren = 4
Each attribute of the node is shown in the following table. One of the important concepts is Zxid (ZooKeeper transaction ID). Every change of ZooKeeper node has a unique Zxid. If Zxid1 is smaller than Zxid2, the change of Zxid1 occurs before the change of Zxid2.
State properties | explain |
---|---|
cZxid | Transaction ID when data node is created |
ctime | Time when the data node was created |
mZxid | Transaction ID when the data node was last updated |
mtime | Time when the data node was last updated |
pZxid | Transaction ID of the data node's child node when it was last modified |
cversion | Number of changes to child nodes |
dataVersion | Number of changes to node data |
aclVersion | Number of ACL changes for node |
ephemeralOwner | If the node is a temporary node, it indicates the SessionID of the session that created the node; if the node is a persistent node, the property value is 0 |
dataLength | Length of data content |
numChildren | Current number of child nodes of data node |
6. View node status
You can use the stat command to see the status of a node, which returns values similar to the get command, but does not return node data.
[zk: hadoop-nn-01:2181(CONNECTED) 19] stat /test cZxid = 0x300000007 ctime = Thu Jun 18 16:09:14 CST 2020 mZxid = 0x300000007 mtime = Thu Jun 18 16:09:14 CST 2020 pZxid = 0x30000000b cversion = 4 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 6 numChildren = 4
7 update node
The command to update a node is set, which can be modified directly, as follows:
[zk: hadoop-nn-01:2181(CONNECTED) 22] set /test 345 cZxid = 0x300000007 ctime = Thu Jun 18 16:09:14 CST 2020 mZxid = 0x30000000d mtime = Thu Jun 18 16:23:57 CST 2020 pZxid = 0x30000000b cversion = 4 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 3 numChildren = 4
It can also be changed based on the version number. At this time, it is similar to the optimistic locking mechanism. When the data version you passed in does not match the data version number of the current node, zookeeper will reject this modification:
[zk: hadoop-nn-01:2181(CONNECTED) 23] set /test 345 0 version No is not valid : /test #Invalid version number [zk: hadoop-nn-01:2181(CONNECTED) 26] set /test 33345 1 cZxid = 0x300000007 ctime = Thu Jun 18 16:09:14 CST 2020 mZxid = 0x300000010 mtime = Thu Jun 18 16:26:01 CST 2020 pZxid = 0x30000000b cversion = 4 dataVersion = 2 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 5 numChildren = 4
1.7 deleting nodes
The syntax for deleting a node is as follows:
delete path [version]
Just like updating node data, you can also pass in version number. When the data version number you pass in does not match the data version number of the current node, zookeeper will not delete it.
[zk: hadoop-nn-01:2181(CONNECTED) 28] delete /test/a0000000000 1 version No is not valid : /test/a0000000000 #Invalid version number [zk: hadoop-nn-01:2181(CONNECTED) 29] delete /test/a0000000000 0
To delete a node and all its descendants, you can use recursive deletion with the command rmr path.
8 monitor
- get path [watch]
A listener registered with get path [watch] can notify the client when the content of the node changes. It should be noted that the trigger of zookeeper is one-time trigger, that is, once triggered, it will fail immediately.
[zk: hadoop-nn-01:2181(CONNECTED) 31] get /test watch [zk: hadoop-nn-01:2181(CONNECTED) 32] set /test 45678 WATCHER:: WatchedEvent state:SyncConnected type:NodeDataChanged path:/test #Change of node value
- stat path [watch]
A listener registered with stat path [watch] can notify clients when the state of a node changes.
[zk: hadoop-nn-01:2181(CONNECTED) 33] stat /test watch [zk: hadoop-nn-01:2181(CONNECTED) 34] set /test 112233 WATCHER:: WatchedEvent state:SyncConnected type:NodeDataChanged path:/test #Change of node value
- ls\ls2 path [watch]
A listener registered with ls path [watch] or ls2 path [watch] can listen to the addition and deletion of all the child nodes under the node.
[zk: hadoop-nn-01:2181(CONNECTED) 35] ls /test watch [b0000000001, tmp, c0000000002] [zk: hadoop-nn-01:2181(CONNECTED) 36] create /test/yarn "aaa" WATCHER:: Created /test/yarn WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/test
9 zookeeper four word command
command | Function description |
---|---|
conf | Print details of service configuration. |
cons | Lists the full connection / session details for all clients connected to this server. Including the number of packets received / sent, session ID, operation delay, last operation, etc. |
dump | Lists outstanding sessions and temporary nodes. This only applies to Leader nodes. |
envi | Print details of the service environment. |
ruok | Test whether the service is in the correct state. If it is correct, return "imok", otherwise, do nothing. |
stat | Lists brief details of the server and connecting clients. |
wchs | Lists simple information for all watch es. |
wchc | Lists the details of the server watch by session. |
wchp | Lists the details of the server watch by path. |
More four word commands can be found in official documents: https://zookeeper.apache.org/doc/current/zookeeperAdmin.html
[zk: hadoop-nn-01:2181(CONNECTED) 0] conf ZooKeeper -server host:port cmd args stat path [watch] set path data [version] ls path [watch] delquota [-n|-b] path ls2 path [watch] setAcl path acl setquota -n|-b val path history redo cmdno printwatches on|off delete path [version] sync path listquota path rmr path get path [watch] create [-s] [-e] path data acl addauth scheme auth quit getAcl path close connect host:port
4, Zookeeper automation script
################################################################################ # Function: monitor the process of zookeeper service # Author : gzg # Date : 2020-6-18 ################################################################################ #!/bin/bash current_dir=$(cd $(dirname $0);pwd) cd ${current_dir} source ~/.bashrc now_time=$(date +"%Y-%m-%d %H:%M:%S") ERR_FILE=/home/hadoop/log/zookeeper.log function checkProcess() { process=$(ps -ef | grep $1 | grep -Ev "grep|check_zookeeper_process.sh") if [[ "$process" != "" ]]; then echo "${now_time} $1 process exist." echo "$process" return 0 fi echo "${now_time} java zk service restart" >> ${ERR_FILE} /home/hadoop/zookeeper-3.4.14/bin/zkServer.sh start /home/hadoop/zookeeper-3.4.14/conf/zoo.cfg } function main() { checkProcess "org.apache.zookeeper.server.quorum.QuorumPeerMain" } main
Hang in crontab and start up automatically every time you use it
* * * * * /bin/bash /home/hadoop/scripts/check_zookeeper_process.sh > /dev/null 2>&1
reference resources:
Construction of Zookeeper single machine and cluster environment
Zookeeper common Shell commands
Blindly building Zookeeper distributed cluster
zookeeper zkServer.sh Orders zkCli.sh Command, four word command