Reader's Letter|If you have too many Region s in your HBase cluster, take a look at this problem, which you may encounter

Keywords: Java Zookeeper HBase Apache

Preface: Reader's Letter is a question and answer column opened by the old HBase store, which aims to solve HBase-related problems frequently encountered in the work for more small partners.The store will try its best to help you solve these problems or give you a help. The store hopes this will be a small platform to help each other.If you have problems, please leave a message directly in the back of your old store. If you have good solutions, please don't be stingy. We warmly welcome you to actively explore solutions in the message area and boldly express your own views. Perhaps the problem you are helping others to solve today is the one you may encounter tomorrow.

Letter: Liu*gang

Ape asked a question

During the process of restarting the HBase cluster, all the RS nodes started successfully, but HMaser could not start all the time. The error log is as follows:

unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Packet len4745468 is out of range!
    at org.apache.zookeeper.ClientCnxnSocket.readLength(ClientCnxnSocket.java:112)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:79)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:366)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
2020-04-02 22:31:08,673 ERROR [hadoop01:16000.activeMasterManager] zookeeper.RecoverableZooKeeper: ZooKeeper getChildren failed after 4 attempts
2020-04-02 22:31:08,674 FATAL [hadoop01:16000.activeMasterManager] master.HMaster: Failed to become active master
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/region-in-transition
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:99)
    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
    at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getChildren(RecoverableZooKeeper.java:295)
    at org.apache.hadoop.hbase.zookeeper.ZKUtil.listChildrenNoWatch(ZKUtil.java:513)
    at org.apache.hadoop.hbase.master.AssignmentManager.processDeadServersAndRegionsInTransition(AssignmentManager.java:519)
    at org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:494)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:748)
    at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:184)
    at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1729)
    at java.lang.Thread.run(Thread.java:748)

Ape analysis

  • HBase version: Apache 1.2.1
  • Cluster size: 120000+ region

Looking at the error log, it looks like you've only seen ZK with the key word [ZooKeeper.getChildren|Packet|out of range|ConnectionLoss for /hbase/region-in-transition].
We know that HBase Master restarts with a lot of initialization work, some interaction work with ZK data nodes, such as metadata or node state registration, modification, acquisition, and so on.Looking at these keywords, it might seem like you understand what's going on: ZooKeeper went beyond the range of Packet while getChildren (region-in-transition), causing the connection to be lost, Failed to become active master.

So what is Packet?When the ape asks the bride, the bride replies:

In ZooKeeper, Packet is the smallest unit of communication protocol, the packet.Pakcet is used for network transmission between client and server. Any object that needs to be transferred needs to be wrapped as a Packet object.

That is, there is a limit on the length of the zk node packets to read. At this time, we must go online to find out if there are any relevant parameters for zk to adjust.The result: jute.maxbuffer, feeling lucky.Explain this parameter in the words of the official website:

(Java system property: jute.maxbuffer)

This option can only be set as a Java system property. There is no zookeeper prefix on it. It specifies the maximum size of the data that can be stored in a znode. The default is 0xfffff, or just under 1M. If this option is changed, the system property must be set on all servers and clients otherwise problems will arise. This is really a sanity check. ZooKeeper is designed to store data on the order of kilobytes in size.

Translate:

(Java system properties: jute.maxbuffer)

This option can only be set as a Java system property.There is no Zookeeper prefix above.It specifies the maximum size of the data that can be stored in a znode.The default value is 0xfffff or less than 1M.If you change this option, you must set system properties on all servers and clients, or you will have problems.This is really a sanity check.ZooKeeper is designed to store data in kilobytes.

There is another saying:

It is important to note that this parameter does not take effect when set at both the Server and Client ends.The reality is that Zookeeper controls the size of the data read from the Server side (outgoingBuffer) when set on the client side, and incomingBuffer when set on the server side.

The code is as follows:

protected final ByteBuffer lenBuffer = ByteBuffer.allocateDirect(4);
protected ByteBuffer incomingBuffer = lenBuffer;

protected void readLength() throws IOException {
    int len = incomingBuffer.getInt();
    if (len < 0 || len >= ClientCnxn.packetLen) {
        throw new IOException("Packet len" + len + " is out of range!");
    }
    incomingBuffer = ByteBuffer.allocate(len);
}

public static final int packetLen = Integer.getInteger("jute.maxbuffer", 4096 * 1024);

Why would you read such a large package?Based on the keyword/hbase/region-in-transition mentioned above and the size of the region (120000+), we assume that because there are too many Regions, the/hbase/region-in-transition node is too large, and HMaster fails to read the node's data beyond the limit.We also found relevant issue s in the HBase Jira library:
Cluster with too many regions cannot withstand some master failover scenarios
https://issues.apache.org/jira/browse/HBASE-4246

Most of the time we're not the first to have wet shoes. Perhaps the problem you're solving today for someone else is the answer you'll likely encounter tomorrow.This is also the original idea of the store to open the Q&A column "Reader's Letter" - for better dissemination and sharing of knowledge!

Ape Answer

Of course, not only / region-in-transition nodes have this problem, but also / unssigned and so on.The solutions are summarized as follows:
Scenario 1: Clean up garbage data that existed in the history of zk nodes

The purpose of this scheme is to reduce the data size of zk nodes to below the red line.

Option 2: Increase the parameter jute.maxbuffer

# Set Client End
$ vim $ZOOKEEPER_HOME/bin/zkCli.sh
  # Add-Djute.maxbuffer=<buffer_size>parameter
  "$JAVA" "-Dzookeeper.log.dir=${ZOO_LOG_DIR}" "-Dzookeeper.root.logger=${ZOO_LOG4J_PROP}"  "-Djute.maxbuffer=1073741824"  \
       -cp "$CLASSPATH" $CLIENT_JVMFLAGS $JVMFLAGS \
       org.apache.zookeeper.ZooKeeperMain "$@"

# Set Server Side
$ vim $ZOOKEEPER_HOME/conf/zoo.cfg
  # Increase jute.maxbuffer=<buffer_size>parameter
  jute.maxbuffer=1073741824

Scaling this parameter can be risky, as mentioned above, because zk is designed to store data of kilobytes in size.

Option 3: Use hierarchy (from community commentary area)

The scheme slices the //hbase/region-in-transition directory by prefix of the region ID.For example, the region 1234567890abcdef will be located in/hbase/region-in-transition/1234/1234567890abcdef.Therefore, we must traverse to get a complete list.

Reference

Reprint please indicate the source!Welcome to my WeChat Public Number [HBase Work Notes]

Posted by chatmaster on Mon, 06 Apr 2020 20:35:23 -0700