Cloudera Manager HBase Thrift interface Go/Python client

Keywords: HBase Python github Hadoop

background

A recent requirement is to write a data query interface that stores the data in the Hadoop cluster HBase built by CDH. It has always been a firm Pythoner (actually lazy), but this year, after gradually contacting and experimenting with Go, I find it very appetizing. In addition to the company's disgusting operation and maintenance control mechanism, the static compilation of Gos project into a single file can minimize the dependence on operation and maintenance, so I am more and more interested in Gos.

After realizing a simple query interface with net/http and gorilla/mux package, I thought that the DAL of HBase could be packaged and tested easily, but if coding was so smooth, it would not show any skill (bi) technique. So unexpectedly and reasonably, getting HBase from Go is not that simple.

I thought that with such a mature database as HBase, Go would have very convenient and practical official or third-party libraries to access, but after searching, I found that there were only two choices: Thrift provided by HBase, and this The third-party library GoHbase, which is still marked by the developer as the Beta version. In the initial debugging of Thrift has been fruitless, the author tried GoHbase, which is simple to use and can successfully obtain HBase data. Considering that this is an online project, in line with the attitude of "zhe" teng (dao) responsibility (di), after a series of debugging, finally achieved the goal with Thrift, the following flow chart records the specific process.

Software environment used:

  • go version go1.7.4 linux/amd64 & windows/amd64*
  • Thrift version 0.10.0*
  • HBase 1.2.0-cdh5.7.2

step

  1. Determine the HBase installation directory and start the command
  2. Generation of HBase SDK with Thrift
  3. Implementing client code

Specific process

Query HBase directory and run commands

HBase provides two sets of thrift interfaces. First, we need to determine which set of interfaces the hbase thrift server started with, such as mine:

The first set of interfaces, if the parameter is thrift2, is the second set of interfaces.

Go to the directory of HBase and find the Thrift file:

[www@dev-hdp007 thrift]$ ls -l /opt/cloudera/parcels/CDH/lib/hbase/include/thrift
total 44
-rw-r–r– 1 root root 24870 Jul 23 2016 hbase1.thrift
-rw-r–r– 1 root root 15126 Jul 23 2016 hbase2.thrift

Generating code with Thrift

Find the corresponding thrift file in the previous step, copy the file to the personal directory, run:

thrift -out . -r hbase–gen go ${THRIFT}

The generated code directory is as follows:

The hbase-remote directory is the generated client test code, but if it runs directly, it will get a bunch of errors:

..\hbase1.go:1662: cannot use temp (type Text) as type string in assignment
..\hbase1.go:11229: cannot use temp (type Text) as type string in assignment
..\hbase1.go:12252: cannot use temp (type Text) as type string in assignment
..\hbase1.go:12669: cannot use temp (type Text) as type string in assignment
..\hbase1.go:13121: cannot use temp (type Text) as type string in assignment
..\hbase1.go:13531: cannot use temp (type Text) as type string in assignment
..\hbase1.go:13925: cannot use temp (type Text) as type string in assignment
..\hbase1.go:14330: cannot use temp (type Text) as type string in assignment
..\hbase1.go:14759: cannot use temp (type Text) as type string in assignment
..\hbase1.go:15173: cannot use temp (type Text) as type string in assignment
..\hbase1.go:15173: too many errors
Error: Process exit code 2.

This may be due to the incompatibility of the thrift version. The following definitions are found in the code:

type Text []byte

Locate the error location:

var _key1 string
if v, err := iprot.ReadString(); err != nil {
return thrift.PrependError("error reading field 0: ", err)
} else {
temp := Text(v)
_key1 = temp
}

It is found that temp _key1 assigned to string type does not do type conversion, and all error locations are manually modified as follows:

temp := Text(v)
_key1 = string(temp)

Modify the host and port in the code to the actual address and run again:

[www@dev-hdp007 hbase-remote]$ go run hbase-remote.go

Usage of /tmp/go-build890271332/command-line-arguments/_obj/exe/hbase-remote
[-h host:port] [-u url] [-f[ramed]] function [arg1 [arg2…]]:
-P string Specify the protocol (binary, compact, simplejson, json) (default "binary")
-framed Use framed transport
-h string Specify host and port (default "10.59.74.135")
-http Use http
-p int Specify port (default 9090)
-u string Specify the url
…….

Error resolution.
Now you can copy the generated hbase directory into $GOPATH/src.

Implementing Client

The simple example code is as follows:

package main

import (
    "fmt"
    "net"
    "os"
    "hbase1"
    "github.com/apache/thrift/lib/go/thrift"
)

func main() {
    host := "10.59.74.135"
    port := "9090"

    trans, err := thrift.NewTSocket(net.JoinHostPort(host, port))
    if err != nil {
        fmt.Println("Build socked failed: ", err)
        os.Exit(1)
    }

    defer trans.Close()
    var protocolFactory thrift.TProtocolFactory
    //protocolFactory = thrift.NewTSimpleJSONProtocolFactory()
    protocolFactory = thrift.NewTBinaryProtocolFactoryDefault()

    client := hbase1.NewHbaseClientFactory(trans, protocolFactory)
    if err := trans.Open(); err != nil {
        fmt.Println("Opening socket failed: ", err)
        os.Exit(1)
    }

    tableName := "agentBasicInfo" // tablename
    rowKey := "1970010121012971" // rowkey
    family := "basicinfo:entry_date" // column

    tables, err := client.GetTableNames()
    if err != nil {
        fmt.Println("Get tables failed: ", err)
        os.Exit(1)
    }
    for _, table := range tables {
        fmt.Println("table: ", string(table))
    }

    fmt.Println("-------------------")
    fmt.Printf("trying to get table: {%s}, rowkey: {%s}\n", tableName, rowKey)

    //attr := map[string]hbase1.Text {"basicinfo":[]byte("entry_date")}
    data, err := client.Get([]byte(tableName), []byte(rowKey), []byte(family), nil)
    if err != nil {
        fmt.Println("Get data failed: ", err)
    }
    for _, ele := range data {
        fmt.Println("value: ", ele.Timestamp, " ", string(ele.Value))
    }
}

The results are as follows:

[www@dev-hdp007 test_hbase]$ go run test_thrift.go
table: KYLIN_010EV7WZQ6
table: KYLIN_228LAP2P5A
table: KYLIN_3AYUR4WPJW
table: KYLIN_4DX8LTMC7A
table: KYLIN_4XR1LT20V4
table: KYLIN_959ZEKZBEM
table: KYLIN_9OHU8KSWI3
table: KYLIN_A6DW68YNOX
table: KYLIN_A6JKAAU8KS
table: KYLIN_BB5KKOWPCN
table: KYLIN_BUNDHMMD78
table: KYLIN_BZTUAMVLK6
table: KYLIN_CMQF0PAX8T
table: KYLIN_DK8AAXFNR7
table: KYLIN_DPFEWKDP5N
……

Python client

In fact, the version of Hbase hairstyle already has a lot of client-side sample code.

[www@dev-hdp007 repos]$ ls hbase-1.2.0-cdh5.7.2/hbase-examples/src/main

cpp java perl php protobuf python ruby sh

python client sample file:

www@dev-hdp007 python]$ tree .

├── thrift1
│   ├── DemoClient.py
│   └── gen-py
│       └── hbase
│           ├── constants.py
│           ├── Hbase.py
│           ├── Hbase.pyc
│           ├── Hbase-remote
│           ├── __init__.py
│           ├── __init__.pyc
│           ├── ttypes.py
│           └── ttypes.pyc
└── thrift2
    ├── DemoClient.py
    └── gen-py
        └── hbase
            ├── constants.py
            ├── __init__.py
            ├── __init__.pyc
            ├── THBaseService.py
            ├── THBaseService.pyc
            ├── THBaseService-remote
            ├── ttypes.py
            └── ttypes.pyc

The two versions of Thrift interface can be implemented by referring to DemoClient.py.

Catalog

Posted by supergrame on Sun, 23 Dec 2018 03:24:06 -0800