Using etcd to realize service registration and discovery

Keywords: Go

The basic functions of service registration and discovery in the system are as follows

  • Service registration: all nodes of the same service are registered in the same directory. After the node is started, its information is registered in the directory of the service.
  • Health check: the service node sends the heartbeat regularly, sets a shorter TTL for the information registered in the service directory, and updates the TTL of the information every other period of time for the normal service node.
  • Service discovery: through the name, you can query the IP and port number of external access provided by the service. For example, gateway proxy service can discover new nodes in the service and discard unavailable service nodes in time, and each service can sense the existence of the other.

In the distributed system, how to manage the state between nodes is always a difficult problem. etcd is developed and maintained. It is written in Go language, and uses Raft consistency algorithm to process log replication to ensure strong consistency. etcd is designed for service discovery and registration in cluster environment. It provides functions such as data TTL failure, data change monitoring, multi value, directory monitoring, distributed atomic lock operation, etc. it can easily track and manage the status of cluster nodes.


We write two Demo programs, a service as a service and a client as a gateway agent. After the service runs, it will go to etcd to register the service node in the directory named by its service name, and renew (update TTL) regularly. The client queries the request of the node information agent service in the service directory from etcd, and monitors the changes in the service directory in real time during the cooperation process, and maintains it in its own service node information list.

// Register services to etcd
func RegisterServiceToETCD(ServiceTarget string, value string) {
    dir = strings.TrimRight(ServiceTarget, "/") + "/"
 
    client, err := clientv3.New(clientv3.Config{
        Endpoints:   []string{"localhost:2379"},
        DialTimeout: 5 * time.Second,
    })
    if err != nil {
    panic(err)
    }
 
    kv := clientv3.NewKV(client)
    lease := clientv3.NewLease(client)
    var curLeaseId clientv3.LeaseID = 0
 
    for {
        if curLeaseId == 0 {
            leaseResp, err := lease.Grant(context.TODO(), 10)
            if err != nil {
              panic(err)
            }
 
            key := ServiceTarget + fmt.Sprintf("%d", leaseResp.ID)
            if _, err := kv.Put(context.TODO(), key, value, clientv3.WithLease(leaseResp.ID)); err != nil {
                  panic(err)
            }
            curLeaseId = leaseResp.ID
        } else {
      // Renew the lease. If the lease has expired, reset the curleseid to 0 and go back to the logic of creating the lease
            if _, err := lease.KeepAliveOnce(context.TODO(), curLeaseId); err == rpctypes.ErrLeaseNotFound {
                curLeaseId = 0
                continue
            }
        }
        time.Sleep(time.Duration(1) * time.Second)
    }
}
type HelloService struct {}

func (p *HelloService) Hello(request string, reply *string) error {
    *reply = "hello:" + request
    return nil
}

var serviceTarget = "Hello"
var port = ":1234"
var host = "remote_host"// Pseudo code

func main() {
    rpc.RegisterName("HelloService", new(HelloService))

    listener, err := net.Listen("tcp", port)
    if err != nil {
        log.Fatal("ListenTCP error:", err)
    }

    conn, err := listener.Accept()
    if err != nil {
        log.Fatal("Accept error:", err)
    }

    go RegisterServiceToETCD(serviceTarget,  host + port)
    rpc.ServeConn(conn)
}

The gateway obtains the information of all nodes in the service directory through etcd and initializes them into the list of accessible service nodes maintained by itself. Then use the Watch mechanism to listen for the update of the directory corresponding to the service on the etcd, and add and DELETE the list of available nodes of the service according to the PUT and DELETE events sent by the channel.

var serviceTarget = "Hello"
type remoteService struct {
  name string
  nodes map[string]string
  mutex sync.Mutex
}

// Get all the key s in the service directory and initialize them to the list of available nodes in the service
func getService(etcdClient clientv3.Client) *remoteService {
    service = &remoteService {
      name: serviceTarget
    } 
    kv := clientv3.NewKV(etcdClient)
    rangeResp, err := kv.Get(context.TODO(), service.name, clientv3.WithPrefix())
    if err != nil {
       panic(err)
    }
 
    service.mutex.Lock()
    for _, kv := range rangeResp.Kvs {
        service.nodes[string(kv.Key)] = string(kv.Value)
    }
    service.mutex.Unlock()
  
    go watchServiceUpdate(etcdClient, service)
}

// Monitor events in the service directory
func watchServiceUpdate(etcdClient clientv3.Client, service *remoteService) {
    watcher := clientv3.NewWatcher(client)
    // Update under Watch service directory
    watchChan := watcher.Watch(context.TODO(), service.name, clientv3.WithPrefix())
    for watchResp := range watchChan {
          for _, event := range watchResp.Events {
                service.mutex.Lock()
                switch (event.Type) {
                case mvccpb.PUT://PUT event. There is a new key in the directory
                      service.nodes[string(event.Kv.Key)] = string(event.Kv.Value)
            case mvccpb.DELETE://DELETE event. A key in the directory is deleted (if the Lease expires, the key will also be deleted)
                      delete(service.nodes, string(event.Kv.Key))
                }
                service.mutex.Unlock()
          }
    }
}

func main () {
  client, err := clientv3.New(clientv3.Config{
        Endpoints:   []string{"remote_host:2379"},
        DialTimeout: 5 * time.Second,
    })
  service := getService(client)// Get available nodes for service
  ......
  // Every time there is a request, select a connection from the service node and send the request to the node
  rpcClient, _ = rpc.Dial("tcp", service.nodes[i])
  var reply string
  rpcClient.Call("HelloService.hello", &reply)
  ......
  
}

In addition to the existing services in the above mentioned client or gateway discovery system, the existence of other roles needs to be sensed among various services in the system. The discovery method between services is similar to the above example. Each service can find the existence of other services in etcd as a client.

Note: in order to understand the program easily, there are a lot of pseudo codes, mainly to explain the ideas. To run it in practice, it needs a lot of coding work. Welcome to exchange ideas with friends who have experience in this field.

Posted by hofmann777 on Thu, 07 Nov 2019 22:09:54 -0800