Elasticsearch cross cluster search

Keywords: Operation & Maintenance kafka Distribution ELK

1. Introduction

Elasticsearch introduces the Cross Cluster Search (CCS Cross Cluster Search) function in version 5.3 to replace the cube node to be discarded. Similar to triple node, Cross Cluster Search is used to realize cross cluster data search. Cross Cluster Search enables you to run a single search request against one or more remote clusters. For example, you can use Cross Cluster Search to filter and analyze log data stored in clusters in different data centers.

Cross cluster query allows you to query the data of multiple clusters. This feature helps us better design our architecture.

In distributed systems or microservices, the system is divided into multiple modules. Module A is in the charge of one team, module B is in the charge of one team, and module C is in the charge of one team. Usually, each module only queries its own log, but when some problems involve multiple modules, it will find the log of multiple modules. At this time, cross cluster query can just solve such problems. For the architecture, we will use module A The logs of module B and module C are stored in separate clusters.

In this way, mutual interference between clusters can be avoided. The large amount of logs in module a will only affect the writing of cluster A, so as to divide and rule.

2. Configure Cross Cluster Search

Suppose we have two ES clusters:

NodeAddressPortTransport PortCluster
elasticsearch01127.0.0.192009300America
elasticsearch02127.0.0.192019301America
elasticsearch03127.0.0.192029302Europe
elasticsearch04127.0.0.192039303Europe

There are two ways to configure CCS:

1) Configure elasticsearch.yml

search:
    remote:
        america:
            seeds: 127.0.0.1:9300
            seeds: 127.0.0.1:9301
        europe:
            seeds: 127.0.0.1:9302
            seeds: 127.0.0.1:9303

Note: in the above way, the remote cluster needs to be running during configuration. For example, when configuring the "america" cluster, the "europe" cluster needs to be running, otherwise the node cannot be started successfully.

2) Configuring using the Cluster Settings API

curl -XPUT -H'Content-Type: application/json' localhost:9200/_cluster/settings -d '
{
    "persistent": {
        "search.remote": {
            "america": {
                "skip_unavailable": "true",
                "seeds": ["127.0.0.1:9300","127.0.0.1:9301"]
            },
            "europe": {
                "skip_unavailable": "true",
                "seeds": ["127.0.0.1:9302","127.0.0.1:9303"]
            }
        }
    }
}'

It is recommended to use API to modify seeds and other configurations of remote cluster conveniently.

3. Validate Cross Cluster Search

1) Use_ remote/info to view CCS connection status:

[root@localhost elasticsearch01]# curl -XGET -H 'Content-Type: application/json' localhost:9201/_remote/info?pretty
{
  "america" : {
    "seeds" : [
      "127.0.0.1:9300",
      "127.0.0.1:9301"
    ],
    "http_addresses" : [
      "127.0.0.1:9200",
      "127.0.0.1:9201"
    ],
    "connected" : true,
    "num_nodes_connected" : 2,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s"
  },
  "europe" : {
    "seeds" : [
      "127.0.0.1:9302",
      "127.0.0.1:9303"
    ],
    "http_addresses" : [
      "127.0.0.1:9202",
      "127.0.0.1:9203"
    ],
    "connected" : true,
    "num_nodes_connected" : 2,
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s"
  }
}

2) Use cross cluster search:

Query the data of two clusters at the same time:

curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/cluster_name:index,cluster_name:index/_search

Query the data of all clusters at the same time:

 curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/test/_search'

Example: using cluster queries

Example: using remote cluster queries
Cluster: america
Index: test

curl -H 'Content-Type:application/json' -XGET 'http://localhost:9200/america:test/_search'

java API example:

//Query all clusters for data starting with appIndex -
SearchRequest searchRequest = Requests.searchRequest(":appIndex-");
SearchResponse response = es.getClient().search(searchRequest).get();
4,Disable Cross Cluster Search

Use API settings:

curl -XPUT -H'Content-Type: application/json' localhost:9201/_cluster/settings -d '
{
    "persistent": {
        "search.remote": {
            "america": {
                "skip_unavailable": null,
                "seeds": null
            },
            "europe": {
                "skip_unavailable": null,
                "seeds": null
            }
        }
    }
}'

5. Configuration of CCS

search.remote.${cluster_alias}.skip_unavailable: clusters that cannot be reached by skip during query. The default is false, and it is recommended to set it to true

search.remote.connect: the default is true, that is, any node is connected to the remote cluster as a cross cluster client. Cross cluster search requests must be sent to the cross cluster client.

Search.remote.node.attr: set the properties of remote node, such as search.remote.node.attr:gateway. Only nodes with node.attr.gateway: true will be used by the node connection for CCS query.

Posted by jhuedder on Fri, 26 Nov 2021 14:55:14 -0800