Elasticsearch asynchronous search

1. Elasticsearch asynchronous search definition

The asynchronous Search API asynchronously executes search requests, monitors their progress, and retrieves some of the available results.

The following official introduction animation can more vividly introduce asynchronous retrieval.

Traditional retrieval VS asynchronous retrieval, when the amount of data is large:

  • Traditional retrieval may lead to timeout, so that no data is returned; Or it takes a long time to wait, and the user experience is poor.
  • Asynchronous retrieval can quickly respond to data without waiting.

2. Elasticsearch asynchronously searches for published versions

Elasitcsearch V7.7.0.

3. Elasticsearch asynchronous search applicable scenarios

Asynchronous search allows users to retrieve asynchronous search results when they are available, eliminating the situation that the final response is only after the query is completely completed.

4. Elasticsearch asynchronous search practice

4.1 performing asynchronous retrieval

The premise of the following operations is that the amount of index data to be retrieved asynchronously is very large (in fact, it is OK to be small, but a large amount of data is more suitable).

Otherwise, the normal index will directly return the result data.

Try a common index:

POST kibana_sample_data_flights/_async_search?size=0
{
  "sort": [
    {
      "timestamp": {
        "order": "asc"
      }
    }
  ],
  "aggs": {
    "sale_date": {
      "date_histogram": {
        "field": "timestamp",
        "calendar_interval": "1d"
      }
    }
  }
}

Return result:

{
  "is_partial" : false,
  "is_running" : false,
  "start_time_in_millis" : 1628663114252,
  "expiration_time_in_millis" : 1629095114252,
  "response" : {
    "took" : 23,
    "timed_out" : false,
    "_shards" : {
      "total" : 1,
      "successful" : 1,
      "skipped" : 0,
      "failed" : 0
    },
    "hits" : {
      "total" : {
        "value" : 10000,
        "relation" : "gte"
      },
      "max_score" : null,
      "hits" : [ ]
    },

In order to achieve the purpose of asynchronous retrieval, you can use the recommended test write data tool:

https://github.com/oliver006/elasticsearch-test-data

Are you also confused that there is no test data or a certain amount of test data? Here comes the gadget.

Generate 100W + data, an instruction:

python es_test_data.py --es_url=http://172.21.0.14:19205 --count=1000000

The results are as follows:

Done - total docs uploaded: 1000000, took 71 seconds

You can optimize it in combination with your own business scenarios. python code is very suitable for packaging into your own gadgets.

With data, asynchronous retrieval can be started directly!

POST test_data/_async_search?size=0
{
  "sort": [
    {
      "last_updated": {
        "order": "asc"
      }
    }
  ],
  "aggs": {
    "sale_date": {
      "date_histogram": {
        "field": "last_updated",
        "calendar_interval": "1d"
      }
    }
  }
}

The returned results are as follows:

{
  "id" : "FjUxQURkZFZyUVVlUUNydjVSZXhmWGcedFJCVnRVSVhSdVM0emN2YXZfTU9ZQToyNzE3MTcy",
  "is_partial" : true,
  "is_running" : true,
  "start_time_in_millis" : 1628662256012,
  "expiration_time_in_millis" : 1629094256012,
  "response" : {
    "took" : 1008,
    "timed_out" : false,
    "terminated_early" : false,
    "num_reduce_phases" : 0,
    "_shards" : {
      "total" : 1,
      "successful" : 0,
      "skipped" : 0,
      "failed" : 0
    },
    "hits" : {
      "total" : {
        "value" : 0,
        "relation" : "gte"
      },
      "max_score" : null,
      "hits" : [ ]
    }
  }
}

If you can't see the above results, the amount of data is not large enough. You can import some more.

The core return parameters are explained as follows:

  • id - the identifier of an asynchronous search that can be used to monitor its progress, retrieve its results, and / or delete it.
  • is_partial -- indicates whether the search on all partitions failed or completed successfully when the query is no longer running. When executing a query, is_partial is always set to true.
  • is_running -- whether the search is still executing or completed.
  • total -- overall, how many slices will the search be performed on.
  • successful - how many tiles have successfully completed the search.

4.2 viewing asynchronous retrieval

GET /_async_search/FjFoeU8xMHJKUW9pd1dzN1g2Rm9wOGcedFJCVnRVSVhSdVM0emN2YXZfTU9ZQToyNjYyNjk5

4.3 viewing asynchronous retrieval status

GET /_async_search/status/FjUxQURkZFZyUVVlUUNydjVSZXhmWGcedFJCVnRVSVhSdVM0emN2YXZfTU9ZQToyNzE3MTcy/

4.4 delete / abort asynchronous retrieval

DELETE /_async_search/FjFoeU8xMHJKUW9pd1dzN1g2Rm9wOGcedFJCVnRVSVhSdVM0emN2YXZfTU9ZQToyNjYyNjk5

5. Official document address

https://www.elastic.co/guide/en/elasticsearch/reference/current/async-search.html

6. Summary

Asynchronous search certification examination link, you can understand, you can find the location of official documents and know which API s are available.

Actual business links can be selected and used in combination with business needs. However, it is more inclined to the scenario where the traditional synchronization request experience is poor for businesses with a large amount of data.

Posted by Stagnate on Thu, 11 Nov 2021 22:49:02 -0800