5. search
_ES provides 2 search methods:
- REST-style request URI, which takes the parameters directly;
- Encapsulated in request body, this way can define a more readable JSON format;
_The case of URI search through REST style is as follows:
curl -X GET "localhost:9200/bank/_search?q=*&sort=account_number:asc&pretty"
Search uses _search as the endpoint, and then some parameters are taken after it:
- q=*: Represents all documents in the matching index;
- sort=account_number:asc: indicates that the search results are arranged in ascending order according to account_number attribute in each document;
- pretty: It means having ES return to us beautiful results in JSON format (it's just not easy to read without it).
Look again at the results of the search as follows:
{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1000, "max_score" : null, "hits" : [ { "_index" : "bank", "_type" : "_doc", "_id" : "0", "_score" : null, "_source" : { "account_number" : 0, "balance" : 16623, "firstname" : "Bradshaw", "lastname" : "Mckenzie", "age" : 29, "gender" : "F", "address" : "244 Columbus Place", "employer" : "Euron", "email" : "bradshawmckenzie@euron.com", "city" : "Hobucken", "state" : "CO" }, "sort" : [ 0 ] }, // Records from id 1 to 8 are omitted { "_index" : "bank", "_type" : "_doc", "_id" : "9", "_score" : null, "_source" : { "account_number" : 9, "balance" : 24776, "firstname" : "Opal", "lastname" : "Meadows", "age" : 39, "gender" : "M", "address" : "963 Neptune Avenue", "employer" : "Cedward", "email" : "opalmeadows@cedward.com", "city" : "Olney", "state" : "OH" }, "sort" : [ 9 ] } ] } }
There are some parameters in the returned results:
- Take: The time spent (in milliseconds) to execute the search on behalf of ES, "take": 1 represents the time spent on the above search in milliseconds;
- timed_out: indicates whether the search timed out;
- _ shards: where total represents the total number of fragments searched and successful/failed indicates the number of fragments searched successfully / failed;
-
hits: search results;
- hits.total: denotes the total number of documnet s that meet the search criteria;
- hits.hits: The return value array of the actual search results (default is only the result of the first 10 document s);
- hits.sort: Sort the results (if sorted by scores, they are lost);
- hits._score and max_score: temporarily ignored;
In addition to searching directly by URI, you can also encapsulate the search conditions as a JSON into the request body as follows:
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] } // The Way of curl curl -X GET "localhost:9200/bank/_search" -H 'Content-Type: application/json' -d' { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ] } '
There are some parameters in search JSON in the above command: query defines the definition of query, match_all is the type of query we want to run. It defines the search behavior to search documents in a specified index, to match all documents, and to specify special matching criteria by matching:
// Match only documents with account_number=20 GET /bank/_search { "query": { "match": { "account_number": 20 } } } // Match only documents with the word mill in the address attribute (not uppercase) GET /bank/_search { "query": { "match": { "address": "mill" } } } // Match documents with the words mill or lane in the address attribute (case-insensitive) GET /bank/_search { "query": { "match": { "address": "mill lane" } } }
match_phrase is a variant of match and is used as follows:
// Match Wendan with the phrase "mill lane" in address, instead of just matching the words as above GET /bank/_search { "query": { "match_phrase": { "address": "mill lane" } } }
In query, besides the match es mentioned above, bool queries can also be used, and Boolean logic can be used to compose large queries with small queries, such as:
// Documents that match mill and lane words in address at the same time GET /bank/_search { "query": { "bool": { "must": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }
All matching principles specified in music in bool query are matched (that is, all results are true) in order to match other bool search principles or matching principles. Otherwise, the corresponding documents will be eliminated directly. On the contrary, must_not, which prohibits matching, if matching, the documents will be eliminated directly:
// Documents that match address contain neither mill nor lane GET /bank/_search { "query": { "bool": { "must_not": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }
In bool, besides the music condition, there is also the should condition, as long as one of the conditions is matched:
// Match documents with mill or lane in address GET /bank/_search { "query": { "bool": { "should": [ { "match": { "address": "mill" } }, { "match": { "address": "lane" } } ] } } }
A bool search can be used to compose a matching of multilevel logical relationships. The following is a bool search of multilevel logical combinations:
// Match age=40 and state does not contain "ID" and address contains "mill" or "lane" records GET /bank/_search { "query": { "bool": { "must": [ {"match": {"age": 40}} ], "must_not": [ {"match": {"state": "ID"}} ], "should": [ {"match": {"address": "mill"}}, {"match": {"address": "lane"}} ] } } }
[Question] The above should condition does not take effect and the reasons are unknown.
Except for configuring parameters in query, of course. We can also pass in other parameters to control search results, such as sort above. Another example is "size": 1, which controls the return of an array containing only the first result (without specifying "size": 10 by default):
GET /bank/_search { "query": { "match_all": {} }, "sort": [ { "account_number": "asc" } ], "size": 1 }
You can also have the from parameter (default is 0), specifying the number of returned results to be displayed from the actual results, which is naturally associated with the paging function (the following is an array of 10 results from the 10th in turn to the back):
GET /bank/_search { "query": { "match_all": {} }, "from": 10, "size": 10 }
The following is a descending order of the results according to the balance attribute, returning the first 10 results:
GET /bank/_search { "query": { "match_all": {} }, "sort": { "balance": { "order": "desc" } } }
The returned result array contains all the document attributes of account, but only a few of them may be needed, such as the name lastname and balance of his account balance:
GET /bank/_search { "query": { "match_all": {} }, "_source": ["lastname", "balance"] }
That's the result of the return _source, which contains only lastname and balance attributes:
{ "_index": "bank", "_type": "_doc", "_id": "25", "_score": 1, "_source": { "balance": 40540, "lastname": "Ayala" } }
_It is important to understand that once the search results are obtained, Elastic search completes the request completely and does not maintain any type of server-side resources or open cursors in the results. This is in sharp contrast to many other platforms such as SQL. I may initially obtain a subset of the query results in advance, and then if I want to get (or turn over) the rest, I must go back continuously. The result of using a stateful server-side cursor back to the server.