Common operations of Elastic Search: query and aggregation

Keywords: Big Data less MySQL Java ElasticSearch

[TOC]

0 description

Based on es 5.4 and es 5.6, this paper lists the queries frequently used in personal work (only Java API is used in work). If you need to see the complete, you can refer to the official documents.
https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search.html.

1 query

First use a quick introduction to introduce, and then list the various queries are used more (in my working environment, this is the case), the rest of the less useful here is not listed.

1.1 Quick Start

1.1.1 Query All

GET index/type/_search
{
    "query":{
        "match_all":{}
    }
}

or

GET index/type/_search

1.1.2 Paging (term as an example)

GET index/type/_search
{
    "from":0,
    "size":100,
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    }
}

1.1.3 contains the specified fields (take term for example)

GET index/type/_search
{
    "_source":["hobby", "name"],
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    }
}

1.1.4 Sorting (term as an example)

Single field sorting:

GET index/type/_search
{
    "query":{
        "term":{
            "area":"GuangZhou"
        }
    },
    "sort":[
        {"user_id":{"order":"asc"}},
        {"salary":{"order":"desc"}}
    ]
}

1.2 Full Text Query

Query fields are indexed and analyzed, and the word splitter (or search word splitter) for each field is applied to the query string before execution.

1.2.1 match query

{
  "query": {
    "match": {
      "content": {
        "query": "Lippi Hengda".
        "operator": "and"
      }
    }
  }
}

operator The default is or,That is to say, "Lipi Hengda" is divided into "Lipi" and "Hengda", as long as content If one of the two occurs, it will be searched; set it to ___________ and Later, it will be searched only if it appears at the same time.

1.2.2 match_phrase query

Documents are searched only if they satisfy the following two conditions:

  • (1) After participle, all terms appear in the field.
  • (2) The word order in the field should be consistent.
{
  "query": {
    "match_phrase": {
      "content": "Lipi Hengda"
    }
  }
}

1.3 Term Query

Item search matches the entries stored in the inverted index accurately. Item-level queries are used for structured data, such as numbers, dates and enumeration types.

1.3.1 term query

{
  "query": {
    "term": {
      "postdate": "2015-12-10 00:41:00"
    }
  }
}

1.3.2 terms query

Upgraded versions of term, such as the postdate field of the query above, can be set multiple.

{
  "query": {
    "terms": {
      "postdate": [
        "2015-12-10 00:41:00",
        "2016-02-01 01:39:00"
      ]
    }
  }
}

Because term is an exact match, don't ask, [] How do you set and? How can this be possible? Since it's an exact match, a field can't have two different values.

1.3.3 range query

Match documents with data type, date type, or string field in a range. Note that only one field can be queried, not multiple fields.

Value:

{
  "query": {
    "range": {
      "reply": {
        "gte": 245,
        "lte": 250
      }
    }
  }
}

The supported operators are as follows:

gt: greater than, gte: greater than or equal to, lt: less than, lte: less than or equal to

Date:

{
  "query": {
    "range": {
      "postdate": {
        "gte": "2016-09-01 00:00:00",
        "lte": "2016-09-30 23:59:59",
        "format": "yyyy-MM-dd HH:mm:ss"
      }
    }
  }
}

format It's OK if you don't add it, if you write it in the correct time format.

1.3.4 exists query

Returns a document with at least one non-null value in the corresponding field, that is, the field has a value (the concept will be explained later).

{
  "query": {
    "exists": {
      "field": "user"
    }
  }
}

Refer to the instructions in "From Lucene to Elastic Search: Full Text Retrieval in Practice".

The following documents match the query above:

File Explain
{"user":"jane"} user field, not empty
{"user":""} There is a user field with an empty string value
{"user":"-"} With user field, the value is not empty
{"user":["jane"]} With user field, the value is not empty
{"user":["jane",null]} With the user field, at least one value is not empty.

The following documents will not be matched:

File Explain
{"user":null} Although there is a user field, the value is empty
{"user":[]} Although there is a user field, the value is empty
{"user":[null]} Although there is a user field, the value is empty
{"foo":"bar"} No user field

1.3.5 ids query

Query for documents with the specified id.

{
  "query": {
    "ids": {
      "type": "news",
      "values": "2101"
    }
  }
}

Types are optional, or multiple IDS can be specified in a data manner.

{
  "query": {
    "ids": {
      "values": [
        "2101",
        "2301"
      ]
    }
  }
}

1.4 Compound Query

1.4.1 bool query

Because the work touches on es is to do aggregation, statistics, classification projects, often to do a variety of complex multi-condition queries, so in fact, bool query is used a lot, because the number of query conditions is uncertain, so the logical thinking of processing, the outer layer with a large bool query to carry. (Of course, the project uses its Java API)

bool query can be combined with any number of simple queries. The logical representation of each simple query is as follows:

attribute Explain
must Documents must match the query conditions under the music option, which is equivalent to the AND of logical operations.
should Documents can match query conditions under the should option, or they can not match, which is equivalent to OR of logical operations.
must_not Contrary to music, documents matching query criteria under this option are not returned
filter Like music, documents that match the query criteria under the filter option are returned, but the filter does not score, it only performs filtering functions.

An example is as follows:

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": "Lippi"
        }
      },
      "must_not": {
        "match": {
          "content": "China Super League"
        }
      }
    }
  }
}

It should be noted that under the same bool, there can only be one must, must_not, should and filter.

If you want to have more than one music, for example, you want to match Lippi and Zhongchao at the same time, but deliberately separate the two keywords (because in fact, a must, and then use match, and the operator for and can achieve the goal), how to operate? Note the use of arrays under music, and then multiple match objects are available:

{
  "size": 1,
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "Lippi"
          }
        },
        {
          "match": {
            "content": "Hengda University"
          }
        }
      ]
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

Of course, the array under music can also be multiple bool query conditions for more complex queries.

The above query is equivalent to:

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "content": {
            "query": "Lippi Evergrande",
            "operator": "and"
          }
        }
      }
    }
  },
  "sort": [
    {
      "id": {
        "order": "desc"
      }
    }
  ]
}

1.5 nested query

First add the following index:

PUT /my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "user":{
          "type": "nested",
          "properties": {
            "first":{"type":"keyword"},
            "last":{"type":"keyword"}
          }
        },
        "group":{
          "type": "keyword"
        }
      }
    }
  }
}

Add data:

PUT my_index/my_type/1
{
  "group":"GuangZhou",
  "user":[
    {
      "first":"John",
      "last":"Smith"
    },
    {
      "first":"Alice",
      "last":"White"
    }
  ]
}

PUT my_index/my_type/2
{
  "group":"QingYuan",
  "user":[
    {
      "first":"Li",
      "last":"Wang"
    },
    {
      "first":"Yonghao",
      "last":"Ye"
    }
  ]
}

Enquiries:

Simpler queries:

{
  "query": {
    "nested": {
      "path": "user",
      "query": {
        "term": {
          "user.first": "John"
        }
      }
    }
  }
}

More complex queries:

{
  "query": {
    "bool": {
      "must": [
        {"nested": {
          "path": "user",
          "query": {
            "term": {
              "user.first": {
                "value": "Li"
              }
            }
          }
        }},
        {
          "nested": {
            "path": "user",
            "query": {
              "term": {
                "user.last": {
                  "value": "Wang"
                }
              }
            }
          }
        }
      ]
    }
  }
}

1.6 Supplement: Array Query and Testing

Add an index:

PUT my_index2
{
  "mappings": {
    "my_type2":{
      "properties": {
        "message":{
          "type": "text"
        },
        "keywords":{
          "type": "keyword"
        }
      }
    }
  }
}

Add data:

PUT /my_index2/my_type/1
{
  "message":"keywords test1",
  "keywords":["Beauty","Comic","Film"]
}

PUT /my_index2/my_type/2
{
  "message":"keywords test2",
  "keywords":["Film","Beauty makeup","Advertisement"]
}

Search:

{
  "query": {
    "term": {
      "keywords": "advertising"
    }
  }
}

Note1: Note that when setting the field type, keywords Set to keyword,So use term Queries can match exactly, but set to text,Not necessarily - if you add a word segmentation, you can search; if not, you use the default word segmentation, just divide it into one word, it will not be searched. This is especially noticeable.

Note2: For array fields, it is also possible to do barrel aggregation. When doing barrel aggregation, each value will be grouped as a value, not as a whole array. You can use the above test, but it should be noted that the field type can not be as follows text,Otherwise, the aggregation will fail.

Note3: Therefore, according to the above suggestion, pure arrays are generally suitable for storing label class data, as in the case above, and the field type is set to keyword,Instead of text,It's good to match exactly when searching.

2 polymerization

2.1 Index Polymerization

The aggregation function equivalent to MySQL.

max

{
  "size": 0,
  "aggs": {
    "max_id": {
      "max": {
        "field": "id"
      }
    }
  }
}

size If not set to 0, all other data will be returned in addition to the aggregated results.

min

{
  "size": 0,
  "aggs": {
    "min_id": {
      "min": {
        "field": "id"
      }
    }
  }
}

avg

{
  "size": 0,
  "aggs": {
    "avg_id": {
      "avg": {
        "field": "id"
      }
    }
  }
}

sum

{
  "size": 0,
  "aggs": {
    "sum_id": {
      "sum": {
        "field": "id"
      }
    }
  }
}

stats

{
  "size": 0,
  "aggs": {
    "stats_id": {
      "stats": {
        "field": "id"
      }
    }
  }
}

2.2 Barrel Polymerization

Equivalent to the group by operation of MySQL, so don't try to bucket aggregation of text fields in es, otherwise it will fail.

Terms

It is equivalent to grouping queries, aggregating according to fields.

{
  "size": 0,
  "aggs": {
    "per_count": {
      "terms": {
        "size":100,
        "field": "vtype",
        "min_doc_count":1
      }
    }
  }
}

In the process of barrel polymerization, index aggregation can also be carried out, which is equivalent to mysql doing group by, then doing various max, min, avg, sum, stats and so on:

{
  "size": 0,
  "aggs": {
    "per_count": {
      "terms": {
        "field": "vtype"
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

Filter

MySQL filters the results according to where conditions, and then does various max, min, avg, sum, stats operations.

{
  "size": 0,
  "aggs": {
    "gender_1_follower": {
      "filter": {
        "term": {
          "gender": 1
        }
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

The aggregation operation above is equivalent to querying each indicator of gender 1.

Filters

On the basis of Filter, we can query the individualized indices of multiple fields, that is, aggregate indices for each query result.

{
  "size": 0,
  "aggs": {
    "gender_1_2_follower": {
      "filters": {
        "filters": [
          {
            "term": {
              "gender": 1
            }
          },
          {
            "term": {
              "gender": 2
            }
          }
        ]
      },
      "aggs": {
        "stats_follower": {
          "stats": {
            "field": "realFollowerCount"
          }
        }
      }
    }
  }
}

Range

{
  "size": 0,
  "aggs": {
    "follower_ranges": {
      "range": {
        "field": "realFollowerCount",
        "ranges": [
          {
            "to": 500
          },
          {
            "from": 500,
            "to": 1000
          },
          {
            "from": 1000,
            "to": 1500
          },
          {
            "from": "1500",
            "to": 2000
          },
          {
            "from": 2000
          }
        ]
      }
    }
  }
}

to: Less than from: Greater than or equal to

Date Range

Similar to the previous one, only the field is of date type, and then the range value is also of date.

Posted by cheshil on Sat, 18 May 2019 03:00:15 -0700