ES Learning Record 4 - Filtration and Aggregation in ES

Keywords: SQL network

6. filtration

The hits._score and max_score parameters in the search results were ignored before. In fact, they all refer to the relative measure of the matching degree between the document and the specified search query. The higher the score, the higher the matching degree. However, queries do not always need to generate scores, especially when they are only used to "filter" document sets, Elastic search detects these situations and automatically optimizes query execution to avoid calculating useless scores. Both bool search and range search support filtering operations, such as (within bool):

// Filtration yields 20000 <= balance <= 30000 documents
GET /bank/_search
{
  "query": {
    "bool": {
      "must": { "match_all": {} },
      "filter": {
        "range": {
          "balance": {
            "gte": 20000,
            "lte": 30000
          }
        }
      }
    }
  }
}

7. Aggregation operations

Aggregation provides data grouping and statistics (which can be understood as group by in SQL). In Elastic search, when a search is executed and a result is returned, the aggregation result is separated from the hit result (that is, the actual return to hits). Queries and aggregations can be run, and the results of two (or any) operations can be obtained at one time, thus avoiding the time consumed by using a concise API for network roundtrip.

The following is an operation condition for aggregation based on the state of the user's state:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      }
    }
  }
}

The aggs mentioned above are used to specify the aggregation conditions. In order to observe the aggregation results easily, the concrete result array in the returned hit results shows 0 "size": 0. The aggregation results also show top10 by default, and the final result is:

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1000,
    "max_score": 0,
    // The array on this side shows 0 directly.
    "hits": []
  },
  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets": [
        {
          "key": "ID",
          "doc_count": 27
        },
        // Eliminate 8...
        {
          "key": "MO",
          "doc_count": 20
        }
      ]
    }
  }
}

The aggregations in the returned results refer to aggregations (note that they are separated from hits) and 27 users are found in the state of "ID"(Idaho, Idaho). For example, the following is the average wage by state:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword"
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

In the above command, an average_balance aggregation is nested in the group_by_state aggregation (which is very common). In practical development, the aggregation can be arbitrarily nested to extract the required information. The aggregation results are as follows (other irrelevant ones are omitted):

  "aggregations": {
    "group_by_state": {
      "doc_count_error_upper_bound": 20,
      "sum_other_doc_count": 770,
      "buckets": [
        {
          "key": "ID",
          "doc_count": 27,
          "average_balance": {
            "value": 24368.777777777777
          }
        },
        // Save 8.
        {
          "key": "MO",
          "doc_count": 20,
          "average_balance": {
            "value": 24151.8
          }
        }
      ]
    }
  }

In descending order according to the average state account balance:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_state": {
      "terms": {
        "field": "state.keyword",
        "order": {
          "average_balance": "desc"
        }
      },
      "aggs": {
        "average_balance": {
          "avg": {
            "field": "balance"
          }
        }
      }
    }
  }
}

Following is the grouping of age 20-29, 30-39, 40-49 in turn, then by gender, and finally by average account balance:

GET /bank/_search
{
  "size": 0,
  "aggs": {
    "group_by_age": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 20,
            "to": 30
          },
          {
            "from": 30,
            "to": 40
          },
          {
            "from": 40,
            "to": 50
          }
        ]
      },
      "aggs": {
        "group_by_gender": {
          "terms": {
            "field": "gender.keyword"
          },
          "aggs": {
            "average_balance": {
              "avg": {
                "field": "balance"
              }
            }
          }
        }
      }
    }
  }
}

The aggregation results returned are as follows:

  "aggregations": {
    "group_by_age": {
      "buckets": [
        {
          "key": "20.0-30.0",
          "from": 20,
          "to": 30,
          "doc_count": 451,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 232,
                "average_balance": {
                  "value": 27374.05172413793
                }
              },
              {
                "key": "F",
                "doc_count": 219,
                "average_balance": {
                  "value": 25341.260273972603
                }
              }
            ]
          }
        },
        {
          "key": "30.0-40.0",
          "from": 30,
          "to": 40,
          "doc_count": 504,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "F",
                "doc_count": 253,
                "average_balance": {
                  "value": 25670.869565217392
                }
              },
              {
                "key": "M",
                "doc_count": 251,
                "average_balance": {
                  "value": 24288.239043824702
                }
              }
            ]
          }
        },
        {
          "key": "40.0-50.0",
          "from": 40,
          "to": 50,
          "doc_count": 45,
          "group_by_gender": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
              {
                "key": "M",
                "doc_count": 24,
                "average_balance": {
                  "value": 26474.958333333332
                }
              },
              {
                "key": "F",
                "doc_count": 21,
                "average_balance": {
                  "value": 27992.571428571428
                }
              }
            ]
          }
        }
      ]
    }
  }

Of course, in addition to the above aggregation operations, there are more aggregation operations available explore.

Posted by elflacodepr on Fri, 10 May 2019 15:06:48 -0700