ElasticSearch aggregate filtering, similar to having in SQL

Keywords: SQL ElasticSearch Java

ElasticSearch aggregate filtering, similar to having in SQL

Specific note: All content of the article is based on ElasticSerch version 5.5.3

background

In real business scenarios, we encounter the need for aggregation filtering, which requires grouping, aggregation, and filtering through the results of aggregation. There are having or subqueries in relational databases. The key in ES uses bucket_selector to achieve this function

Actual Business Scenario

You need to find customers who place orders more than or equal to 2 orders and whose average order amount is greater than or equal to 100
Data is based on data from a previous blog: https://blog.csdn.net/tuposky/article/details/80988915

  • Corresponding SQL statements in relational databases
SELECT 
    userId,
    AVG(amount) avgAmount,
    count(*) orderCount
FROM type_order
GROUP by userId
HAVING avgAmount >= 100 and orderCount >=2
  • query of ES
GET index_test/type_order/_search
{
  "size": 0,
  "aggs": {
    "groupUserId": {
      "terms": {
        "field": "userId"
      },
      "aggs": {
        "avgAmount": {
          "avg": {
            "field": "amount"
          }
        },
        "having": {
          "bucket_selector": {
            "buckets_path": {
              "orderCount": "_count",
              "avgAmount": "avgAmount"
            },
            "script": {
              "source": "params.avgAmount >= 100 && params.orderCount >=2 "
            }
          }
        }
      }
    }
  }
}
  • Return results
{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 4,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "groupUserId": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": 1000,
          "doc_count": 2,
          "avgAmount": {
            "value": 275
          }
        }
      ]
    }
  }
}
  • Implementation in Java Api
String tremsAlias = "userIdGroup";

//Grouping first. If you don't specify size, the default is 10. Here you can pass an int with a maximum value of 2147483647 and fetch all the data at once
TermsAggregationBuilder termsAgg = AggregationBuilders.terms(tremsAlias).field("userId").size(Integer.MAX_VALUE).order(Terms.Order.term(true));
//Aggregate, count is its own
termsAgg.subAggregation(AggregationBuilders.avg("avgAmount").field("amount"));
//Declare BucketPath for subsequent bucket filtering
Map<String, String> bucketsPathsMap = new HashMap<>(8);
bucketsPathsMap.put("orderCount", "_count");
bucketsPathsMap.put("avgAmount", "avgAmount");
//Setup script
Script script = new Script("params.avgAmount >= 100 && params.orderCount >=2");

//Build a bucket selector
BucketSelectorPipelineAggregationBuilder bs =
        PipelineAggregatorBuilders.bucketSelector("having", bucketsPathsMap, script);

termsAgg.subAggregation(bs);

SearchRequestBuilder sb = client.prepareSearch("index_test").setTypes("type_order");
SearchResponse sr = sb.setSize(0).addAggregation(termsAgg).execute().actionGet();
System.out.println("query Query: ");
System.out.println(sb);

//Get aggregate filtered result data
LongTerms lt = sr.getAggregations().get(tremsAlias);
List<LongTerms.Bucket> buckets = lt.getBuckets();
for (int i = 0; i < buckets.size(); i++) {
    LongTerms.Bucket bucket = buckets.get(i);
    System.out.println("-------------------------");
    System.out.println(bucket.getKey());
    System.out.println("count = " + bucket.getDocCount());
    List<Aggregation> list = bucket.getAggregations().asList();
    for (Aggregation agg : list) {
        if (agg instanceof InternalAvg) {
            InternalAvg ia = bucket.getAggregations().get("avgAmount");
            System.out.println("avgAmount = " + ia.getValue());
        }
    }
    System.out.println("-------------------------");

}

Posted by admun on Sat, 22 Feb 2020 09:40:38 -0800