search (14) - elastic4s - statistical range: global, filter, post filter bucket

Keywords: Scala JSON

Aggregation generally works within the scope of query. The aggregation request without query is actually counted within the query range of match [all {}:

GET /cartxns/_search
{
  "aggs": {
    "all_colors": {
      "terms": {"field" : "color.keyword"}
    }
  }
 }
}

GET /cartxns/_search
{
  "query": {
    "match_all": {}
  }, 
  "aggs": {
    "all_colors": {
      "terms": {"field" : "color.keyword"}
    }
  }
 }
}

The results of the above two requests are the same:

  "aggregations" : {
    "all_colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "red",
          "doc_count" : 4
        },
        {
          "key" : "blue",
          "doc_count" : 2
        },
        {
          "key" : "green",
          "doc_count" : 2
        }
      ]
    }
  }

Although many times we want to make statistics in the query scope, we will also encounter the need to make statistics on the totals without any query conditions. For example, you need to know the average selling price of all car models while calculating the average selling price of a car model. Here, the average price of all car models is a kind of global bucket statistics:

GET /cartxns/_search
{
  "query" : {
    "match" : {"make.keyword": "ford"}
  }
  , "aggs": {
    "avg_ford": {
      "avg": {
        "field": "price"
      }
    },
    "avg_all" : {
      "global": {},
      "aggs": {
        "avg_price": {
          "avg": {"field": "price"}
        }
      }
    }
    
  }

}

Search results and aggregate results are as follows:

 "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.2809337,
    "hits" : [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "NGVXAnIBSDa1Wo5UqLc3",
        "_score" : 1.2809337,
        "_source" : {
          "price" : 30000,
          "color" : "green",
          "make" : "ford",
          "sold" : "2014-05-18"
        }
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "OWVYAnIBSDa1Wo5UTrf8",
        "_score" : 1.2809337,
        "_source" : {
          "price" : 25000,
          "color" : "blue",
          "make" : "ford",
          "sold" : "2014-02-12"
        }
      }
    ]
  },
  "aggregations" : {
    "avg_all" : {
      "doc_count" : 8,
      "avg_price" : {
        "value" : 26500.0
      }
    },
    "avg_ford" : {
      "value" : 27500.0
    }
  }

Expressed in elastic4s:

 val aggGlob = search("cartxns").query(
    matchQuery("make.keyword","ford")
  ).aggregations(
    avgAggregation("single_avg").field("price"),
    globalAggregation("all_avg").subaggs(
        avgAggregation("avg_price").field("price")
    )
  )
  println(aggGlob.show)

  val globResult = client.execute(aggGlob).await

  if (globResult.isSuccess) {
    val gavg = globResult.result.aggregations.global("all_avg").avg("avg_price")
    val savg = globResult.result.aggregations.avg("single_avg")
    println(s"${savg.value},${gavg.value}")
    globResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
  } else println(s"error: ${globResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"aggs":{"single_avg":{"avg":{"field":"price"}},"all_avg":{"global":{},"aggs":{"avg_price":{"avg":{"field":"price"}}}}}},Some(application/json))
27500.0,26500.0
Map(price -> 30000, color -> green, make -> ford, sold -> 2014-05-18)
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)

The function of filter bucket is to perform statistics after filtering in query results. For example: query all Hongda car payment transactions, but only count the monthly sales of Hongda:

GET /cartxns/_search
{
    "query": {
      "match": {
        "make.keyword": "honda"
      }
    },
    "aggs": {
      "sales_this_month": {
        "filter": {
          "range" : {"sold" : { "from" : "2014-10-01", "to" : "2014-11-01" }}
        },
        "aggs": {
          "month_total": {
            "sum": {"field": "price"}
          }
        }
      }
    }
}

First, the query results should not be affected. At the same time, the sales volume of the car payment in a certain month of the query result is also obtained:

 "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 0.9444616,
    "hits" : [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "MmVXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 10000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-10-28"
        }
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "M2VXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 20000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-11-05"
        }
      },
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "N2VXAnIBSDa1Wo5UqLc3",
        "_score" : 0.9444616,
        "_source" : {
          "price" : 20000,
          "color" : "red",
          "make" : "honda",
          "sold" : "2014-11-05"
        }
      }
    ]
  },
  "aggregations" : {
    "sales_this_month" : {
      "doc_count" : 1,
      "month_total" : {
        "value" : 10000.0
      }
    }
  }

The demonstration of elastic4s is as follows:

  val aggfilter = search("cartxns").query(
    matchQuery("make.keyword","honda")
  ).aggregations(
    filterAgg("sales_the_month",rangeQuery("sold").gte("2014-10-01").lte("2014-11-01"))
    .subaggs(sumAggregation("monthly_sales").field("price"))
  )
  println(aggfilter.show)

  val filterResult = client.execute(aggfilter).await

  if (filterResult.isSuccess) {
    val ms = filterResult.result.aggregations.filter("sales_the_month")
              .sum("monthly_sales").value
    println(s"${ms}")
    filterResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
  } else println(s"error: ${filterResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"honda"}}},"aggs":{"sales_the_month":{"filter":{"range":{"sold":{"gte":"2014-10-01","lte":"2014-11-01"}}},"aggs":{"monthly_sales":{"sum":{"field":"price"}}}}}},Some(application/json))
10000.0
Map(price -> 10000, color -> red, make -> honda, sold -> 2014-10-28)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)
Map(price -> 20000, color -> red, make -> honda, sold -> 2014-11-05)

The last one is post filter. Post filter is also the filtering of query results, but after the completion of the whole query, it filters the results. That is, if query also involves aggregation, aggregation is not affected by filtering:

GET /cartxns/_search
{
  "query": {
    "match": {
      "make.keyword": "ford"
    }
  },
  "post_filter": {
    "match" : {
      "color.keyword" : "blue"
    }
  }
  ,"aggs": {
    "colors": {
      "terms": {
        "field": "color.keyword",
        "size": 10
      }
    }
  }
}

The query and aggregation results are as follows:

  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.2809337,
    "hits" : [
      {
        "_index" : "cartxns",
        "_type" : "_doc",
        "_id" : "OWVYAnIBSDa1Wo5UTrf8",
        "_score" : 1.2809337,
        "_source" : {
          "price" : 25000,
          "color" : "blue",
          "make" : "ford",
          "sold" : "2014-02-12"
        }
      }
    ]
  },
  "aggregations" : {
    "colors" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "blue",
          "doc_count" : 1
        },
        {
          "key" : "green",
          "doc_count" : 1
        }
      ]
    }
  }
}

It can be seen that the query results show the results filtered by post filter, but aggregation and merging are not affected by filter.

elastic4s demo code:

 val aggPost = search("cartxns").query(
    matchQuery("make.keyword","ford")
  ).postFilter(matchQuery("color.keyword","blue"))
      .aggregations(
        termsAgg("colors","color.keyword")
      )

  println(aggPost.show)

  val postResult = client.execute(aggPost).await

  if (postResult.isSuccess) {
    postResult.result.hits.hits.foreach(h => println(s"${h.sourceAsMap}"))
    postResult.result.aggregations.terms("colors").buckets
      .foreach(b => println(s"${b.key},${b.docCount}"))
  } else println(s"error: ${postResult.error.causedBy.getOrElse("unknown")}")

...

POST:/cartxns/_search?
StringEntity({"query":{"match":{"make.keyword":{"query":"ford"}}},"post_filter":{"match":{"color.keyword":{"query":"blue"}}},"aggs":{"colors":{"terms":{"field":"color.keyword"}}}},Some(application/json))
Map(price -> 25000, color -> blue, make -> ford, sold -> 2014-02-12)
blue,1
green,1

Posted by something on Sat, 16 May 2020 08:26:40 -0700