Elastic Search Learning Notes More than 19 Field Sorting and String Sorting

Keywords: Big Data ElasticSearch

Elastic Search Learning Notes More than 19 Field Sorting and String Sorting

sort

In order to sort by correlation, it is necessary to represent the correlation as a numerical value. In Elastic search, the correlation score is expressed by a floating point number and returned in the search results by the _score parameter. The default ranking is _score descending.

New index mapping

PUT tweet
{
  "mappings": {
    "tweet": {
      "properties": {
          "about": {
            "type": "text"
          },
          "comments": {
            "type": "nested", 
            "properties": {
              "content": {
                "type": "text"
              },
              "date": {
                "type": "date"
              },
              "user_id": {
                "type": "long"
              }
            }
          },
          "date": {
            "type": "date"
          },
          "name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          },
          "topic": {
            "type": "text"
          },
          "tweet": {
            "type": "text"
          },
          "user_id": {
            "type": "long"
          }
      }
    }
  }
}

insert data

PUT tweet/tweet/1
{
    "tweet":    "What is Elasticsearch?",
    "date":     "2014-09-14",
    "name":     "Mary Jones",
    "about":["es","elasticsearch"],
    "topic":"elasticsearch",
    "user_id":  1,
    "comments": [
      {
        "content":  "very good",
        "date":  "2014-09-14",
        "user_id":  1
      },
      {
        "content":  "good question",
        "date":  "2014-09-15",
        "user_id":  2
      }
    ]
}
PUT tweet/tweet/2
{
    "tweet":    "How can we study Elasticsearch?",
    "date":     "2014-09-18",
    "name":     "Mary Jones",
    "about":["es","elasticsearch"],
    "topic":"elasticsearch",
    "user_id":  1,
    "comments": [
      {
        "content":  "very good",
        "date":  "2014-09-18",
        "user_id":  1
      },
      {
        "content":  "good question",
        "date":  "2014-10-15",
        "user_id":  2
      }
    ]
}
PUT tweet/tweet/3
{
    "tweet":    "How can we manage Elasticsearch?",
    "date":     "2014-10-18",
    "name":     "Tom Foxs",
    "about":["es","elasticsearch"],
    "topic":"elasticsearch",
    "user_id":  2,
    "comments": [
      {
        "content":  "do not know",
        "date":  "2014-09-14",
        "user_id":  3
      },
      {
        "content":  "good question",
        "date":  "2014-09-15",
        "user_id":  2
      }
    ]
}

Let's assume that we want to see the sort by time to query the data.

GET tweet/tweet/_search
{
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ]
}

The results are as follows:

  • The document returns with sort, which contains the values we use for sorting.
  • score is not computed because it is not used for sorting.
  • The value of the date field is expressed as the number of milliseconds since epoch (January 1, 1970:00:00 UTC), which is returned by the value of the sort field.

Both the _score and max_score fields are null. Computing _score is expensive and is usually used only for sorting; here we do not sort by correlation, so it makes no sense to record _score. If you have to calculate _score anyway, you can set the track_score parameter to true.

Relevance Query

GET tweet/tweet/_search
{
  "query": {
    "match": {
      "name": "jones"
    }
  }
}

The results are as follows:

Multilevel ranking

Suppose we want to query with date and _score, and the matching results are sorted first by date and then by correlation:

GET tweet/tweet/_search
{
  "query": {
    "match": {
      "name": "jones"
    }
  },
    "sort": [
    {
      "date": {
        "order": "desc"
      }
    },
    {
      "_score":{
        "order": "desc"
      }
    }
  ]
}

The order of ranking conditions is very important. The results are first sorted by the first condition, and then sorted by the second condition only when the first sort value of the result set is exactly the same, and so on.

String sorting

Sorting by full-text analysis fields consumes a lot of memory.

If you want to analyze a string, such as fine old art, there are three items. We may want to sort the first item alphabetically, then the second item alphabetically, and so on, but Elastic search does not have this information in the sorting process.

You can use min and max sorting modes (default is min), but this will result in sorting in art or old, none of which is desirable.

To sort by a string field, this field should contain only one item: the entire not_analyzed string. But we still need the analyzed field so that we can query in full text.

A simple way is to index the same string in two ways, which will include two fields in the document: analysis for search and not_analysis for sort.

But saving the same string twice in the source field is a waste of space. What we really want to do is pass a single field but index it in two ways. All _core_field types (strings, numbers, Booleans, dates) receive a field parameter

This parameter allows you to transform a simple mapping, such as the one we created above for names:

"name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword",
                "ignore_above": 256
              }
            }
          }

Correspondence search

GET tweet/tweet/_search
{
  "query": {
    "match": {
      "name": "jones"
    }
  },
    "sort": [
    {
      "name.keyword": {
        "order": "asc"
      }
    },
    {
      "_score":{
        "order": "desc"
      }
    }
  ]
}

Posted by AngelicS on Sat, 26 Jan 2019 13:30:14 -0800