term query and match query (text and keyword) in Elastic search 5.0

Keywords: ElasticSearch

I. Basic information

Preface: term query and match query involve many things, such as word segmentation, mapping, inverted index and so on. I would like to take an example from an official document to talk about my understanding of this place.

  • The string type in es5. * is divided into text and keyword. Text is to be segmented, and the whole string is decomposed into a lowercase term according to certain rules. keyword is similar to the case of not_analyzed in es2.3.

string data is put into elastic search by default text.

NOTE: The default segmenter is standard analyzer." Quick Brown Fox!"Will be decomposed into [quick,brown,fox] and written to the inverted index

  • term query looks for the exact term in the inverted index, and it does not know the existence of the word segmentation. This query is suitable for keyword, numeric, date
  • match query knows the existence of word segmentation. And understand how words are segmented.

Overall, there are the following:
- term query queries the exact term in the inverted index
- match query performs word segmentation on files and then queries

Testing (1)

  1. Prepare data:
POST /termtest/termtype/1
{
  "content":"Name"
}
POST /termtest/termtype/2
{
  "content":"name city"
}
  1. Check whether the data is imported or not
GET /termtest/_search
{
  "query":
  {
    "match_all": {}
  }
}
  •  
  • Result:
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "2",
        "_score": 1,
        "_source": {
          "content": "name city"
        }
      },
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "1",
        "_score": 1,
        "_source": {
          "content": "Name"
        }
      }
    ]
  }
}
  • 1

As indicated, the data has been imported. The string type here is text, which is the default word segmentation.

  1. Make the following queries:
POST /termtest/_search
{
  "query":{
    "term":{
      "content":"Name"
    }
  }
}
  • 1
  • Result
{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
  • 1

The result of analysis: Because it is segmented by standard analyzer by default, all capital letters are converted to lowercase letters and stored in an inverted index for search. term is the exact query.
Name must be matched to uppercase. So the result is empty.

POST /termtest/_search
{
  "query":{
    "match":{
      "content":"Name"
    }
  }
}
  • 1
  • Result
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "content": "Name"
        }
      },
      {
        "_index": "termtest",
        "_type": "termtype",
        "_id": "2",
        "_score": 0.25811607,
        "_source": {
          "content": "name city"
        }
      }
    ]
  }
}
  • 1

Analysis results: Reasons (1): By default, the word is segmented by standard analyzer, all capital letters are converted to lowercase letters, and an inverted index is stored for searching.
Reason (2): match query first participles filed into "name" and then matches term in inverted index

3. Testing (2)

Here's an example of the official website Official website instance 
1. Importing data

PUT my_index
{
  "mappings": {
    "my_type": {
      "properties": {
        "full_text": {
          "type":  "text" 
        },
        "exact_value": {
          "type":  "keyword" 
        }
      }
    }
  }
}

PUT my_index/my_type/1
{
  "full_text":   "Quick Foxes!", 
  "exact_value": "Quick Foxes!"  
}
  • 1

First specify the type, then import the data

  • full_text: Specifies a type of text, which is participled.
  • exact_value: Specifies a type of keyword that will not be participled
  • full_text: Subjected to standard analyzer, the following terms [quick,foxes] are stored in the inverted index
  • exact_value: Only the term [Quick Foxes!] will be stored in the inverted index

    1. Make the following query
GET my_index/my_type/_search
{
  "query": {
    "term": {
      "exact_value": "Quick Foxes!" 
    }
  }
}
  • 1

Result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}
  • 1

exact_value contains the exact Quick Foxes!, so it is queried

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "full_text": "Quick Foxes!" 
    }
  }
}
  • 1

Result:

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}
  • 1

full_text is participled, with only quick and foxes in the inverted index. No Quick Foxes!

GET my_index/my_type/_search
{
  "query": {
    "term": {
      "full_text": "foxes" 
    }
  }
}
  • 1

Result:

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.25811607,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.25811607,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}
  • 1

full_text is participled and only quick and foxes are in the inverted index, so querying foxes is successful

GET my_index/my_type/_search
{
  "query": {
    "match": {
      "full_text": "Quick Foxes!" 
    }
  }
}
  • 1

Result:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.51623213,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 0.51623213,
        "_source": {
          "full_text": "Quick Foxes!",
          "exact_value": "Quick Foxes!"
        }
      }
    ]
  }
}
  • 1

match query participles its query string first. That is "Quick Foxes!" The first participle is "Quick" and "foxes". Then query in the inverted index, where full_text is text type and is partitioned into quick and foxes
So it matches.

Posted by Dragoon1 on Tue, 08 Jan 2019 11:51:09 -0800