The difference between [ES] term and match

Keywords: Java

term usage

First, let's look at the definition of term, which means perfect matching, that is, precise query. Before searching, we will not disassemble the search terms.

Here is an example to illustrate how to store some data first:

{
    "title": "love China",
    "content": "people very love China",
    "tags": ["China", "love"]
}
{
    "title": "love HuBei",
    "content": "people very love HuBei",
    "tags": ["HuBei", "love"]
}

Use term to query:

{
  "query": {
    "term": {
      "title": "love"
    }
  }
}

As a result, the above two data can be queried:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

I found that all the keywords related to love in the title have been found out, but I just want to match love China exactly. Follow the writing method below to see if it can be found out:

{
  "query": {
    "term": {
      "title": "love China"
    }
  }
}

No data is found during execution. Conceptually, term is an exact match, only a single word can be checked. How do I want to match multiple words with term? You can use terms to:

{
  "query": {
    "terms": {
      "title": ["love", "China"]
    }
  }
}

The query result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.6931472,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": ["HuBei","love"]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 0.6931472,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": ["China","love"]
        }
      }
    ]
  }
}

Find all of them. Why? Because there are multiple yes or relationships in terms, as long as one of the words is satisfied. To notify that two words are satisfied, you need to use bool's must, as follows:

{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "title": "love"
          }
        },
        {
          "term": {
            "title": "china"
          }
        }
      ]
    }
  }
}
As you can see, China is lowercase. When we search in uppercase China, we find that we can't find any information. Why is that? When the word title is stored, it is segmented. We use the default word segmentation processor for word segmentation. Can we see how to do word segmentation?

Word segmentation processor

GET test/_analyze
{
  "text" : "love China"
}

The result is:

{
  "tokens": [
    {
      "token": "love",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "china",
      "start_offset": 5,
      "end_offset": 10,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

The two words analyzed are love and China. And term can only match the words completely without any change. Therefore, when we use the query method like China, it will fail. There will be a section devoted to word segmentation later.

match usage

First match with love China.

GET test/doc/_search
{
  "query": {
    "match": {
      "title": "love China"
    }
  }
}

The result is:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      },
      {
        "_index": "test",
        "_type": "doc",
        "_id": "8",
        "_score": 0.6931472,
        "_source": {
          "title": "love HuBei",
          "content": "people very love HuBei",
          "tags": [
            "HuBei",
            "love"
          ]
        }
      }
    ]
  }
}

It's found that both of them have been found out. Why? Because when a match is searched, it will split the words first, and then match them. For the above two contents, the entry of their title is: love China hub. We search for love China, and we divide the words to get the relationship of love China, which belongs to or. As long as any entry is in it, it can be matched. If you want love and China to match at the same time, what should you do? Use match? Phrase

Match ABCD phrase usage

match_phrase is called phrase search, which requires that all participles must appear in the document at the same time, and the location must be close to each other.

GET test/doc/_search
{
  "query": {
    "match_phrase": {
      "title": "love china"
    }
  }
}

The result is:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.3862944,
    "hits": [
      {
        "_index": "test",
        "_type": "doc",
        "_id": "7",
        "_score": 1.3862944,
        "_source": {
          "title": "love China",
          "content": "people very love China",
          "tags": [
            "China",
            "love"
          ]
        }
      }
    ]
  }
}

 

Posted by hubardz on Thu, 09 Apr 2020 05:55:05 -0700