elasticsearch Nested data type nested object mapping, query, and aggregate analysis

Processing association relationship in elastic search

#● for relational databases, Normalize data is generally considered; for elastic search, Normalize data is often considered
#● the benefits of Denormalize: faster reading / demand meter connection / demand lock
#● Elasticsearch is not good at handling the relationship. We generally use the following four methods to deal with Association
#○ object type
#○ nested object
#Parent / Child relationship
#○ end to end correlation

DELETE blog

Set the Mapping of blog

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "time": {
        "type": "date"
      },
      "user": {
        "properties": {
          "city": {
            "type": "text"
          },
          "userid": {
            "type": "long"
          },
          "username": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Insert a Blog message

PUT blog/_doc/1
{
  "content":"I like Elasticsearch",
  "time":"2019-01-01T00:00:00",
  "user":{
    "userid":1,
    "username":"Jack",
    "city":"Shanghai"
  }
}

Find the article that contains elastic search by jack

POST blog/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "Elasticsearch"
          }
        },
        {
          "match": {
            "user.username": "Jack"
          }
        }
      ]
    }
  }
}

## 

DELETE my_movies

Movie Mapping information

PUT my_movies
{
      "mappings" : {
      "properties" : {
        "actors" : {
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "keyword"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}

Write a movie message

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },

    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }

  ]
}

We do the same query as before. If the query name is Keanu and the last name is Hopper, we could not find the data because there is no such an actor

However, the data is still queried because when it is stored, the edges of the internal objects are not taken into account, and the json format is processed into a flat key value pair structure.

Example: "title": "Speed"

"actors.first_name":["Keanu","Dennis"]

"actors.last_name":["Reeves","Hopper"]

We can use Nested Data Type to solve this problem

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "actors.first_name": "Keanu"
          }
        },
        {
          "match": {
            "actors.last_name": "Hopper"
          }
        }
      ]
    }
  }
}

Nested Data Type

Nested data type, which allows objects in object numbers to be independently indexed (stored)

Internally, the nested document will be saved in two lucene s, and the join processing will be performed during query

Recreate index, specify type nested

Specifying nested data properties internally

DELETE my_movies
PUT my_movies
{
  "mappings": {
    "properties": {
      "actors": {
        "type": "nested",
        "properties": {
          "first_name": {
            "type": "keyword"
          },
          "last_name": {
            "type": "keyword"
          }
        }
      },
      "title": {
        "type": "text"
      }
    }
  }
}

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },

    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }

  ]
}

For nested objects, we also need to specify the nested query and the nested path

Specify nested to specify path path where nested queries are needed

When we specify the wrong actor name, we can't get the data. When it's right, we can

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Speed"
          }
        },
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "actors.first_name": "Keanu"
                    }
                  },
                  {
                    "match": {
                      "actors.last_name": "Hopper"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Speed"
          }
        },
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "actors.first_name": "Keanu"
                    }
                  },
                  {
                    "match": {
                      "actors.last_name": "Reeves"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Nested Aggregation

Group by author's last name

We gathered in the way before and found that he could not work

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actor_name": {
      "terms": {
        "field": "actors.first_name"
      }
    }
  }
}

When performing aggregate analysis on nested objects, we need to specify the aggregated field as nested objects, and specify the path to write our aggregate analysis to the sub aggregate analysis within nested objects

It can be seen that our sub polymerization is for aggregation analysis, and the main polymerization is not for aggregation analysis

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {
      "nested": {
        "path": "actors"
      },
      "aggs": {
        "actor_name": {
          "terms": {
            "field": "actors.first_name",
            "size": 10
          }
        }
      }
    },
    "actor_name": {
      "terms": {
        "field": "actors.first_name"
      }
    }
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-nested-query.html

Posted by Brudus on Sun, 21 Jun 2020 00:56:55 -0700

Programmer Group