Processing association relationship in elastic search
#● for relational databases, Normalize data is generally considered; for elastic search, Normalize data is often considered
#● the benefits of Denormalize: faster reading / demand meter connection / demand lock
#● Elasticsearch is not good at handling the relationship. We generally use the following four methods to deal with Association
#○ object type
#○ nested object
#Parent / Child relationship
#○ end to end correlation
DELETE blog
Set the Mapping of blog
PUT /blog { "mappings": { "properties": { "content": { "type": "text" }, "time": { "type": "date" }, "user": { "properties": { "city": { "type": "text" }, "userid": { "type": "long" }, "username": { "type": "keyword" } } } } } }
Insert a Blog message
PUT blog/_doc/1 { "content":"I like Elasticsearch", "time":"2019-01-01T00:00:00", "user":{ "userid":1, "username":"Jack", "city":"Shanghai" } }
Find the article that contains elastic search by jack
POST blog/_search { "query": { "bool": { "must": [ { "match": { "content": "Elasticsearch" } }, { "match": { "user.username": "Jack" } } ] } } } ## DELETE my_movies
Movie Mapping information
PUT my_movies { "mappings" : { "properties" : { "actors" : { "properties" : { "first_name" : { "type" : "keyword" }, "last_name" : { "type" : "keyword" } } }, "title" : { "type" : "text", "fields" : { "keyword" : { "type" : "keyword", "ignore_above" : 256 } } } } } }
Write a movie message
POST my_movies/_doc/1 { "title":"Speed", "actors":[ { "first_name":"Keanu", "last_name":"Reeves" }, { "first_name":"Dennis", "last_name":"Hopper" } ] }
We do the same query as before. If the query name is Keanu and the last name is Hopper, we could not find the data because there is no such an actor
However, the data is still queried because when it is stored, the edges of the internal objects are not taken into account, and the json format is processed into a flat key value pair structure.
Example: "title": "Speed"
"actors.first_name":["Keanu","Dennis"]
"actors.last_name":["Reeves","Hopper"]
We can use Nested Data Type to solve this problem
POST my_movies/_search { "query": { "bool": { "must": [ { "match": { "actors.first_name": "Keanu" } }, { "match": { "actors.last_name": "Hopper" } } ] } } }
Nested Data Type
Nested data type, which allows objects in object numbers to be independently indexed (stored)
Internally, the nested document will be saved in two lucene s, and the join processing will be performed during query
Recreate index, specify type nested
Specifying nested data properties internally
DELETE my_movies PUT my_movies { "mappings": { "properties": { "actors": { "type": "nested", "properties": { "first_name": { "type": "keyword" }, "last_name": { "type": "keyword" } } }, "title": { "type": "text" } } } } POST my_movies/_doc/1 { "title":"Speed", "actors":[ { "first_name":"Keanu", "last_name":"Reeves" }, { "first_name":"Dennis", "last_name":"Hopper" } ] }
For nested objects, we also need to specify the nested query and the nested path
Specify nested to specify path path where nested queries are needed
When we specify the wrong actor name, we can't get the data. When it's right, we can
POST my_movies/_search { "query": { "bool": { "must": [ { "match": { "title": "Speed" } }, { "nested": { "path": "actors", "query": { "bool": { "must": [ { "match": { "actors.first_name": "Keanu" } }, { "match": { "actors.last_name": "Hopper" } } ] } } } } ] } } } POST my_movies/_search { "query": { "bool": { "must": [ { "match": { "title": "Speed" } }, { "nested": { "path": "actors", "query": { "bool": { "must": [ { "match": { "actors.first_name": "Keanu" } }, { "match": { "actors.last_name": "Reeves" } } ] } } } } ] } } }
Nested Aggregation
Group by author's last name
We gathered in the way before and found that he could not work
POST my_movies/_search { "size": 0, "aggs": { "actor_name": { "terms": { "field": "actors.first_name" } } } }
When performing aggregate analysis on nested objects, we need to specify the aggregated field as nested objects, and specify the path to write our aggregate analysis to the sub aggregate analysis within nested objects
It can be seen that our sub polymerization is for aggregation analysis, and the main polymerization is not for aggregation analysis
POST my_movies/_search { "size": 0, "aggs": { "actors": { "nested": { "path": "actors" }, "aggs": { "actor_name": { "terms": { "field": "actors.first_name", "size": 10 } } } }, "actor_name": { "terms": { "field": "actors.first_name" } } } }