Elasticsearch Series - Index Management for Production Clusters

Keywords: Programming curl JSON ElasticSearch Java

outline

Indexing is one of the most frequent daily operations we use in Elasticsearch. This article takes a look at the indexing operations of Elasticsearch from the perspective of operations personnel.

basic operation

From the perspective of running children's shoes, let's take a look at the daily operation of the index in addition to CRUD, or turn on and off, compression, alias reset.

Create Index

[esuser@elasticsearch02 ~]$curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d '
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, 
            "number_of_replicas" : 2 
        }
    },
    "mappings" : {
        "type1" : {
            "properties" : {
                "name" : { "type" : "text" }
            }
        }
    }
}'

{
    "acknowledged": true,
    "shards_acknowledged": true
}

By default, the index creation command returns a response message, such as the one above, after each primary shard's replica share begins to replicate, or the request times out.

acknowledged indicates whether the index was created successfully, shards_acknowledged indicates whether each primary shard has enough replicas to begin replication.

These two parameters may be false, but the index can still be created successfully.Because these parameters simply indicate whether the two operations succeeded before the request timed out, or possibly the request timed out and did not succeed before the timeout, the Elasticsearch Server side actually receives the message and executes it, but it is not time before the response, so the response is false.

Delete Index

curl -XDELETE 'http://elasticsearch02:9200/music?pretty'

Query Index Settings Information

curl -XGET 'http://elasticsearch02:9200/music?pretty'

Turn Index on/off

curl -XPOST 'http://elasticsearch02:9200/music/_close?pretty'
curl -XPOST 'http://elasticsearch02:9200/music/_open?pretty'

If an index is turned off, then there is no performance overhead for the index, as long as the metadata of the index is preserved, then read and write operations to the index will not succeed.A closed index can be accepted and reopened, and the shard recovery process takes place after opening.

If cluster data is backed up regularly, the index to be recovered must be closed before performing the recovery operation, otherwise the recovery will fail.

Compressed Index

We know that once the number of primary shard s of an index is specified at the time of creation, it cannot be modified later, but there is one case where the estimated number of shards is found to be a little higher after actual production, such as when number_was originally setOf_Shards is 8. As a result, when production comes online, the amount of data is not that large. I want to compress the primary share of this index. What should I do?

The shrink command compresses the index, but there is a limitation: the number of shards compressed must be divisible by the original number of shards.For example, our index of eight primary shards can only be compressed into four, two, or one index of a primary shard.

Workflow for shrink command:
  1. Create a target index as defined by the source index, but the only change is that primary shard becomes the specified number.
  2. Connect the segment file of source index directly to the segment file of target index using hard-link. If the operating system does not support hard-link, the segment file of source index will be copied to the data dir of target index, which is time consuming.It will be fast if you use hard-link.
  3. target index for shard recovery recovery recovery.
Case Demonstration
  1. Let's create a number_of_shards is an index of 8, named music8
curl -XPUT 'http://elasticsearch02:9200/music8?pretty' -H 'Content-Type: application/json' -d '
{
    "settings" : {
        "index" : {
            "number_of_shards" : 8, 
            "number_of_replicas" : 2 
        }
    },
    "mappings" : {
        "children" : {
            "properties" : {
                "name" : { "type" : "text" }
            }
        }
    }
}'
  1. Point data into the index
  2. Move all shard s of the index to a node, such as node1
curl -XPUT 'http://elasticsearch02:9200/music8/_settings?pretty' -H 'Content-Type: application/json' -d '
{
  "settings": {
    "index.routing.allocation.require._name": "node-1", 
    "index.blocks.write": true 
  }
}'

This process is called shard copy relocate and uses

`curl -XGET 'http://elasticsearch02:9200/_cat/recovery?v'

You can view the progress of the process.

  1. Execute the shrink command with the new index name music9
curl -XPOST 'http://elasticsearch02:9200/music8/_shrink/music9?pretty' -H 'Content-Type: application/json' -d '
{
  "settings": {
	"index.number_of_shards": 2, 
    "index.number_of_replicas": 1,
    "index.codec": "best_compression" 
  }
}'

When the execution is complete, you can see that the shard data for music 9 has changed, and you have all the data for music 8.

  1. The alias is pointed to the new music9 index, and client access is insensitive.

rollover index

Our most common log index is to create a new date index every day, but the client writes with the same alias, which can be reset to the new index with the rollover command.

Assume log_The write alias already exists, sample command:

curl -XPOST 'http://elasticsearch02:9200/log_write/_rollover/log-20120122
-H 'Content-Type: application/json' -d '
{
  "conditions": {
    "max_age":   "1d"
  }
}'

Execute crontab periodically once a day and parameterize the date part with a shell script so that an index name with a date is created every day and the client always uses log_The write alias is useful for writing to log systems.

Index mapping management

The mapping management of an index is a very basic operation, either by defining mapping information when an index is created or by adding fields after the index is created successfully.

List the following common examples:

View mapping information for the index

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children?pretty'

View mapping information for index specified field

curl -XGET 'http://elasticsearch02:9200/music/_mapping/children/field/content?pretty'

Create an index with mapping information

# Save space, omit most fields
curl -XPUT 'http://elasticsearch02:9200/music?pretty' -H 'Content-Type: application/json' -d ' 
{
  "mappings": {
    "children": {
      "properties": {
        "content": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
		}
      }
    }
  }
}'

Add a field name of type text to the index

curl -XPUT 'http://elasticsearch02:9200/music/_mapping/children?pretty' -H 'Content-Type: application/json' -d ' 
{
  "properties": {
    "name": {
      "type": "text"
    }
  }
}'

Index Alias

When clients access Elasticsearch's index, the normalization operation does not use the index name directly, but the index alias, which can encapsulate the real index of Elasticsearch, such as the rollover operation above, the index rebuild operation, and the alias plays a key role.

Let's take a brief look at the basic operations of indexing:

# Create index alias
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "add" : { "index" : "music", "alias" : "music_prd" } }
    ]
}'
# Delete index alias
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "remove" : { "index" : "music", "alias" : "music_prd" } }
    ]
}'
# Rename Alias: Delete then Add
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "remove" : { "index" : "music", "alias" : "music_prd" } },
        { "add" : { "index" : "music2", "alias" : "music_prd" } }
    ]
}'
# Multiple indexes bind an alias
curl -XPOST 'http://elasticsearch02:9200/_aliases?pretty' -H 'Content-Type: application/json' -d '
{
    "actions" : [
        { "add" : { "indices" : ["music1", "music2"], "alias" : "music_prd" } }
    ]
}'

Index setting s modification

View index setting s information:

curl -XGET 'http://elasticsearch02:9200/music/_settings?pretty'

Modify setting s information:

curl -XPUT 'http://elasticsearch02:9200/music/_settings?pretty' -H 'Content-Type: application/json' -d '
{
    "index" : {
        "number_of_replicas" : 1
    }
}'

The most common modification to setting s is the number of replicas, and other scenarios where parameters are modified are not particularly large.

Index template

Assuming that we are designing an index structure for the log system, that there is a large amount of log data, and that a new index may be created every day, with the index name marked by date but with the same alias, this scenario would be more appropriate to use the index template.

Let's start by creating an index template:

curl -XPUT 'http://elasticsearch02:9200/_template/template_access_log?pretty' -H 'Content-Type: application/json' -d '
{
  "template": "access-log-*",
  "settings": {
    "number_of_shards": 2
  },
  "mappings": {
    "log": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
		"thread_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "YYYY-MM-dd HH:mm:ss"
        }
      }
    }
  },
  "aliases" : {
      "access-log" : {}
  }
}'

The template will be used if the index name matches "access-log-*", so we create an index:

curl -XPUT 'http://elasticsearch02:9200/access-log-01?pretty'

View the index:

curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'

You can see the following structure:

[esuser@elasticsearch02 bin]$ curl -XGET 'http://elasticsearch02:9200/access-log-01?pretty'
{
  "access-log-01" : {
    "aliases" : {
      "access-log" : { }
    },
    "mappings" : {
      "log" : {
        "_source" : {
          "enabled" : false
        },
        "properties" : {
          "created_at" : {
            "type" : "date",
            "format" : "YYYY-MM-dd HH:mm:ss"
          },
          "host_name" : {
            "type" : "keyword"
          },
          "thread_name" : {
            "type" : "keyword"
          }
        }
      }
    },
    "settings" : {
      "index" : {
        "creation_date" : "1581373546223",
        "number_of_shards" : "2",
        "number_of_replicas" : "1",
        "uuid" : "N8AHh3wITg-Zh4T6umCS2Q",
        "version" : {
          "created" : "6030199"
        },
        "provided_name" : "access-log-01"
      }
    }
  }
}

Describes the content of the template used.

There are also commands to view and delete template s:

curl -XGET 'http://elasticsearch02:9200/_template/template_access_log?pretty'

curl -XDELETE 'http://elasticsearch02:9200/_template/template_access_log?pretty'

Index Common Queries

Index Operations Statistics Query

Elasticsearch does statistics on all CRUD operations that occur on an index, and the statistics are very informative. We can use this command:

curl -XGET 'http://elasticsearch02:9200/music/_stats?pretty'

The content is very detailed, with hundreds of lines ranging from doc data and the number of bytes consumed on disk to the underlying data such as get, search, merge, translog, and so on.

segment Information Query

The segment ed information under the index can be queried using this command:

curl -XGET 'http://elasticsearch02:9200/music/_segments?pretty'

There's also a lot of content, so let's take a sample of the key parts:

"segments" : {
  "_1" : {
    "generation" : 1,
    "num_docs" : 1,
    "deleted_docs" : 0,
    "size_in_bytes" : 7013,
    "memory_in_bytes" : 3823,
    "committed" : true,
    "search" : true,
    "version" : "7.3.1",
    "compound" : true,
    "attributes" : {
      "Lucene50StoredFieldsFormat.mode" : "BEST_SPEED"
    }
  }
}

This fragment represents a name of _segment information for 1.Details are as follows:

  • _Name of 1:segment
  • generation:segment self-growing ID
  • Num_Number of document s not deleted in docs:segment s
  • Deleted_Number of document s deleted in docs:segment s
  • Size_In_Disk space occupied by bytes:segment s
  • memory_in_bytes:segments cache some data in memory, which is the amount of memory space segments occupy
  • committed:segment s sync to disk
  • Can search:segment s be searched if they have been sync to disk but have not been refresh, false
  • Version:lucene version number
  • compound:true means that lucene has merge d all files of this segment into a single file
shard Storage Information

This command is useful to see how shard is stored under the index and on which node it is distributed:

curl -XGET 'http://elasticsearch02:9200/music/_shard_stores?status=green&pretty'

An excerpt, 3 for shard's id:

"3" : {
  "stores" : [
    {
      "A1s1uus7TpuDSiT4xFLOoQ" : {
        "name" : "node-2",
        "ephemeral_id" : "Q3uoxLeJRnWQrw3E2nOq-Q",
        "transport_address" : "192.168.17.137:9300",
        "attributes" : {
          "ml.machine_memory" : "3954196480",
          "rack" : "r1",
          "xpack.installed" : "true",
          "ml.max_open_jobs" : "20",
          "ml.enabled" : "true"
        }
      },
      "allocation_id" : "o-t-AwGZRrWTflYLP030jA",
      "allocation" : "primary"
    },
    {
      "RGw1IXzZR4CeZh9FUrGHDw" : {
        "name" : "node-1",
        "ephemeral_id" : "B1pv6c4TRuu1vQNvL40iPg",
        "transport_address" : "192.168.17.138:9300",
        "attributes" : {
          "ml.machine_memory" : "3954184192",
          "rack" : "r1",
          "ml.max_open_jobs" : "20",
          "xpack.installed" : "true",
          "ml.enabled" : "true"
        }
      },
      "allocation_id" : "SaXqL8igRUmLAoBBQyQNqw",
      "allocation" : "replica"
    }
  ]
},
Add a few actions
  1. Empty Index Cache

curl -XPOST 'http://elasticsearch02:9200/music/_cache/clear?pretty'

  1. Force flush

Forcing fsync to disk from data in os cache also clears logs in translog

curl -XPOST 'http://elasticsearch02:9200/music/_flush?pretty'

  1. refresh operation

Explicitly refresh the index to make all operations visible before automatic refresh

curl -XPOST 'http://elasticsearch02:9200/music/_flush?pretty'

  1. force merge

Force merging of segment file s to reduce the number of segments curl -XPOST 'http://elasticsearch02:9200/music/_forcemerge?pretty'

These four operations are generally performed automatically by Elasticsearch and do not require manual intervention under special circumstances.

Summary

This paper briefly introduces some daily operation and management of index from the perspective of operation and maintenance, which can improve the efficiency of index manipulation if skilled in application.

Focus on Java high-concurrency, distributed architecture, more technology dry goods to share and learn from, please follow Public Number: Java Architecture Community You can sweep the QR code on the left to add friends and invite you to join the Java Architecture Community WeChat Group to explore technology

Posted by jossejf on Sun, 21 Jun 2020 17:38:00 -0700