MongoDB Replica Set Configuration and Data Migration Actual

Keywords: MongoDB Ubuntu vim

Original Link: https://my.oschina.net/u/168138/blog/1838207

MongoDB Replica Set Configuration and Data Migration Actual

https://gitee.com/et/ops/blob/master/MongoDB Replica Set Configuration and Data Migration Actual.md

Environment: Ubuntu 16.04, MongoDB 3.6

Basic concepts

A replica set of MongoDB is the MongoDB master-slave cluster with automatic recovery capabilities.Since MongoDB's master-slave replication capability does not support high availability, it has been obsolete since version 3.2 and replaced by a replica set.A replica set always has one active node (Primary) and several Secondary nodes, as well as an optional Arbiter node to implement failover in HA.

Dead work

Configure replica set Primary and Sandary nodes

  • Create Data Catalog
$ mkdir -p /mnt/mongodb/replset
  • Start a replica set named'my-repl', with port 27017, bound to any IP (you can also specify IP)
$ mongod --dbpath /mnt/mongodb/replset --port 27017 --replSet "my-repl" --bind_ip_all
  • Initialize replica set
    • Connect the Primary node with the mongo client:
    $ mongo
    
    • Execute an initialization script to create a replica set:
    > rs.initiate({
     _id:"my-repl",
      members:[
        {_id:0, host:"192.168.10.58:27017"},
        {_id:1, host:"192.168.10.59:27017"}
      ]
    });
    
    Output results:
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1523237919, 1),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1523237919, 1),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    
    • View Configuration Results
    > rs.conf();
    {
    	"_id" : "my-repl",
    	"version" : 1,
    	"protocolVersion" : NumberLong(1),
    	"members" : [
    		{
    			"_id" : 0,
    			"host" : "192.168.10.58:27017",
    			"arbiterOnly" : false,
    			"buildIndexes" : true,
    			"hidden" : false,
    			"priority" : 1,
    			"tags" : {
    
    			},
    			"slaveDelay" : NumberLong(0),
    			"votes" : 1
    		},
    		{
    			"_id" : 1,
    			"host" : "192.168.10.59:27017",
    			"arbiterOnly" : false,
    			"buildIndexes" : true,
    			"hidden" : false,
    			"priority" : 1,
    			"tags" : {
    
    			},
    			"slaveDelay" : NumberLong(0),
    			"votes" : 1
    		}
    	],
    	"settings" : {
    		"chainingAllowed" : true,
    		"heartbeatIntervalMillis" : 2000,
    		"heartbeatTimeoutSecs" : 10,
    		"electionTimeoutMillis" : 10000,
    		"catchUpTimeoutMillis" : -1,
    		"catchUpTakeoverDelayMillis" : 30000,
    		"getLastErrorModes" : {
    
    		},
    		"getLastErrorDefaults" : {
    			"w" : 1,
    			"wtimeout" : 0
    		},
    		"replicaSetId" : ObjectId("5acac41fded47067da446ddd")
    	}
    }
    
    The configuration process is very simple, you can see that there are two nodes, 0 and 1, in the replica set member, when the master and slave servers are ready to work, and any changes in server 0 (Primary) data will be synchronized to server 1 (Secondary).However, at this time, the replica set only provides the function of data backup and cannot be highly available.To achieve this, you need to configure an Arbiter node

Configure Arbiter

When the Primary node fails, the arbitrator votes in the election of the participating replica set to determine which replica becomes the Primary node.A quorum node will not become a Primary node without saving data.

  • Arbitrators are not usually deployed on servers with large disk space, so to minimize default data creation, modify the configuration:

    $ vim /etc/mongod.conf
    
    storage.journal.enabled=false
    storage.mmapv1.smallFiles = true.
    
  • Create Arbitrator Directory and Start Service

    $ mkdir /mnt/mongodb/arbiter
    $ mongod --port 27017 --dbpath /mnt/mongodb/arbiter --replSet 'my-repl' --bind_ip_all
    
  • Add Arbitrator to Replica Set

    • Connect to Primary Server

      $mongo --host 192.168.10.58
      
      my-repl:PRIMARY> rs.addArb("192.168.10.57:27017")
      {
      	"ok" : 1,
      	"operationTime" : Timestamp(1523326877, 1),
      	"$clusterTime" : {
      		"clusterTime" : Timestamp(1523326877, 1),
      		"signature" : {
      			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
      			"keyId" : NumberLong(0)
      		}
      	}
      }
      
  • View the effect of the replica set:

    >rs.status();
    my-repl:PRIMARY> rs.status();
    {
    	"set" : "my-repl",
    	"date" : ISODate("2018-04-10T02:21:44.826Z"),
    	"myState" : 1,
    	"term" : NumberLong(2),
    	"heartbeatIntervalMillis" : NumberLong(2000),
    	"optimes" : {
    		"lastCommittedOpTime" : {
    			"ts" : Timestamp(1523326895, 1),
    			"t" : NumberLong(2)
    		},
    		"readConcernMajorityOpTime" : {
    			"ts" : Timestamp(1523326895, 1),
    			"t" : NumberLong(2)
    		},
    		"appliedOpTime" : {
    			"ts" : Timestamp(1523326895, 1),
    			"t" : NumberLong(2)
    		},
    		"durableOpTime" : {
    			"ts" : Timestamp(1523326895, 1),
    			"t" : NumberLong(2)
    		}
    	},
    	"members" : [
    		{
    			"_id" : 0,
    			"name" : "192.168.10.58:27017",
    			"health" : 1,
    			"state" : 1,
    			"stateStr" : "PRIMARY",
    			"uptime" : 2891,
    			"optime" : {
    				"ts" : Timestamp(1523326895, 1),
    				"t" : NumberLong(2)
    			},
    			"optimeDate" : ISODate("2018-04-10T02:21:35Z"),
    			"electionTime" : Timestamp(1523324284, 1),
    			"electionDate" : ISODate("2018-04-10T01:38:04Z"),
    			"configVersion" : 2,
    			"self" : true
    		},
    		{
    			"_id" : 1,
    			"name" : "192.168.10.59:27017",
    			"health" : 1,
    			"state" : 2,
    			"stateStr" : "SECONDARY",
    			"uptime" : 2624,
    			"optime" : {
    				"ts" : Timestamp(1523326895, 1),
    				"t" : NumberLong(2)
    			},
    			"optimeDurable" : {
    				"ts" : Timestamp(1523326895, 1),
    				"t" : NumberLong(2)
    			},
    			"optimeDate" : ISODate("2018-04-10T02:21:35Z"),
    			"optimeDurableDate" : ISODate("2018-04-10T02:21:35Z"),
    			"lastHeartbeat" : ISODate("2018-04-10T02:21:43.080Z"),
    			"lastHeartbeatRecv" : ISODate("2018-04-10T02:21:43.083Z"),
    			"pingMs" : NumberLong(0),
    			"syncingTo" : "192.168.10.58:27017",
    			"configVersion" : 2
    		},
    		{
    			"_id" : 2,
    			"name" : "192.168.10.57:27017",
    			"health" : 1,
    			"state" : 7,
    			"stateStr" : "ARBITER",
    			"uptime" : 27,
    			"lastHeartbeat" : ISODate("2018-04-10T02:21:43.079Z"),
    			"lastHeartbeatRecv" : ISODate("2018-04-10T02:21:42.088Z"),
    			"pingMs" : NumberLong(0),
    			"configVersion" : 2
    		}
    	],
    	"ok" : 1,
    	"operationTime" : Timestamp(1523326895, 1),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1523326895, 1),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    

    You can see the status display: Server 0 is Primary, Server 1 is Secondary, Server 2 is Arbiter.
    Now that the highly available replica set is configured, Arbiter monitors the operation of the Primary node, and if server 0 goes down, the Arbiter node of the arbitrator initiates an election and eventually selects one of the multiple Secondaries as the Primary node.If there is only one Secondary node in the architecture we are testing, then Server 1 will become Primary.When Server 0 resumes working, it will be run as Secondary.

  • Replica Priority
    Replica sets promote Secondary to Primary when the Primary node fails, but in many cases Secondary is just a standby node and we don't want it to run as a Primary node for a long time.So to do this, we change the priority of the copy.

    • Connect the Primary node and execute the following script:
    cfg = rs.conf()
    cfg.members[0].priority = 10
    cfg.members[1].priority = 5
    rs.reconfig(cfg)
    {
    	"ok" : 1,
    	"operationTime" : Timestamp(1523411797, 2),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1523411797, 2),
    		"signature" : {
    			"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
    			"keyId" : NumberLong(0)
    		}
    	}
    }
    
    • If run on a non-Prmiary, such as Arbiter, the following error will be reported:
    {
      "ok" : 0,
      "errmsg" : "replSetReconfig should only be run on PRIMARY, but my state is ARBITER; use the \"force\" argument to override",
      "code" : 10107,
      "codeName" : "NotMaster"
    }
    
    • By doing this, Server 0 takes precedence over Server 1, and when Server 0 recovers from a failure, it becomes a Primary node again to serve.

data migration

Before configuring a replica set, you may already have a separate MongoDB instance that stores some data, and you need to migrate the data from the original instance to the new replica set.(You can also initially configure an original single instance as a Primary node of a replica set and then add a new replica to synchronize the data, which is outside the scope of this article)

  • Log in to the server where the original instance is located and export the data:

    $ mongoexport -h localhost -p 27017 -u xxx -p xxx -d MY_DB_NAME -c MY_COLLECTION_NAME -o MY_COLLECTION_NAME.dmp
    

    A set of export data about 1.2G in size was tested on the Ali Cloud with the following export performance:

    • Time: 3 minutes 11 seconds
    • Export 1728423 records, read records per second = 1728423/191 = 9050/sec
    • Export 1264441038 bytes, number of processing bytes per second = 1264441038/191=6.6M/sec
  • The exported data file MY_COLLECTION_NAME.dmp has been synchronized in any way (such as scp) to server 0 where the Primary node is located

  • Import data to replica set Primay node

    mongoimport -h my-repl/192.168.10.58:27017 -d MY_DB_NAME -c MY_COLLECTION_NAME --file MY_COLLECTION_NAME.dmp
    

    Test import performance

    • Time: 82 seconds
    • Import 1728423 records, write records per second = 1728423/82 = 21078/sec
    • Import 1264441038 bytes, processing bytes per second = 1264441038/82=14.7M/sec

    Note: Since the servers are not the same, it is not possible to compare the performance differences between imports and exports through this, only one reference can be used:

  • Summary:

    • After the Primary node imports the data, log in to the Secondary node to see that more than 1.7 million copies of the data have been made, and the master-slave copies have succeeded.
    • From the performance analysis results, MongoDB has good read and write performance, especially when importing and synchronizing data between replicas.

Reprinted at: https://my.oschina.net/u/168138/blog/1838207

Posted by Goofan on Wed, 11 Sep 2019 13:33:27 -0700