memcached distributed cache

Keywords: PHP Database

1. memcached Distributed Introduction

Although memcached is called a "distributed" cache server, there is no "distributed" function on the server side. Memcache cluster hosts can not communicate with each other to transmit data, and its "distributed" is further realized based on client-side program logic algorithm.

See the following sketch:

Based on the above figure, we briefly describe the process of analyzing set and get of distributed memcached

set process:

1. First, through the application set('key','value')

2. Enter the program and use key to get the location of the node that the key needs to store by logical algorithm.

3. Connect the corresponding memcached server according to the node location and send the set command

get process:

1. First, get('key') through the application.

2. Then use the key to get the storage node of the key by logical algorithm.

3. Connect the corresponding memcached server according to the node and send get command

There are many ways to implement memcached, among which the most common one is the distribution of consistent hashing (hereinafter referred to as uniform hashing). Good things of course need inferior products to foil its advantages, so besides uniform hash distribution, we will also talk about modular distribution here. So we can further analyze their advantages and disadvantages.

The examples here will be implemented in PHP code, of course, the most important thing is the ideas and methods. After all, these two things are common in any language.

 

2. Modeling Algorithms

What is the distributed mode of modular algorithm? The key is converted to a 32-bit number and divided by the total number of memcached servers to get the remainder. The remainder is the node of the memcached server. With this node, we can determine the memcached server and send commands to memcached to execute.

Graphic analysis:

The whole process is shown in the figure above.

1) PHP Code Implementation

GetModMemcache.class.php

 1 <?php
 2 #Distributed memcache(Modeling calculation)
 3 class GetModMemcache
 4 {
 5     private $total='';          #storage memcache Total number of servers
 6     private $servers=array();   #storage memcache Server Specific Information
 7     /**
 8     * @desc Constructor
 9     *
10     * @param $serversArr array | memcache Server Specific Information
11     */
12     public function __construct($serversArr)
13     {
14         $this->total=count($serversArr);
15         $this->servers=$serversArr;
16     }
17 
18     /**
19     * @desc Calculate the storage location of $key (that is, which server)
20     * 
21     * @param string | key Character string
22     *
23     * @return int  Return to the number of servers
24     */
25     protected function position($key)
26     {
27         #Use crc32(),Converting strings to 32 digits
28         return sprintf('%u',crc32($key))%$this->total;      #Remainder
29     }
30 
31     /**
32     * @desc Getting memcached objects
33     *
34     * @param $position int | key Location information
35     *
36     * @return object Returns the instantiated memcached object
37     */
38     protected function getMemcached($position)
39     {
40         $host=$this->servers[$position]['host'];    #A server in the server pool host
41         $port=$this->servers[$position]['port'];    #A server in the server pool port
42         $m= new memcached();
43         $m->addserver($host, $port);
44         return $m; 
45     }
46 
47     /**
48     * @desc Set the key-value value value
49     *
50     * @param string | key Character string
51     * @param mixed  | Values can be any valid non-resource php type
52     *
53     * @return Return results
54     */
55     public function setKey($key, $value)
56     {
57         $num=$this->position($key);
58         echo $num;      #Debugging
59         $m=$this->getMemcached($num);   #Obtain memcached object
60         return $m->set($key, $value);
61     }
62 
63     public function getKey($key)
64     {
65         $num=$this->position($key);
66         $m=$this->getMemcached($num);
67         return $m->get($key);
68     }
69 
70 
71 }
72 
73 
74 $arr=array(
75     array('host'=>'192.168.95.11', 'port'=>'11210'),
76     array('host'=>'192.168.95.11', 'port'=>'11211'),
77     array('host'=>'192.168.95.11', 'port'=>'11212'),
78     );
79 $mod=new GetModMemcache($arr);
80 
81 /*
82 #Storage of data
83 $a=$mod->setKey('key3', 'key33333');
84 echo "<pre>";
85 print_r($a);
86 echo "</pre>";die;
87 */
88 /*
89 #get data
90 $b=$mod->getKey('key1');
91 echo "<pre>";
92 print_r($b);
93 echo "</pre>";die;
94 */
95 ?>

 

2) Conduct corresponding tests

1. Continuous insertion of three data

  #set('key1','value11111');  #node=1

  #set('key2','value22222');  #node=1

  #set('key3','value33333';)  #node=0

2. telnet connection 192.168.95.11: (11210, 11211, 11212)

11210 contains key3 data

11211 contains key1 and key2 data

11212 does not contain data

3. Use program get data

The result is that the data can be retrieved.

3) Advantages and disadvantages

Advantages:

1. Simple, practical and easy to understand

2. Uniform distribution of data

Disadvantages:

1. When a memcached server is down, it can not automatically adjust the group to process data, so that some data can not be cached, and it has been continuously retrieving data from the database.

2. When need to expand, add more memcached servers, then most of the cached data can not be hit, that is, the data is useless.

 

3. Uniform Hash Algorithms

What is uniform hash algorithm distributed?

Imagine distributing all 32-bit numbers clockwise from small to large on a circle.

Secondly, each storage node is given a name and converted to a 32-bit number through the crc32 function, which is the storage node of the memcached server.

Next, the key is converted to a 32-bit number through the crc32 function, and its location is clockwise. The memcached server corresponding to the first encounter storage node is the final storage server of the key.

1) Image analysis

Assuming that the node1 node server hangs up, according to the clockwise nearest principle, the data originally stored in the node1 node can also be stored in the node3 node at this time.

Assuming there is a need for expansion, what about two additional memcached servers? See the chart below for analysis.

The results show that only a small amount of data will be affected, and these effects are acceptable relative to the overall data.

From the above illustration, we can easily find that we can not control the location of memcached storage nodes by using the crc32 function, and the total number of nodes is very small relative to the 32 power of 2. If it happens that even these storage nodes are very close, then there must be a memcached server to withstand the vast majority of data caching.

See the following chart for analysis:

Solution:

Mapping a real storage node to multiple virtual storage nodes, that is, the real node + suffix is processed by crc32 (e.g. node1_1, node1_2, node1_3,... ., node1_n)

Look at the node distribution in the following figure:

Three real nodes become thirty storage nodes on the ring, which can avoid the problem of uneven distribution of data cache caused by too close storage nodes, and the storage mechanism has no change.

2) PHP Code Implementation

ConsistentHashMemcache.class.php

  1 <?php
  2 #Distributed memcache Consistency hashing algorithm (using ring data structure)
  3 class ConsistentHashMemcache
  4 {
  5     private $virtualNode='';      #Number of virtual nodes to store
  6     private $realNode=array();    #Used to store real nodes
  7     private $servers=array();      #Used for storage memcache server information
  8     #private $totalNode=array();   #Number of nodes
  9     /**
 10     * @desc Constructor
 11     *
 12     * @param $servers array    | memcache Server information
 13     * @param $virtualNode int | Number of virtual nodes, default 64
 14     */
 15     public function __construct($servers, $virtualNode=64)
 16     {
 17         $this->servers=$servers;
 18         $this->realNode=array_keys($servers);
 19         $this->virtualNode=$virtualNode;
 20     }
 21 
 22     /**
 23     * @return int Returns a 32-bit number
 24     */
 25     private function hash($str)
 26     {
 27         return sprintf('%u',crc32($str));   #Converting strings to 32-bit numbers
 28     }
 29 
 30     /**
 31     * @desc Processing Node
 32     *
 33     * @param $realNode     array | Real Node 
 34     * @param $virturalNode int   | Number of virtual nodes
 35     *
 36     * @return array Returns all node information
 37     */
 38     private function dealNode($realNode, $virtualNode)
 39     {
 40         $totalNode=array();
 41         foreach ($realNode as $v) 
 42         {
 43             for($i=0; $i<$virtualNode; $i++)
 44             {
 45                 $hashNode=$this->hash($v.'-'.$i);
 46                 $totalNode[$hashNode]=$v;
 47             }
 48         }
 49         ksort($totalNode);     #Sort by index, ascend
 50         return $totalNode;
 51     }
 52 
 53     /**
 54     * @desc Get the key's real storage node
 55     * 
 56     * @param $key string | key Character string
 57     *
 58     * @return string Returns the real node
 59     */
 60     private function getNode($key)
 61     {
 62         $totalNode=$this->dealNode($this->realNode, $this->virtualNode);    #Get all virtual nodes
 63         /* #View the total number of virtual nodes
 64         echo "<pre>";
 65         print_r($totalNode);
 66         echo "</pre>";die;
 67         */
 68         $hashNode=$this->hash($key);            #key Hash node of
 69         foreach ($totalNode as $k => $v)        #Loop Summary Point Loop Search
 70         {
 71             if($k >= $hashNode)                 #Find the first greater than key The value of hash node
 72             {
 73                 return $v;                      #Returns the real node
 74             }
 75         }
 76         return reset($totalNode);               #If the value of the total node ring is equal to the value of the total node ring key If the hash node is small, the first total hash ring is returned. value value
 77     }
 78 
 79     /**
 80     * @desc Return memcached object
 81     *
 82     * @param $key string | key value
 83     *
 84     * @return object
 85     */
 86     private function getMemcached($key)
 87     {
 88         $node=$this->getNode($key);             #Getting Real Nodes
 89         echo  $key.'Real node:'.$node.'<br/>'; #Test usage, view key True Node
 90         $host=$this->servers[$node]['host'];    #A server in the server pool host
 91         $port=$this->servers[$node]['port'];    #A server in the server pool port
 92         $m= new memcached();                    #instantiation
 93         $m->addserver($host, $port);            #Add to memcache The server
 94         return $m;                              #Return memcached object
 95     }
 96 
 97     /**
 98     * @desc Set the key-value value value
 99     */
100     public function setKey($key, $value)
101     {
102         $m=$this->getMemcached($key);
103         return $m->set($key, $value);
104     }
105 
106     /**
107     * @desc Get the value in the key
108     */
109     public function getKey($key)
110     {
111         $m=$this->getMemcached($key);
112         return $m->get($key);
113     }
114 
115 
116 }
117 
118 ?>

3) Testing

1. View all virtual nodes

A total of 64*3 = 132 virtual nodes (virtual node settings are still relatively small, usually between 100 and 200)

2. set test

 1 include './ConsistentHashMemcache.class.php';
 2 header("content-type: text/html;charset=utf8;");
 3 $arr=array(
 4     'node1'=>array('host'=>'192.168.95.11', 'port'=>'11210'),
 5     'node2'=>array('host'=>'192.168.95.11', 'port'=>'11211'),
 6     'node3'=>array('host'=>'192.168.95.11', 'port'=>'11212'),
 7     );
 8 
 9 $c=new ConsistentHashMemcache($arr);
10 
11 #test set
12 $c->setKey('aaa', '11111');
13 $c->setKey('bbb', '22222');
14 $c->setKey('ccc', '33333');

 

telnet connection 192.168.95.11: (11210, 11211, 11212)

In node 1, get('aaa') and get('bbb') can get values.

get('ccc') can get the value in node 3

3. get test

 1 include './ConsistentHashMemcache.class.php';
 2 header("content-type: text/html;charset=utf8;");
 3 $arr=array(
 4     'node1'=>array('host'=>'192.168.95.11', 'port'=>'11210'),
 5     'node2'=>array('host'=>'192.168.95.11', 'port'=>'11211'),
 6     'node3'=>array('host'=>'192.168.95.11', 'port'=>'11212'),
 7     );
 8 
 9 $c=new ConsistentHashMemcache($arr);
10 #test get
11 echo $c->getKey('aaa').'<br/>';
12 echo $c->getKey('bbb').'<br/>';
13 echo $c->getKey('ccc').'<br/>';

4. Advantages and disadvantages

Comparing with the distributed mode, the code complexity of the consistent hash mode is higher, but it is acceptable and does not pose any obstacles. On the contrary, its advantages are very significant. By means of virtual nodes, uncontrollable storage nodes can be distributed as evenly as possible in the ring, so that data can be evenly cached in each host. Secondly, adding and deleting virtual nodes has little impact on the overall data cached before.

 

(These are my own opinions and summaries. If there are any shortcomings or mistakes, please point them out.)

Authors: That leaf follows the wind

Statement: The above only represent my own views or conclusions summarized during a certain period of work and study. When reprinting, please give a link to the original text in a clear place on the article page.

Posted by The_Anomaly on Mon, 08 Jul 2019 12:47:21 -0700