In the large-scale web applications, the cache can be regarded as a standard development of today's deployed. Applications in large-scale cache, distributed caching system came into being. The basic principle of the distributed cache system, we all hear about it. key-value how evenly distributed to the cluster? Having said this, the most conventional way than the way hash modulo. Amount of such machines in the cluster can be used as N, then the key value K of the data request should be routed to a very simple hash (K) mod N corresponds to the machine. Indeed, this structure is simple, and it is practical. However, in some high-speed development of web systems, such a solution is still some shortcomings. As the system pressure to increase access to the cache system had to increase the machine nodes by way of improving the speed and data clusters corresponding carrying capacity. And machinery means in accordance with the hash modulus way to increase the machine nodes in this moment, a lot of life is not in the cache, the cache data need to re-establish, or even the whole cache is for data migration, will instantly bring a high DB the system load, set the cause DB server downtime. Then there is no solution modulo way hash brought criticism it? See below.
Consistent hashing (Consistent Hashing):
Select specific nodes in the machine need not rely solely on caching data key of the hash itself, but rather the machine node has also conducted a hash operation.
(1) hash machine node
First, find the machine node hash value (how the machine operator node hash? Ip parameters as hash bar.. Of course there are other ways a), and then distributed to the 0 to 2 ^ 32 of a circle ( clockwise distribution). As shown below:
Cluster in the machine: A, B, C, D, E five machines, through certain of its hash algorithm will be distributed to the ring as shown above.
(2) access method
If you have a write cache of the request, including Key value of K, calculator hash value Hash (K), Hash (K) corresponds to Figure - 1 ring a certain point, if the point is not mapped to correspond to a specific a machine node, then clockwise to find, until the first machine to find a map node, the node is to determine the destination node, if more than 2 ^ 32, still can not find the node, then hit the first machine node. Such as Hash (K) between the value of between A ~ B, then hit the machine node should be the node B (see above).
(3) to increase the processing nodes
As Figure - 1, based on the original cluster desire to increase a machine F, increase the process is as follows:
Increase the machine node F, the access policy does not change, still according to (2) in access, then life is not in the cache remained unavoidable, can not hit the data is hash (K) increase in the node before the fall C ~ F between the data. Despite the increase in the node still hit problems, but more traditional way hash modulus, consistency has not hit the hash data to a minimum.
Consistent Hashing maximum inhibition of the re-distribution of hash keys. Also to get better load balancing effect, the number of servers less often when the need to increase the virtual node to ensure that the server can be distributed evenly in the circle. Because the hash using the general method of mapping the server very uneven distribution of locations. Thinking of using virtual nodes for each physical node (server) in the circle on the distribution of 100 to 200 points. This uneven distribution can be inhibited, to minimize the server's cache when changes in the redistribution. User data mapping the virtual nodes, it means that user data is actually stored in the virtual node on behalf of the actual physical server.
Here is a chart describing the need to increase physical server for each virtual node.
x-axis is the need to expand physical server for each virtual node multiple (scale), y-axis are the actual number of physical servers, you can see, when the number of physical servers is very low, the need for greater virtual node, otherwise you need to fewer nodes can be seen from the chart, in the physical server has 10 units, almost every server needs to increase from 100 to 200 virtual nodes to achieve true load balancing.
Consistency of hash, assuming point B should have the data fall in the A, B plus a machine between, on average, half of the data is invalid. And A to increase the data on the machine point B are no longer in use, how to clean up. As more and more machines, the probability will not hit more and more.
Although the most commonly used hash modulo the inevitable need to do data migration, but may choose to time, such as two in the morning. Visit this time will certainly be small.
If it is C, A was inserted between the node B, that between the original data no longer falls on CB to find A, B but look, this part of the data in A is indeed a failure. But you say this is pure theory. After adding the actual node B, CB between the data (hit A on the original) will be gradually saved to B, (rather than not do nothing when hit), while the data on A New Data to increase as the original that part of the failure data by LRU algorithm are phased out. So I feel with the machine increases, the probability of no hit will not fluctuate drastically.
In fact, the consistency of hash is used to solve the storage node to increase the problem due to hit lower.
Practical examples: Japan mixi is gradually increased to 200 or more memcached server clusters, made using this method, there is no problem you said.