Consistent hashing (Consistent Hashing)

2010-05-25  来源:本站原创  分类:Java  人气:210 

Reprinted from: http://hi.baidu.com/fdwm_lx/blog/item/fe46344e11517705b3de054c.html

In the large-scale web applications, the cache can be regarded as a standard development of today's deployed. Applications in large-scale cache, distributed caching system came into being. The basic principle of the distributed cache system, we all hear about it. key-value how evenly distributed to the cluster? Having said this, the most conventional way than the way hash modulo. Amount of such machines in the cluster can be used as N, then the key value K of the data request should be routed to a very simple hash (K) mod N corresponds to the machine. Indeed, this structure is simple, and it is practical. However, in some high-speed development of web systems, such a solution is still some shortcomings. As the system pressure to increase access to the cache system had to increase the machine nodes by way of improving the speed and data clusters corresponding carrying capacity. And machinery means in accordance with the hash modulus way to increase the machine nodes in this moment, a lot of life is not in the cache, the cache data need to re-establish, or even the whole cache is for data migration, will instantly bring a high DB the system load, set the cause DB server downtime. Then there is no solution modulo way hash brought criticism it? See below.

Consistent hashing (Consistent Hashing):

Select specific nodes in the machine need not rely solely on caching data key of the hash itself, but rather the machine node has also conducted a hash operation.

(1) hash machine node

First, find the machine node hash value (how the machine operator node hash? Ip parameters as hash bar.. Of course there are other ways a), and then distributed to the 0 to 2 ^ 32 of a circle ( clockwise distribution). As shown below:
Consistent hashing (Consistent Hashing)

Cluster in the machine: A, B, C, D, E five machines, through certain of its hash algorithm will be distributed to the ring as shown above.

(2) access method

If you have a write cache of the request, including Key value of K, calculator hash value Hash (K), Hash (K) corresponds to Figure - 1 ring a certain point, if the point is not mapped to correspond to a specific a machine node, then clockwise to find, until the first machine to find a map node, the node is to determine the destination node, if more than 2 ^ 32, still can not find the node, then hit the first machine node. Such as Hash (K) between the value of between A ~ B, then hit the machine node should be the node B (see above).

(3) to increase the processing nodes

As Figure - 1, based on the original cluster desire to increase a machine F, increase the process is as follows:

The Hash value of the computer node, the machine is mapped to a node in the ring, as shown below:
Consistent hashing (Consistent Hashing)

Increase the machine node F, the access policy does not change, still according to (2) in access, then life is not in the cache remained unavoidable, can not hit the data is hash (K) increase in the node before the fall C ~ F between the data. Despite the increase in the node still hit problems, but more traditional way hash modulus, consistency has not hit the hash data to a minimum.

Consistent Hashing maximum inhibition of the re-distribution of hash keys. Also to get better load balancing effect, the number of servers less often when the need to increase the virtual node to ensure that the server can be distributed evenly in the circle. Because the hash using the general method of mapping the server very uneven distribution of locations. Thinking of using virtual nodes for each physical node (server) in the circle on the distribution of 100 to 200 points. This uneven distribution can be inhibited, to minimize the server's cache when changes in the redistribution. User data mapping the virtual nodes, it means that user data is actually stored in the virtual node on behalf of the actual physical server.
Here is a chart describing the need to increase physical server for each virtual node.

Consistent hashing (Consistent Hashing)

x-axis is the need to expand physical server for each virtual node multiple (scale), y-axis are the actual number of physical servers, you can see, when the number of physical servers is very low, the need for greater virtual node, otherwise you need to fewer nodes can be seen from the chart, in the physical server has 10 units, almost every server needs to increase from 100 to 200 virtual nodes to achieve true load balancing.

-----------------

Consistency of hash, assuming point B should have the data fall in the A, B plus a machine between, on average, half of the data is invalid. And A to increase the data on the machine point B are no longer in use, how to clean up. As more and more machines, the probability will not hit more and more.
Although the most commonly used hash modulo the inevitable need to do data migration, but may choose to time, such as two in the morning. Visit this time will certainly be small.

-

If it is C, A was inserted between the node B, that between the original data no longer falls on CB to find A, B but look, this part of the data in A is indeed a failure. But you say this is pure theory. After adding the actual node B, CB between the data (hit A on the original) will be gradually saved to B, (rather than not do nothing when hit), while the data on A New Data to increase as the original that part of the failure data by LRU algorithm are phased out. So I feel with the machine increases, the probability of no hit will not fluctuate drastically.

In fact, the consistency of hash is used to solve the storage node to increase the problem due to hit lower.
Practical examples: Japan mixi is gradually increased to 200 or more memcached server clusters, made using this method, there is no problem you said.

相关文章
  • Consistent hashing (Consistent Hash) [reproduced, Memo] 2010-03-05

    Consistent hashing (Consistent Hash) About the agreement Consistent hashing algorithm proposed in 1997 by the Massachusetts Institute of Technology (see 0), the design goal is to solve the Internet's hot (Hot pot) problems, mind, and is very similar

  • Consistent hashing (Consistent Hashing) 2010-05-25

    Reprinted from: http://hi.baidu.com/fdwm_lx/blog/item/fe46344e11517705b3de054c.html In the large-scale web applications, the cache can be regarded as a standard development of today's deployed. Applications in large-scale cache, distributed caching s

  • Summary Consistency hash (Consistent Hashing) 2010-03-10

    In the large-scale web applications, the cache can be regarded as a standard development of today's deployed. Applications in large-scale cache, distributed caching system came into being. The basic principle of the distributed cache system, we all h

  • [Recommended] memcached comprehensive analysis of -4. Memcached distributed algorithm: Consistent Hashing 2010-05-29

    Keywords: memcached distributed, consistent hashing 2nd , 3rd Osaka, introduced by the former interior of memcached. This no longer describes the internal structure of memcached, start introducing memcached distributed. memcached distributed memcache

  • Web Caching with Consistent Hashing 2012-07-13

    Web Caching with Consistent Hashing by David Karger, Alex Sherman , Andy Berkheimer, Bill Bogstad, Rizwan Dhanidina Ken Iwamoto, Brian Kim, Luke Matkins, Yoav Yerushalmi. MIT Laboratory for Computer Science. Abstract: A key performance measure for th

  • Consistency hash algorithm (consistent hashing) 2010-09-27

    http://blog.csdn.net/sparkliang/archive/2010/02/02/5279393.aspx Consistency hash algorithm (consistent hashing) Zhang Liang consistent hashing algorithm as early as 1997 in papers Consistent hashing and random trees was proposed in the cache system c

  • Consistent hash algorithm (consistent hashing) 2010-09-27

    http://blog.csdn.net/sparkliang/archive/2010/02/02/5279393.aspx Consistent hash algorithm (consistent hashing) Zhang Liang consistent hashing algorithm as early as 1997 in papers Consistent hashing and random trees were proposed in the cache system i

  • (Transfer) agreement hash algorithms - consistent hashing 2011-02-25

    Recently prepared a cache to optimize things, looked under the consistency of hash algorithms, and online information are more complex and more to find a good summary of this article, it is reproduced to. Reprinted from: http://blog.csdn.net/sparklia

  • Reproduced - consistency of hash algorithms - consistent hashing 2011-09-12

    Consistent hash algorithm (consistent hashing) Zhang Liang consistent hashing algorithm as early as 1997 in his paper Consistent hashing and random trees were proposed, currently in the cache system more widely; 1 Basic Scene For example, you have N

  • memcache consistency hash algorithm (consistent hashing) 2011-09-26

    consistent hashing algorithm as early as 1997 in the paper Consistent hashing and random trees were proposed, currently in the cache system more widely; A basic scenario such as you have N cache server (behind the short cache), then how would an obje

  • 基于一致性hash算法(consistent hashing)的使用详解 2014-08-18

    本篇文章对一致性hash算法(consistent hashing)的使用进行了详细的分析介绍.需要的朋友参考下 1 基本场景 比如你有 N 个 cache 服务器(后面简称 cache ),那么如何将一个对象 object 映射到 N 个 cache 上呢,你很可能会采用类似下面的通用方法计算 object 的 hash 值,然后均匀的映射到到 N 个 cache : hash(object)%N 一切都运行正常,再考虑如下的两种情况: 1 一个 cache 服务器 m down 掉了(在实际

  • consistent gets减少,cost增加? 2012-03-09

    在一条SQL语句中,当使用索引时,cosistent gets 减少,而cost增加.理论上在稳定后的执行计划中,physical reads为零值的前提下, cost应当相应减少.下面来看看其原由. 1.原始的SQL语句 SQL> SELECT acc_num, amount, curr_cd 2 FROM voucher_tbl 3 WHERE value_date > '20110929' -->谓词value_date,存在索引的情况下通常会走索引 4 AND vou_type

  • 一致性 hash 算法 --consistent hashing 2012-03-19

    一致性 hash 算法( consistent hashing ) 张亮 consistent hashing 算法早在 1997 年就在论文 Consistent hashing and random trees 中被提出,目前在 cache 系统中应用越来越广泛: 1 基本场景 比如你有 N 个 cache 服务器(后面简称 cache ),那么如何将一个对象 object 映射到 N 个 cache 上呢,你很可能会采用类似下面的通用方法计算 object 的 hash 值,然后均匀的映射

  • 一致性哈希(Consistent Hashing) 2012-12-19

    在大型web应用中,缓存可算是当今的一个标准开发配置了.在大规模的缓存应用中,应运而生了分布式缓存系统.分布式缓存系统的基本原理,大家也有所耳闻.key-value如何均匀的分散到集群中?说到此,最常规的方式莫过于hash取模的方式.比如集群中可用机器适量为N,那么key值为K的的数据请求很简单的应该路由到hash(K) mod N对应的机器.的确,这种结构是简单的,也是实用的.但是在一些高速发展的web系统中,这样的解决方案仍有些缺陷.随着系统访问压力的增长,缓存系统不得不通过增加机器节点的方

  • 一致性hash算法 - consistent hashing 2015-04-21

    一致性 hash 算法( consistent hashing ) consistent hashing 算法早在 1997 年就在论文 Consistent hashing and random trees 中被提出,目前在cache 系统中应用越来越广泛: 1 基本场景 比如你有 N 个 cache 服务器(后面简称 cache ),那么如何将一个对象 object 映射到 N 个 cache 上呢,你很可能会采用类似下面的通用方法计算 object 的 hash 值,然后均匀的映射到到 N

  • Consistent hashing algorithm optimization ---- Paul is part of how to add new nodes, the accuracy is not affected 2010-04-22

    Background Early in 2009, we made a smart memcached client library, business library as long as this chain, you can communicate with the memcached server. And achieve a consistency of the distributed hash algorithm, the backend memcached servers unli

  • Consistent Hashing 2010-03-29

    import java.util.Collection; import java.util.SortedMap; import java.util.TreeMap; public class ConsistentHash<T> { private final HashFunction hashFunction; private final int numberOfReplicas; private final SortedMap<Integer, T> circle = new T

  • memcached complete analysis of (turn) 2010-03-12

    memcached complete analysis of -1. memcached based Copyright Notice: You can willfully, but reproduced the original author must be identified charlee, the original link http://tech.idv2.com/2008/07/10/memcached-001/ and this statement Day: 2008/7/2 A

  • memcached distributed mean? 2010-07-04

    memcached distributed mean? http://hi.baidu.com/cbxm/blog/item/2da5410e90813ee1ab6457f7.html Here repeatedly used the "distributed" the word, but do not explain in detail. Now briefly introduce the principle of individual client to achieve basic

  • What do you mean memcached distributed? 2010-07-04

    memcached distributed mean? http://hi.baidu.com/cbxm/blog/item/2da5410e90813ee1ab6457f7.html Here repeatedly used the "distributed" the word, but do not explain in detail. Now briefly introduce the principle of individual client to achieve basic