And consistency of the distributed hash table hash

2011-10-26  来源:本站原创  分类:Internet  人气:161 


Distributed hash (DHT)
Two key point: each node maintains only part of the route; each node stores only a part of the data. In order to achieve the network addressing and storage.
DHT is just a concept, put forward such a network model. And that it is very good for distributed storage. But how to achieve specific, DHT is not the scope.

Consistency of the hash:
An implementation of DHT. Nature or a hash algorithm. Recall that we usually do load balancing, according to the signature on the back-end node querystring modulus is the simplest and most commonly used algorithms, but the nodes of the obvious problems caused by additions and deletions: the original request down less than almost all on the same machine. That it is carp optimization algorithm so that only 1 / n of the data affected.
Consistency of the hash, it seems that first proposed in the distributed cache inside, so that when the shock node, with minimal impact. But now has been used in distributed storage and p2p systems inside.

Consistency of the hash is only made four concepts and principles, and no mention of specific implementation:

1, balance: the hash result as the average distributed to each node, so that each node can be fully utilized.

2, Monotonicity: The above also said, if it is signed modulus algorithm change will make the entire network node mapping changes. If carp, will make the 1 / n of the mapping changes. Consistent hash goal is to change the node will not change the network mapping.

3, spread: the same data, stored in different nodes, in other words, the system redundancy. Consistency of the hash is committed to reduce system redundancy degree.

4, load: load dispersion, and the balance is about the same meaning, but here is the data stored in more equilibrium, balance is the balance of the visit.

Chord algorithm:
There are several consistent hashing algorithm, the key question is how to define the data partitioning strategy and fast query nodes.

chord as one of the classic implementation. cassandra in the DHT, is basically a simplified version of chord.

Network, each node is assigned a unique id, the mac address of the machine can do sha1, is the basis for network discovery.

Suppose there are N nodes in the network and the network is a ring. The distance between two nodes is defined as each node stores a routing table (finger table), in accordance with the table clockwise from the nodes 2,4,8,16,32. ... ... 2i log2N a selected distance from the other node ip information to record.

Storage: The data is cut according to certain rules, each data has a separate id (query key), and the range and node id is the same. Then find the node, as if the node id and data id, the data will exist on the node; if not, then the stored data from the closest node id. Meanwhile, in order to ensure the reliability of data, will find the K clockwise down redundant nodes, storing the data. Generally believed that the K = 3 is necessary.

Queries: start their own routing table, find the nearest one and data id, and the survival of the nodes in the network next. If the node id and data id equal coincidence, then I congratulate you. If not equal, to the next recursive lookup. General or need to go through multiple queries to find data where the nodes, and this number is less than or equal log2N can prove the.

In this process of query routing tables on the selection reflects the advantage, in fact implements a binary search, to observe the network from each node are divided into log2N block the network, the largest one which has N / 2 nodes . Routing table which is recorded every one of the first node. So that each time a query, at least half of the nodes excluded. Ensure log2N times to find the target node.

Add a node i, need to know in advance the survival of the network has a node j, then node j and interaction, and other nodes update their routing tables. And needs to be away from their nearest node in the copy over the data to provide data services.

Loss of a node, the routing algorithm will automatically skip this node, and rely on data to continue to provide redundant services.

KAD algorithm (Kademlia)
Personally feel, kad algorithm is optimized to do in the chord. Two main points:
1, with binary (32/64/128) that a node id, node id of two XOR to get the distance between nodes.
2, each node to maintain the routing information is more abundant, as is the whole network is divided into log2N were in accordance with the chord, is to keep log2N a routing node, but in the kad which is preserved log2N queues. Values ​​for the configuration of each queue length K, the corresponding network node records the number of nodes in the region, and according to the time of these active nodes swapped out.
The first point is easy to divide the network, each node according to a binary 0 or 1 bit into a binary tree.

  • Collection of the hash table (hash table) 2010-09-04

    Linked list and array elements are arranged according to certain order, the hash table do not mind the order of the elements, but can be achieved quickly find an element, a hash table in accordance with the principle purpose of enabling its operation

  • And consistency of the distributed hash table hash 2011-10-26

    Reference: Distributed hash (DHT) Two key point: each node maintains only part of the route; each node stores only a part of the data. In order to achieve the network addressing and storage. DHT is just

  • Hash table parsing algorithm 2011-09-10

    Hash table parsing algorithm Author: July, wuliming, pkuoliver Source: Description: This article is divided into three parts, The first part is a Top K algorithm Baidu faces questions Wapakhabulo; second part on the ela

  • Why can not (not recommended) use the Array to create an associative array (hash table) 2010-06-28

    Through the study of the official API and the cookbook, I concluded the following reasons: 1. From the API, we understand, Array and Object classes are dynamic, that is, we can dynamically add properties to them. var obj:Object = new Object();

  • Hash table analysis and the Java implementation 2010-11-29

    This blog focuses on some of the principles Hash table / concept, and in accordance with these principles / concepts, one for storage of their own design / Hash table to find data, and with the JDK's HashMap class comparison. We look at seven steps t

  • DHT: distributed hash table 2011-09-06

    Distributed hash table (DHT, Distributed Hash Table) is used in a group of nodes to achieve (key, value) of the relationship mapping. In a similar Cassandra, bitcomet and other distributed systems using DHT. DHT is a non-existent center, providing ke

  • Java分布式哈希表 Bamboo Distributed Hash Table 2009-06-18

    Bamboo Distributed Hash Table 网站 : 分散式杂凑表(英语:Distributed Hash Table,简称DHT)是分散式计算系统中的一类,用来将一个关键值(key)的集合分散到所有在分散式系统中的节点,并且可以有效地将讯息转送到唯一 一个拥有查询者提供的关键值的节点(Peers).这里的节点类似杂凑表中的储存位置.分散式杂凑表通常是为了拥有极大节点数量的系统,而且在系统的节点 常常会加入或离开(例如网路断线)而设计

  • Database horizontally. Vertically. Library hash table briefly 2011-07-12

    Large database on the partition table can be done to improve performance. The table split in the following three ways: Split level One or more columns of data based on the value of the data line into two separate tables. Horizontal partitioning the t

  • mysql core analysis - innodb internal hash table implementation (on) 2010-03-30

    1. Hash table overview hash table is innodb basis functions to achieve one of the key values by mapping to quickly query, insert, delete operation. hash table algorithm, the kernel in the database which is widely used, for example, this structure wil

  • Further understanding of javascript objects. Array and hash table [reproduced] 2010-04-05

    In javascript, the object is actually a hash table, such as following the user object: function user (n, a) ( = n; this.age = a; this.toString = function () ( return "Name:" + + ", Age:" + this.age; ) ) var u = new

  • --- Hashtable hash table and array 2010-04-14

    A key to the data, stored in a table, how to quickly find keywords through data then the corresponding value? Do not tell me one by one out to compare the key ah. We all know that all the linear data structure, an array of positioning the fastest, be

  • javascript object. array and hash table in-depth analysis 2010-06-02

    In javascript, the object is actually a hash table, such as following the user object: function user (n, a) ( = n; this.age = a; this.toString = function () ( return 'Name:' + + ', Age:' + this.age; ) ) var u = new user ('tom', 18

  • Implementation using arrays and linked list hash table to store information 2010-11-19

    1, HashTable principle: Node identified by the key code storage location of nodes, that is the key code given node k, by a certain function H (hash function) to obtain function values H (k), this value is interpreted as the node's memory address . <!

  • C + + custom HASH table to achieve [conflict of law pointer list] 2010-11-27

    # Include <string.h> # Include <ctype.h> # Include <malloc.h> / * malloc () etc. * / # Include <limits.h> / * INT_MAX, etc. * / # Include <stdio.h> / * EOF (= ^ Z or F6), NULL * / # Include <stdlib.h> / * atoi () * / #

  • MySQL uses internal structure of hash table 2010-11-30

    To achieve two recent patch to MySQL use the built-in hash structure. MySQL framework layer of this structure was in many use the code to understand it can be easily read. 1, the overall InnoDB also has built-HASH table is described in this article t

  • glib hash table function in the use of (ghash) 2010-12-17

    Look at the use of ghash introduction to insert when the document is the same value if the key value will be replaced and then inserted, the same value for the key criteria for judging is not clear, hash value is equal to hash collision occurs, then

  • redis源代码分析 ? hash table 2013-10-26

    hashtable的实现有很多,redis的dict.c 是其中之一. dict 包含了2个dictht hashtable ht[0], ht[1]. client版本的dict是没有dictht的概念.加入dictht的概念存在2个ht的目的是为了在rehash的时候可以平滑的迁移bucket里的数据,而不像client的dict要把老的hash table里的一次性的全部数据迁移到新的hash table,这在造成一个密集型的操作,在业务高峰期不可取. ht是hashtable的简称,实际

  • memcached 源码阅读之 hash table 2014-04-02

    之前写了两篇 memcached 源码阅读记录,没什么价值,现在来记录一个有价值的源码阅读. 前言 昨晚用一个小时把 memcached 的服务端程序看了,发现踩到一个坑,大部分程序都在实现服务器的网络编程的部分. 而我是不懂网络编程的,于是又花了半个小时去找 memcached 的储存代码,发现时使用 hash table 储存的. 于是这里研究一下 memcached 的 hash table . 昨晚记录的memcached 源码阅读之原理篇最后说了,服务器端做两部分:一部分是网络编程方面

  • sphinx 源码阅读之json, hash table配置分析器 2014-12-18

    sphinx 代码量之所以多,现在看来是因为自己造了很多轮子,前几天看到它实现了简单的数据结构和算法,今天又看到它实现了简单那的json和配置文件分析器. 前言 读了 sphinx 的读取配置文件的代码, 心中有一个疑问: sphinx 为什么要自己造轮子呢? 难道现在 sphinx 一直没人升级维护也是这个历史包袱的原因吗? 不管哪么多了,先来看看 sphinx 怎么分析配置文件以及储存配置文件的吧. 配置文件规则 下面是一个简单的还有增量索引的 sphinx 配置文件. 其中 inc_sou

  • 散列表(hash table) 2014-04-19

    基本概念 散列表根据关键码直接访问表,把关键码映射到表中的记录来访问记录,这个过程成为散列(hashing) 把关键码值映射到位置的函数成为散列函数(hash function),用h表示 存放记录的数组称为散列表(hash table),用HT表示 散列表中的一个位置被称为一个槽(slot),散列表HT中的槽的数目用变量M表示,从0到M-1编号 设计散列方法的目标是使得对于任意关键码值K和某个散列函数h,0<=h(K)<=M-1,有HT[i]=K 查找过程 在一个根据散列方法组织的数据库中,