HBase基本原理

2013-09-25  来源:本站原创  分类:Hadoop随笔  人气:0 

HBase以表的形式存储数据。与关系型数据库一样,在HBase中,表由行和列组成。与关系型数据库不同的是,HBase还有“列簇”(column family)的概念。一个表有若干“列簇”组成,每一个“列簇”包含若干列(column)。与此同时,表中的每一个cell都是有时间戳的。因此我们可以把其想象成一个三维数据库。除了行和列之外,还有一个时间维度,每一个单元格(cell)的不同版本都被保存。

与关系型数据库类似,在HBase中每一行都有一个主键(row key)。HBase对于数据的检索都是通过row key进行的。HBase对于数据的检索主要有三种方式:

1. 通过单个row key检索一行

2. 通过row key的范围[row key start,row key end]返回多条记录

3. 全表扫描,返回整个table

在HBase中,所有的行都按照row key进行排序。在物理上,每一个table都会按照行划分成一个或者多个HRegion。HRegion包含了table的一部分,即若干行。HRegion 按大小分割的,每个表一开始只有一个 HRegion,随着数据不断插入表, HRegion 不断增大, 当增大到一个阀值的时候, HRegion 就会等分会两个新的 HRegion。当 table 中的行不断增多,就会有越来越多的HRegion。HRegion 是 Hbase 中分布式存储和负载均衡的最小单元。最小单元就表示不同的 HRegion 可以分布在不同的 HRegion server 上。但一个 HRegion 是不会拆分到多个server上的。

HRegion 虽然是分布式存储的最小单元,但并不是存储的最小单元。事实上,HRegion 由一个或者多个 Store 组成,每个 store 保存一个 columns family。每个 Strore 又由一个 memStore 和0至多个 StoreFile 组成。 StoreFile 以 HFile 格式保存在 HDFS 上。除了HFile之外,HRegion server还生成另外一个文件——HLog(WAL log),该文件是一个日志文件。WAL 意 为 Write ahead log,类似 于mysql中的 binlog,用来做灾难恢复使用。HLog 记录数据的所有变更。n(来自不同 table)的日志会混在一起。这样做的目的是不断追加单个文件相对于每个 HRegion Server 维护一个 Hlog,而不是每个HRegion 一个。这样不同Regio同时写多个文件而言,可以减少磁盘寻址次数,因此可以提高对 table 的写性能。带来的麻烦是,如果一台 region server 下线, 为了恢复其上的 region,需要将 region server 上的 log 进行拆分,然后分发到其它 region server 上进行恢复。

相关文章
  • HBase基本原理 2013-09-25

    HBase以表的形式存储数据.与关系型数据库一样,在HBase中,表由行和列组成.与关系型数据库不同的是,HBase还有"列簇"(column family)的概念.一个表有若干"列簇"组成,每一个"列簇"包含若干列(column).与此同时,表中的每一个cell都是有时间戳的.因此我们可以把其想象成一个三维数据库.除了行和列之外,还有一个时间维度,每一个单元格(cell)的不同版本都被保存. 与关系型数据库类似,在HBase中每一行都有一个主键

  • 分布式数据库 HBase 2008-10-27

    HBase 网站 : http://hbase.apache.org/ HBase – Hadoop Database,是一个高可靠性.高性能.面向列.可伸缩的分布式存储系统,利用HBase技术可在廉价PC Server上搭建起大规模结构化存储集群. HBase是Google Bigtable的开源实现,类似Google Bigtable利用GFS作为其文件存储系统,HBase利用Hadoop HDFS作为其文件存储系统:Google运行MapReduce来处理Bigtable中的海量数据,HB

  • HDFS Federation设计动机与基本原理 2015-01-29

    HDFS Federation是Hadoop最新发布版本Hadoop-0.23.0中为解决HDFS单点故障而提出的namenode水平扩展方案.该方案允许HDFS创建多个namespace以提高集群的扩展性和隔离性.本篇文章主要介绍了HDFS Federation的设计动机和基本原理. 1. 当前HDFS概况 1.1 当前HDFS架构 当前HDFS包含两层结构: (1) Namespace 管理目录,文件和数据块.它支持常见的文件系统操作,如创建文件,修改文件,删除文件等. (2) Block

  • In Windows, use Cygwin installation HBase 2010-02-10

    1. Introduction HBase is the official Hadoop subproject, which is a distributed database for the column, it is not stored on the relational data structure, but rather loose distributed, persistent multidimensional sorted and indexed according to the

  • To enhance the performance of several places hbase (reproduced) 2010-04-23

    1, using bloomfilter and mapfile_index_interval Bloomfilter (open / unopened = 1 / 0) mapfile_index_interval Exists (0-10000) / ms Get (10001 - 20000) / ms 01282246023715 001,189,711,416 0641369214034 112,832,753,686 16,429,613,010 1033393498 Test en

  • NOSQL Tour ----- HBase 2009-06-22

    Because the reasons for the recent project to study the Cassandra, Hbase several NOSQL database, and eventually decided to adopt HBase. Here, I'm going to share with you own HBase understanding. HBase saying, I would like to nag a few. Internet appli

  • HDFS + MapReduce + Hive + HBase 10 minutes Getting Started (zhuan) 2010-03-27

    HDFS + MapReduce + Hive + HBase 10 minutes quick start Yi Jian 2009-8-19 1. Introduction The purpose of this paper is to never touch a person Hadoop in a very short time, ease of use, master compile, install and simple to use. 2. Hadoop family 2009-8

  • Hbase JAVA client connection configuration 2010-03-01

    Hbase official has given client sample code, Getting Started instructions inside the document, do not repeat it. Here I would like to tell you the only remote connection configuration, simple enough, but would like to find something from the document

  • hbase data persistence 2010-06-04

    In hbase / conf / hbase-default.xml hbase.rootdir set the value to set the location to save the file, the default is / tmp, reboot the machine will clear all of the things tmp

  • hbase structure and working process 2010-06-04

    Recent concern about Hadoop, and therefore the way a bit concerned about Hadoop-related projects. HBASE is based on an open source Hadoop project, is also an implementation of Google's BigTable. BigTable What is this? Google's Paper made its full des

  • hbase of org.apache.hadoop.hbase.client.RetriesExhaustedException: Error Records 2010-07-16

    Start hbase today, all the shell commands will appear org.apache.hadoop.hbase.client.RetriesExhaustedException exception, web interface is also not open, check a lot of information online have been resolved. Finally, run the start-hbase.sh generated

  • HBase the allocation of exploration Region 2010-08-13

    Region Allocation HBase the cluster of, Region is how to allocate this problem has troubled me for a long time, through code analysis and debugging, draw some of his views, criticism of shortcomings and mistakes please correct me. Participate in the

  • Use Eclipse3.4 compile and deploy Hadoop / Hbase works need to fix the BUG 2010-08-26

    Recently we have developed through the use of Eclipse3.4 Edition to deploy Hadoop and Hbase engineering works. But first and foremost is the need to deploy them locally first Hadoop. Hadoop deployment in the local download the source code when severa

  • RedHat install hadoop + hbase problem records 2010-08-27

    Vshpere virtual machine installed in the virtual two RedHat system. 1 in / etc / hosts, replace 172.0.0.1 with the host name mapping, slave can not find the master. To delete the host name. . 2 hbase the hbase-site.xml configuration file hbase.rootdi

  • HBase source code to read 2010-09-02

    Recently more free, a good study source code under the under the HBase (0.89.20100621), part of the next order, write some notes, posted a Share Next. Behind the scheme hope to be free to compile a complete line up: 1. Script 2.HMaster 3.HRegionServe

  • HBase source read -1 - Script 2010-09-02

    1. Script start-hbase.sh, hbase-daemon.sh, hbase-daemons.sh, zookeepers.sh, regionservers.sh, hbase, hbase-config.sh 1.1 hbase hbase command line entry, the ultimate control master, regionserver, zookeeper or off, etc. 1.1.1 hbase shell Implementatio

  • HBase source read-2-HMaster 2010-09-02

    2.HMaster: to achieve the functions of master A. responsible for allocating the region to regionserver, testing new or failure regionserver, and regionserver interaction, regionserver load balance between; B. Treatment shcema changes; C. Implementati

  • HBase source read-3-HRegionServer 2010-09-02

    3.HRegionServer: management regions, and to report their status HMaster 3.1 regionserver boot process (1) to read some configuration: machine name, port, the client retries, and master interaction interval, rpc timeout (2) create a Worker thread hand

  • HBase source read-4-HMaster and HRegionServer the RPC 2010-09-06

    4.HMaster and HRegionServer the RPC Start to create a Server when the master instance of the use of reflection to provide HMaster method call services; Server inherits from the abstract class HBaseServer; HBaseServer asynchronous io (nio package) pro

  • HBase source read-5-HMaster the management of the HRegion 2010-09-15

    5.HMaster on HRegion management: master the region of the distribution, state management, etc. entrusted to RegionManager (1) RegionManager member variables: rootRegionLocation, root region where the regionserver address; rootScannerThread and MetaSc