首页 > 代码库 > <HBase>

<HBase>

Overview

  • Use HBase when u need random, realtime read/write access to ur Big Data.
  • HBase is an open-source, distributed, versioned, non-relational database modeled after Google‘s Bigtable: A Distributed Storage System for Structured Data

Features

  • Linear and modular scalability; 可扩展性
  • Strictly consistent reads and writes; 强读写一致性
  • Automatic and configurable sharding of tables; 自动、可配置的表分区
  • Automatic failover support between RegionServers; RegionServers间自动的失效备援
  • Convenient base classes for backing Hadoop MapReduce jobs with Apach HBase tables;
  • Easy to use Java API for client access;
  • Block cache and Bloom filters for real-time queries; 块缓存 & bloom filter for实时查询 [for high volume query optimization]
  • Thrift gateway and a REST-ful Web service that supports XML, Protobuf, and binary data encoding options;
  • Extensible jruby-based(JIRB) shell;
  • Support for exporting metrics via the Hadoop metrics subsystem to files or Ganglia; or via JMX

Data Model

  • 在HBase中,数据存储在表(包含有行和列)中。
  • HBase术语:
    • Tables:一个HBase table由多行组成;
    • Row: Hbase中的一行由 a row key + one or more columns with values associated with them组成。 行是按row key字母顺序存储的。因此,row key的设计是十分重要的。
    • column:A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character. 【列由列族 + 列修饰符组成。】
    • column family: Column families physically colocate a set of columns and their values, often for performance reasons. 列族一般是处于性能考虑,每个列族都有一系列的存储特性:比如值是否cache到内存,数据如何压缩,row key如何编码等等。一个表中的每一行都有相同的列族,尽管一个给定的行在给定列族上不存储任何东西。
    • column qualifier: A column qualifier is added to a column family to provide the index for a given piece of data. [列修饰符用于给给定的一块数据提供索引。] 
    • Cell: cell是行、列族和列修饰符的组成,并且包含值和时间戳(用以表示值的版本)。
    • timestamp: 时间戳是与每个值一起写入的,用做值的版本修饰符。缺省情况下,timestamp表示数据写入regionServer时的时间,你也可以指定。

Conceptual View

  • Eg: 
    • TableName: webtable
    • contains two rows-->  com.cnn.www & com.example.www  [rowkey]
    • three column families-->  contents & anchor & people
    • anchor contains two columns-->  anchor:cssnsi.com & anchor:my.look.ca [列anchor:cssnsi.com由列族anchor和cssnsi.com修饰符组成]
    • contains 5 versions of the row with rowkey com.cnn.www

Physical View

  • 尽管在概念层面上,表可以被看成行的稀疏集合。但在物理上它们是通过列族存储的。一个新的列修饰符(column_family:column_qualifier)可以随时被添加到已有的列族中。
  • Table 5. ColumnFamily anchor
    Row KeyTime StampColumn Family anchor

    "com.cnn.www"

    t9

    anchor:cnnsi.com = "CNN"

    "com.cnn.www"

    t8

    anchor:my.look.ca = "CNN.com"

    Table 6. ColumnFamily contents
    Row KeyTime StampColumnFamily contents:

    "com.cnn.www"

    t6

    contents:html = "<html>…?"

    "com.cnn.www"

    t5

    contents:html = "<html>…?"

    "com.cnn.www"

    t3

    contents:html = "<html>…?"

     

Architecture

Overview

NoSQL?

  • "NoSQL" is a general term meaning that the database isn’t an RDBMS which supports SQL as its primary access language.
  • There are many types of NoSQL databases: BerkeleyDB is an example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking, HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS, such as typed columns, secondary indexes, triggers, and advanced query languages, etc.【HBase是分布式数据库。它缺少RDBMS中的很多特性,比如二级索引、触发器和高级查询语言等等。】

When should I use HBase?

  • HBase并不适合所有场景。
  • 首先,确保你有足够的数据。否则你的数据可能在一个单一机器上,其他的node会sitting idle。
  • 第二,确保u can like without all the extra featrues that an RDBMS provides(e.g., typed columns, secondary indexes, transactions, advanced query languages, etc.)
  • 第三,确保你有足够多硬件设备。

What Is the Difference Between HBase and Hadoop/HDFS?

  • HDFS很适合于存储大文件。但是它不提供快速的、单一记录的查询。
  • 而HBase is built on top of HDFS, 并且提供对large tables的快速记录查询和更新
  • 关于这一点,你可能会有点困惑。实际上,HBase是通过将数据放在索引的存放在HDFS上的"StoreFiles"来提供高速查询的。关于更详细的解释,可以看下本篇介绍的Data Model部分。

Catalog Tables

  • catalog table `hbase:meta`实际上是作为HBase表存在的。虽然它被排除在HBase shell的list命令之外,但实际上它本质和其他表一样。

hbase:meta

  • hbase:meta表保存了系统中所有分区的列表

 

 

FYI

  • HBase Guide

<HBase>