~ Tutorial 5 ~

Feed


HBase

  • NoSQL
  • Database for unstructured Data
  • Column oriented
  • SQL/NoSQL queries
  • Scalable (HDFS)
  • Fault tolerance: Write Ahead Log.
  • JAVA API Clients
  • Block cache and Bloom Filters
  • HMaster
  • ZooKeeper
  • Meta Table
    • Holds the location of the regions in the cluster.

HBase vs RDMS

HBase

  • No fixed schema
    • Defines only column families.
  • For structured and semi-structured data
  • Allow missing data
  • Allow wide tables (scaled horizontally)

RDMS

  • Fixed schema (rows and columns)
  • For structured data
  • Only normalized data
  • Hard to scale

Useful commands

type help 'command' when you are in doubt

  • list
  • describe
  • status
  • create
  • put
  • get
  • delete
  • scan
  • alter
  • is_enabled
  • disabled

Demo

[hadoop@ip-172-31-49-26 ~]$ hbase shell

hbase(main):008:0> help "create"
Creates a table. Pass a table name, and a set of column family
specifications (at least one), and, optionally, table configuration.
Column specification can be a simple string (name), or a dictionary
(dictionaries are described below in main help output), necessarily
including NAME attribute.
Examples:

Create a table with namespace=ns1 and table qualifier=t1
  hbase> create 'ns1:t1', {NAME => 'f1', VERSIONS => 5}

Create a table with namespace=default and table qualifier=t1
  hbase> create 't1', {NAME => 'f1'}, {NAME => 'f2'}, {NAME => 'f3'}
  hbase> # The above in shorthand would be the following:
  hbase> create 't1', 'f1', 'f2', 'f3'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 1, TTL => 2592000, BLOCKCACHE => true}
  hbase> create 't1', {NAME => 'f1', CONFIGURATION => {'hbase.hstore.blockingStoreFiles' => '10'}}
  hbase> create 't1', {NAME => 'f1', IS_MOB => true, MOB_THRESHOLD => 1000000, MOB_COMPACT_PARTITION_POLICY => 'weekly'}

Table configuration options can be put at the end.
Examples:

  hbase> create 'ns1:t1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS => ['10', '20', '30', '40']
  hbase> create 't1', 'f1', SPLITS_FILE => 'splits.txt', OWNER => 'johndoe'
  hbase> create 't1', {NAME => 'f1', VERSIONS => 5}, METADATA => { 'mykey' => 'myvalue' }
  hbase> # Optionally pre-split the table into NUMREGIONS, using
  hbase> # SPLITALGO ("HexStringSplit", "UniformSplit" or classname)
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}
  hbase> create 't1', 'f1', {NUMREGIONS => 15, SPLITALGO => 'HexStringSplit', REGION_REPLICATION => 2, CONFIGURATION => {'hbase.hregion.scan.loadColumnFamiliesOnDemand' => 'true'}}
  hbase> create 't1', 'f1', {SPLIT_ENABLED => false, MERGE_ENABLED => false}
  hbase> create 't1', {NAME => 'f1', DFS_REPLICATION => 1}

You can also keep around a reference to the created table:

  hbase> t1 = create 't1', 'f1'

Which gives you a reference to the table named 't1', on which you can then
call methods.
hbase(main):009:0> create 'mytable', 'courses'
Created table mytable
Took 2.4082 seconds
=> Hbase::Table - mytable
hbase(main):011:0> help 'put'
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase(main):012:0> list
TABLE
mytable
1 row(s)
Took 0.0260 seconds
=> ["mytable"]
hbase(main):013:0> status
1 active master, 0 backup masters, 1 servers, 0 dead, 3.0000 average load
Took 0.1406 seconds
hbase(main):014:0> help 'put'
Put a cell 'value' at specified table/row/column and optionally
timestamp coordinates.  To put a cell value into table 'ns1:t1' or 't1'
at row 'r1' under column 'c1' marked with the time 'ts1', do:

  hbase> put 'ns1:t1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value'
  hbase> put 't1', 'r1', 'c1', 'value', ts1
  hbase> put 't1', 'r1', 'c1', 'value', {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
  hbase> put 't1', 'r1', 'c1', 'value', ts1, {VISIBILITY=>'PRIVATE|SECRET'}

The same commands also can be run on a table reference. Suppose you had a reference
t to table 't1', the corresponding command would be:

  hbase> t.put 'r1', 'c1', 'value', ts1, {ATTRIBUTES=>{'mykey'=>'myvalue'}}
hbase(main):004:0> put 'mytable', 'r1', 'courses:econ', 'micro economics'
Took 0.0151 seconds
hbase(main):005:0> put 'mytable', 'r1', 'courses:econ', 'macro economics'
Took 0.0128 seconds
hbase(main):006:0> put 'mytable', 'r1', 'courses:physics', 'special relativity'
Took 0.0138 seconds
hbase(main):007:0> put 'mytable', 'r2', 'courses:physics', 'quantum mechanics'
Took 0.0098 seconds
hbase(main):008:0> scan 'mytable'
ROW                                                          COLUMN+CELL
 r1                                                          column=courses:econ, timestamp=1614672393424, value=macro economics
 r1                                                          column=courses:physics, timestamp=1614672411954, value=special relativity
 r2                                                          column=courses:physics, timestamp=1614672429850, value=quantum mechanics
2 row(s)
Took 0.1086 seconds

Disable table

hbase(main):009:0> is_enabled 'mytable'
true
Took 0.0695 seconds
=> true
hbase(main):010:0> disable 'mytable'
Took 0.8090 seconds
hbase(main):011:0> scan 'mytable'
ROW                                                          COLUMN+CELL
org.apache.hadoop.hbase.TableNotEnabledException: mytable is disabled.
    at org.apache.hadoop.hbase.client.ConnectionImplementation.relocateRegion(ConnectionImplementation.java:761)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:328)
    at org.apache.hadoop.hbase.client.ScannerCallable.prepare(ScannerCallable.java:139)
    at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas$RetryingRPC.prepare(ScannerCallableWithReplicas.java:408)
    at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
    at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

ERROR: Table mytable is disabled!

For usage try 'help "scan"'

Took 0.1300 seconds
hbase(main):012:0> describe 'newtable'

ERROR: Table newtable does not exist.

For usage try 'help "describe"'

Took 0.0104 seconds
hbase(main):013:0> describe 'mytable'
Table mytable is DISABLED
mytable
COLUMN FAMILIES DESCRIPTION
{NAME => 'courses', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
 REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}

1 row(s)

QUOTAS
0 row(s)
Took 0.1125 seconds

Enable table and Alter table

hbase(main):015:0> alter 'mytable', 'college'
Updating all regions with the new schema...
All regions updated.
Done.
Took 1.4347 seconds
hbase(main):016:0> enable 'mytable'
Took 1.2613 seconds
hbase(main):017:0> scan 'mytable'
ROW                                                          COLUMN+CELL
 r1                                                          column=courses:econ, timestamp=1614672393424, value=macro economics
 r1                                                          column=courses:physics, timestamp=1614672411954, value=special relativity
 r2                                                          column=courses:physics, timestamp=1614672429850, value=quantum mechanics
2 row(s)
Took 0.0204 seconds
hbase(main):018:0> get 'mytable', 'r1'
COLUMN                                                       CELL
 courses:econ                                                timestamp=1614672393424, value=macro economics
 courses:physics                                             timestamp=1614672411954, value=special relativity
1 row(s)
Took 0.0444 seconds
hbase(main):019:0> describe 'mytable'
Table mytable is ENABLED
mytable
COLUMN FAMILIES DESCRIPTION
{NAME => 'college', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
 REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}

{NAME => 'courses', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0',
 REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '655
36'}

2 row(s)

QUOTAS
0 row(s)
Took 0.1481 seconds

Reference: