接上一篇,HBase与传统数据库一个很大的不同之处:HBase可以保存多个版本的值,而不仅仅是保存最新的值。版本之间,通过timestamp属性来区分。
我们来具体看一下,首先新增一行visit100,我们多次设置列personinfo:name,每次设置后都查询一次最新的值:
hbase(main):011:0> put 'patientvisit','visit100','personinfo:name','A' 0 row(s) in 0.0890 seconds hbase(main):012:0> get 'patientvisit','visit100','personinfo:name' COLUMN CELL personinfo:name timestamp=1454342758182, value=A 1 row(s) in 0.0310 seconds hbase(main):013:0> put 'patientvisit','visit100','personinfo:name','B' 0 row(s) in 0.0100 seconds hbase(main):014:0> get 'patientvisit','visit100','personinfo:name' COLUMN CELL personinfo:name timestamp=1454342796166, value=B 1 row(s) in 0.0230 seconds hbase(main):015:0> put 'patientvisit','visit100','personinfo:name','B' 0 row(s) in 0.0140 seconds hbase(main):016:0> get 'patientvisit','visit100','personinfo:name' COLUMN CELL personinfo:name timestamp=1454342802829, value=B 1 row(s) in 0.0120 seconds
我们可以看到,每次设置该列的数值timestamp都有相应的变化:
timestamp | value |
1454342758182 | A |
1454342796166 | B |
1454342802829 | B |
查询的时候,默认返回最新一次的值,当然我们也可以查询对应版本的数据:
注意:如果没有该timestamp,则不返回数据
hbase(main):004:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMESTAMP=>1454342758182 COLUMN CELL personinfo:name timestamp=1454342758182, value=A 1 row(s) in 0.0110 seconds hbase(main):005:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMESTAMP=>1454342758183 COLUMN CELL 0 row(s) in 0.0130 seconds
查询的时候,我们也可以通过TIMERANGE来查询该时间段内的最新值:
hbase(main):002:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMERANGE=>[1454342758182,1454342796166] COLUMN CELL personinfo:name timestamp=1454342758182, value=A 1 row(s) in 0.0200 seconds hbase(main):003:0> get 'patientvisit','visit100',COLUMN=>'personinfo:name',TIMERANGE=>[1454342758182,1454342796167] COLUMN CELL personinfo:name timestamp=1454342796166, value=B 1 row(s) in 0.0240 seconds
然后再说一下版本问题,为了说明这个问题,我们要先修改一下表的SCHEMA:
hbase(main):013:0> describe 'patientvisit' Table patientvisit is ENABLED patientvisit COLUMN FAMILIES DESCRIPTION {NAME => 'personinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'personinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'visitinfo', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} {NAME => 'visitinfoex', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'ROW', REPLICATION_SCOPE => '0', VERSIONS => '1', COMPRESSION => 'NONE', MIN_VERSIONS = > '0', TTL => 'FOREVER', KEEP_DELETED_CELLS => 'FALSE', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'} 4 row(s) in 0.0510 seconds hbase(main):020:0> alter 'patientvisit',{NAME=>'personinfoex',VERSIONS=>5} Updating all regions with the new schema... 0/1 regions updated. 1/1 regions updated. Done. 0 row(s) in 3.2850 seconds
然后,对于personinfoex:money进行重复赋值
hbase(main):022:0> put 'patientvisit','visit100','personinfoex:money','100' 0 row(s) in 0.2510 seconds hbase(main):023:0> put 'patientvisit','visit100','personinfoex:money','1000' 0 row(s) in 0.0140 seconds hbase(main):024:0> put 'patientvisit','visit100','personinfoex:money','10000' 0 row(s) in 0.0090 seconds hbase(main):025:0> put 'patientvisit','visit100','personinfoex:money','100000' 0 row(s) in 0.0110 seconds hbase(main):026:0> put 'patientvisit','visit100','personinfoex:money','1000000' 0 row(s) in 0.0110 seconds hbase(main):027:0> put 'patientvisit','visit100','personinfoex:money','10000000' 0 row(s) in 0.0160 seconds
然后,get的时候,传入VERSIONS参数:
hbase(main):029:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money' COLUMN CELL personinfoex:money timestamp=1454466774726, value=10000000 1 row(s) in 0.0690 seconds hbase(main):031:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>1 COLUMN CELL personinfoex:money timestamp=1454466774726, value=10000000 1 row(s) in 0.0160 seconds hbase(main):032:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>2 COLUMN CELL personinfoex:money timestamp=1454466774726, value=10000000 personinfoex:money timestamp=1454466769620, value=1000000 2 row(s) in 0.0100 seconds hbase(main):030:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>3 COLUMN CELL personinfoex:money timestamp=1454466774726, value=10000000 personinfoex:money timestamp=1454466769620, value=1000000 personinfoex:money timestamp=1454466766192, value=100000 3 row(s) in 0.0270 seconds hbase(main):033:0> get 'patientvisit','visit100',COLUMN=>'personinfoex:money',VERSIONS=>4 COLUMN CELL personinfoex:money timestamp=1454466774726, value=10000000 personinfoex:money timestamp=1454466769620, value=1000000 personinfoex:money timestamp=1454466766192, value=100000 personinfoex:money timestamp=1454466763838, value=10000 4 row(s) in 0.0410 seconds
恩,那再总结一下,HBase就是一个有时间戳和版本管理的三层KV数据库。