0. prerequisite
Suppose 2 HBase pseudo distributed clusters have both started as folowing
relevant parameters in hbase-site.xml | source | destnation |
---|---|---|
hbase.zookeeper.quorum | macos | ubuntu |
hbase.zookeeper.property.clientPort | 2181 | 2181 |
zookeeper.znode.parent | /hbase | /hbase |
1. Create table for replication
1) start hbase shell on source cluster and create a table
$ cd $HOME_HBASE
$ bin/hbase shell
> create 'peTable', {NAME => 'info0', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'IN_MEMORY_COMPACTION' => 'NONE'}}
2) create excatly same table on destination cluster
2. Add the destination cluster as a peer in source cluster hbase shell
> add_peer 'ubt_pe', CLUSTER_KEY => "ubuntu:2181:/hbase", TABLE_CFS => { "peTable" => []}
3. Enable the table for replication in source cluster hbase shell
> enable_table_replication 'peTable'
4. Put data by using HBase PerformanceEvaluation tool
$ cd $HOME_HBASE
$ bin/hbase pe --table=peTable --nomapred --valueSize=100 randomWrite 1
2023-09-08 19:57:55,256 INFO [main] hbase.PerformanceEvaluation: RandomWriteTest test run options={"cmdName":"randomWrite","nomapred":true,"filterAll":false,"startRow":0,"size":0.0,"perClientRunRows":1048576,"numClientThreads":1,"totalRows":1048576,"measureAfter":0,"sampleRate":1.0,"traceRate":0.0,"tableName":"peTable","flushCommits":true,"writeToWAL":true,"autoFlush":false,"oneCon":false,"connCount":-1,"useTags":false,"noOfTags":1,"reportLatency":false,"multiGet":0,"multiPut":0,"randomSleep":0,"inMemoryCF":false,"presplitRegions":0,"replicas":1,"compression":"NONE","bloomType":"ROW","blockSize":65536,"blockEncoding":"NONE","valueRandom":false,"valueZipf":false,"valueSize":100,"period":104857,"cycles":1,"columns":1,"families":1,"caching":30,"latencyThreshold":0,"addColumns":true,"inMemoryCompaction":"NONE","asyncPrefetch":false,"cacheBlocks":true,"scanReadType":"DEFAULT","bufferSize":"2097152"}
...
2023-09-08 19:57:58,476 INFO [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=104857, last=1048576], latency [mean=19.87, min=0.00, max=328487.00, stdDev=1355.87, 95th=1.00, 99th=8.00]
2023-09-08 19:57:59,679 INFO [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=209714, last=1048576], latency [mean=15.34, min=0.00, max=328487.00, stdDev=1026.36, 95th=1.00, 99th=4.00]
...
2023-09-08 19:58:10,520 INFO [TestClient-0] hbase.PerformanceEvaluation: row [start=0, current=1048570, last=1048576], latency [mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 95th=0.00, 99th=1.00]
2023-09-08 19:58:10,569 INFO [TestClient-0] hbase.PerformanceEvaluation: Test : RandomWriteTest, Thread : TestClient-0
2023-09-08 19:58:10,577 INFO [TestClient-0] hbase.PerformanceEvaluation: Latency (us) : mean=13.17, min=0.00, max=328487.00, stdDev=780.16, 50th=0.00, 75th=0.00, 95th=0.00, 99th=1.00, 99.9th=19.00, 99.99th=28853.39, 99.999th=58579.15
2023-09-08 19:58:10,577 INFO [TestClient-0] hbase.PerformanceEvaluation: Num measures (latency) : 1048575
2023-09-08 19:58:10,584 INFO [TestClient-0] hbase.PerformanceEvaluation: Mean = 13.17
Min = 0.00
Max = 328487.00
StdDev = 780.16
50th = 0.00
75th = 0.00
95th = 0.00
99th = 1.00
99.9th = 19.00
99.99th = 28853.39
99.999th = 58579.15
2023-09-08 19:58:10,584 INFO [TestClient-0] hbase.PerformanceEvaluation: No valueSize statistics available
2023-09-08 19:58:10,586 INFO [TestClient-0] hbase.PerformanceEvaluation: Finished class org.apache.hadoop.hbase.PerformanceEvaluation$RandomWriteTest in 14286ms at offset 0 for 1048576 rows (9.24 MB/s)
2023-09-08 19:58:10,586 INFO [TestClient-0] hbase.PerformanceEvaluation: Finished TestClient-0 in 14286ms over 1048576 rows
2023-09-08 19:58:10,586 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest] Summary of timings (ms): [14286]
2023-09-08 19:58:10,595 INFO [main] hbase.PerformanceEvaluation: [RandomWriteTest duration ] Min: 14286ms Max: 14286ms Avg: 14286ms
2023-09-08 19:58:10,595 INFO [main] hbase.PerformanceEvaluation: [ Avg latency (us)] 13
2023-09-08 19:58:10,596 INFO [main] hbase.PerformanceEvaluation: [ Avg TPS/QPS] 73399 row per second
2023-09-08 19:58:10,596 INFO [main] client.AsyncConnectionImpl: Connection has been closed by main.
Note, help of PerformanceEvaluation can be shown as:
$ bin/hbase pe
5. Count rows on source and peer
1) in source cluster hbase shell
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563
Current count: 2000, row: 00000000000000000000003160
...
Current count: 663000, row: 00000000000000000001048457
663073 row(s)
Took 12.9970 seconds
2) in peer cluster hbase shell
> count 'peTable'
Current count: 1000, row: 00000000000000000000001563
Current count: 2000, row: 00000000000000000000003160
...
Current count: 663000, row: 00000000000000000001048457
663073 row(s)
Took 7.1883 seconds
6. Verify replication by using VerifyReplication class from source cluster hbase shell
$ cd $HOME_HBASE
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication 'ubt_pe' 'peTable'
2023-09-08 20:14:37,199 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=VerifyReplication connecting to ZooKeeper ensemble=localhost:2181
...
2023-09-08 20:14:44,393 INFO [main] mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1694172104063_0001/
2023-09-08 20:14:44,394 INFO [main] mapreduce.Job: Running job: job_1694172104063_0001
2023-09-08 20:14:54,521 INFO [main] mapreduce.Job: Job job_1694172104063_0001 running in uber mode : false
2023-09-08 20:14:54,524 INFO [main] mapreduce.Job: map 0% reduce 0%
2023-09-08 20:20:18,907 INFO [main] mapreduce.Job: map 100% reduce 0%
2023-09-08 20:20:19,924 INFO [main] mapreduce.Job: Job job_1694172104063_0001 completed successfully
2023-09-08 20:20:20,040 INFO [main] mapreduce.Job: uces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=321487
Total vcore-milliseconds taken by all map tasks=321487
Total megabyte-milliseconds taken by all map tasks=329202688
Map-Reduce Framework
Map input records=663073
Map output records=0
Input split bytes=105
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=707
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=114819072
HBaseCounters
BYTES_IN_REMOTE_RESULTS=103439388
BYTES_IN_RESULTS=103439388
MILLIS_BETWEEN_NEXTS=313921
NOT_SERVING_REGION_EXCEPTION=0
REGIONS_SCANNED=1
REMOTE_RPC_CALLS=60
REMOTE_RPC_RETRIES=0
ROWS_FILTERED=17
ROWS_SCANNED=663073
RPC_CALLS=60
RPC_RETRIES=0
org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication$Verifier$Counters
GOODROWS=663073
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
Note, help of VerifyReplication can be shown as:
$ bin/hbase org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication –help