as u know,the hbas's data logs (akka wal) will roll after certain intervals to speedup restore data lost occasionally.and of course,both log rolling and flush memstore will block up all wirtes but reads.so if decreasing the log rolling will optimize the cluster perf.
1.case
during the high load period of a day,some ugly logs are occured as below:
2015-09-07 09:10:35,380 WARN [IPC Server handler 1 on 60020] WritableRpcEngine.java:436 (responseTooSlow): {"processingtimems":22896,"call":"multi(org.apache.hadoop.hbase.client.Mul tiAction@72e76446), rpc version=1, client version=29, methodsFingerPrint=-56040613","client":"192.168.0.117:55979","starttimems":1441588212481,"queuetimems":0,"class":"HRegionServer ","responsesize":0,"method":"multi"} 2015-09-07 09:11:45,033 DEBUG [regionserver60020.logRoller] LogRoller.java:86 Hlog roll period 3600000ms elapsed 2015-09-07 09:11:45,033 INFO [regionserver60020.logRoller] FSUtils.java:429 FileSystem doesn't support getDefaultReplication 2015-09-07 09:11:45,034 INFO [regionserver60020.logRoller] FSUtils.java:395 FileSystem doesn't support getDefaultBlockSize 2015-09-07 09:11:45,037 DEBUG [regionserver60020.logRoller] SequenceFileLogWriter.java:195 using new createWriter -- HADOOP-6840 2015-09-07 09:11:45,037 DEBUG [regionserver60020.logRoller] SequenceFileLogWriter.java:208 Path=hdfs://hd02:54310/hbase-94s/.logs/node-59,60020,1441098691966/node-59%2C60020%2C14410 98691966.1441588305033, syncFs=true, hflush=false, compression=false 2015-09-07 09:11:45,053 INFO [regionserver60020.logRoller] HLog.java:654 Roll /hbase-94s/.logs/node-59,60020,1441098691966/node-59%2C60020%2C1441098691966.1441584704954, entries=123 01, filesize=25620076. for /hbase-94s/.logs/node-59,60020,1441098691966/node-59%2C60020%2C1441098691966.1441588305033 2015-09-07 09:11:46,104 DEBUG [LRU Statistics #0] LruBlockCache.java:681 Stats: total=1.37 GB, free=2.32 GB, max=3.69 GB, blocks=10745, accesses=22326231, hits=19423914, hitRatio=87 .00%, , cachingAccesses=19369432, cachingHits=19352868, cachingHitsRatio=99.91%, , evictions=0, evicted=5776, evictedPerRun=Infinity 2015-09-07 09:12:05,535 WARN [IPC Server handler 0 on 60020] WritableRpcEngine.java:436 (responseTooSlow): {"processingtimems":17071,"call":"multi(org.apache.hadoop.hbase.client.Mul tiAction@1ef7f77), rpc version=1, client version=29, methodsFingerPrint=-56040613","client":"192.168.0.142:44554","starttimems":1441588308460,"queuetimems":0,"class":"HRegionServer" ,"responsesize":0,"method":"multi"} 2015-09-07 09:12:42,259 INFO [regionserver60020-SendThread(node-01:2181)] ClientCnxn.java:1085 Unable to read additional data from server sessionid 0x14dd790d3735f22, likely server has closed socket, closing socket connection and attempting reconnect
2.optimizations
1) compress the logs by compress option to reduce io
hbase.regionserver.wal.enablecompression=true
the principle of it pls check out LRUDiction.java
before:
2015-09-17 03:32:52,361 INFO [regionserver60020.logRoller] HLog.java:654 Roll /hbase/.logs/node-app-13,60020,1441852998783/node-app-13%2C60020%2C1441852998783.1442429970301, entries =54994, filesize=127512793. for /hbase/.logs/node-app-13,60020,1441852998783/node-app-13%2C60020%2C1441852998783.1442431972113
now:
2015-09-22 08:22:04,008 INFO [regionserver60020.logRoller] HLog.java:654 Roll /hbase/.logs/node-app-13,60020,1442683680803/node-app-13%2C60020%2C1442683680803.1442877723854, entries =34246, filesize=45448938. for /hbase/.logs/node-app-13,60020,1442683680803/node-app-13%2C60020%2C1442683680803.1442881323960
so the compress ratio per entry is:
(45448938/34246 ) / (127512793 / 54994) ~ 0.6
that is 40 percent of raw spaces are saved,it's very considerable to derive that.
in additional,these logs are needless to decompress generally,beside that some rs downs occasionally.
2) memory data compress
TBD
3) increase the hlog block size
to delay compression
hbase.regionserver.hlog.blocksize=<should-more-then-dfs's one>
now i give a double dfs block size to that:268435456
hbase will not occupy some more memory though i set it more then default(dfs size)
4) increase the log roll interval
either this case or 3) will occur by which comes first,so i will set it to 10 hours:36000000