用flume+hbase thrift朝HBase插入了大约2亿行服务器的日志数据, 在用hbase org.apache.hadoop.hbase.mapreduce.Export 的时候, 发现出现了大量的ScannerTimeoutException,
于是ctrl+c取消了落地到HDFS.
HDFS 一共有 3 个datanode. 每个节点有2T的磁盘空间
$bin/hbase org.apache.hadoop.hbase.mapreduce.Export log.server1 /tom/log.server1
Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 61669ms passed since the last invocation, timeout is currently set to 60000
at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:205)
at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 29, already closed?
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2224)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
回到hdfs的web UI http://hdfs_address:50070/dfshealth.html#tab-overview, 发现DFS used还在急剧增长, 大约1s/1GB的写入数据量
登录到HDFS的namenode, top后发现hdfs和yarn还在占用大量的CPU资源, iostat后发现磁盘写入非常大
hadoop fs -du -s -h /tom/log.server1
发现已经占用超过了1.5T的, 在hdfs的web UI上显示DFS used占用超过 3TB, 而且还在增长
通过hbase shell后desc ‘table_name’ 发现表的COMPRESESSION => None 没有配置.
同时REPLICATION_SCOPE 已经被设置成了0, 查询hbase配置, 发现dfs.replication设为3
<property>
<name>dfs.replication.max</name>
<value>6</value>
<source>hdfs-site.xml</source>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
<source>hdfs-site.xml</source>
</property>
dfs.replication 就是落地到hdfs文件系统的时候, 会做几个replication, 我这里有3个datanode, 设成3个基本能满足需求, 如果你只有3个datanode, 但是指定replication为4, 实际是不会生效的, 因为每个datanode只能存放一个replication
因为我设置的是3, 而且得到落地的实际数据是1.5T, 1.5T * 3 = 4.5 TB, 也就是DFS used还要再写1.5TB的数据进去.
能不能在写的同时进行数据压缩, 这样就可以降低磁盘占用, 官方有测试压缩的结果
compress, GZIP, LZO, Snappy. (recommend LZO or Snappy)
Algorithm % remaining Encoding Decoding
GZIP 13.4% 21 MB/s 118 MB/s
LZO 20.5% 135 MB/s 410 MB/s
Snappy 22.2% 172 MB/s 409 MB/s
可以看见snappy压缩度最高,同时解压速度也不错,我这里已经装了snappy的
暂停log.server1表的
disable 'log.server1'
alter 'log.server1', NAME => 'cf1', COMPRESSION => 'snappy' #修改压缩
enable 'log.server1' #enable表后压缩还不会生效, 需要立即生效
major_compact 'log.server1' #这个命令执行的时间会相当长, 会对整个集群的CPU, IO有大量的占用
大约几个小时后, 发现磁盘占用,IO, CPU已经降下来了, 每个datanode从1.5TB降低到160GB, 同时HDFS总占用也降低到了480GBm 看样子数据已经全部落地, 并且经过了压缩.
修改hbase-site.xml, 添加如下参数
<property>
<name>hbase.regionserver.lease.period</name>
<value>120000</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>90000</value>
<description>ZooKeeper session timeout.</description>
</property>
<property>
<name>hbase.regionserver.restart.on.zk.expire</name>
<value>true</value>
<description> Zookeeper session expired will force regionserver exit. Enable this will make the regionserver restart. </description>
</property>
查看hadoop集群的备份冗余情况
hadoop fsck /
Minimally replicated blocks: 7580 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1
可以看见Average block replication 仍是3
需要修改hdfs中文件的备份系数。
hadoop dfs -setrep -w 3 -R /tom/
就是把目录下所有文件备份系数设置为2
sudo -u hdfs hadoop fs -setrep -R 2 /
如果再fsck时候出错,往往是由于某些文件的备份不正常导致的,可以用hadoop的balancer工具修复
自动负载均衡hadoop文件
hadoop balancer
再次查看各节点的磁盘占用情况
hadoop dfsadmin -report
onfigured Capacity: 6073208496384 (5.52 TB)
Present Capacity: 5980433230156 (5.44 TB)
DFS Remaining: 5541538604318 (5.04 TB)
DFS Used: 524630220680 (488.60 GB)
DFS Used%: 8.85%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0
一切已经恢复到正常.