HBase落地到HDFS后磁盘空间急剧增长的解决.

  1. 场景

用flume+hbase thrift朝HBase插入了大约2亿行服务器的日志数据, 在用hbase org.apache.hadoop.hbase.mapreduce.Export 的时候, 发现出现了大量的ScannerTimeoutException,
于是ctrl+c取消了落地到HDFS.
HDFS 一共有 3 个datanode. 每个节点有2T的磁盘空间

$bin/hbase org.apache.hadoop.hbase.mapreduce.Export log.server1 /tom/log.server1

Error: org.apache.hadoop.hbase.client.ScannerTimeoutException: 61669ms passed since the last invocation, timeout is currently set to 60000
        at org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:434)
        at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:364)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:205)
        at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:147)
        at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.nextKeyValue(TableInputFormatBase.java:216)
        at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
        at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
        at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: org.apache.hadoop.hbase.UnknownScannerException: org.apache.hadoop.hbase.UnknownScannerException: Name: 29, already closed?
        at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2224)
        at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
        at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)

回到hdfs的web UI http://hdfs_address:50070/dfshealth.html#tab-overview, 发现DFS used还在急剧增长, 大约1s/1GB的写入数据量
登录到HDFS的namenode, top后发现hdfs和yarn还在占用大量的CPU资源, iostat后发现磁盘写入非常大

hadoop fs -du -s -h /tom/log.server1

发现已经占用超过了1.5T的, 在hdfs的web UI上显示DFS used占用超过 3TB, 而且还在增长

通过hbase shell后desc ‘table_name’ 发现表的COMPRESESSION => None 没有配置.
同时REPLICATION_SCOPE 已经被设置成了0, 查询hbase配置, 发现dfs.replication设为3

<property>
    <name>dfs.replication.max</name>
    <value>6</value>
    <source>hdfs-site.xml</source>
</property>
<property>
    <name>dfs.replication</name>
    <value>3</value>
    <source>hdfs-site.xml</source>
</property>

dfs.replication 就是落地到hdfs文件系统的时候, 会做几个replication, 我这里有3个datanode, 设成3个基本能满足需求, 如果你只有3个datanode, 但是指定replication为4, 实际是不会生效的, 因为每个datanode只能存放一个replication

因为我设置的是3, 而且得到落地的实际数据是1.5T, 1.5T * 3 = 4.5 TB, 也就是DFS used还要再写1.5TB的数据进去.
能不能在写的同时进行数据压缩, 这样就可以降低磁盘占用, 官方有测试压缩的结果

        compress, GZIP, LZO, Snappy. (recommend LZO or Snappy)
            Algorithm   % remaining Encoding Decoding
            GZIP        13.4% 21 MB/s 118 MB/s
            LZO         20.5% 135 MB/s 410 MB/s
            Snappy      22.2% 172 MB/s 409 MB/s

可以看见snappy压缩度最高,同时解压速度也不错,我这里已经装了snappy的

暂停log.server1表的

disable 'log.server1'
alter 'log.server1', NAME => 'cf1', COMPRESSION => 'snappy'     #修改压缩
enable 'log.server1'                                            #enable表后压缩还不会生效, 需要立即生效
major_compact 'log.server1'                                     #这个命令执行的时间会相当长, 会对整个集群的CPU, IO有大量的占用

大约几个小时后, 发现磁盘占用,IO, CPU已经降下来了, 每个datanode从1.5TB降低到160GB, 同时HDFS总占用也降低到了480GBm 看样子数据已经全部落地, 并且经过了压缩.

修改hbase-site.xml, 添加如下参数

<property>
    <name>hbase.regionserver.lease.period</name>
    <value>120000</value>
</property>
<property>
    <name>zookeeper.session.timeout</name>
    <value>90000</value>
    <description>ZooKeeper session timeout.</description>
    </property>
<property>
    <name>hbase.regionserver.restart.on.zk.expire</name>
    <value>true</value>
    <description> Zookeeper session expired will force regionserver exit.  Enable this will make the regionserver restart.  </description>
</property>

查看hadoop集群的备份冗余情况

hadoop fsck /

Minimally replicated blocks: 7580 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 1
Average block replication: 3.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 3
Number of racks: 1

可以看见Average block replication 仍是3

需要修改hdfs中文件的备份系数。

hadoop dfs -setrep -w 3 -R /tom/ 就是把目录下所有文件备份系数设置为2

sudo -u hdfs hadoop fs -setrep -R 2 /

如果再fsck时候出错,往往是由于某些文件的备份不正常导致的,可以用hadoop的balancer工具修复
自动负载均衡hadoop文件

hadoop balancer

再次查看各节点的磁盘占用情况

hadoop dfsadmin -report

onfigured Capacity: 6073208496384 (5.52 TB)
Present Capacity: 5980433230156 (5.44 TB)
DFS Remaining: 5541538604318 (5.04 TB)
DFS Used: 524630220680 (488.60 GB)
DFS Used%: 8.85%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

一切已经恢复到正常.

你可能感兴趣的:(hbase,hdfs,snappy,compressio)