CDH集群日志清理

一. 查看磁盘占用情况

df -h

二. 查看日志占用情况

1.1 cdh的各组件的日志一般在/var/log目录下(最好前面加上/data),因此主要关注“/”

查看/data/var/log下使用空间较大的文件夹,并由大到小排列

cd /data/var/log/
du -s ./* | sort -nr

CDH集群日志清理_第1张图片

 

1.2 还有一个是 Cloudera Management Service服务产生的日志,存在/var/lib/...

cd /data/var/lib/cloudera-service-monitor
du -s ./* | sort -nr

CDH集群日志清理_第2张图片

 三. 清理日志

1.1 清理cm、cdh组件的日志数据

rm -rf /data/var/log/cloudera-scm-eventserver/*.out.*
rm -rf /data/var/log/cloudera-scm-firehose/*.out.*
rm -rf /data/var/log/cloudera-scm-agent/*.log.*
rm -rf /data/var/log/cloudera-scm-agent/*.out.*
rm -rf /data/var/log/cloudera-scm-server/*.out.*
rm -rf /data/var/log/cloudera-scm-server/*.log.*
	   
rm -rf /data/var/log/hadoop-hdfs/*.out.*
rm -rf /data/var/log/hadoop-httpfs/*.out.*
rm -rf /data/var/log/hadoop-kms/*.out.*
rm -rf /data/var/log/hadoop-mapreduce/*.out.*
rm -rf /data/var/log/hadoop-yarn/*.out.*
rm -rf /data/var/log/hadoop-hdfs/*.out.*
rm -rf /data/var/log/hadoop-hdfs/*.audit.*
rm -rf /data/var/log/flume-ng/*.out.*
rm -rf /data/var/log/solr/*.out.*
rm -rf /data/var/log/solr/solr_gc.log.*
	   
rm -rf /data/var/log/zookeeper/*.log.*
rm -rf /data/var/log/impalad/*.log.*
rm -rf /data/yarn/nm/usercache/*/filecache/*
rm -rf /data/azkaban/projects/*

1.2 清理监控服务的数据

rm -rf /data/var/lib/cloudera-host-monitor/ts/*/partition*/* 
rm -rf /data/var/lib/cloudera-service-monitor/ts/*/partition*/*

1.3 清理hdfs回收站数据

-- 查看回收站文件大小
hadoop fs -du -h -s /user/*/.Trash/*
-- 清理回收站内容(需2步,步骤1其他账号回收站内容移动到root下,步骤二清理root下回收站内容)
hadoop fs -rm -r /user/*/.Trash/*
hadoop fs -rm -r /user/root/.Trash/Current

四. 自动化脚本

vim  cleanLog.sh
#!/bin/bash

rm -rf /data/var/lib/cloudera-host-monitor/ts/*/partition*/* 
rm -rf /data/var/lib/cloudera-service-monitor/ts/*/partition*/*

rm -rf /data/var/log/cloudera-scm-eventserver/*.out.*
rm -rf /data/var/log/cloudera-scm-firehose/*.out.*
rm -rf /data/var/log/cloudera-scm-agent/*.log.*
rm -rf /data/var/log/cloudera-scm-agent/*.out.*
rm -rf /data/var/log/cloudera-scm-server/*.out.*
rm -rf /data/var/log/cloudera-scm-server/*.log.*
	   
rm -rf /data/var/log/hadoop-hdfs/*.out.*
rm -rf /data/var/log/hadoop-httpfs/*.out.*
rm -rf /data/var/log/hadoop-kms/*.out.*
rm -rf /data/var/log/hadoop-mapreduce/*.out.*
rm -rf /data/var/log/hadoop-yarn/*.out.*
rm -rf /data/var/log/hadoop-hdfs/*.out.*
rm -rf /data/var/log/hadoop-hdfs/*.audit.*
rm -rf /data/var/log/flume-ng/*.out.*
rm -rf /data/var/log/solr/*.out.*
rm -rf /data/var/log/solr/solr_gc.log.*
	   
rm -rf /data/var/log/zookeeper/*.log.*
rm -rf /data/var/log/impalad/*.log.*
rm -rf /data/yarn/nm/usercache/*/filecache/*
rm -rf /data/azkaban/projects/*
crontab -e
# 设置每周一的凌晨1点执行
00 01 * * 1 sh /root/clearLog.sh

记一次CDH集群日志数据清理

你可能感兴趣的:(数仓,大数据,linux,java,linux,cdh,大数据,清理日志)