hive临时目录清理

hive运行失败会导致临时目录无法自动清理,因此需要自己写脚本去进行清理

实际发现hive临时目录有两个:
/tmp/hive/{user}/*
/warehouse/tablespace//hive/**/.hive-staging_hive

分别由配置hive.exec.scratchdir和hive.exec.stagingdir决定:
hive临时目录清理_第1张图片
在这里插入图片描述
要注意的是stagingdir可能存在多个层级的目录中,比如xxx.db/.hive-staging_xxx,xxx.db/${table}/.hive-staging_xxx。这里为了偷懒,只清理两层即可,如果发现有更多层,再多加一层的调用即可

创建清理脚本clean_hive_tmpfile.sh

#!/bin/bash
usage="Usage: cleanup.sh [days]"
        if [ ! "$1" ]
        then
        echo $usage
        exit 1
fi
 
now=$(date +%s)
days=$1
 
cleanTmpFile() {
        echo "clean path:"$1
        su hdfs -c "hdfs dfs -ls -d $1" | grep "^d" | while read f; do
                dir_date=`echo $f | awk '{print $6}'`
                difference=$(( ( $now - $(date -d "$dir_date" +%s) ) / (24 * 60 * 60 ) ))
 
                if [ $difference -gt $days ]; then
                        echo $f
                        name=`echo $f| awk '{ print $8 }'`
                        echo "delete:"$name
                        su hdfs -c "hadoop fs -rm -r -skipTrash $name"
                fi
        done
 
}
 
cleanTmpFile /tmp/hive/*/*
cleanTmpFile "/warehouse/tablespace/*/hive/*/.hive-staging_*"
cleanTmpFile "/warehouse/tablespace/*/hive/*/*/.hive-staging_*"

配置crontab,每天凌晨2点执行,清理30天以前的目录

0 2 * * * sh /data/script/clean_hive_tmpfile.sh 30 >> /tmp/clean_hive_tmpfile.log

参考:
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.start.cleanup.scratchdir
https://www.cnblogs.com/ucarinc/p/11831280.html
https://blog.csdn.net/zhoudetiankong/article/details/51800887
http://t.zoukankan.com/telegram-p-10748530.html

你可能感兴趣的:(备忘,#,hive,hive,临时目录清理)