目录
0.引 言
1.备份方法
2 数据恢复方法
2.1 将将备份文件添加到hdfs中
2.2 导入数据到HBase集群
3 相关参考代码
4.小 结
HBase在大数据处理中地位至关重要,有的公司会将HBase作为原始数据接入层,那么Hbase的数据备份就显得至关重要,那么如何备份呢?这里我们引入Hbase的一个API,hbase org.apache.hadoop.hbase.mapreduce.Export,该API可以实现同一集群间的数据备份。对于实际使用场景中,我们往往还会遇到数据迁移的问题,即数据从一个集群拷贝到另一个集群中,当然数据迁移的方案有很多种,本文所讨论的是对于不同环境,不同IP场景下,数据从一种环境中的集群拷贝到另一环境的集群,两种集群不能互相通信,对于这种需求我们依然采用HBase提供的Export及Import工具进行备份,本文针对以上场景给出了完整的解决方案。
(1)场景:应用不同网络集群之间的数据拷贝,两集群之间互不相通。比如从杭州机房拷贝数据到北京机房,两个集群之间不能通信。冷备份,历史数据迁移,不考虑数据丢失等问题。
(2)基本原理:
Export导出工具与CopyTable一样是依赖hbase的scan读取数据,并且采用的ImportFormat与CopyTable一样是TableInputFormat类,从该类的getSplits()方法可以看出MR的map数与hbase表的region数相同。
(3)脚本实现
1) 脚本说明:
2)脚本实现
#!/bin/bash
#需要提前安装zip命令,yum install -y zip
###############################################################################################################################
#当没有参数输入时候(输入参数为0),说明需要将HBASE中的所有表都进行全量备份
#当输入参数为1时候,说明只输入了需要备份的哪张表,且该表需要全量备份
#当输入参数为2的时候,说明只输入了起始时间和结束时间,需要将所有的表按照时间进行增量备份
#当输入参数为3的时候,说明需要按照某张表,某个时间范围进行增量备份
##################################################################################################################################
HBASE_HOME=/usr/idp/current/hbase-client
#version=1;
start_time=
end_time=
backup_table=
function getFromHbase()
{
if [ "$#" = "4" ];then
backup_table="$4"
elif [ "$#" = "2" ];then
backup_table="$2"
else
echo Invalid Args!
fi
case "$1" in
"i")
#指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
hbase org.apache.hadoop.hbase.mapreduce.Export ${backup_table} /bak/${backup_table} 1 $(( ${2} * 1000 )) $(( ${3} * 1000 ))
;;
"f")
#指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
hbase org.apache.hadoop.hbase.mapreduce.Export ${backup_table} /bak/${backup_table}
;;
"*")
echo Invalid Args!
;;
esac
rm -rf /home/centos/opt/${backup_table}
hadoop dfs -get /bak/${backup_table} /home/centos/opt/${backup_table}
cd /home/centos/opt
zip -r ${backup_table}.zip /home/centos/opt/${backup_table}
}
case "$#" in
"0")
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
getFromHbase "f" ${backup_table}
done
;;
"1")
if [[ $1 =~ ^phm_default_ ]];then
backup_table=$1
getFromHbase "f" ${backup_table}
else
echo Invalid Args!
echo 'Usage: '$(basename $0)' backup_table'
fi
;;
"2")
if [ "$1" != "" ];then
start_time=`date -d "$1" +%s` #增量备份的起始时间戳
fi;
if [ "$2" != "" ];then
end_time=`date -d "$2" +%s` #增量备份的结束时间戳
fi;
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
getFromHbase "i" ${start_time} ${end_time} ${backup_table}
done
;;
"3")
if [ "$1" != "" ];then
start_time=`date -d "$1" +%s` #增量备份的起始时间戳
fi;
if [ "$2" != "" ];then
end_time=`date -d "$2" +%s` #增量备份的结束时间戳
fi;
if [ "$3" != "" ];then
backup_table=$3 #需要备份的表
fi;
getFromHbase "i" ${start_time} ${end_time} ${backup_table}
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' ""|backup_table|start_time end_time|start_time end_time backup_table'
;;
esac
3)初始化脚本
#!/bin/bash
#创建备份目标路径/home/centos/opt
rm -rf /home/centos/opt
mkdir -p /home/centos/opt
chmod 777 -R /home/centos/opt
#hdfs中创建/bak备份目录
su - hdfs << EOF
hadoop fs -rmr /bak;
hadoop fs -mkdir -p /bak
hadoop fs -chmod -R 777 /bak;
exit;
EOF
执行备份脚本时候先进行初始化操作。
4)将备份到本地的数据利用相关工具copy到磁盘,作为下一次数据恢复使用。
将上面的目录复制到新的机器中,通过put将文件上传到hdfs中去。
hadoop dfs -put localFile hdfsFile
如下:把系统的/home/centos/opt/student 上传到新集群hdfs的/bak目录下下。
(1)拷贝数据到新集群的本地目录下。
创建/home/centos/opt,利用相关远传工具将数据复制到新的集群中。
rm -rf /home/centos/opt
mkdir -p /home/centos/opt
(2)在新集群环境的hdfs中创建/bak目录
为了避免权限等一些不必要的麻烦采用如下操作
su - hdfs << EOF
hadoop fs -rmr /bak;
hadoop fs -mkdir -p /bak;
hadoop fs -chmod -R 777 /bak;
exit;
EOF
(3)put上传本地文件到HDFS集群
hadoop fs -put /home/centos/opt/student /bak
(4) 查看上传到HDFS中的文件
[root@bigdata-1 opt]# hadoop fs -ls /bak
(1)先建立一张与原来一样的表。如student表
create 'student','infor'
结果如下:
(2)采用HBase的import工具将数据导入
hbase org.apache.hadoop.hbase.mapreduce.Import student /bak/student
执行结果如下:
注意:如果执行报错,错误如下
2020-08-12 14:42:08,803 INFO [main] client.AHSProxy: Connecting to Application History server at bigdata-1.jiaxun.com/10.9.4.109:10200
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user/root/.staging":hdfs:hdfs:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1939)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1922)
at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:71)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4150)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1109)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:645)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3075)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3043)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1181)
at org.apache.hadoop.hdfs.DistributedFileSystem$25.doCall(DistributedFileSystem.java:1177)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:1177)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:1169)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:160)
at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:111)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:144)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at org.apache.hadoop.hbase.mapreduce.Import.main(Import.java:547)
比较简单粗暴的解决方法就是给报错内容所指的目录处给予777权限
su - hdfs
hadoop fs -chmod -R 777 /user
(3) 导入结束后查看数据是否导入成功
可以看到数据已经被成功导入。
(4)数据导入时需要注意的问题
2.3 导入数据脚本
导入比较简单只需要将备份数据直接导入所需要的集群即可
#!/bin/bash
HBASE_HOME=/usr/idp/current/hbase-client
function ImportToHBase()
{ #指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
sudo -u hdfs hadoop fs -put /home/centos/opt/${backup_table} /bak
hbase org.apache.hadoop.hbase.mapreduce.Import ${backup_table} /bak/${backup_table}
}
if [ "$1" != "" ];then
backup_table=$1
ImportToHBase ${backup_table}
else
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
ImportToHBase ${backup_table}
done
fi
#####HBASE中批量建表############
#!/bin/bash
#table_head='iot_phm_'
if [ "$1" != "" ];then
table_head=$1
fi;
CF='phm'
if [ "$2" != "" ];then
CF=$2
fi;
HBASE_HOME=/usr/idp/current/hbase-client
$HBASE_HOME/bin/hbase shell < "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}shock", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}environment", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}conresis", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}transtime", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}input", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}lightningswitch", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}tdcs", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}relay", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}stabvol", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}dcswitchmachpower", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}micromonit", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}interlock", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}turnrepres", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}block", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}zpwrail", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}stationjoin", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}circuitbreak", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}inputtransftemp", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}power25hz", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}accontact", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}signal", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}xbbox", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}lightunit", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}sigmachshock", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}sigmachenv", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}centdetectlightunit", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}tocircuit", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}ticircuit", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}tcenvironment", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}tcshock", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}swtctlcir", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}sigctlcir", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}2y2wrelay", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
create "${table_head}alarm", {NAME => "$CF",COMPRESSION => 'SNAPPY'}
EXIT;
EOF
#初始化准备工作
#!/bin/bash
#创建备份目标路径/home/centos/opt
rm -rf /home/centos/opt
mkdir -p /home/centos/opt
chmod 777 -R /home/centos/opt
#hdfs中创建/bak备份目录
su - hdfs << EOF
hadoop fs -rmr /bak;
hadoop fs -mkdir -p /bak
hadoop fs -chmod -R 777 /bak;
exit;
EOF
=======================================
#导出代码
#!/bin/bash
#需要提前安装zip命令,yum install -y zip
###############################################################################################################################
#当没有参数输入时候(输入参数为0),说明需要将HBASE中的所有表都进行全量备份
#当输入参数为1时候,说明只输入了需要备份的哪张表,且该表需要全量备份
#当输入参数为2的时候,说明只输入了起始时间和结束时间,需要将所有的表按照时间进行增量备份
#当输入参数为3的时候,说明需要按照某张表,某个时间范围进行增量备份
##################################################################################################################################
HBASE_HOME=/usr/idp/current/hbase-client
#version=1;
start_time=
end_time=
backup_table=
function getFromHbase()
{
if [ "$#" = "4" ];then
backup_table="$4"
elif [ "$#" = "2" ];then
backup_table="$2"
else
echo Invalid Args!
fi
case "$1" in
"i")
#指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
hbase org.apache.hadoop.hbase.mapreduce.Export ${backup_table} /bak/${backup_table} 1 $(( ${2} * 1000 )) $(( ${3} * 1000 ))
;;
"f")
#指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
hbase org.apache.hadoop.hbase.mapreduce.Export ${backup_table} /bak/${backup_table}
;;
"*")
echo Invalid Args!
;;
esac
rm -rf /home/centos/opt/${backup_table}
hadoop dfs -get /bak/${backup_table} /home/centos/opt/${backup_table}
cd /home/centos/opt
zip -r ${backup_table}.zip /home/centos/opt/${backup_table}
}
case "$#" in
"0")
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
getFromHbase "f" ${backup_table}
done
;;
"1")
if [[ $1 =~ ^phm_default_ ]];then
backup_table=$1
getFromHbase "f" ${backup_table}
else
echo Invalid Args!
echo 'Usage: '$(basename $0)' backup_table'
fi
;;
"2")
if [ "$1" != "" ];then
start_time=`date -d "$1" +%s` #增量备份的起始时间戳
fi;
if [ "$2" != "" ];then
end_time=`date -d "$2" +%s` #增量备份的结束时间戳
fi;
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
getFromHbase "i" ${start_time} ${end_time} ${backup_table}
done
;;
"3")
if [ "$1" != "" ];then
start_time=`date -d "$1" +%s` #增量备份的起始时间戳
fi;
if [ "$2" != "" ];then
end_time=`date -d "$2" +%s` #增量备份的结束时间戳
fi;
if [ "$3" != "" ];then
backup_table=$3 #需要备份的表
fi;
getFromHbase "i" ${start_time} ${end_time} ${backup_table}
;;
*)
echo Invalid Args!
echo 'Usage: '$(basename $0)' ""|backup_table|start_time end_time|start_time end_time backup_table'
;;
esac
=======================================================
#导入HBASE操作
#!/bin/bash
HBASE_HOME=/usr/idp/current/hbase-client
function ImportToHBase()
{ #指定hdfs用户执行命令
sudo -u hdfs hadoop fs -rmr /bak/${backup_table};
sudo -u hdfs hadoop fs -put /home/centos/opt/${backup_table} /bak
hbase org.apache.hadoop.hbase.mapreduce.Import ${backup_table} /bak/${backup_table}
}
if [ "$1" != "" ];then
backup_table=$1
ImportToHBase ${backup_table}
else
for backup_table in `echo "list" | $HBASE_HOME/bin/hbase shell | grep '^phm_default_'`
do
ImportToHBase ${backup_table}
done
fi
本文主要讲解Hbase数据备份及迁移方法,备份主要采用HBase自带的工具Export进行备份,该备份工具结合shell脚本可有效的实现Hbase层的增量备份与全量备份,对于备份好的数据需要实现不同网络之间集群的数据迁移时可以采用Import工具进行数据导入,文章给出了完整的数据备份及迁移解决方案,并在实际中得到了良好的应用。