CDH5.0.2升级至CDH5.2.0

升级需求
1.为支持spark kerberos安全机制
2.为满足impala trunc函数
3.为解决impala import时同时query导致impala hang问题
升级步骤
参考http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/installation_upgrade.html
优先升级cloudera manager,再升级cdh
1.准备工作:
统一集群root密码,需要运维帮忙操作下
agent自动重启关闭
事先下载好parcals包

2.CM升级
登录cmserver安装的主机,执行命令:
cat /etc/cloudera-scm-server/db.properties
备份CM数据:
  pg_dump -U scm -p 7432   > scm_server_db_backup.bak
  检查/tmp下是否有文件生成,期间保证tmp下文件不要被删除。
停止CM server :
  sudo service cloudera-scm-server stop
停止CM server依赖的数据库:
  sudo service cloudera-scm-server-db stop
如果这台CM server上有agent在运行也停止:
  sudo service cloudera-scm-agent stop
修改yum的 cloudera-manager.repo文件:
  sudo vim /etc/yum.repos.d/cloudera-manager.repo
   [cloudera-manager]
       # Packages for Cloudera Manager, Version 5, on RedHat or CentOS 6 x86_64
       name=Cloudera Manager
       baseurl=http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/
       gpgkey = http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
       gpgcheck = 1
安装:
  sudo
      yum clean all
      sudo yum upgrade 'cloudera-*'
检查:
  rpm  -qa 'cloudera-manager-*'
启动CM server 数据库:
  sudo  service cloudera-scm-server-db start
启动CM server:
  sudo  service cloudera-scm-server start
登录http://172.20.0.83:7180/
  安装agent(步骤略)
升级如果升级jdk,会改变java_home路径,导致java相关服务不可用,需要重新配置java_home
升级CM后需要重启CDH。
3.CDH升级
停止集群所有服务
备份namenode元数据:
  进入namenode dir,执行:
   tar -cvf /root/nn_backup_data.tar ./*
下载parcels
分发包->激活包->关闭(非重启)
开启zk服务
进入HDFS服务->升级hdfs metadata
  namenode上启动元数据
  启动剩余HDFS角色
  namenode响应RPC
  HDFS退出安全模式
备份hive metastore数据库
  mysqldump -h172.20.0.67 -ucdhhive -p111111 cdhhive > /tmp/database-backup.sql
进入hive服务->更新hive metastore
     database scheme
更新oozie sharelib:oozie->install
     oozie share lib
  创建 oozie user
      sharelib
  创建 oozie user
      Dir
更新sqoop:进入sqoop服务->update
     sqoop
  更新sqoop2 server
更新spark(略,可先卸载原来版本,升级后直接安装新版本)
启动集群所有服务:zk->hdfs->spark->flume->hbase->hive->impala->oozie->sqoop2->hue
分发客户端文件:deploy client
     configuration
  deploy hdfs client configuration
  deploy spark client configuration
  deploy hbase client configuration
  deploy yarn client configuration
  deploy hive client configuration
删除老版本包:
  sudo  yum remove bigtop-utils bigtop-jsvc bigtop-tomcat hue-common sqoop2-client
启动agent:
  sudo service cloudera-scm-agent restart
HDFS
  metadata update
  hdfs server->instance->namenode=>action->Finalize
      Metadata Upgrade

升级过程遇主要问题:
com.cloudera.server.cmf.FeatureUnavailableException: The feature Navigator Audit Server is not available.
        at com.cloudera.server.cmf.components.LicensedFeatureManager.check(LicensedFeatureManager.java:49)
        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfig(OperationsManagerImpl.java:1312)
        at com.cloudera.server.cmf.components.OperationsManagerImpl.setConfigUnsafe(OperationsManagerImpl.java:1352)
        at com.cloudera.api.dao.impl.ManagerDaoBase.updateConfigs(ManagerDaoBase.java:264)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateConfigsHelper(RoleConfigGroupManagerDaoImpl.java:214)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:97)
        at com.cloudera.api.dao.impl.RoleConfigGroupManagerDaoImpl.updateRoleConfigGroup(RoleConfigGroupManagerDaoImpl.java:79)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at com.cloudera.api.dao.impl.ManagerDaoBase.invoke(ManagerDaoBase.java:208)
        at com.sun.proxy.$Proxy82.updateRoleConfigGroup(Unknown Source)
        at com.cloudera.api.v3.impl.RoleConfigGroupsResourceImpl.updateRoleConfigGroup(RoleConfigGroupsResourceImpl.java:69)
        at com.cloudera.api.v3.impl.MgmtServiceResourceV3Impl$RoleConfigGroupsResourceWrapper.updateRoleConfigGroup(MgmtServiceResourceV3Impl.java:54)
        at com.cloudera.cmf.service.upgrade.RemoveBetaFromRCG.upgrade(RemoveBetaFromRCG.java:80)
        at com.cloudera.cmf.service.upgrade.AbstractApiAutoUpgradeHandler.upgrade(AbstractApiAutoUpgradeHandler.java:36)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgradesForOneVersion(AutoUpgradeHandlerRegistry.java:233)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:167)
        at com.cloudera.cmf.service.upgrade.AutoUpgradeHandlerRegistry.performAutoUpgrades(AutoUpgradeHandlerRegistry.java:138)
        at com.cloudera.server.cmf.Main.run(Main.java:587)
        at com.cloudera.server.cmf.Main.main(Main.java:198)
2014-11-26 03:17:42,891 INFO ParcelUpdateService:com.cloudera.parcel.components.ParcelDownloade
原先版本使用了60天试用企业版本,该期限已经过期,升级时Navigator服务启动不了,导致整个cloduera manager server启动失败
升级后问题
a.升级后flume原先提供的第三方jar丢失,需要将包重新放在/opt....下
b.sqoop导入mysql的驱动包找不到,需要将包重新放在/opt....下
c.hbase服务异常
Unhandled exception. Starting shutdown.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User hbase/[email protected] (auth:KERBEROS) is not authorized for protocol interface org.apache.hadoop.hdfs.protocol.ClientProtocol, expected client Kerberos principal is null
at org.apache.hadoop.ipc.Client.call(Client.java:1409)
at org.apache.hadoop.ipc.Client.call(Client.java:1362)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:594)
at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2224)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:993)
at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:977)
at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:432)
at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:851)
at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:435)
at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:146)
at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:127)
at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:789)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:606)
at java.lang.Thread.run(Thread.java:744)
通过cm将safe配置文件里的hbase.rpc.engine org.apache.hadoop.hbase.ipc.SecureRpcEngine去掉后重启成功。
后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。
d.service monitor,zookeeper也有警告,其他服务都有部分红色异常
Exception in scheduled runnable.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at com.cloudera.cmon.firehose.polling.CdhTask.checkClientConfigs(CdhTask.java:712)
at com.cloudera.cmon.firehose.polling.CdhTask.updateCacheIfNeeded(CdhTask.java:675)
at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.getDescriptorAndHandleChanges(FirehoseServicesPoller.java:615)
at com.cloudera.cmon.firehose.polling.FirehoseServicesPoller.run(FirehoseServicesPoller.java:179)
at com.cloudera.enterprise.PeriodicEnterpriseService$UnexceptionablePeriodicRunnable.run(PeriodicEnterpriseService.java:67)
at java.lang.Thread.run(Thread.java:745)
后来发现是cm server的问题,之前修改了一个hostname,cloudera manager server未重启,重启后,加入该配置重启hbase不会有问题。
e.mapreduce访问安全机制下的hbase失败
去除client hbase-site safe配置文件内容:hbase.rpc.protection privacy,旧版本中必须加此配置,而新版本文档中也提到需要加此配置,但经过测试加此配置后报如上异常。
14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Socket connection established to ip-10-1-33-24.ec2.internal/10.1.33.24:2181, initiating session
14/11/27 12:38:26 INFO zookeeper.ClientCnxn: Session establishment complete on server ip-10-1-33-24.ec2.internal/10.1.33.24:2181, sessionid = 0x549ef6088f20309, negotiated timeout = 60000
14/11/27 12:38:41 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:38:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:39:15 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:39:34 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:39:55 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:40:19 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
14/11/27 12:40:36 WARN ipc.RpcClient: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]


Caused by: java.io.IOException: Couldn't setup connection for hbase/[email protected] to hbase/[email protected]
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$1.run(RpcClient.java:821)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.handleSaslConnectionFailure(RpcClient.java:796)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:898)
at org.apache.hadoop.hbase.ipc.RpcClient.getConnection(RpcClient.java:1543)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1442)
at org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1661)
at org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1719)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.execService(ClientProtos.java:30014)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.execService(ProtobufUtil.java:1623)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:93)
at org.apache.hadoop.hbase.ipc.RegionCoprocessorRpcChannel$1.call(RegionCoprocessorRpcChannel.java:90)
at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:114)
... 31 more
Caused by: javax.security.sasl.SaslException: No common protection layer between client and server
at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:252)
at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:187)
at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:210)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupSaslConnection(RpcClient.java:770)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.access$600(RpcClient.java:357)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:891)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$2.run(RpcClient.java:888)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.setupIOstreams(RpcClient.java:888)
... 40 more
<property>
     <name>hbase.rpc.engine</name>
     <value>org.apache.hadoop.hbase.ipc.SecureRpcEngine</value>
</property>
mr中使用http://www.cloudera.com/content/cloudera/en/documentation/cdh5/v5-0-0/CDH5-Installation-Guide/cdh5ig_mapreduce_hbase.html  TableMapReduceUtil.addDependencyJars(job);方式加载。
并且使用user api加入例如:
hbase.master.kerberos.principal=hbase/[email protected]
hbase.keytab.path=/home/dev/1015q.keytab
f.升级后impala jdbc安全机制下不可用
java.sql.SQLException: Could not open connection to jdbc:hive2://ip-10-1-33-22.ec2.internal:21050/ym_system;principal=impala/[email protected]: GSS initiate failed
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:187)
at org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:164)
at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
at java.sql.DriverManager.getConnection(DriverManager.java:571)
at java.sql.DriverManager.getConnection(DriverManager.java:233)
at com.cloudera.example.ClouderaImpalaJdbcExample.main(ClouderaImpalaJdbcExample.java:37)
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:221)
at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:297)
at org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:185)
... 5 more
解决:
hadoop-auth-2.5.0-cdh5.2.0.jar
hive-shims-common-secure-0.13.1-cdh5.2.0.jar
两个包回退版本即可

你可能感兴趣的:(upgrade,Cloudera)