最近对CDH进行升级,从5.4.8升级到5.7.0,主要想升级spark和hbase。
详细信息参考What’s New In CDH 5.7.x
Component | Package Version |
---|---|
Apache Hadoop | hadoop-2.6.0+cdh5.7.0+1280 |
HBase | hbase-1.2.0+cdh5.7.0+129 |
Apache Hive | hive-1.1.0+cdh5.7.0+522 |
Hue | hue-3.9.0+cdh5.7.0+1759 |
Apache Impala | impala-2.5.0+cdh5.7.0+0 |
Apache Oozie | oozie-4.1.0+cdh5.7.0+267 |
Apache Sentry | sentry-1.5.1+cdh5.7.0+184 |
Apache Spark | spark-1.6.0+cdh5.7.0+180 |
Apache Sqoop | sqoop-1.4.6+cdh5.7.0+56 |
Apache Sqoop2 | sqoop2-1.99.5+cdh5.7.0+38 |
Zookeeper | zookeeper-3.4.5+cdh5.7.0+94 |
更多服务的版本和下载地址参考: CDH 5.7.x Packaging and Tarball Information
cloudera manager使用postgres来存储,数据库信息可以在/etc/cloudera-scm-server/db.properties中找到。
备份命令:
1
|
pg_dump -p 7432 -U scm > /data/backup/scm_server_db_backup.$(date +%Y%m%d)
|
使用的是mysql,涉及的库有hive
, hue
, sentry
, oozie
, sqoop
mysqldump -h vlnx107010 -uroot -p hive > /data/backup/hive-backup.$(date +%Y%m%d).sql
mysqldump -h vlnx107010 -uroot -p hue > /data/backup/hue-backup.$(date +%Y%m%d).sql
mysqldump -h vlnx107010 -uroot -p sentry > /data/backup/sentry-backup.$(date +%Y%m%d).sql
mysqldump -h vlnx107010 -uroot -p oozie_oozie_server > /data/backup/oozie-backup.$(date +%Y%m%d).sql
mysqldump -h vlnx107010 -uroot -p sqoop > /data/backup/sqoop-backup.$(date +%Y%m%d).sql
使用packages方式更新。
停止cloudera-manager服务
1
|
$ sudo service cloudera-scm-server stop
|
使用内嵌的PostgreSQL数据库的话,停止此服务
1
|
$ sudo service cloudera-scm-server-db stop
|
==Important:== If you are not running the embedded database service and you attempt to stop it, you receive a message indicating that the service cannot be found. If instead you get a message that the shutdown failed, the embedded database is still running, probably because services are connected to the Hive metastore. If the database shutdown fails due to connected services, issue the following command:
RHEL-compatible 7 and higher:
12345 $ sudo service cloudera-scm-server-db next_stop_fast$ sudo service cloudera-scm-server-db stop```All other Linux distributions:>
sudo service cloudera-scm-server-db fast_stop
1
2
3
4
5
|
4. 停止cloudera-agent服务
``` bash
$ sudo service cloudera-scm-agent stop
|
如果网络速度比较快,可以直接新建cloudera-manager.repo
:
1
2
3
4
5
6
|
[cloudera-manager]
# Packages for Cloudera Manager, Version 5, on RHEL or CentOS 6 x86_64
name = Cloudera Manager
baseurl = https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/
gpgkey = https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/RPM-GPG-KEY-cloudera
gpgcheck = 1
|
如果访问cloudera源不太稳定,可以搭建本地的repo源。
安装vsftp
使用vsftp作为ftp服务器,配置文件在/etc/vsftpd/
下,ftp路径在/var/ftp/
下。
1
2
|
$ yum install vsftpd
$ service vsftpd start
|
下载rpm包和repodata
新建目录/var/ftp/pub/cloudera-repo
作为repo源目录,从https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/
按下载所需的rpm包和repodata目录,完成后目录结构如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
$ tree /var/ftp/pub/cloudera-repo/
/var/ftp/pub/cloudera-repo/
├── repodata
│ ├── filelists.xml.gz
│ ├── filelists.xml.gz.asc
│ ├── other.xml.gz
│ ├── other.xml.gz.asc
│ ├── primary.xml.gz
│ ├── primary.xml.gz.asc
│ ├── repomd.xml
│ └── repomd.xml.asc
└── RPMS
└── x86_64
├── cloudera-manager-agent-5.7.0-1.cm570.p0.76.el6.x86_64.rpm
├── cloudera-manager-daemons-5.7.0-1.cm570.p0.76.el6.x86_64.rpm
├── cloudera-manager-server-5.7.0-1.cm570.p0.76.el6.x86_64.rpm
└── cloudera-manager-server-db-2-5.7.0-1.cm570.p0.76.el6.x86_64.rpm
|
新建本地repocloudera-manager.repo
如下:
1
2
3
4
|
[cloudera-manager]
name = Cloudera Manager, Version 5.7.0
baseurl = ftp://${local-repo-ip}/pub/cloudera-repo
gpgcheck = 0
|
Custom Repository
,填写repository地址为ftp://${local-repo-ip}/pub/cloudera-repo
。
1
2
|
$ sudo yum clean all
$ sudo yum upgrade cloudera-manager-server cloudera-manager-daemons cloudera-manager-server-db-2 cloudera-manager-agent
|
1
2
|
$ sudo service cloudera-scm-server-db start
$ sudo service cloudera-scm-server start
|
==Important:== All hosts in the cluster must have access to the Internet if you plan to use archive.cloudera.com as the source for installation files. If you do not have Internet access, create a custom repository.
以下两种升级任选一种。
进入cloudera manager后,会自动弹出升级页面,选择Yes, I would like to upgrade the Cloudera Manager Agent packages now
,然后一步步进行。
在选择Cloudera Manager Agent Release时,有两种选择
Matched Release for this Cloudera Manager Server
,这样会直接从https://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5/
进行更新。Custom Repository
,然后填写本地repo源地址。这两种方式都会在每台机器/etc/yum.repos.d/
目录下生成cloudera-manager.repo
,不过其中的baseurl
参数不同。
cloudera-manager.repo
并清理yum cache,yum clean all
。service cloudera-scm-agent stop
。yum upgrade cloudera-manager-server cloudera-manager-daemons cloudera-manager-server-db-2 cloudera-manager-agent
。service cloudera-scm-agent start
。在页面上hosts
页面,点击Inspect All Hosts
,检测完成后可以查看结果,能够比较详细的查看各机器情况。
CDH升级后,更要重的是对CDH管理的服务进行升级,这里使用parcels进行升级。
/data/cloudera-parcel-server
下载CDH parcel和manifest.json
1
2
|
$ wget http://archive.cloudera.com/cdh5/parcels/5/CDH-5.7.0-1.cdh5.7.0.p0.45-el6.parcel
$ wget http://archive.cloudera.com/cdh5/parcels/5/manifest.json
|
下载kafka parcel和manifest.json
1
2
|
$ wget http://archive.cloudera.com/kafka/parcels/2/KAFKA-2.0.1-1.2.0.1.p0.5-el6.parcel
$ wget http://archive.cloudera.com/kafka/parcels/2/manifest.json
|
下载gplextras5和manifest.json
1
2
|
$ wget http://archive.cloudera.com/gplextras5/parcels/5/GPLEXTRAS-5.6.1-1.cdh5.6.1.p0.5-el6.parcel
$ wget http://archive.cloudera.com/gplextras5/parcels/5/manifest.json
|
整体目录结构如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$ tree /data/cloudera-parcel-server/
/data/cloudera-parcel-server/
├── cdh
│ ├── CDH-5.7.0-1.cdh5.7.0.p0.45-el6.parcel
│ └── manifest.json
├── gplextras
│ ├── GPLEXTRAS-5.6.1-1.cdh5.6.1.p0.5-el6.parcel
│ └── manifest.json
└── kafka
├── KAFKA-2.0.1-1.2.0.1.p0.5-el6.parcel
└── manifest.json
3 directories, 6 files
|
启动一个server
1
|
$ python -m SimpleHTTPServer 8080
|
在浏览器中打开链接查看是否启动成功,http://{http-server-ip}:8080
,成功后可以在parcels的配置Remote Parcel Repository URLs
中添加相关的parcel地址:
http://{http-server-ip}:8080/cdh/
http://{http-server-ip}:8080/kafka/
http://{http-server-ip}:8080/gplextras/
参考Upgrading to CDH 5.7 Using Parcels
Hosts -> Parcels
处设置上一个步骤中的parcel临时远程仓库Let me upgrade the cluster
,手动进行重启服务以避免服务不可用。Actions > Install Oozie ShareLib
。Actions > Upgrade Sqoop
。Actions > Install Spark JAR
和Actions > Create Spark History Log Dir
。Instances
Tab页,一个实例一个实例的重启,这样可以避免服务不可用。Deploy Client Configuration
。另外的途径可以在Hosts > parcels
页面手动进行Distribute
和Active
,然后进行上面的步骤4–步骤9。
在Hosts > parcels
页面手动对Kafka Parcel进行Distribute
和Active
,然后对Kafka Broker一台台进行重启,确定没问题后可以删除旧的parcel。
参考Configuring Services to Use the GPL Extras Parcel
Hosts > parcels
页面手动对gplextras parcel进行Distribute
和Active
Compression Codecs
,添加com.hadoop.compression.lzo.LzoCodec
和com.hadoop.compression.lzo.LzopCodec
,重启HDFS。ln -sf /opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/hadoop-lzo.jar /var/lib/ooziehadoop-lzo.jar
,重启Oozie。Sqoop Service Environment Advanced Configuration Snippet
,添加HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/*
和JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/GPLEXTRAS/lib/hadoop/lib/native
,重启Sqoop2。Deploy Client Configuration
。在升级后,发现各Client中的命令引用没有更新,需要手动将/etc/alternatives/
目录下CDH相关更新为最新。
以spark-shell
为例。
which spark-shell
,发现指向/usr/bin/spark-shell
ls -l /usr/bin/spark-shell
,发现是个软链接并指向了/etc/alternatives/spark-shell
ls -l /etc/alternatives/spark-shell
,发现是个软链接并指向了/opt/cloudera/parcels/CDH-5.4.8-1.cdh5.4.8.p0.4/bin/spark-shell
,明显还指向升级前的版本。ln -sf /opt/cloudera/parcels/CDH/bin/spark-shell /etc/alternatives/spark-shell
将其指向最新的版本。可以使用shell脚本批量将/etc/alternatives
下CDH相关软链接指向最新的版本。