本文根据官方安装文档的PathB 整理
CDH的搭建有多种方法,一般来说,都是先搭建Cloudera Manager, 然后利用Cloudera Manager搭建CDH,如果是测试环境,可以直接利用Cloudera Manager自动化安装,这种方式使用内嵌的PostgreSQL作为metadata等数据的存储,不适于生产环境。生产环境中一般会使用MariaDB, MySQL或其他独立搭建的数据库(当然要做HA),所以我们会先搭建一个MySQL数据库作为外部存储数据库。
以下为腾讯云服务器为例
主机名 | CPU | 内存 | 带宽 | 内网IP | 系统盘大小 | 数据盘大小 |
---|---|---|---|---|---|---|
cdh1 | 4 | 16 | 10 | 10.0.11.15 | 50 | 300 |
cdh2 | 4 | 16 | 10 | 10.0.11.4 | 50 | 300 |
cdh3 | 4 | 16 | 10 | 10.0.11.3 | 50 | 300 |
修改/etc/hostname 中的主机名
1 |
echo "" > /etc/hostname && echo "cdh3" > /etc/hostname && hostname cdh3 |
修改/etc/hosts 增加机器名和IP映射到host文件
1 |
vim /etc/hosts |
复制以下内容到host文件(注意服务器重启后丢失的情况)
1 2 3 |
10.0.11.15 cdh1 10.0.11.4 cdh2 10.0.11.3 cdh3 |
1 2 |
systemctl stop firewalld systemctl disable firewalld |
所有节点依次执行, 并将authorized_keys文件复制到下一个节点,
全部节点执行过后,将包含所有节点公钥的authorized_keys 复制回cdh1并同步到所有节点
1 2 3 4 5 |
cd ~/.ssh/ ssh-keygen -t rsa cat id_rsa.pub >> authorized_keys chmod 600 authorized_keys scp authorized_keys root@cdh2:~/.ssh/ |
1 |
curl -o /etc/yum.repos.d/cloudera-manager.repo https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo # cloudera yum源 |
2.5.1 设置Centos-Base 源为阿里镜像(如有需要)
1 |
curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo # 阿里yum源 |
2.5.2 更新系统
1 |
yum update |
腾讯云使用: ntpupdate.tencentyun.com
阿里云使用:ntp1.aliyun.com 或者 time1.aliyun.com
1 2 3 4 |
yum install ntp ntpdate ntpupdate.tencentyun.com crontab e 0 */2 * * * ntpdate ntpupdate.tencentyun.com |
2.7.1 Oracle JDK1.7
1 |
yum install oracle-j2sdk1.7 -y |
2.7.2 Oracle JDK1.8(建议)
所有节点的JDK版本和安装路径必须一致
安装前请务必确认ClouderaManager 和 CDH支持的JDK版本点击此处查询原文
以JDK1.8为例,官网文档要求最低版本为1.8u31, 建议1.8u74版本以上(All JDK 8 updates, from the minimum required version, are supported in Cloudera Manager/CDH 5.3 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.)
部分JDK8版本会有bug/安全问题,参见下图
2.7.3 安装
1 2 3 4 5 |
rpm -qa | grep java #查询是否有旧版jdk rpm -e –nodeps #卸载旧版的jdk mkdir /usr/java tar zxvf /opt/jdk-8u172-linux-x64.tar.gz -C /usr/java #解压到/usr/java下(重要) vim /etc/profile #编辑环境变量 |
1 2 3 4 |
JAVA_HOME=/usr/java/jdk1.8.0_172 JRE_HOME=/usr/java/jdk1.8.0_172/jre PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin export JAVA_HOME JRE_HOME PATH |
1 |
ln -s /usr/java/jdk1.8.0_172/bin/java /usr/bin/java |
1 |
java -version |
修改/etc/selinux/config, 将SElinx的值设置为disabled, 重启生效
1 |
vim /etc/selinux/config |
Cloudera Manager和CDH所需要的数据库清单如下:
注意: CentOS可能自带了MariaDB,注意观察安装日志是否被取代
1 2 3 4 5 6 |
cd /opt/ wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm yum update yum install mysql-server service mysqld start |
配置MySQL的目的有以下几个:
3.2.1 删除部分文件
删除/var/lib/mysql/ib_logfile0 和 /var/lib/mysql/ib_logfile1文件
1 2 |
rm -f /var/lib/mysql/ib_logfile0 rm -f /var/lib/mysql/ib_logfile1 |
3.2.2 修改MySQL配置文件
1 2 |
cp /etc/my.cnf /etc/my.cnf.bak vim /etc/my.cnf |
替换为以下内容(官方推荐):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
[mysqld] transaction-isolation = READ-COMMITTED # Disabling symbolic-links is recommended to prevent assorted security risks; # to do so, uncomment this line: # symbolic-links = 0 key_buffer_size = 32M max_allowed_packet = 32M thread_stack = 256K thread_cache_size = 64 query_cache_limit = 8M query_cache_size = 64M query_cache_type = 1 max_connections = 550 #expire_logs_days = 10 #max_binlog_size = 100M #log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system #and chown the specified folder to the mysql user. log_bin=/var/lib/mysql/mysql_binary_log # For MySQL version 5.1.8 or later. For older versions, reference MySQL documentation for configuration help. binlog_format = mixed read_buffer_size = 2M read_rnd_buffer_size = 16M sort_buffer_size = 8M join_buffer_size = 8M # InnoDB settings innodb_file_per_table = 1 innodb_flush_log_at_trx_commit = 2 innodb_log_buffer_size = 64M innodb_buffer_pool_size = 4G innodb_thread_concurrency = 8 innodb_flush_method = O_DIRECT innodb_log_file_size = 512M [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid sql_mode=STRICT_ALL_TABLES |
3.2.3 安装MySQL的JDBC Driver
从该页面Download Connector/J 下载, 并上传到cdh1:/opt 目录下, 以mysql-connector-java-5.1.41.tar.gz为例
1 2 3 |
cd /opt && tar zxvf mysql-connector-java-5.1.41.tar.gz mkdir -p /usr/share/java/ cp mysql-connector-java-5.1.41/mysql-connector-java-5.1.41-bin.jar /usr/share/java/mysql-connector-java.jar |
3.2.4 使用MySQL脚本初始化root帐号
1 |
/usr/bin/mysql_secure_installation |
按照选项依次输入
1 2 3 4 5 |
Set root password? [Y/n] y Remove anonymous users? [Y/n] y Disallow root login remotely? [Y/n] n Remove test database and access to it? [Y/n] y Reload privilege tables now? [Y/n] y |
3.2.5 确保MySQL已经设置开机启动
1 2 3 |
/sbin/chkconfig mysqld on /sbin/chkconfig --list mysqld mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off |
3.2.6 为Activity Monitor等CDH的组件 创建数据库和对应的用户
注意: 原文中的数据库清单不全, 缺少hue和scm, 准确清单如下:
Role | Database | User | Password |
---|---|---|---|
Activity Monitor | amon | amon | amon_password |
Reports Manager | rman | rman | rman_password |
Hive Metastore Server | hive | hive | hive_password |
Sentry Server | sentry | sentry | sentry_password |
Cloudera Navigator Audit Server | nav | nav | nav_password |
Cloudera Navigator Metadata Server | navms | navms | navms_password |
Cloudera Manager | scm | scm | scm_password |
Oozie | oozie | oozie | oozie_password |
Hue | hue | hue | hue_password |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
CREATE DATABASE scm DEFAULT CHARACTER SET utf8; grant all on scm.* TO 'scm'@'%' IDENTIFIED BY 'scm_password'; CREATE DATABASE amon DEFAULT CHARACTER SET utf8; grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'amon_password'; CREATE DATABASE rman DEFAULT CHARACTER SET utf8; grant all on rman.* TO 'rman'@'%' IDENTIFIED BY 'rman_password'; CREATE DATABASE hive DEFAULT CHARACTER SET utf8; grant all on hive.* TO 'hive'@'%' IDENTIFIED BY 'hive_password'; CREATE DATABASE sentry DEFAULT CHARACTER SET utf8; grant all on sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry_password'; CREATE DATABASE nav DEFAULT CHARACTER SET utf8; grant all on nav.* TO 'nav'@'%' IDENTIFIED BY 'nav_password'; CREATE DATABASE navms DEFAULT CHARACTER SET utf8; grant all on navms.* TO 'navms'@'%' IDENTIFIED BY 'navms_password'; CREATE DATABASE oozie DEFAULT CHARACTER SET utf8; grant all on oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie_password'; CREATE DATABASE hue DEFAULT CHARACTER SET utf8; grant all on hue.* TO 'hue'@'%' IDENTIFIED BY 'hue_password'; FLUSH PRIVILEGES; |
1 |
yum install cloudera-manager-daemons cloudera-manager-server cloudera-manager-agent |
4.1.1 在Master节点执行scm_prepare_database.sh(建议)
为了避免数据库错误, 建议安装Cloudera Manager Server后, 执行/usr/share/cmf/schema/scm_prepare_database.sh 这个初始化数据库文件,参见: 原文链接
1 |
/usr/share/cmf/schema/scm_prepare_database.sh mysql scm scm scm_password |
4.1.2 启动Cloudera Manager Server
1 |
service cloudera-scm-server start |
4.1.3 Cloudera Manager Server 日志
1 |
tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log |
4.1.4 启动Cloudera Manager Agent
1 |
service cloudera-scm-agent start |
4.1.5 Cloudera Manager Agent日志
1 |
tail -f /var/log/cloudera-scm-agent/cloudera-scm-agent.log |
1 |
yum install cloudera-manager-daemons cloudera-manager-agent |
4.2.1 配置Cloudera Manager Agent的Server地址
修改 /etc/cloudera-scm-agent/config.ini 中的server_host为master的IP, (注意填Private IP或者机器名)
1 |
vim /etc/cloudera-scm-agent/config.ini |
改为:
1 2 3 4 5 6 |
[General] # Hostname of the CM server. server_host=cdh1 # Port that the CM server is listening on. server_port=7182 |
4.2.2 启动Cloudera Manager Agent
1 |
service cloudera-scm-agent start |
如果日志没有报错, 那么进行下一步
4.3.1 Unable to retrieve non-local address
“ScmActive-0:com.cloudera.server.cmf.components.ScmActive: ScmActive: Unable to retrieve non-local non-loopback IP address. Seeing address: cdh1/127.0.0.1.”
解决: 修改/etc/hosts 将 127.0.0.1 cdh1 这一行注释掉, 保留内网IP和机器名的映射即可
4.3.2 腾讯云服务器重启后/etc/hosts 文件被重置
解决: 增加shell脚本, 启动时将映射表执行写入/etc/hosts
打开 http://masterHost:7180 (cloudera-manager-server所在服务器的公网IP), 进入管理页面进行安装
在登陆页面输入 帐号密码: admin/admin
在条款页面 点击同意
在“欢迎使用 Cloudera Manager”页面, 选择Cloudera Express版本
然后进入下个页面时应在“当前管理的主机”内看到刚才的服务器清单, 点击全选, 进入下一步
进行CDH Components的选择, 默认选择即可
然后会进入安装界面
等待完成即可(速度取决于内网分发速度)
此时会发现CDH的下载速度非常慢, 大约会需要数十个小时。
通过手动上传parcel包解决:
分别将下面的三个文件上传到cdh1 (即Cloudera Manager Server所在服务器)
1 2 3 |
scp CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel cdh1:/home/ scp CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 cdh1:/home/ scp manifest.json cdh1:/home/ |
登陆cdh1, 复制这三个文件到/opt/cloudera/parcel-repo下,并将CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 改名为CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha
1 2 3 |
cp /home/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel /opt/cloudera/parcel-repo/ cp /home/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 /opt/cloudera/parcel-repo/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha #非常重要,将.sha1重命名为.sha文件 cp /home/manifest.json /opt/cloudera/parcel-repo/ |
设置文件权限
1 2 |
cd /opt/cloudera/parcel-repo chown cloudera-scm:cloudera-scm ./* |
删除MySQL的scm数据库,并重新创建
1 2 3 |
DROP DATABASE scm; CREATE DATABASE scm DEFAULT CHARACTER SET utf8; GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm_password'; |
重新执行
1 |
/usr/share/cmf/schema/scm_prepare_database.sh mysql scm scm scm_password |
删除所有cloudera-scm-agent服务器上的cm_guid
1 |
rm -f /var/lib/cloudera-scm-agent/cm_guid |
重启cloudera-scm-server 和 cloudera-scm-agent
1 2 |
service cloudera-scm-server restart service cloudera-scm-agent restart |
重新打开 http://masterHost:7180 进行安装
安装后会进入“检查主机正确性”的步骤, 例如下图:
5.3.1 解决swappiness过高的问题
在所有节点上执行:
1 |
echo 10 > /proc/sys/vm/swappiness |
5.3.2 解决“透明页面压缩”的设置问题
在所有节点上执行下面两行命令 并添加到/etc/rc.local 下:
1 2 |
echo never > /sys/kernel/mm/transparent_hugepage/defrag echo never > /sys/kernel/mm/transparent_hugepage/enabled |
5.3.3 解决”系统之间存在不匹配的版本,这将导致失败”的问题
TODO:
暂选择核心hadoop, 后面可以追加组件
一般不必改动
根据 3.2.6 的 创建数据库步骤, 填入对应的数据库名, 用户和密码
保持默认即可
7.1.1 HDFS报错“副本不足的块 744块”的问题
原因: 设置的副本备份数与DataNode的个数不匹配。
说明: dfs. replication属性默认是3,也就是说副本数—块的备份数默认为3份, 但是集群只有两个DataNode, 所以导致副本备份不足。
解决办法:
设置目标备份数为2
1.1 点击集群 -> HDFS -> 配置
1.2 搜索dfs. replication,设置为2后保存更改。
在Master服务器更改当前备份数设置
1 2 |
su hdfs hadoop fs -setrep -R 2 / |