这篇文章中的部署方案主要参考cloudera官方手册
1 集群部署思路
下面是一个脑图展示了一般的部署思路
http://naotu.baidu.com/file/f78248ad91fcc2019bfc16faf3987f31?token=d163a5b78d501c2d
2 以CDH5_12版本为例的部署方案
部署方案为parcles 离线部署 官方参考手册
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_install_path_b.html
2.1 确定环境
在官网可以查看到需求的环境信息
https://www.cloudera.com/downloads/manager/5-12-0.html
环境 | 版本 | 下载地址 |
---|---|---|
操作系统 | redhat7.2 | 链接:http://pan.baidu.com/s/1pLTvDYb 密码:5vsc |
数据库 | mysql5.7 | https://www.mysql.com/downloads/ |
jdk | oracle: jdk-8u121-linux-x64 | http://www.oracle.com/technetwork/java/javase/downloads/java-archive-javase8-2177648.html |
浏览器 | chrome | http://sw.bos.baidu.com/sw-search-sp/software/252cc80b36e05/ChromeStandalone_60.0.3112.78_Setup.exe |
CDH | 5.12 | http://archive.cloudera.com/cdh5/parcels/5.12.0 |
CM | 5.12 | http://archive.cloudera.com/cm5/redhat/7/x86_64/cm/5.12.0/RPMS/x86_64/ |
spark2 | 2.2.0 | http://archive.cloudera.com/spark2/ |
kafka | 2.2.0-1.2.2.0.p0.68-el7 | http://archive.cloudera.com/kafka/parcels/2.2.0/ |
2.2 CDH以及CM 需下载对应的包文件
- 对应redhat7.2 应该下载以下安装包
安装包 | 文件名 |
---|---|
cm安装包 | cloudera-manager-agent-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm、cloudera-manager-daemons-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm、cloudera-manager-server-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm |
cdh安装包 | CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel、CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1、manifest.json |
spark2 | SPARK2_ON_YARN-2.2.0.cloudera1.jar、SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-el7.parcel、SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354-el7.parcel.sha1、manifest.json |
kafka | KAFKA-2.2.0-1.2.2.0.p0.68-el7.parcel、KAFKA-2.2.0-1.2.2.0.p0.68-el7.parcel.sha1、manifest.json |
2.3 集群规划
官方集群角色规划建议https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_host_allocations.html
主机名 | IP地址 | 角色 |
---|---|---|
jp-hadoop-00 | 192.168.1.240 | cms Master Hosts mysql |
jp-hadoop-01 | 192.168.1.241 | worker hosts(界面) |
jp-hadoop-02 | 192.168.1.242 | worker hosts 主NN |
jp-hadoop-03 | 192.168.1.243 | worker hosts tools |
jp-hadoop-04 | 192.168.1.244 | worker hosts |
jp-hadoop-05 | 192.168.1.245 | worker hosts |
jp-hadoop-06 | 192.168.1.246 | worker hosts 备NN SNN |
jp-hadoop-07 | 192.168.1.247 | worker hosts |
jp-hadoop-08 | 192.168.1.248 | worker hosts |
jp-hadoop-09 | 192.168.1.249 | worker hosts |
jp-mysql-01 | 192.168.1.55 | mysql |
2.4 安装系统以及系统设置
2.4.1 安装Redhat7.2
- 选择最小化安装
- 磁盘分区
分区 | 磁盘大小 |
---|---|
boot | 500m |
/ | 100g+ |
/home | 50g+ |
/var | 50g+ |
/dfs | 100g+ |
2.4.2 修改主机名
分别修改每一台主机的主机名,方法如下
hostname 主机名称 #临时修改
hostnamectl set-hostname 主机名称 #永久修改
# vi /etc/hostname 在里面写入主机名 #编辑配置也可更改为永久
localhost.localdomain
# vi /etc/sysconfig/network
NETWORKING=yes
NETWORKING_IPV6=no
HOSTNAME=localhost.localdomain
==修改后需要重新连接shell主机名才能生效==
2.4.3 配置IP地址
vi /etc/sysconfig/network-scripts/ifcfg-eth*****
添加如下信息
TYPE=Ethernet
BOOTPROTO=static #修改为静态
DEFROUTE=yes
PEERDNS=yes
PEERROUTES=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=no
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes
IPV6_FAILURE_FATAL=no
NAME=eth0
UUID=11a27cf6-7589-4220-8f93-8471d3c789fc
DEVICE=eth0
ONBOOT=yes #开机启动
IPADDR=192.168.1.243
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS=180.76.76.76
2.4.4 配置hosts
编辑 /etc/hosts文件 添加如下内容
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.240 jp-hadoop-00
192.168.1.241 jp-hadoop-01
192.168.1.242 jp-hadoop-02
192.168.1.243 jp-hadoop-03
192.168.1.244 jp-hadoop-04
192.168.1.245 jp-hadoop-05
192.168.1.246 jp-hadoop-06
192.168.1.247 jp-hadoop-07
192.168.1.248 jp-hadoop-08
192.168.1.249 jp-hadoop-09
192.168.1.55 jp-mysql-01
2.4.5 配置ssh免密登陆
-
配置 ssh
vi /etc/ssh/sshd_config, 找到以下内容,去掉前面的注释’#’ RSAAuthentication yes #启用 RSA 认证 PubkeyAuthentication yes #启用公钥私钥配对认证方式 AuthorizedKeysFile .ssh/authorized_keys #公钥文件路径
-
重启sshd服务
service sshd restart
-
在每一台主机生成ssh 公钥和私钥,执行命令,加三个回车
ssh-keygen -t rsa
-
分别将每一台机器的公钥文件值放到jp-hadoop-00:/root/.ssh/authorized_keys 文件中
scp .ssh/id_rsa.pub [email protected]:/root/id_rsa.pub cat id_rsa.pub >> /root/.ssh/authorized_keys
2.4.6 每一台主机分别执行脚本配置参数
-
执行脚本
#关闭防火墙 sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config systemctl stop firewalld systemctl disable firewalld #调整内核参数 echo 1 > /proc/sys/vm/overcommit_memory #调整ulimit和nproc sed -i '49d' /etc/security/limits.conf sed -i '/4/aroot - nofile 327680' /etc/security/limits.conf rm -rf /etc/pam.d/common-session echo 'session required pam_limits.so' > /etc/pam.d/common-session #调整内存交换 echo 'vm.swappiness=0' >> /etc/sysctl.conf #调整基础环境 echo never > /sys/kernel/mm/transparent_hugepage/defrag echo never > /sys/kernel/mm/transparent_hugepage/enabled echo 'echo never > /sys/kernel/mm/transparent_hugepage/defrag' >> /etc/rc.local echo 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' >> /etc/rc.local
-
centos 7 禁用ipv6
1. 通过命令: Check to see if you’re installation is currently set up for IPv6: # cat /proc/sys/net/ipv6/conf/all/disable_ipv6 If the output is 0, IPv6 is enabled. If the output is 1, IPv6 is already disabled. 2. 临时禁用ipv6 vi /etc/sysctl.conf 添加下面的行: net.ipv6.conf.all.disable_ipv6 = 1 net.ipv6.conf.default.disable_ipv6 = 1 执行命令 #sysctl -p 3. 永久禁用 # vim /etc/default/grub 在第6行 加上 GRUB_CMDLINE_LINUX="ipv6.disable=1" # grub2-mkconfig -o /boot/grub2/grub.cfg # reboot
[图片上传失败...(image-4badfa-1551970449869)]
2.4.7 配置yum镜像
上传redhat7.2镜像到jp-hadoop-00 的/opt目录下
-
执行命令挂载redhat 本地源
mkdir -p /mnt/redhat7 mount -o loop -t iso9660 /opt/rhel-server-7.2-x86_64-dvd.iso /mnt/redhat7
-
添加本地redhat yum配置
vim /etc/yum.repos.d/redhat7.repo [redhat7] name=redhat7 baseurl=file:///mnt/redhat7 gpgcheck=0
-
安装httpd 服务
yum install httpd systemctl start httpd
-
添加httpd链接
ln -s /mnt/redhat7/ /var/www/html/redhat7 通过网页 http://jp-hadoop-00/redhat7 可以访问即可
-
在所有主机添加yum 源
vim /etc/yum.repos.d/redhat_html.repo [redhat7] name=redhat7 baseurl=http://192.168.1.240/redhat7 gpgcheck=0
-
上传文件到主机
在jp-hadoop-00上,上传文件在jp-hadoop-00上,上传文件 在主机新建目录 /opt/cloudera/cm-5.12.0 上传文件 cloudera-manager-agent-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm cloudera-manager-server-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm cloudera-manager-daemons-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm jdk-8u121-linux-x64.rpm 在主机新建目录 /opt/cloudera/parcel-repo/ 上传文件 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1 ###将名称修改一下,去掉最后的1 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha manifest.json CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.torrent # 将它权限改为777 在所有主机新建目录 /opt/cloudera/cm-5.12.0 上传文件 cloudera-manager-agent-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm cloudera-manager-daemons-5.12.0-1.cm5120.p0.120.el7.x86_64.rpm jdk-8u121-linux-x64.rpm 在所有主机新建目录 /opt/cloudera/parcel-repo/ 上传文件 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha1 ###将名称修改一下,去掉最后的1 CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.sha manifest.json CDH-5.12.0-1.cdh5.12.0.p0.29-el7.parcel.torrent # 将它权限改为777
-
在jp-hadoop-00上创建cdh内网yum源
yum install createrepo -y cd /opt/cloudera/cm-5.12.0 createrepo . ln -s /opt/cloudera/cm-5.12.0 /var/www/html/cms
-
在所有主机添加cms源
vim /etc/yum.repos.d/cms.repo [cms] name=cms baseurl=http://192.168.1.240/cms gpgcheck=0 enabled=0
-
重新生成yum缓存
yum clean all yum makecache
2.4.8 配置ntp服务
-
在所有主机安装ntp服务
yum install ntp
配置一台内网主ntp server,选择jp-hadoop-00为ntp server
```
编辑 /etc/ntp.conf,修改如下
# Hosts on local network are less restricted.
# 允许内网其他机器同步时间(前者是内网的网关,后者是子网掩码)
restrict 192.168.1.255 mask 255.255.255.0 nomodify notrap
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
# 中国公安网NTP服务器IP地址(亦可以填写其他标准服务器)
server 202.120.2.101 prefer
# ntp.sjtu.edu.cn 202.120.2.101 (上海交通大学网络中心NTP服务器地址)
# allow update time by the upper server
# 允许上层时间服务器主动修改本机时间,填写标准时间服务器IP
# restrict 100.2.8.68 nomodify notrap noquery
# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available.
# 外部时间服务器不可用时,以本地时间作为时间服务
server 127.127.1.0
fudge 127.127.1.0 stratum 10
```
- 重启ntpd服务
```
systemctl restart ntpd
```
-
配置其它ntp 主机
同样配置/etc/ntp.conf # Please consider joining the pool (http://www.pool.ntp.org/join.html). # 配置时间服务器为本地的时间服务器(此处的地址是NTP-Server的地址) server jp-hadoop-00 # allow update time by the upper server #restrict jp-hadoop-00 nomodify notrap noquery # Undisciplined Local Clock. This is a fake driver intended for backup # and when no outside source of synchronized time is available. server 127.127.1.0 fudge 127.127.1.0 stratum 10
-
手动调整时间
ntpdate -u jp-hadoop-00
-
重启ntpd服务
systemctl restart ntpd
-
查看ntpd状态
ntpq -p
2.5 安装jdk1.8
-
在所有主机执行如下步骤
$ cd /opt/cloudera/cm-5.12.0 $ rpm -ivh jdk-8u121-linux-x64.rpm $ vim /etc/profile PATH=$PATH:$HOME/.local/bin:$HOME/bin export JAVA_HOME=/usr/java/jdk1.8.0_121 export JRE_HOME=$JAVA_HOME/jre export PATH=$JAVA_HOME/bin:$JRE_HOME/bin/:$PATH $ source /etc/profile $ java -version
2.6 安装cm和cdh rpm包
在所有主机上执行
```
cd /opt/cloudera/cm-5.12.0
yum localinstall cloudera-*.rpm
```
2.7 安装外置数据库mysql
见mysql数据库安装
2.8 mysql数据库配置
官方mysql数据库需求:https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_mysql.html#cmig_topic_5_5_2
设置mysql为默认数据库: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_installing_configuring_dbs.html#concept_i2r_m3m_hn
2.8.1 mysql配置文件 编辑 /etc/my.cnf文件 内容如下
# For advice on how to change settings please see
# http://dev.mysql.com/doc/refman/5.7/en/server-configuration-defaults.html
[mysqld]
#set by chenyangang
transaction-isolation = READ-COMMITTED
#
# Remove leading # and set to the amount of RAM for the most important data
# cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%.
# innodb_buffer_pool_size = 128M
#
# Remove leading # to turn on a very important data integrity option: logging
# changes to the binary log between backups.
# log_bin
#
# Remove leading # to set options mainly useful for reporting servers.
# The server defaults are faster for transactions and fast SELECTs.
# Adjust sizes as needed, experiment to find the optimal values.
# join_buffer_size = 128M
# sort_buffer_size = 2M
# read_rnd_buffer_size = 2M
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
#symbolic-links=0 #set by chenyangang
#set by chenyangang
key_buffer_size = 32M
max_allowed_packet = 512M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
#log_bin=/var/lib/mysql/mysql_binary_log
#set by chenyangang
# For MySQL version 5.1.8 or later. For older versions, reference MySQL documentation for configuration help.
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
#set by chenyangang
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 2G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
#set by chenyangang
sql_mode=STRICT_ALL_TABLES
#set by chenyangang
wait_timeout=31536000
interactive_timeout=31536000
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
2.8.2 mysql 数据库用户 以及 数据库配置
ROLE | Database | User | Password |
---|---|---|---|
Activity Monitor | amon | amon | amon |
Reports Manager | rman | rman | rman |
Hive Metastore Server | metastore | hive | hive |
Sentry Server | sentry | sentry | sentry |
Cloudera Navigator Audit Server | nav | nav | nav12 |
Cloudera Navigator Metadata Server | navms | navms | navms |
create database amon DEFAULT CHARACTER SET utf8;
create database rman DEFAULT CHARACTER SET utf8;
create database metastore DEFAULT CHARACTER SET utf8;
create database sentry DEFAULT CHARACTER SET utf8;
create database nav DEFAULT CHARACTER SET utf8;
create database navms DEFAULT CHARACTER SET utf8;
create database hue DEFAULT CHARACTER SET utf8;
create database oozie DEFAULT CHARACTER SET utf8;
set global validate_password_policy=0; #设置为纯数字纯字母等密码
set global validate_password_length=2; #设置密码最低要求
CREATE USER 'amon' IDENTIFIED BY 'amon';
CREATE USER 'rman' IDENTIFIED BY 'rman';
CREATE USER 'hive' IDENTIFIED BY 'hive';
CREATE USER 'sentry' IDENTIFIED BY 'sentry';
CREATE USER 'nav' IDENTIFIED BY 'nav12';
CREATE USER 'navms' IDENTIFIED BY 'navms';
CREATE USER 'hue' IDENTIFIED BY 'hue12';
CREATE USER 'oozie' IDENTIFIED BY 'oozie';
GRANT ALL PRIVILEGES ON *.* TO 'amon'@'%' IDENTIFIED BY 'amon' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'rman'@'%' IDENTIFIED BY 'rman' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'hive'@'%' IDENTIFIED BY 'hive' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'sentry'@'%' IDENTIFIED BY 'sentry' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'nav'@'%' IDENTIFIED BY 'nav12' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'navms'@'%' IDENTIFIED BY 'navms' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'hue'@'%' IDENTIFIED BY 'hue12' WITH GRANT OPTION;
GRANT ALL PRIVILEGES ON *.* TO 'oozie'@'%' IDENTIFIED BY 'oozie' WITH GRANT OPTION;
flush privileges ;
2.8.3 将 mysql-connector-java.jar 包文件放到 /usr/share/java/ 目录下
下载地址链接:http://pan.baidu.com/s/1hsb1qYC 密码:d59v
2.8.4 选择mysql为默认数据库
官方手册https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ig_installing_configuring_dbs.html#concept_i2r_m3m_hn
如果存在文件/etc/cloudera-scm-server/db.mgmt.properties 则删掉文件/etc/cloudera-scm-server/db.mgmt.properties
在jp-hadoop-00上执行脚本
形式
/usr/share/cmf/schema/scm_prepare_database.sh database-type [options] database-name username password
示例
/usr/share/cmf/schema/scm_prepare_database.sh mysql -h jp-mysql-01 -utemp -ptemp --scm-host jp-hadoop-00 scm scm scm12
2.9 开始安装集群
2.9.1 启动集群
- 修改所有主机/etc/cloudera-scm-agent/config.ini 配置文件 ,配置server_host=jp-hadoop-00
在jp-hadoop-00 上启动
/etc/init.d/cloudera-scm-server start
/etc/init.d/cloudera-scm-agent start
在其它主机启动
/etc/init.d/cloudera-scm-agent start
2.9.2 日志目录
日志文件夹在 /var/log/cloudera-scm-*
3.集群集成安装
spark2 集成安装https://www.cloudera.com/documentation/spark2/latest/topics/spark2_installing.html
3.1 集群详细角色分配方案
平台部署方案可以参考
http://note.youdao.com/noteshare?id=48bee275a7200edd1d7e1db708c69ac2&sub=1C92B5E06BBB4741B50AC196D8DE2D41
3.2 kafka集成安装到cloudera
http://note.youdao.com/noteshare?id=c20162f3004508dfae7db437ae13db31&sub=BE58F970A2F24EF59C12A882E2D5252C
3.3 spark2集成安装到cloudera
http://note.youdao.com/noteshare?id=33890dfa2ae11757e2ca4f55b705d8c3&sub=B1E205D7D6034A29BA57076EF8873BAA
3.4 配置hue来测试
http://note.youdao.com/noteshare?id=eeb1422b47558cd10380d7e3ee64f9c7&sub=DBABAE532FEC4AE2BAFC2C491F15853E
3.5 安装phoenix
http://note.youdao.com/noteshare?id=e4e894e46bae493a5d92f982165ae2f6&sub=A89B6C9AAD4B4C659F08FBAE2630617D
3.6 streamsets配置
http://note.youdao.com/noteshare?id=2ab703d1eda166cf55ab452d8c3699ca&sub=3D96E381B4D649EC8FD0A1BF4450FCA4