Centos7.5+CDH 6.2搭建大数据平台

1.CDH介绍

目前Hadoop比较流行的主要有2个版本,Apache和Cloudera版本。

  • Apache Hadoop:社区人员比较多,更新频率比较快,但是稳定性比较差,安装配置繁琐,实际使用者少。
  • Cloudera Hadoop(CDH):Cloudera公司的发行版本,基于Apache Hadoop的二次开发,优化了组件兼容和交互接口、简化安装配置、提供界面统一管理程序。

2.Cloudera Manager 介绍

Cloudera Manager 是用于管理cdh集群的端到端应用程序,统一管理和安装。CDH除了可以通过cm安装也可以通过yum,tar,rpm安装。主要由如下几部分组成:

  • 服务端/Server:
    Cloudera Manager 的核心。主要用于管理 web server 和应用逻辑。它用于安装软件,配置,开始和停止服务,以及管理服务运行的集群。

  • 代理/agent:
    安装在每台主机上。它负责启动和停止进程,部署配置,触发安装和监控主机。

  • 数据库/Database:
    存储配置和监控信息。通常可以在一个或多个数据库服务器上运行的多个逻辑数据库。例如,所述的 Cloudera 管理器服务和监视,后台程序使用不同的逻辑数据库。
    Cloudera Repository:由cloudera manager 提供的软件分发库。

  • 客户端/Clients:
    提供了一个与 Server 交互的接口。


3.环境准备

3.1.节点准备(四个节点)

主机名 IP CM管理软件
nn01 192.168.18.110 Cloudera Manager Server&Agent ,MySQL
dn01 192.168.18.111 Cloudera Manager Agent
dn02 192.168.18.112 Cloudera Manager Agent
dn03 192.168.18.113 Cloudera Manager Agent

3.2.配置主机名和hosts解析(所有节点)

编辑/etc/hostname,修改主机名,并使用命令hostname使其立刻生效。编辑文件/etc/hosts,增加如下内容。

192.168.18.110 nn01.yunlu.cn nn01
192.168.18.111 dn01.yunlu.cn dn01
192.168.18.112 dn02.yunlu.cn dn02
192.168.18.113 dn03.yunlu.cn dn03

3.3.关闭防火墙

# systemctl stop firewalld.service && systemctl disable firewalld.service

3.4.关闭SELinux

# sed -i 's#SELINUX=enforcing#SELINUX=disabled#g' /etc/selinux/config
# setenforce 0

3.5.配置时间同步

chrony既可作时间服务器服务端,也可作客户端。chrony性能比ntp要好很多,且chrony配置简单、管理方便。 但是此次我们采用定时任务同步网络时间的方法。

  • 添加定时任务
# echo "$((RANDOM%60)) $((RANDOM%24)) * * * /usr/sbin/ntpdate time1.aliyun.com" >> /var/spool/cron/root

3.6.禁用透明大页面压缩,CDH配置需要

# echo never > /sys/kernel/mm/transparent_hugepage/defrag
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
  • 并将上面的两条命令写入开机自启动/etc/rc.local
tee -a /etc/rc.local <<-'EOF'
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
EOF

3.7.优化交换分区

# echo "vm.swappiness = 10" >> /etc/sysctl.conf
# sysctl -p

3.8.配置SSH免密登录

  • 所有节点执行如下命令(四次回车):
# ssh-keygen -t rsa 
  • 用拷贝的方法分发秘钥,所有节点执行如下命令:
# ssh-copy-id [nn01,dn01-dn03]

总共四次拷贝,每次拷贝按提示输入yes和相应节点的密码。


4.安装 CM 和 CDH

4.1.配置 Cloudera Manager 仓库(所有节点)

# wget https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/cloudera-manager.repo -P /etc/yum.repos.d/
# rpm --import https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPM-GPG-KEY-cloudera

使用在线安装会比较慢,建议先把需要的rpm下载下来,进行离线安装或者建私有仓库,涉及下面三个软件包:

  • 下载地址:https://archive.cloudera.com/cm6/6.2.0/redhat7/yum/RPMS/x86_64/
cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm    
cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm
cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm

4.2.配置 JDK (所有节点)

  • 下载地址:https://repo.huaweicloud.com/java/jdk/8u202-b08/
# rpm -ivh jdk-8u202-linux-x64.rpm

4.3.安装 CM Server 和 Agent

建议离线安装,把rpm包下载到服务器上面,传到其他节点一份,再本地安装,速度会快很多。

  • nn01:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-server-6.2.0-968826.el7.x86_64.rpm -y
  • dn[01-03]:
# yum localinstall cloudera-manager-daemons-6.2.0-968826.el7.x86_64.rpm -y
# yum localinstall cloudera-manager-agent-6.2.0-968826.el7.x86_64.rpm -y

4.4.安装MySQL数据库(在nn01节点)

此次安装 mysql 是按照官网教程安装的,链接地址:
https://www.cloudera.com/documentation/enterprise/6/6.0/topics/cm_ig_mysql.html#cmig_topic_5_5

# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
# rpm -ivh mysql-community-release-el7-5.noarch.rpm
# yum install mysql-server -y
# systemctl start mysqld
  • 查看状态


    在这里插入图片描述
  • 可选步骤。根据官方推荐的配置,编辑文件/etc/my.cnf,修改成如下内容:
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
symbolic-links = 0
 
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
 
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
 
#log_bin should be on a disk with enough free space.
#Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your
#system and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
 
#In later versions of MySQL, if you enable the binary log and do not set
#a server_id, MySQL will not start. The server_id must be unique within
#the replicating group.
server_id=1
 
binlog_format = mixed
 
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
 
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
 
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
 
sql_mode=STRICT_ALL_TABLES

以上配置的含义,请参考本节开头的文档。

  • 设置MySQL的登录密码,按照相关提示操作即可
# /usr/bin/mysql_secure_installation
  • 将mysql 加到 开机启动中
# systemctl enable mysqld

4.5.安装 MySQL JDBC 驱动(所有节点)

用于各节点连接数据库。

# wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
# tar xf mysql-connector-java-5.1.46.tar.gz
# mkdir -p /usr/share/java/
# cd mysql-connector-java-5.1.46
# cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar

4.6.为 Cloudera 各软件创建数据库(在nn01节点)

Service Database User
Cloudera Manager Server scm scm
Activity Monitor amon amon
Reports Manager rman rman
Sentry Server sentry sentry
Cloudera Navigator Audit Server nav nav
Cloudera Navigator Metadata Server navms navms
Hive Metastore Server hive hive
Hue hue hue
Oozie oozie oozie
  • 将如下内容,写入到cdh.sql文件中
CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm';
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY 'amon';
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY 'rman';
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY 'hue';
CREATE DATABASE hive DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry';
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY 'nav';
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY 'navms';
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';
  • 执行sql文件
# mysql -uroot -p < ./cdh.sql
在这里插入图片描述

4.7.设置 Cloudera Manager 数据库

# /opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm

接着,输入scm数据库密码

4.8.安装 CDH(在nn01节点)

CM安装成功之后,接下来我们就可以通过CM安装CDH的方式构建企业大数据平台。所以首先需要把CDH的parcels包下载到CM主服务器上。同样的,我们为了加速我们的安装,我们可以把需要下载的软件包提前下载下来,也可以创建CDH私有仓库。

  • 下载CDH的软件包 parcels
# cd /opt/cloudera/parcel-repo
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel
# wget https://archive.cloudera.com/cdh6/6.2.0/parcels/manifest.json
  • 生成sha文件
# sha1sum CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel | awk '{ print $1 }' > CDH-6.2.0-1.cdh6.2.0.p0.967373-el7.parcel.sha
  • 修改属主属组
# chown -R cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo/*

4.9.启动 Cloudera Manager Server(在nn01节点)

# systemctl start cloudera-scm-server

如果启动中有什么问题,可以查看日志。

# tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

在最后显示的日志中,有显示启动监听的端口。

Started ServerConnector@da518cb{SSL,[ssl, http/1.1]}{0.0.0.0:7183}
Started ServerConnector@a77165b{HTTP/1.1,[http/1.1]}{0.0.0.0:7180}

5.初始化 Cloudera Manager

稍等下,浏览器打开http://nn01:7180,用户名和密码默认都是admin。

在这里插入图片描述

  • 然后按需,继续下一步操作即可。

5.1.CDH集群安装

  • 按照提示操作即可,一般选默认就行。

5.2.集群设置

  • 数据库设置


    在这里插入图片描述
  • 其它按照提示操作即可,一般选默认就行。

6.参考资料

  • https://xuchao918.github.io/2019/05/31/Centos7%E5%AE%89%E8%A3%85%E5%A4%A7%E6%95%B0%E6%8D%AE%E5%B9%B3%E5%8F%B0CDH-6-2/
  • https://blog.51cto.com/wzlinux/2321433
  • https://www.jianshu.com/p/106739236db4
  • https://blog.csdn.net/wh211212/article/details/78743191
  • http://www.ishenping.com/ArtInfo/321643.html

你可能感兴趣的:(Centos7.5+CDH 6.2搭建大数据平台)