1、CM6.1安装包
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPM-GPG-KEY-cloudera
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-agent-6.1.0-769885.el7.x86_64.rpm
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-daemons-6.1.0-769885.el7.x86_64.rpm
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-server-6.1.0-769885.el7.x86_64.rpm
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-server-db-2-6.1.0-769885.el7.x86_64.rpm
https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/oracle-j2sdk1.8-1.8.0+update141-1.x86_64.rpm
放到master的/var/www/html/cloudera-repos/cm6/6.1.0
2、CDH6.1的parcel安装包
https://archive.cloudera.com/cdh6/6.1.0/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel
https://archive.cloudera.com/cdh6/6.1.0/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha256
https://archive.cloudera.com/cdh6/6.1.0/parcels/manifest.json
放到master的/opt/cloudera/parcel-repo/
3、文档
https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html
Master节点内存>=8G。
host |
mysql |
cm |
cdh |
192.168.1.21 |
* |
|
master |
192.168.1.22 |
|
cm service |
second |
192.168.1.23 |
|
|
worker |
192.168.1.24 |
|
|
worker |
192.168.1.25 |
|
|
worker |
配置域名相关的信息。
1、每个节点的hostname,等效修改/etc/hostname文件
hostnamectl set-hostname cm01
hostnamectl set-hostname cm02
hostnamectl set-hostname cm03
hostnamectl set-hostname cm04
hostnamectl set-hostname cm05
2、每个节点的/etc/hosts
cat /etc/hosts
192.168.1.21 cm01
192.168.1.22 cm02
192.168.1.23 cm03
192.168.1.24 cm04
192.168.1.25 cm05
3、如果用域名,修改/etc/sysconfig/hostname
hostname=domain-name
uname –a需要和hostname得到一致的域名。
systemctl disable firewalld
vi /etc/sysconfig/selinux
SELINUX=disabled
重启生效。
配置NTP服务(实际未用),虚拟机可在选项中与主机同步。
1、安装NTP
yum -y install ntp
2、配置NTP
master配置,(阿里)time1.aliyun.com
vi /etc/ntp.conf
server ntp.sjtu.edu.cn prefer(复旦大学ntp)
slave配置
server master(同步master)
3、开启
systemctl start ntpd
3、查看同步效果
ntpstat
vi /etc/rc.local
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
/etc/rc.local是/etc/rc.d/rc.local的软连接。
chmod 777 /etc/rc.d/rc.local
vi /etc/sysctl.conf
vm.swappiness = 10
vm.max_map_count=262144
执行生效
sysctl -p /etc/sysctl.conf
ssh-keygen -t rsa
chmod 711 .ssh
cat id_rsa.pub >> authorized_keys
chmod 644 authorized_keys
ssh-copy-id id_rsa.pub root@cm02
ssh-copy-id id_rsa.pub root@cm03
ssh-copy-id id_rsa.pub root@cm04
ssh-copy-id id_rsa.pub root@cm05
所有主机互操作一遍。
所有主机rpm安装cloudera的jdk(已下载)。
vi /etc/profile
export JAVA_HOME="/usr/java/jdk1.8.0_141-cloudera"
export CLASSPATH=".:${JAVA_HOME}/lib:${CLASSPATH}"
export PATH="${JAVA_HOME}/bin:${PATH}"
. /etc/profile
mysql的安装包http://repo.mysql.com/yum
yum -y install net-tools perl
与mariadb-libs冲突,卸载rpm -e --nodeps
安装common,libs,client,compat。
安装server。
默认目录
客户端程序和脚本:/usr/bin
mysqld服务器:/usr/sbin
数据:/var/lib/mysql/
错误消息:/usr/share/mysql
配置文件:/etc/my.cnf
1、编辑/etc/my.cnf,官推配置
[mysqld]
server_id=100
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
transaction-isolation = READ-COMMITTED
symbolic-links = 0
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
log_bin=/var/lib/mysql/mysql_binary_log
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
2、检查依赖(自带)
rpm -qa | grep libaio
yum -y install libaio
1、启动
systemctl start mysqld
2、查看临时密码
grep 'temporary password' /var/log/mysqld.log
3、修改密码
mysql -uroot -p
#更改密码策略
mysql> set global validate_password_policy=0;
mysql> set global validate_password_length=1;
#更改密码
mysql> alter user root@localhost identified by 'root';
#赋权远程访问
mysql> grant all privileges on *.* to root@'%' identified by 'root';
mysql> flush privileges;
4、丢失root密码
4.1、在my.cnf的[mysqld]中加入skip-grant-tables=1。
4.2、systemctl start mysqld
4.3、mysql -uroot -p,空密码进入,use mysql。
4.4、update user set authentication_string = password('root'), password_expired='N', password_last_changed=now() where user='root';
4.5、去掉skip-grant-tables。
4.6、另一种方法,建一个文件(未试)。
vi change_password.sql,内容。
alter user root@localhost identified by 'root';
mysqld --defaults-file=/etc/my.cnf --init-file=/root/change_password.sql &
mysql -p -S /tmp/mysql.sock
文件名与下面的一样,所有主机。
放在/usr/share/java/mysql-connector-java.jar
Service |
Database |
User |
Cloudera Manager Server |
scm |
scm |
Activity Monitor |
amon |
amon |
Reports Manager |
rman |
rman |
Hue |
hue |
hue |
Hive Metastore Server |
hive |
hive |
Sentry Server |
sentry |
sentry |
Cloudera Navigator Audit Server |
nav |
nav |
Cloudera Navigator Metadata Server |
navms |
navms |
Oozie |
oozie |
oozie |
create database scm default character set utf8 default collate utf8_general_ci;
create database amon default character set utf8 default collate utf8_general_ci;
create database rman default character set utf8 default collate utf8_general_ci;
create database hue default character set utf8 default collate utf8_general_ci;
create database hive default character set utf8 default collate utf8_general_ci;
create database sentry default character set utf8 default collate utf8_general_ci;
create database nav default character set utf8 default collate utf8_general_ci;
create database navms default character set utf8 default collate utf8_general_ci;
create database oozie default character set utf8 default collate utf8_general_ci;
--建用户
CREATE USER 'scm'@'%' IDENTIFIED BY 'root';
CREATE USER 'amon'@'%' IDENTIFIED BY 'root';
CREATE USER 'rman'@'%' IDENTIFIED BY 'root';
CREATE USER 'hue'@'%' IDENTIFIED BY 'root';
CREATE USER 'hive'@'%' IDENTIFIED BY 'root';
CREATE USER 'sentry'@'%' IDENTIFIED BY 'root';
CREATE USER 'nav'@'%' IDENTIFIED BY 'root';
CREATE USER 'navms'@'%' IDENTIFIED BY 'root';
CREATE USER 'oozie'@'%' IDENTIFIED BY 'root';
--赋权
GRANT ALL PRIVILEGES ON scm.* TO 'scm'@'%';
GRANT ALL PRIVILEGES ON amon.* TO 'amon'@'%';
GRANT ALL PRIVILEGES ON rman.* TO 'rman'@'%';
GRANT ALL PRIVILEGES ON hue.* TO 'hue'@'%';
GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';
GRANT ALL PRIVILEGES ON sentry.* TO 'sentry'@'%';
GRANT ALL PRIVILEGES ON nav.* TO 'nav'@'%';
GRANT ALL PRIVILEGES ON navms.* TO 'navms'@'%';
GRANT ALL PRIVILEGES ON oozie.* TO 'oozie'@'%';
--刷新权限
flush privileges;
yum -y install createrepo
package是指rpm包,用yum会自动检测和安装依赖。
1、建本地访问目录
mkdir -p /var/www/html/cloudera-repos/cm6/6.1.0
2、编辑/etc/httpd/conf/httpd.conf
在
AddType application/x-gzip .gz .tgz .parcel
3、启动
systemctl start httpd
4、将rpm包,放到/var/www/html/cloudera-repos/cm6/6.1.0
所有主机将RPM-GPG-KEY-cloudera放任意位置导入。
rpm --import RPM-GPG-KEY-cloudera
5、生成依赖
cd /var/www/html/cloudera-repos/cm6/6.1.0
createrepo .
6、编辑/etc/yum.repos.d/cloudera-repo.repo
[cloudera-repo]
name = cloudera repo
baseurl=http://cm01/cloudera-repos/cm6/6.1.0
gpgcheck=0
enabled=1
拷其它主机,cloudera-manager.repo是默认的,安装失败会覆盖。
7、更新所有
yum makecache
yum search cloudera
parcel是CM管理的一种package格式。
/opt/cloudera/parcel-repo,是存放parcel文件的。
把CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel文件拷贝至此。
把CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha256改为CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha
把manifest.json里的该parcel文件的hash码替换到sha文件中。
/opt/cloudera/parcels,是安装parcel后的目录。
默认外部访问端口
Component |
Service |
Port |
Description |
CM Server |
HTTP (Web UI) |
7180 |
web console |
HTTPS (Web UI) |
7183 |
HTTPS |
|
Cloudera Navigator Metadata Server |
HTTP (Web UI) |
7187 |
CNMS监听 |
Backup and Disaster Recovery |
HTTP (Web UI) |
7180 |
与CM通信用 |
HTTPS (Web UI) |
7183 |
HTTPS |
|
HDFS NameNode |
8020 |
||
HDFS DataNode |
50010 |
||
Telemetry Publisher |
HTTP |
10110 |
|
Telemetry Publisher |
HTTP (Debug) |
10111 |
默认内部访问端口
Component |
Service |
Port |
Description |
CM Server |
Avro (RPC) |
7182 |
Agent to Server的心跳 |
Embedded PostgreSQL |
7432 |
||
Peer-to-peer parcel distribution |
7190, 7191 |
||
Cloudera Manager Agent |
HTTP (Debug) |
9000 |
|
Event Server |
Custom protocol |
7184 |
|
Custom protocol |
7185 |
||
HTTP (Debug) |
8084 |
||
Alert Publisher |
Custom protocol |
10101 |
|
Service Monitor |
HTTP (Debug) |
8086 |
|
HTTPS (Debug) |
|||
Custom protocol |
9997 |
||
Internal query API (Avro) |
9996 |
||
Activity Monitor |
HTTP (Debug) |
8087 |
|
HTTPS (Debug) |
|||
Custom protocol |
9999 |
||
Internal query API (Avro) |
9998 |
||
Host Monitor |
HTTP (Debug) |
8091 |
|
HTTPS (Debug) |
9091 |
||
Custom protocol |
9995 |
||
Internal query API (Avro) |
9994 |
||
Reports Manager |
Queries (Thrift) |
5678 |
|
HTTP (Debug) |
8083 |
||
Cloudera Navigator Audit Server |
HTTP |
7186 |
|
HTTP (Debug) |
8089 |
所有主机。
yum -y install bind-utils psmisc cyrus-sasl-plain cyrus-sasl-gssapi portmap httpd mod_ssl openssl-devel python-psycopg2 libpq.so.5 MySQL-python /lib/lsb/init-functions fuse net-tools perl
yum -y install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server
安装后自动创建cloudera-scm用户。
yum -y install cloudera-manager-daemons cloudera-manager-agent
修改指向cm Server的主机和端口(7182)。
vi /etc/cloudera-scm-agent/config.ini
server_host=cm01
在cm主机,自动创建证书。可选,配置文件需tls=1
sudo JAVA_HOME=/usr/java/jdk1.8.0_141-cloudera /opt/cloudera/cm-agent/bin/certmanager --location /opt/cloudera/CMCA setup --configure-services
1、mysql在本机
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm root
2、mysql在其它机器
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h cm02 --scm-host cm01 scm root
需要输入scm的密码,在数据库创建语句中scm的密码被设置成了root。
1、在master上
systemctl start cloudera-scm-server
2、在所有主机上
systemctl start cloudera-scm-agent
查看端口netstat -tnlp
3、将CHD6的Parcel包放到/opt/cloudera/parcel-repo/。
4、登录Cloudera Manager Admin Console。
http://cm01:7180
用户名密码默认为admin。
5、可调整内存
vi /etc/default/cloudera-scm-server,Xmx >= 2G。
vi /opt/cloudera/cm/bin/cm-server,maxheap。
1、选择本地库及安装的软件包。
如果hdfs格式化失败,就再执行一次。
systemctl stop cloudera-scm-server
systemctl stop cloudera-scm-agent
yum -y remove 'cloudera-manager-*'
yum clean all
umount cm_processes
umount /var/run/cloudera-scm-agent/process
rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera* /var/run/cloudera*
rm -rf /tmp/.scmpreparenode.lock
rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper
rm -Rf datadrivepath/dfs datadrivepath/mapred datadrivepath/yarn
rm -rf /var/lib/hadoop-* /var/lib/impala /var/lib/solr /var/lib/zookeeper /var/lib/hue /var/lib/oozie /var/lib/pgsql /var/lib/sqoop2 /data/dfs/ /data/impala/ /data/yarn/ /dfs/ /impala/ /yarn/ /var/run/hadoop-*/ /var/run/hdfs-*/ /usr/bin/hadoop* /usr/bin/zookeeper* /usr/bin/hbase* /usr/bin/hive* /usr/bin/hdfs /usr/bin/mapred /usr/bin/yarn /usr/bin/sqoop* /usr/bin/oozie /etc/hadoop* /etc/zookeeper* /etc/hive* /etc/hue /etc/impala /etc/sqoop* /etc/oozie /etc/hbase* /etc/hcatalog
systemctl stop mysqld.service
yum -y remove mysql
rm -rf /var/lib/mysql
rm -rf /var/log/mysqld.log
rm -rf /var/lib/mysql/mysql
rm -rf /usr/lib64/mysql
rm -rf /usr/share/mysql
rm -rf /opt/cloudera
rpm -qa | grep -i mysql
卸载MySQL相关的文件。
群集主机分为4个类型:
master主机,运行主进程,如NameNode和Resource Manager。
程序主机,运行不是主进程的其它进程,如CM和Hive Metastore。
网关主机,用于在群集中启动作业的客户端访问点。所需的网关主机数量取决于工作负载的类型和大小。
worker主机,运行DataNode和其它分布式进程。
3 - 10 Worker Hosts without High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode ResourceManager JobHistory Server ZooKeeper Kudu master Spark History Server |
Host 1: Secondary NameNode Cloudera Manager Cloudera Manager Management Service Hive Metastore HiveServer2 Impala Catalog Server Impala StateStore Hue Oozie Flume Gateway |
3-10 Hosts: DataNode NodeManager Impalad Kudu tablet server |
3 - 20 Worker Hosts with High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode JournalNode FailoverController ResourceManager ZooKeeper JobHistory Server Spark History Server Kudu master
Host 2: NameNode JournalNode FailoverController ResourceManager ZooKeeper Kudu master
Host 3: Kudu master HA 要奇数个 |
Host 1: Cloudera Manager Cloudera Manager Management Service Hive Metastore Impala Catalog Server Impala StateStore Oozie ZooKeeper JournalNode |
Hosts 1-n: Hue HiveServer2 Flume Gateway |
3-20 Hosts: DataNode NodeManager Impalad Kudu tablet server |
20 - 80 Worker Hosts with High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode JournalNode FailoverController ResourceManager ZooKeeper Kudu master
Host 2: NameNode JournalNode FailoverController ResourceManager ZooKeeper Kudu master
Host 3: ZooKeeper JournalNode JobHistory Server Spark History Server Kudu master |
Host 1: Cloudera Manager
Host 2: Cloudera Manager Management Service Hive Metastore Impala Catalog Server Oozie |
Hosts 1-n: Hue HiveServer2 Flume Gateway |
20-80 Hosts: DataNode NodeManager Impalad Kudu tablet server |
80 - 200 Worker Hosts with High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode JournalNode FailoverController ResourceManager ZooKeeper Kudu master
Host 2: NameNode JournalNode FailoverController ResourceManager ZooKeeper Kudu master
Host 3: ZooKeeper JournalNode JobHistory Server Spark History Server Kudu master |
Host 1: Cloudera Manager
Host 2: Hive Metastore Impala Catalog Server Impala StateStore Oozie
Host 3: Activity Monitor
Host 4: Host Monitor
Host 5: Navigator Audit Server
Host 6: Navigator Metadata Server
Host 7: Reports Manager
Host 8: Service Monitor |
Hosts 1-n: Hue HiveServer2 Flume Gateway |
80-200 Hosts: DataNode NodeManager Impalad Kudu tablet server tablet servers<=100 |
200 - 500 Worker Hosts with High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode JournalNode FailoverController ZooKeeper Kudu master
Host 2: NameNode JournalNode FailoverController ZooKeeper Kudu master
Host 3: ResourceManager ZooKeeper JournalNode Kudu master
Host 4: ResourceManager ZooKeeper JournalNode
Host 5: JobHistory Server Spark History Server ZooKeeper JournalNode
总数<=3 Kudu masters |
Host 1: Cloudera Manager
Host 2: Hive Metastore Impala Catalog Server Impala StateStore Oozie
Host 3: Activity Monitor
Host 4: Host Monitor
Host 5: Navigator Audit Server
Host 6: Navigator Metadata Server
Host 7: Reports Manager
Host 8: Service Monitor |
Hosts 1-n: Hue HiveServer2 Flume Gateway |
200-500 Hosts: DataNode NodeManager Impalad Kudu tablet server tablet servers<=100 |
500 -1000 Worker Hosts with High Availability
Master Hosts |
Utility Hosts |
Gateway Hosts |
Worker Hosts |
Host 1: NameNode JournalNode FailoverController ZooKeeper Kudu master
Host 2: NameNode JournalNode FailoverController ZooKeeper Kudu master
Host 3: ResourceManager ZooKeeper JournalNode Kudu master
Host 4: ResourceManager ZooKeeper JournalNode
Host 5: JobHistory Server Spark History Server ZooKeeper JournalNode
<=3 Kudu masters |
Host 1: Cloudera Manager
Host 2: Hive Metastore Impala Catalog Server Impala StateStore Oozie
Host 3: Activity Monitor
Host 4: Host Monitor
Host 5: Navigator Audit Server
Host 6: Navigator Metadata Server
Host 7: Reports Manager
Host 8: Service Monitor |
Hosts 1-n: Hue HiveServer2 Flume Gateway |
500-1000 Hosts: DataNode NodeManager Impalad Kudu tablet server tablet servers<=100 |
参考值:CPU-8C,RAM-32G,Centost7。
参数 |
值 |
NameNode的堆大小 |
至少1GB |
dfs.datanode.max.locked.memory |
至少256MB |
DataNode的堆大小 |
至少512MB |
Failover Controller的堆大小 |
至少256MB |
JournalNode的堆大小 |
至少256MB |
参数 |
值 |
Metastore Server的堆大小 |
至少1.5GB |
HiveServer2的堆大小 |
至少1GB |
参数 |
值 |
Catalog Server的堆大小 |
至少256MB |
Impala Daemon内存限制 |
至少1GB |
参数 |
值 |
Broker |
至少1GB |
参数 |
值 |
Kudu Tablet Server Hard Memory Limit |
至少3GB |
Kudu Tablet Server Block Cache Capacity |
至少2GB |
maintenance_manager_num_threads |
至少4 |
参数 |
值 |
History Server堆大小 |
至少512MB |
参数 |
值 |
JobHistory Server的Java堆栈大小 |
至少512MB |
NodeManager的Java堆栈 |
至少512MB |
容器内存 |
至少9GB |
ResourceManager的Java堆栈大小 |
至少512MB |
最小容器内存 |
至少1GB |
容器内存增量 |
至少512MB |
最大容器内存 |
至少6GB |
容器虚拟CPU内核 |
至少6 |
最小容器虚拟CPU内核数量 |
至少1 |
容器虚拟CPU内核增量 |
至少1 |
最大容器虚拟CPU内核数量 |
至少1 |
参数 |
值 |
Activity Monitor的Java堆栈大小 |
至少1GB |
Alert Publisher的Java堆栈 |
至少256MB |
EventServer的Java堆栈大小 |
至少1GB |
Host Monitor的Java堆栈大小 |
至少1GB |
Host Monitor的最大非Java内存 |
至少1.5GB |
Service Monitor的Java堆栈大小 |
至少1GB |
Service Monitor的最大非Java内存 |
至少1.5GB |