用cm搭建Hadoop环境

 

1、下载

1、CM6.1安装包

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPM-GPG-KEY-cloudera

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-agent-6.1.0-769885.el7.x86_64.rpm

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-daemons-6.1.0-769885.el7.x86_64.rpm

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-server-6.1.0-769885.el7.x86_64.rpm

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/cloudera-manager-server-db-2-6.1.0-769885.el7.x86_64.rpm

 

https://archive.cloudera.com/cm6/6.1.0/redhat7/yum/RPMS/x86_64/oracle-j2sdk1.8-1.8.0+update141-1.x86_64.rpm

放到master的/var/www/html/cloudera-repos/cm6/6.1.0

2、CDH6.1的parcel安装包

https://archive.cloudera.com/cdh6/6.1.0/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel

https://archive.cloudera.com/cdh6/6.1.0/parcels/CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha256

https://archive.cloudera.com/cdh6/6.1.0/parcels/manifest.json

放到master的/opt/cloudera/parcel-repo/

3、文档

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html

2、环境

Master节点内存>=8G。

host

mysql

cm

cdh

192.168.1.21

*

 

master

192.168.1.22

 

cm service

second

192.168.1.23

 

 

worker

192.168.1.24

 

 

worker

192.168.1.25

 

 

worker

3、准备

3.1、配置hostname

配置域名相关的信息。

1、每个节点的hostname,等效修改/etc/hostname文件

hostnamectl set-hostname  cm01

hostnamectl set-hostname  cm02

hostnamectl set-hostname  cm03

hostnamectl set-hostname  cm04

hostnamectl set-hostname  cm05

2、每个节点的/etc/hosts

cat /etc/hosts

192.168.1.21   cm01

192.168.1.22   cm02

192.168.1.23   cm03

192.168.1.24   cm04

192.168.1.25   cm05

3、如果用域名,修改/etc/sysconfig/hostname

hostname=domain-name

uname –a需要和hostname得到一致的域名。

 

3.2、禁用防火墙

systemctl disable firewalld

 

3.3、设置SELinux

vi /etc/sysconfig/selinux

SELINUX=disabled

重启生效。

 

3.4、启用NTP

配置NTP服务(实际未用),虚拟机可在选项中与主机同步。

1、安装NTP

yum -y install ntp

2、配置NTP

master配置,(阿里)time1.aliyun.com

vi /etc/ntp.conf

server ntp.sjtu.edu.cn prefer(复旦大学ntp)

slave配置

server master(同步master)

3、开启

systemctl start ntpd

3、查看同步效果

ntpstat

 

3.5、禁用透明大页面压缩

vi /etc/rc.local

echo never > /sys/kernel/mm/transparent_hugepage/defrag

echo never > /sys/kernel/mm/transparent_hugepage/enabled

 

/etc/rc.local是/etc/rc.d/rc.local的软连接。

chmod 777 /etc/rc.d/rc.local

 

3.6、优化交换分区

vi /etc/sysctl.conf

vm.swappiness = 10

vm.max_map_count=262144

 

执行生效

sysctl -p /etc/sysctl.conf

 

3.7、SSH免密

ssh-keygen -t rsa

 

chmod 711 .ssh

cat id_rsa.pub >> authorized_keys

chmod 644 authorized_keys

 

ssh-copy-id id_rsa.pub root@cm02

ssh-copy-id id_rsa.pub root@cm03

ssh-copy-id id_rsa.pub root@cm04

ssh-copy-id id_rsa.pub root@cm05

所有主机互操作一遍。

 

3.8、配置JDK

所有主机rpm安装cloudera的jdk(已下载)。

vi /etc/profile

export JAVA_HOME="/usr/java/jdk1.8.0_141-cloudera"

export CLASSPATH=".:${JAVA_HOME}/lib:${CLASSPATH}"

export PATH="${JAVA_HOME}/bin:${PATH}"

 

. /etc/profile

4、mysql

 

4.1、rpm

mysql的安装包http://repo.mysql.com/yum

yum -y install net-tools perl

与mariadb-libs冲突,卸载rpm -e --nodeps

安装common,libs,client,compat。

安装server。

默认目录

客户端程序和脚本:/usr/bin

mysqld服务器:/usr/sbin

数据:/var/lib/mysql/

错误消息:/usr/share/mysql

配置文件:/etc/my.cnf

 

4.2、配置

1、编辑/etc/my.cnf,官推配置

[mysqld]

server_id=100

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

transaction-isolation = READ-COMMITTED

symbolic-links = 0

key_buffer_size = 32M

max_allowed_packet = 32M

thread_stack = 256K

thread_cache_size = 64

query_cache_limit = 8M

query_cache_size = 64M

query_cache_type = 1

max_connections = 550

log_bin=/var/lib/mysql/mysql_binary_log

binlog_format = mixed

read_buffer_size = 2M

read_rnd_buffer_size = 16M

sort_buffer_size = 8M

join_buffer_size = 8M

# InnoDB settings

innodb_file_per_table = 1

innodb_flush_log_at_trx_commit = 2

innodb_log_buffer_size = 64M

innodb_buffer_pool_size = 4G

innodb_thread_concurrency = 8

innodb_flush_method = O_DIRECT

innodb_log_file_size = 512M

 

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

 

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

sql_mode=STRICT_ALL_TABLES

2、检查依赖(自带)

rpm -qa | grep libaio

yum -y install libaio

 

4.3、启停

1、启动

systemctl start mysqld

2、查看临时密码

grep 'temporary password' /var/log/mysqld.log

3、修改密码

mysql -uroot -p

#更改密码策略

mysql> set global validate_password_policy=0;

mysql> set global validate_password_length=1;

#更改密码

mysql> alter user root@localhost identified by 'root';

#赋权远程访问

mysql> grant all privileges on *.* to root@'%' identified by 'root';

mysql> flush privileges;

4、丢失root密码

4.1、在my.cnf的[mysqld]中加入skip-grant-tables=1。

4.2、systemctl start mysqld

4.3、mysql -uroot -p,空密码进入,use mysql。

4.4、update user set authentication_string = password('root'), password_expired='N', password_last_changed=now() where user='root';

4.5、去掉skip-grant-tables。

4.6、另一种方法,建一个文件(未试)。

vi change_password.sql,内容。

alter user root@localhost identified by 'root';

 

mysqld --defaults-file=/etc/my.cnf --init-file=/root/change_password.sql &

 

mysql -p -S /tmp/mysql.sock

 

 

4.4、驱动

文件名与下面的一样,所有主机。

放在/usr/share/java/mysql-connector-java.jar

 

4.5、建库

Service

Database

User

Cloudera Manager Server

scm

scm

Activity Monitor

amon

amon

Reports Manager

rman

rman

Hue

hue

hue

Hive Metastore Server

hive

hive

Sentry Server

sentry

sentry

Cloudera Navigator Audit Server

nav

nav

Cloudera Navigator Metadata Server

navms

navms

Oozie

oozie

oozie

 

create database scm default character set utf8 default collate utf8_general_ci;

create database amon default character set utf8 default collate utf8_general_ci;

create database rman default character set utf8 default collate utf8_general_ci;

create database hue default character set utf8 default collate utf8_general_ci;

create database hive default character set utf8 default collate utf8_general_ci;

create database sentry default character set utf8 default collate utf8_general_ci;

create database nav default character set utf8 default collate utf8_general_ci;

create database navms default character set utf8 default collate utf8_general_ci;

create database oozie default character set utf8 default collate utf8_general_ci;

--建用户

CREATE USER 'scm'@'%' IDENTIFIED BY 'root';

CREATE USER 'amon'@'%' IDENTIFIED BY 'root';

CREATE USER 'rman'@'%' IDENTIFIED BY 'root';

CREATE USER 'hue'@'%' IDENTIFIED BY 'root';

CREATE USER 'hive'@'%' IDENTIFIED BY 'root';

CREATE USER 'sentry'@'%' IDENTIFIED BY 'root';

CREATE USER 'nav'@'%' IDENTIFIED BY 'root';

CREATE USER 'navms'@'%' IDENTIFIED BY 'root';

CREATE USER 'oozie'@'%' IDENTIFIED BY 'root';

--赋权

GRANT ALL PRIVILEGES ON scm.* TO 'scm'@'%';

GRANT ALL PRIVILEGES ON amon.* TO 'amon'@'%';

GRANT ALL PRIVILEGES ON rman.* TO 'rman'@'%';

GRANT ALL PRIVILEGES ON hue.* TO 'hue'@'%';

GRANT ALL PRIVILEGES ON hive.* TO 'hive'@'%';

GRANT ALL PRIVILEGES ON sentry.* TO 'sentry'@'%';

GRANT ALL PRIVILEGES ON nav.* TO 'nav'@'%';

GRANT ALL PRIVILEGES ON navms.* TO 'navms'@'%';

GRANT ALL PRIVILEGES ON oozie.* TO 'oozie'@'%';

--刷新权限

flush privileges;

5、本地库

yum -y install createrepo

 

5.1、package库

package是指rpm包,用yum会自动检测和安装依赖。

1、建本地访问目录

mkdir -p /var/www/html/cloudera-repos/cm6/6.1.0

2、编辑/etc/httpd/conf/httpd.conf

模块中

AddType application/x-gzip .gz .tgz .parcel

3、启动

systemctl start httpd

4、将rpm包,放到/var/www/html/cloudera-repos/cm6/6.1.0

所有主机将RPM-GPG-KEY-cloudera放任意位置导入。

rpm --import RPM-GPG-KEY-cloudera

5、生成依赖

cd /var/www/html/cloudera-repos/cm6/6.1.0

createrepo .

6、编辑/etc/yum.repos.d/cloudera-repo.repo

[cloudera-repo]

name = cloudera repo

baseurl=http://cm01/cloudera-repos/cm6/6.1.0

gpgcheck=0

enabled=1

拷其它主机,cloudera-manager.repo是默认的,安装失败会覆盖。

7、更新所有

yum makecache

yum search cloudera

 

5.2、parcel库

parcel是CM管理的一种package格式。

/opt/cloudera/parcel-repo,是存放parcel文件的。

把CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel文件拷贝至此。

把CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha256改为CDH-6.1.0-1.cdh6.1.0.p0.770702-el7.parcel.sha

把manifest.json里的该parcel文件的hash码替换到sha文件中。

6、安装

/opt/cloudera/parcels,是安装parcel后的目录。

 

6.1、端口

默认外部访问端口

Component

Service

Port

Description

CM Server

HTTP (Web UI)

7180

web console

HTTPS (Web UI)

7183

HTTPS

Cloudera Navigator Metadata Server

HTTP (Web UI)

7187

CNMS监听

Backup and Disaster Recovery

HTTP (Web UI)

7180

与CM通信用

HTTPS (Web UI)

7183

HTTPS

HDFS NameNode

8020

 

HDFS DataNode

50010

 

Telemetry Publisher

HTTP

10110

 

Telemetry Publisher

HTTP (Debug)

10111

 

 

默认内部访问端口

Component

Service

Port

Description

CM Server

Avro (RPC)

7182

Agent to Server的心跳

Embedded PostgreSQL

7432

 

Peer-to-peer parcel distribution

7190, 7191

 

Cloudera Manager Agent

HTTP (Debug)

9000

 

Event Server

Custom protocol

7184

 

Custom protocol

7185

 

HTTP (Debug)

8084

 

Alert Publisher

Custom protocol

10101

 

Service Monitor

HTTP (Debug)

8086

 

HTTPS (Debug)

   

Custom protocol

9997

 

Internal query API (Avro)

9996

 

Activity Monitor

HTTP (Debug)

8087

 

HTTPS (Debug)

   

Custom protocol

9999

 

Internal query API (Avro)

9998

 

Host Monitor

HTTP (Debug)

8091

 

HTTPS (Debug)

9091

 

Custom protocol

9995

 

Internal query API (Avro)

9994

 

Reports Manager

Queries (Thrift)

5678

 

HTTP (Debug)

8083

 

Cloudera Navigator Audit Server

HTTP

7186

 

HTTP (Debug)

8089

 

 

 

6.2、依赖

所有主机。

yum -y install bind-utils psmisc cyrus-sasl-plain cyrus-sasl-gssapi portmap httpd mod_ssl openssl-devel python-psycopg2 libpq.so.5 MySQL-python /lib/lsb/init-functions fuse net-tools perl

 

6.3、master节点

yum -y install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

安装后自动创建cloudera-scm用户。

 

6.4、data节点

yum -y install cloudera-manager-daemons cloudera-manager-agent

 

6.5、修改配置

修改指向cm Server的主机和端口(7182)。

vi /etc/cloudera-scm-agent/config.ini

server_host=cm01

 

6.6、启用Auto-TLS

在cm主机,自动创建证书。可选,配置文件需tls=1

sudo JAVA_HOME=/usr/java/jdk1.8.0_141-cloudera /opt/cloudera/cm-agent/bin/certmanager --location /opt/cloudera/CMCA setup --configure-services

 

6.7、数据库

1、mysql在本机

/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm root

2、mysql在其它机器

/opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h cm02 --scm-host cm01 scm root

需要输入scm的密码,在数据库创建语句中scm的密码被设置成了root。

 

6.8、启动cm

1、在master上

systemctl start cloudera-scm-server

2、在所有主机上

systemctl start cloudera-scm-agent

查看端口netstat -tnlp

3、将CHD6的Parcel包放到/opt/cloudera/parcel-repo/。

4、登录Cloudera Manager Admin Console。

http://cm01:7180

用户名密码默认为admin。

5、可调整内存

vi /etc/default/cloudera-scm-server,Xmx >= 2G。

vi /opt/cloudera/cm/bin/cm-server,maxheap。

6.9、cdh

1、选择本地库及安装的软件包。

如果hdfs格式化失败,就再执行一次。

7、卸载

systemctl stop cloudera-scm-server

systemctl stop cloudera-scm-agent

 

yum -y remove 'cloudera-manager-*'

 

yum clean all

 

umount cm_processes

umount /var/run/cloudera-scm-agent/process

 

rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera* /var/run/cloudera*

rm -rf /tmp/.scmpreparenode.lock

rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper

rm -Rf datadrivepath/dfs datadrivepath/mapred datadrivepath/yarn

 

rm -rf /var/lib/hadoop-* /var/lib/impala /var/lib/solr /var/lib/zookeeper /var/lib/hue /var/lib/oozie  /var/lib/pgsql  /var/lib/sqoop2  /data/dfs/  /data/impala/ /data/yarn/  /dfs/ /impala/ /yarn/  /var/run/hadoop-*/ /var/run/hdfs-*/ /usr/bin/hadoop* /usr/bin/zookeeper* /usr/bin/hbase* /usr/bin/hive* /usr/bin/hdfs /usr/bin/mapred /usr/bin/yarn /usr/bin/sqoop* /usr/bin/oozie /etc/hadoop* /etc/zookeeper* /etc/hive* /etc/hue /etc/impala /etc/sqoop* /etc/oozie /etc/hbase* /etc/hcatalog

 

systemctl stop mysqld.service

yum -y remove mysql

 

rm -rf /var/lib/mysql

rm -rf /var/log/mysqld.log

rm -rf /var/lib/mysql/mysql

rm -rf /usr/lib64/mysql

rm -rf /usr/share/mysql

rm -rf /opt/cloudera

 

rpm -qa | grep -i mysql

卸载MySQL相关的文件。

 

8、主机和角色分配

群集主机分为4个类型:

master主机,运行主进程,如NameNode和Resource Manager。

程序主机,运行不是主进程的其它进程,如CM和Hive Metastore。

网关主机,用于在群集中启动作业的客户端访问点。所需的网关主机数量取决于工作负载的类型和大小。

worker主机,运行DataNode和其它分布式进程。

3 - 10 Worker Hosts without High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

ResourceManager

JobHistory Server

ZooKeeper

Kudu master

Spark History Server

Host 1:

Secondary NameNode

Cloudera Manager

Cloudera Manager Management Service

Hive Metastore

HiveServer2

Impala Catalog Server

Impala StateStore

Hue

Oozie

Flume

Gateway

3-10 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

3 - 20 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

JobHistory Server

Spark History Server

Kudu master

 

Host 2:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

Kudu master

 

Host 3:

Kudu master

HA 要奇数个

Host 1:

Cloudera Manager

Cloudera Manager Management Service

Hive Metastore

Impala Catalog Server

Impala StateStore

Oozie

ZooKeeper

JournalNode

Hosts 1-n:

Hue

HiveServer2

Flume

Gateway

3-20 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

20 - 80 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

Kudu master

 

Host 2:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

Kudu master

 

Host 3:

ZooKeeper

JournalNode

JobHistory Server

Spark History Server

Kudu master

Host 1:

Cloudera Manager

 

Host 2:

Cloudera Manager Management Service

Hive Metastore

Impala Catalog Server

Oozie

Hosts 1-n:

Hue

HiveServer2

Flume

Gateway

20-80 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

80 - 200 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

Kudu master

 

Host 2:

NameNode

JournalNode

FailoverController

ResourceManager

ZooKeeper

Kudu master

 

Host 3:

ZooKeeper

JournalNode

JobHistory Server

Spark History Server

Kudu master

Host 1:

Cloudera Manager

 

Host 2:

Hive Metastore

Impala Catalog Server

Impala StateStore

Oozie

 

Host 3:

Activity Monitor

 

Host 4:

Host Monitor

 

Host 5:

Navigator Audit Server

 

Host 6:

Navigator Metadata Server

 

Host 7:

Reports Manager

 

Host 8:

Service Monitor

Hosts 1-n:

Hue

HiveServer2

Flume

Gateway

80-200 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

tablet servers<=100

200 - 500 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

 

Host 2:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

 

Host 3:

ResourceManager

ZooKeeper

JournalNode

Kudu master

 

Host 4:

ResourceManager

ZooKeeper

JournalNode

 

Host 5:

JobHistory Server

Spark History Server

ZooKeeper

JournalNode

 

总数<=3 Kudu masters

Host 1:

Cloudera Manager

 

Host 2:

Hive Metastore

Impala Catalog Server

Impala StateStore

Oozie

 

Host 3:

Activity Monitor

 

Host 4:

Host Monitor

 

Host 5:

Navigator Audit Server

 

Host 6:

Navigator Metadata Server

 

Host 7:

Reports Manager

 

Host 8:

Service Monitor

Hosts 1-n:

Hue

HiveServer2

Flume

Gateway

200-500 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

tablet servers<=100

500 -1000 Worker Hosts with High Availability

Master Hosts

Utility Hosts

Gateway Hosts

Worker Hosts

Host 1:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

 

Host 2:

NameNode

JournalNode

FailoverController

ZooKeeper

Kudu master

 

Host 3:

ResourceManager

ZooKeeper

JournalNode

Kudu master

 

Host 4:

ResourceManager

ZooKeeper

JournalNode

 

Host 5:

JobHistory Server

Spark History Server

ZooKeeper

JournalNode

 

<=3 Kudu masters

Host 1:

Cloudera Manager

 

Host 2:

Hive Metastore

Impala Catalog Server

Impala StateStore

Oozie

 

Host 3:

Activity Monitor

 

Host 4:

Host Monitor

 

Host 5:

Navigator Audit Server

 

Host 6:

Navigator Metadata Server

 

Host 7:

Reports Manager

 

Host 8:

Service Monitor

Hosts 1-n:

Hue

HiveServer2

Flume

Gateway

500-1000 Hosts:

DataNode

NodeManager

Impalad

Kudu tablet server

tablet servers<=100

 

9、内存调整

参考值:CPU-8C,RAM-32G,Centost7。

 

9.1、hdfs

参数

NameNode的堆大小

至少1GB

dfs.datanode.max.locked.memory

至少256MB

DataNode的堆大小

至少512MB

Failover Controller的堆大小

至少256MB

JournalNode的堆大小

至少256MB

 

 

9.2、hive

参数

Metastore Server的堆大小

至少1.5GB

HiveServer2的堆大小

至少1GB

 

 

9.3、impala

参数

Catalog Server的堆大小

至少256MB

Impala Daemon内存限制

至少1GB

 

 

9.4、kafka

参数

Broker

至少1GB

 

 

9.5、kudu

参数

Kudu Tablet Server Hard Memory Limit

至少3GB

Kudu Tablet Server Block Cache Capacity

至少2GB

maintenance_manager_num_threads

至少4

 

 

9.6、spark

参数

History Server堆大小

至少512MB

 

 

9.7、yarn

参数

JobHistory Server的Java堆栈大小

至少512MB

NodeManager的Java堆栈

至少512MB

容器内存

至少9GB

ResourceManager的Java堆栈大小

至少512MB

最小容器内存

至少1GB

容器内存增量

至少512MB

最大容器内存

至少6GB

容器虚拟CPU内核

至少6

最小容器虚拟CPU内核数量

至少1

容器虚拟CPU内核增量

至少1

最大容器虚拟CPU内核数量

至少1

 

 

9.8、cm Service

参数

Activity Monitor的Java堆栈大小

至少1GB

Alert Publisher的Java堆栈

至少256MB

EventServer的Java堆栈大小

至少1GB

Host Monitor的Java堆栈大小

至少1GB

Host Monitor的最大非Java内存

至少1.5GB

Service Monitor的Java堆栈大小

至少1GB

Service Monitor的最大非Java内存

至少1.5GB

 

 

你可能感兴趣的:(HHZ)