Hadoop完全分布模式安装

1 准备工作

1.1 虚拟机规划

  • 版本:CentOS Linux release 7.6.1810
  • VMware安装三台虚拟机
192.168.159.133(linux-01.potato.com) NameNode DataNode ResourceManager NodeManager 
192.168.159.128(linux-02.potato.com) SecondaryNameNode DataNode NodeManager
192.168.159.131(linux-03.potato.com) DataNode NodeManager

1.2 用户

  • hadoop不能用root用户启动,需要创建一个启动用户,本文使用dehuab作为启动用户

1.3 SSH免密登录

  • 三台虚拟机配置免密登陆,避免后续集群操作频繁输入密码
  • 在NameNode(linux-01.potato.com)服务器上操作
ssh-keygen -t rsa(3次回车)
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected](自己也要拷贝给自己)
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

1.4 JDK

  • 版本 1.8.0_181
  • 解压到/usr/local/jdk

1.5 Hadoop

  • 版本 Hadoop 3.2.0
  • 解压到/usr/local/hadoop
  • 创建hadoop数据存放目录/usr/local/hadoop-data
  • 设置hadoop目录的所属用户(启动用户dehuab)
sudo chown -R dehuab:dehuab /usr/local/hadoop
sudo chown -R dehuab:dehuab /usr/local/hadoop-data

1.6 配置hosts

  • /etc/hosts
192.168.159.133 linux-01.potato.com
192.168.159.128 linux-02.potato.com
192.168.159.131 linux-03.potato.com

1.7 防火墙

  • 关闭firewall:systemctl stop firewalld.service
  • 停止firewall(禁止firewall开机启动):systemctl disable firewalld.service
  • 查看默认防火墙状态(关闭后显示notrunning,开启后显示running):firewall-cmd --state

1.8 环境变量

JAVA_HOME=/usr/local/jdk
export JAVA_HOME
PATH=$JAVA_HOME/bin:$PATH
export PATH

HADOOP_HOME=/usr/local/hadoop
export HADOOP_HOME
PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH
  • 使环境变量生效 source /etc/profile

2 配置文件

  • 在NameNode机器上操作,其他机器通过scp拷贝

2.1 hadoop-env.sh

  • 在hadoop-env.sh中,再显示地重新声明一遍export JAVA_HOME
export JAVA_HOME=/usr/local/jdk

2.2 hdfs-site.xml




	dfs.replication
	2



	dfs.permissions
	false


	dfs.namenode.http-address
	linux-01.potato.com:50070


	dfs.namenode.secondary.http-address
	linux-02.potato.com:50090


2.3 core-site.xml




	fs.defaultFS
	hdfs://linux-01.potato.com:9001



	hadoop.tmp.dir
	/usr/local/hadoop-data


2.4 mapred-site.xml(默认没有)


	
    mapreduce.framework.name
    yarn

2.5 yarn-site.xml



	
	yarn.resourcemanager.hostname
	linux-01.potato.com


	
	yarn.nodemanager.aux-services
	mapreduce_shuffle


2.6 works

  • 该文件里面配置所有节点机器的名称
  • hadoop3.0以前文件名称为slaves,hadoop3.0以后文件名称改为works
bigdata122
bigdata123

2.7 scp拷贝

scp -r /etc/hosts [email protected]:/etc/hosts
scp -r /etc/profile [email protected]:/etc/profile
scp -r /usr/local/hadoop/ [email protected]:/usr/local/
scp -r /usr/local/jdk/ [email protected]:/usr/local/

scp -r /etc/hosts [email protected]:/etc/hosts
scp -r /etc/profile [email protected]:/etc/profile
scp -r /usr/local/hadoop/ [email protected]:/usr/local/
scp -r /usr/local/jdk/ [email protected]:/usr/local/

3 HDFS NameNode 格式化

hdfs namenode -format
成功的标志: Storage directory /usr/local/hadoop/tmp/dfs/name has been successfully formatted.

4 启动服务

  • 通过start-all.sh启动(环境变量配置成功后,start-all.sh可以在任意位置访问)
  • jps命令查看进程
验证5个进程:
5022 NameNode
5314 SecondaryNameNode
5586 NodeManager
5476 ResourceManager
5126 DataNode
  • 管理站点
YARN: http://linux-01.potato.com:8088
HDFS: http://linux-01.potato.com:50070
  • 日志文件
/usr/local/hadoop/logs/hadoop-dehuab-datanode-linux-01.potato.com.log Shift+G 看启动日志

你可能感兴趣的:(Data,大数据,Hadoop,分布式安装,CentOS,VMware)