一、环境准备
1.三台虚拟机(CentOS7 + jdk1.8)
1)192.168.122.11 master 配置:2G内存+32G储存
2)192.168.122.12 slave1 配置:1G内存+32G储存
3)192.168.122.13 slave2 配置:1G内存+32G储存
2.Hadoop2.6的安装包
下载地址:https://pan.baidu.com/s/1bO6IB37b75Nb7hD2KO6ixA
3.CentOS安装、配置及jdk的安装教程如下:
1)虚拟机的:https://blog.csdn.net/qq_34256296/article/details/81322243
2)jdk1.8的:https://blog.csdn.net/qq_34256296/article/details/81321110
4.下面配置的Hadoop文件可以下载直接拷贝到hadoop安装目录下的/etc/hadoop/文件夹下用
hadoop配置文件:https://github.com/Kara-Feite/hadoop-config
二、Hadoop安装前的环境设置
1.关闭防火墙
systemctl stop firewalld.service
systemctl disable firewalld.service
2.修改主机名和配置hosts(依次对每个虚拟机进行修改)
1)vim /etc/hosts(在后面添加)
192.168.122.11 master
192.168.122.12 slave1
192.168.122.13 slave2
2)vim /etc/sysconfig/network
# Created by anaconda
NETWORKING=yes
HOSTNAME=master #对应主机名,master、slave1、slave2
3)vim /etc/hostname
master #对应主机名,master、slave1、slave2
3.配置无密ssh连接(每台机器依次执行完前一步骤,然后才能再执行下一步骤)
1)ssh-keygen -t rsa
2)cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys
3)分别ssh另外两台虚拟机
ssh slave1 cat /root/.ssh/authorized_keys >> /root/.ssh/authorized_keys
ssh slave2 cat /root/.ssh/authorized_keys >> /root/.ssh/authorized_keys
三、安装Hadoop(这里统一将大数据相关软件放/usr/local/src目录下)
1.解压文件夹
cd /usr/local/src
tar -xzvf hadoop-2.6.0-x64.tar.gz
2.配置hadoop环境变量
vim /etc/profile #最后一行加入如下代码
# set hadoop environment
export HADOOP_HOME=/usr/local/src/hadoop-2.6.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
3.创建目录,后续搭建过程中需要使用
mkdir /usr/local/src/hadoop-2.6.0/tmp
mkdir /usr/local/src/hadoop-2.6.0/var
mkdir /usr/local/src/hadoop-2.6.0/dfs
mkdir /usr/local/src/hadoop-2.6.0/dfs/name
mkdir /usr/local/src/hadoop-2.6.0/dfs/data
4.修改hadoop-env.sh文件(cd /usr/local/src/hadoop-2.6.0/etc/hadoop/)
vim hadoop-env.sh
将:export JAVA_HOME=${JAVA_HOME}
修改为:export JAVA_HOME=/usr/local/src/jdk1.8.0_171 #修改为jdk目录
5. 修改slaves文件
vim slaves #添加如下
slave1
slave2
6. 修改core-site.xml文件
vim core-site.xml #内容如下
hadoop.tmp.dir
/usr/local/src/hadoop-2.6.0/tmp
fs.default.name
hdfs://master:9000
7. 修改hdfs-site.xml文件
vim hdfs-site.xml #内容如下
dfs.name.dir
/usr/local/src/hadoop-2.6.0/dfs/name
dfs.data.dir
/usr/local/src/hadoop-2.6.0/dfs/data
dfs.replication
3
dfs.permissions
false
need not permissions
8. 修改mapred-site.xml文件
cp mapred-site.xml.template mapred-site.xml
vim mapred-site.xml #内容如下
mapred.job.tracker
master:49001
mapred.local.dir
/usr/local/src/hadoop-2.6.0/var
mapreduce.framework.name
yarn
9. 修改yarn-site.xml文件
vim yarn-site.xml #内容如下
yarn.resourcemanager.hostname
master
The address of the applications manager interface in the RM.
yarn.resourcemanager.address
${yarn.resourcemanager.hostname}:8032
The address of the scheduler interface.
yarn.resourcemanager.scheduler.address
${yarn.resourcemanager.hostname}:8030
The http address of the RM web application.
yarn.resourcemanager.webapp.address
${yarn.resourcemanager.hostname}:8088
The https adddress of the RM web application.
yarn.resourcemanager.webapp.https.address
${yarn.resourcemanager.hostname}:8090
yarn.resourcemanager.resource-tracker.address
${yarn.resourcemanager.hostname}:8031
The address of the RM admin interface.
yarn.resourcemanager.admin.address
${yarn.resourcemanager.hostname}:8033
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.scheduler.maximum-allocation-mb
12288
每个节点可用内存,单位MB,默认8182MB
10.将配置好的hadoop复制到其他节点
scp -rp /usr/local/src/hadoop-2.6.0/ root@slave1:/usr/local/src
scp -rp /usr/local/src/hadoop-2.6.0/ root@slave2:/usr/local/src
11.初始化hadoop集群
cd /usr/local/src/hadoop-2.6.0/bin
./hadoop namenode -format
12.启动hadoop集群
cd /usr/local/src/hadoop-2.6.0/sbin
./start-all.sh
13.测试Hadoop是否安装成功
vim hadoop_test.txt #随便写点东西
#上传成功说明集群搭建成功了
hadoop fs -put hadoop_test.txt /
#删除文件
rm hadoop_test.txt
14.Hadoop各个Web页面
1、HDFS页面:50070
2、YARN的管理界面:8088
3、HistoryServer的管理界面:19888
4、Zookeeper的服务端口号:2181
5、Mysql的服务端口号:3306
6、Hive.server1=10000
7、Kafka的服务端口号:9092
8、azkaban界面:8443
9、Hbase界面:16010,60010
10、Spark的界面:8080
11、Spark的URL:7077