Hadoop集群配置
使用虚拟机配置hadoop集群
顺序;
一、先在一台机上配置好虚拟机,然后拷贝出其他两台。
1、安装CentOS7
2、修改yum源(清华源)
3、# yum update -y
安装jps工具
# yum install -y java-1.8.0-openjdk-devel
4、默认使用root用户操作
# vim ~/.bashrc
#在末尾添加以下内容
#HADOOP START
export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_INSTALL=/usr/local/hadoop
export PATH=$PATH:$HADOOP_INSTALL/bin
export PATH=$PATH:$HADOOP_INSTALL/sbin
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_HOME=$HADOOP_INSTALL
export YARN_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
#HADOOP END
5、通过systemctl命令可以将sshd服务加到开机自启动列表里。实现开机自动启动sshd服务。
# systemctl enable sshd
6、设置开机启动第三运行级别
# systemctl set-default multi-user.target
7、关机,拷贝虚拟机两份。总共有三个虚拟机。
二、先在一台机上配置hadoop,之后使用scp拷贝到其他两台机。
1、记下其他两台机子的ip地址
2、修改hosts
# vim /etc/hosts
127.0.0.1 localhost
192.168.182.134 master
192.168.182.135 slaves1
192.168.182.136 slaves2
PS:master是本机
3、检查连接是否成功
# ping -w 3 slaves1
# ping -w 3 slaves2
4、配置ssh免密登录
(1)生成密钥(在每台虚拟机上执行)
# ssh-keygen -t rsa -P ""
ssh-keygen是工具,-t指定密钥类型rsa,-P指定密码
(2)添加可信密钥
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# ssh slaves1 cat /root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# ssh slaves2 cat /root/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
(3)复制authorized_keys到另外两台
# scp ~/.ssh/authorized_keys slaves1:~/.ssh/
# scp ~/.ssh/authorized_keys slaves2:~/.ssh/
(4)验证ssh的免密登录:
# ssh slaves1
# ssh slaves2
5、下载hadoop2.6.0.tar.gz,并解压
# wget labfile.oss.aliyuncs.com/hadoop-2.6.0.tar.gz
# tar xf hadoop-2.6.0.tar.gz
# mv hadoop-2.6.0/ /usr/local/hadoop
6、修改hadoop配置文件
(1)hadoop配置文件目录 /usr/local/hadoop/etc/hadoop/
# cd /usr/local/hadoop/etc/hadoop/
(2)hadoop-env.sh
# vim hadoop-env.sh +25
# The java implementation to use.
export JAVA_HOME=/usr/lib/jvm/jre
(3) core-site.xml
# vim core-site.xml
fs.default.name
hdfs://master:9000
hadoop.tmp.dir
/hadoop/datanode
(4)hdfs-site.xml
# vim hdfs-site.xml
dfs.namenode.secondary.http-address
master:50090
dfs.http.address
master:50070
The address and the base port where the dfs namenode web ui will listen on.
If the port is 0 then the server will start on a free port.
dfs.replication
2
dfs.namenode.name.dir
/hadoop/namenode
dfs.datanode.data.dir
/hadoop/datanode
(5)mapred-site.xml
# vim mapred-site.xml
mapreduce.framework.name
yarn
(6)yarn-site.xml
# vim yarn-site.xml
yarn.resourcemanager.hostname
master
yarn.nodemanager.aux-services
mapreduce_shuffle
7、创建目录/hadoop
# mkdir /hadoop
8、利用scp拷贝到其他两台机
# scp /usr/loca/hadoop -r slaves1:/usr/local/
# scp /usr/loca/hadoop -r slaves2:/usr/local/
三、格式化后启动
1、格式化
# hdfs dfs namenode -format
1.5、清除防火墙
# iptables -F
2、启动dfs系统
# start-dfs.sh
启动yarn
# start-yarn.sh
3、使用jps查看是否启动
这里建议查看清楚,经常有些会启动失败,只要多启动几次就好
# hadoop-daemon start namenode
# hadoop-daemon start datanode
# yarn-daemon start resourcemanager
# yarn-daemon start nodemanager
至此,hadoop的分布式已经成功部署
四、测试
1、创建目录
# hdfs dfs -mkdir /user
# hdfs dfs -mkdir /user/root
# hdfs dfs -mkdir /user/root/input
2、将文件复制到hdfs上
# hdfs dfs -copyFromLocal /etc/protocols /user/root/input
# hdfs dfs -ls -R /user/root/input
# 这里貌似不能用put命令,用了好多次一直出错, 以为出问题了
3、执行WordCount测试
# cd /usr/local/hadoop
# bin/hadoop jar share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-sources.jar org.apache.hadoop.examples.WordCount input output
4、查看结果
# hdfs dfs -cat /user/root/output/*