在CentOS7环境下,hadoop2.7.7集群部署的实战的步骤如下:
接下来就逐步开始吧;
本次实战用到了三台CentOS7的机器,身份信息如下所示:
IP地址 | hostname(主机名) | 身份 |
---|---|---|
192.168.119.163 | node0 | NameNode、ResourceManager、HistoryServer |
192.168.119.164 | node1 | DataNode、NodeManager |
192.168.119.165 | node2 | DataNode、NodeManager、SecondaryNameNode |
192.168.119.163 node0
192.168.119.164 node1
192.168.119.165 node2
systemctl stop firewalld.service && systemctl disable firewalld.service
groupadd hadoop && useradd -d /home/hadoop -g hadoop -m hadoop
node0、node1、node2三台机器之间要设置SSH免密码登录,详细的设置步骤请参考《Linux配置SSH免密码登录(非root账号)》;
后面在三台机器上的所有操作,都是用hadoop账号进行的,不再使用root账号;
[hadoop@node0 ~]$ ls ~
hadoop-2.7.7.tar.gz jdk-8u191-linux-x64.tar.gz
tar -zxvf ~/jdk-8u191-linux-x64.tar.gz
export JAVA_HOME=/home/hadoop/jdk1.8.0_191
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
[hadoop@node0 ~]$ java -version
java version "1.8.0_191"
Java(TM) SE Runtime Environment (build 1.8.0_191-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)
创建文件夹,后面hadoop会用到:
mkdir -p ~/work/tmp/dfs/name && mkdir -p ~/work/tmp/dfs/data
tar -zxvf hadoop-2.7.7.tar.gz
export JAVA_HOME=/home/hadoop/jdk1.8.0_191
<configuration>
<property>
<name>fs.defaultFSname>
<value>hdfs://node0:8020value>
property>
<property>
<name>hadoop.tmp.dirname>
<value>/home/hadoop/work/tmpvalue>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>file://${hadoop.tmp.dir}/dfs/namevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>file://${hadoop.tmp.dir}/dfs/datavalue>
property>
configuration>
<configuration>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>node2:50090value>
property>
configuration>
node1
node2
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>node0value>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>106800value>
property>
configuration>
mv mapred-site.xml.template mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>node0:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>node0:19888value>
property>
configuration>
scp -r ~/hadoop-2.7.7 hadoop@node1:~/
scp -r ~/hadoop-2.7.7 hadoop@node2:~/
在node0执行以下命令格式化hdfs:
~/hadoop-2.7.7/bin/hdfs namenode -format
~/hadoop-2.7.7/sbin/start-dfs.sh
~/hadoop-2.7.7/sbin/start-yarn.sh
~/hadoop-2.7.7/sbin/yarn-daemon.sh start resourcemanager
~/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver
[hadoop@node0 ~]$ jps
3253 JobHistoryServer
2647 NameNode
3449 Jps
2941 ResourceManager
[hadoop@node1 ~]$ jps
2176 DataNode
2292 NodeManager
2516 Jps
[hadoop@node2 ~]$ jps
1991 DataNode
2439 Jps
2090 SecondaryNameNode
2174 NodeManager
至此,hadoop启动成功;
下面运行一次经典的WorkCount程序来检查hadoop工作是否正常:
hadoop mapreduce hive
hbase spark storm
sqoop hadoop hive
spark hadoop
~/hadoop-2.7.7/bin/hdfs dfs -mkdir /input
~/hadoop-2.7.7/bin/hdfs dfs -put ~/test.txt /input
~/hadoop-2.7.7/bin/yarn \
jar ~/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar \
wordcount \
/input/test.txt \
/output
控制台输出如下:
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/yarn \
> jar ~/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar \
> wordcount \
> /input/test.txt \
> /output
19/02/08 14:34:28 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.119.164:8032
19/02/08 14:34:29 INFO input.FileInputFormat: Total input paths to process : 1
19/02/08 14:34:29 INFO mapreduce.JobSubmitter: number of splits:1
19/02/08 14:34:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1549606965916_0001
19/02/08 14:34:30 INFO impl.YarnClientImpl: Submitted application application_1549606965916_0001
19/02/08 14:34:30 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1549606965916_0001/
19/02/08 14:34:30 INFO mapreduce.Job: Running job: job_1549606965916_0001
19/02/08 14:34:36 INFO mapreduce.Job: Job job_1549606965916_0001 running in uber mode : false
19/02/08 14:34:36 INFO mapreduce.Job: map 0% reduce 0%
19/02/08 14:34:41 INFO mapreduce.Job: map 100% reduce 0%
19/02/08 14:34:46 INFO mapreduce.Job: map 100% reduce 100%
19/02/08 14:34:46 INFO mapreduce.Job: Job job_1549606965916_0001 completed successfully
19/02/08 14:34:46 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=94
FILE: Number of bytes written=245525
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=168
HDFS: Number of bytes written=60
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=2958
Total time spent by all reduces in occupied slots (ms)=1953
Total time spent by all map tasks (ms)=2958
Total time spent by all reduce tasks (ms)=1953
Total vcore-milliseconds taken by all map tasks=2958
Total vcore-milliseconds taken by all reduce tasks=1953
Total megabyte-milliseconds taken by all map tasks=3028992
Total megabyte-milliseconds taken by all reduce tasks=1999872
Map-Reduce Framework
Map input records=4
Map output records=11
Map output bytes=115
Map output materialized bytes=94
Input split bytes=97
Combine input records=11
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=94
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=93
CPU time spent (ms)=1060
Physical memory (bytes) snapshot=430956544
Virtual memory (bytes) snapshot=4203192320
Total committed heap usage (bytes)=285212672
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=71
File Output Format Counters
Bytes Written=60
~/hadoop-2.7.7/bin/hdfs dfs -ls /output
可见hdfs的/output目录下,有两个文件:
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/hdfs dfs -ls /output
Found 2 items
-rw-r--r-- 3 hadoop supergroup 0 2019-02-08 14:34 /output/_SUCCESS
-rw-r--r-- 3 hadoop supergroup 60 2019-02-08 14:34 /output/part-r-00000
[hadoop@node0 ~]$ ~/hadoop-2.7.7/bin/hdfs dfs -cat /output/part-r-00000
hadoop 3
hbase 1
hive 2
mapreduce 1
spark 2
sqoop 1
storm 1
可见WorkCount计算成功,结果符合预期;
7. hdfs网页如下图,可以看到文件信息,地址:http://192.168.119.163:50070
8. yarn的网页如下图,可以看到任务信息,地址:http://192.168.119.163:8088
至此,hadoop2.7.7集群搭建和验证完毕,希望在您搭建环境时能给您提供一些参考;