[root@test opt]# su hadoop
[hadoop@test opt]$ cd /opt/software
[hadoop@test software]$ tar -zxvf hadoop-${version}.tar.gz -C /opt
[hadoop@test software]$ cd /opt
[hadoop@test opt]$ ln -s /opt/hadoop-${version} /opt/hadoop
[hadoop@test opt]$ cd /opt/hadoop
[hadoop@test hadoop]$ mkdir -p tmpdir
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ mkdir -p /opt/hadoop/pids
[hadoop@test hadoop]$ vim hadoop-env.sh
在hadoop-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java
export HADOOP_PID_DIR=/opt/hadoop/pids
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-env.sh
在mapred-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim core-site.xml
在core-site.xml文件中添加如下配置:
//namenode的临时工作目录
hadoop.tmp.dir
/opt/hadoop/tmpdir
//hdfs的入口,告诉namenode在那个机器上,端口号是什么。
fs.defaultFS
hdfs://test:8020
io.file.buffer.size
131072
fs.trash.interval
1440
在安装的时候如果没有安装过rnager,那么在该文件中需要将以下代码注释掉。
dfs.namenode.inode.attributes.provider.class
org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim hdfs-site.xml
在hdfs-site.xml文件中添加如下配置:
#副本数量,一般是小于等于datanode的数量,
dfs.replication
2
dfs.namenode.name.dir
file:/opt/hadoop/data/namenode
dfs.datanode.data.dir
file:/opt/hadoop/data/datanode
dfs.webhdfs.enabled
true
dfs.secondary.http.address
test:50090
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-site.xml
在mapred-site.xml文件中添加如下配置:
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
test:10020
mapreduce.jobhistory.webapp.address
test:19888
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim yarn-site.xml
在yarn-site.xml文件中添加如下配置:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.scheduler.address
test:8030
yarn.resourcemanager.resource-tracker.address
test:8031
yarn.resourcemanager.address
test:8032
yarn.resourcemanager.admin.address
test:8033
yarn.resourcemanager.webapp.address
test:8088
[hadoop@test hadoop]$ vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH
配置成功后,执行source /etc/profile使配置生效
[hadoop@test hadoop]$ source /etc/profile
[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop
[hadoop@test hadoop]$ vim slaves
在slaves文件中添加
//datanode的节点的位置
test2
test3
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test2:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test3:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
# 格式化 namenode ,仅第一次启动需要格式化!!
[hadoop@test hadoop]$ hadoop namenode -format
# 启动
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/start-all.sh
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver
start-all.sh包含dfs和yarn两个模块的启动,分别为start-dfs.sh 、 start-yarn.sh,所以dfs和yarn可以单独启动。
注意:如果datanode没有启动起来,看看是不是tmpdir中有之前的脏数据,删除这个目录其他两台机器也要删除。
[hadoop@test ~]$ jps
24429 Jps
22898 ResourceManager
24383 JobHistoryServer
22722 SecondaryNameNode
22488 NameNode
[ahdoop@test2 ~]$ jps
7650 DataNode
7788 NodeManager
8018 Jps
[hadoop@test3 ~]$ jps
28407 Jps
28038 DataNode
28178 NodeManager
如果三台机器正常输出上述内容,则表示hadoop集群的服务正常工作。
访问hadoop的服务页面:在浏览器中输入如下地址:http://172.24.5.173:8088
跑一个简单的mr程序,验证集群是否安装成功
[hadoop@test mapreduce]$ cd /opt/hadoop/share/hadoop/mapreduce
[hadoop@test mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 4
Number of Maps = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/04/06 09:36:47 INFO client.RMProxy: Connecting to ResourceManager at test/172.24.5.173:8032
17/04/06 09:36:47 INFO input.FileInputFormat: Total input paths to process : 2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: number of splits:2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491470782060_0001
17/04/06 09:36:48 INFO impl.YarnClientImpl: Submitted application application_1491470782060_0001
17/04/06 09:36:48 INFO mapreduce.Job: The url to track the job: http://test:8088/proxy/application_1491470782060_0001/
17/04/06 09:36:48 INFO mapreduce.Job: Running job: job_1491470782060_0001
17/04/06 09:36:56 INFO mapreduce.Job: Job job_1491470782060_0001 running in uber mode : false
17/04/06 09:36:56 INFO mapreduce.Job: map 0% reduce 0%
17/04/06 09:37:00 INFO mapreduce.Job: map 50% reduce 0%
17/04/06 09:37:02 INFO mapreduce.Job: map 100% reduce 0%
17/04/06 09:37:08 INFO mapreduce.Job: map 100% reduce 100%
17/04/06 09:37:08 INFO mapreduce.Job: Job job_1491470782060_0001 completed successfully
17/04/06 09:37:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=50
FILE: Number of bytes written=357588
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=554
HDFS: Number of bytes written=215
HDFS: Number of read operations=11
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=6118
Total time spent by all reduces in occupied slots (ms)=4004
Total time spent by all map tasks (ms)=6118
Total time spent by all reduce tasks (ms)=4004
Total vcore-milliseconds taken by all map tasks=6118
Total vcore-milliseconds taken by all reduce tasks=4004
Total megabyte-milliseconds taken by all map tasks=6264832
Total megabyte-milliseconds taken by all reduce tasks=4100096
Map-Reduce Framework
Map input records=2
Map output records=4
Map output bytes=36
Map output materialized bytes=56
Input split bytes=318
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=56
Reduce input records=4
Reduce output records=0
Spilled Records=8
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=213
CPU time spent (ms)=2340
Physical memory (bytes) snapshot=713646080
Virtual memory (bytes) snapshot=6332133376
Total committed heap usage (bytes)=546308096
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=236
File Output Format Counters
Bytes Written=97
Job Finished in 20.744 seconds
Estimated value of Pi is 3.50000000000000000000
Q: stop-all.sh无法停止hadoop集群 ?
A: 由于hadoop进程的信息保存在tmp中,而tmp会被定时清空
Q:无法启动namenode
A: core-site.xml 里的 namenode value 不能有下划线!!!!
nodemanage
1.管理单个节点中的计算功能
2.与ResourcesManger(集群管理者)和ApplicationMaster(单机上的主进程)保持通信
3.管理container的生命周期,监控每一个container的资源使用(内存、CPU,追踪节点健康状况、管理日志)