




    三台机器的网络主机配置如下: master slave01 slave02


  1、安装三台centos7的服务器,版本是Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux






http:// blog.csdn.net/firehadoop/article/details/68953541



[hadoop@master ~]$ vi .bashrc

export JAVA_HOME=/usr/java/jdk1.8.0_121
export HADOOP_HOME=/home/hadoop/bigdata/hadoop
export HADOOP_USER_NAME=hadoop


[hadoop@master ~]$ scp .bashrc hadoop@slave01:/home/hadoop/
.bashrc                                       100%  418     0.4KB/s   00:00    
[hadoop@master ~]$ scp .bashrc hadoop@slave02:/home/hadoop/
.bashrc                                       100%  418     0.4KB/s   00:00    


[hadoop@master ~]$ tar -zxr hadoop-2.7.3.tar.gz 

mkdir bigdata

mv hadoop-2.7.3 bigdata/

cd bigdata/

mv hadoop-2.7.3 hadoop




service iptables stop

/etc/init.d/iptables stop


[hadoop@master ~]$ systemctl stop firewalld.service
[hadoop@master ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.


[hadoop@slave01 ~]$ systemctl stop firewalld.service
[hadoop@slave01 ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.


[hadoop@slave02 ~]$ systemctl stop firewalld.service
[hadoop@slave02 ~]$ systemctl disable firewalld.service
Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service.
Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service.


vim /home/hadoop/bigdata/hadoop/etc/hadoop/core-site.xml




Parameter Value Notes
fs.defaultFS NameNode URI hdfs://host:port/
io.file.buffer.size 131072 Size of read/write buffer used in SequenceFiles.





vim /home/hadoop/bigdata/hadoop/etc/hadoop/hdfs-site.xml






Parameter Value Notes
dfs.namenode.name.dir Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.hosts / dfs.hosts.exclude List of permitted/excluded DataNodes. If necessary, use these files to control the list of allowable datanodes.
dfs.blocksize 268435456 HDFS blocksize of 256MB for large file-systems.
dfs.namenode.handler.count 100 More NameNode server threads to handle RPCs from large number of DataNodes.


vim /home/hadoop/bigdata/hadoop/etc/hadoop/mapred-site.xml




Parameter Value Notes
mapreduce.framework.name yarn Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb 1536 Larger resource limit for maps.
mapreduce.map.java.opts -Xmx1024M Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb 3072 Larger resource limit for reduces.
mapreduce.reduce.java.opts -Xmx2560M Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb 512 Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor 100 More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies 50 Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.






  • Configurations for ResourceManager:
Parameter Value Notes
yarn.resourcemanager.address ResourceManager host:port for clients to submit jobs. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.scheduler.address ResourceManager host:port for ApplicationMasters to talk to Scheduler to obtain resources. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.resource-tracker.address ResourceManager host:port for NodeManagers. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.admin.address ResourceManager host:port for administrative commands. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.webapp.address ResourceManager web-ui host:port. host:port If set, overrides the hostname set in yarn.resourcemanager.hostname.
yarn.resourcemanager.hostname ResourceManager host. host Single hostname that can be set in place of setting all yarn.resourcemanager*address resources. Results in default ports for ResourceManager components.
yarn.resourcemanager.scheduler.class ResourceManager Scheduler class. CapacityScheduler (recommended), FairScheduler (also recommended), or FifoScheduler
yarn.scheduler.minimum-allocation-mb Minimum limit of memory to allocate to each container request at the Resource Manager. In MBs
yarn.scheduler.maximum-allocation-mb Maximum limit of memory to allocate to each container request at the Resource Manager. In MBs
yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path List of permitted/excluded NodeManagers. If necessary, use these files to control the list of allowable NodeManagers.
  • Configurations for NodeManager:
Parameter Value Notes
yarn.nodemanager.resource.memory-mb Resource i.e. available physical memory, in MB, for given NodeManager Defines total available resources on the NodeManager to be made available to running containers
yarn.nodemanager.vmem-pmem-ratio Maximum ratio by which virtual memory usage of tasks may exceed physical memory The virtual memory usage of each task may exceed its physical memory limit by this ratio. The total amount of virtual memory used by tasks on the NodeManager may exceed its physical memory usage by this ratio.
yarn.nodemanager.local-dirs Comma-separated list of paths on the local filesystem where intermediate data is written. Multiple paths help spread disk i/o.
yarn.nodemanager.log-dirs Comma-separated list of paths on the local filesystem where logs are written. Multiple paths help spread disk i/o.
yarn.nodemanager.log.retain-seconds 10800 Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.remote-app-log-dir /logs HDFS directory where the application logs are moved on application completion. Need to set appropriate permissions. Only applicable if log-aggregation is enabled.
yarn.nodemanager.remote-app-log-dir-suffix logs Suffix appended to the remote log dir. Logs will be aggregated to ${yarn.nodemanager.remote-app-log-dir}/${user}/${thisParam} Only applicable if log-aggregation is enabled.
yarn.nodemanager.aux-services mapreduce_shuffle Shuffle service that needs to be set for Map Reduce applications.


[hadoop@master hadoop]$ vim slaves 

--设置使用 slaves文件一次在许多主机上运行命令。 它不用于任何基于Java的Hadoop配置。 为了使用此功能,必须为用于运行Hadoop的帐户建立ssh信任。




[hadoop@master hadoop]$ scp -r /home/hadoop/bigdata/hadoop/ hadoop@slave01:/home/hadoop/bigdata


[hadoop@master hadoop]$ scp -r /home/hadoop/bigdata/hadoop/ hadoop@slave01:/home/hadoop/bigdata



[hadoop@master sbin]$  hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

17/04/03 05:44:41 INFO namenode.NameNode: STARTUP_MSG: 
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.3

17/04/03 05:44:41 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
17/04/03 05:44:41 INFO namenode.NameNode: createNameNode [-format]
Formatting using clusterid: CID-a36eb93b-a2f3-482e-b3c8-8507c2aeca07
17/04/03 05:44:42 INFO namenode.FSNamesystem: No KeyProvider found.
17/04/03 05:44:42 INFO namenode.FSNamesystem: fsLock is fair:true
17/04/03 05:44:42 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
17/04/03 05:44:42 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
17/04/03 05:44:42 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
17/04/03 05:44:42 INFO blockmanagement.BlockManager: The block deletion will start around 2017 Apr 03 05:44:42
17/04/03 05:44:42 INFO util.GSet: Computing capacity for map BlocksMap
17/04/03 05:44:42 INFO util.GSet: VM type       = 64-bit
17/04/03 05:44:42 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
17/04/03 05:44:42 INFO util.GSet: capacity      = 2^21 = 2097152 entries
17/04/03 05:44:42 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
17/04/03 05:44:42 INFO blockmanagement.BlockManager: defaultReplication         = 3
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxReplication             = 512
17/04/03 05:44:42 INFO blockmanagement.BlockManager: minReplication             = 1
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
17/04/03 05:44:42 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
17/04/03 05:44:42 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
17/04/03 05:44:42 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
17/04/03 05:44:42 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
17/04/03 05:44:42 INFO namenode.FSNamesystem: supergroup          = supergroup
17/04/03 05:44:42 INFO namenode.FSNamesystem: isPermissionEnabled = true
17/04/03 05:44:42 INFO namenode.FSNamesystem: HA Enabled: false
17/04/03 05:44:42 INFO namenode.FSNamesystem: Append Enabled: true
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map INodeMap
17/04/03 05:44:43 INFO util.GSet: VM type       = 64-bit
17/04/03 05:44:43 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
17/04/03 05:44:43 INFO util.GSet: capacity      = 2^20 = 1048576 entries
17/04/03 05:44:43 INFO namenode.FSDirectory: ACLs enabled? false
17/04/03 05:44:43 INFO namenode.FSDirectory: XAttrs enabled? true
17/04/03 05:44:43 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
17/04/03 05:44:43 INFO namenode.NameNode: Caching file names occuring more than 10 times
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map cachedBlocks
17/04/03 05:44:43 INFO util.GSet: VM type       = 64-bit
17/04/03 05:44:43 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
17/04/03 05:44:43 INFO util.GSet: capacity      = 2^18 = 262144 entries
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/04/03 05:44:43 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
17/04/03 05:44:43 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
17/04/03 05:44:43 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/04/03 05:44:43 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/04/03 05:44:43 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/04/03 05:44:43 INFO util.GSet: VM type       = 64-bit
17/04/03 05:44:43 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
17/04/03 05:44:43 INFO util.GSet: capacity      = 2^15 = 32768 entries



[hadoop@master hadoop]$ cd /home/hadoop/bigdata/hadoop/sbin
[hadoop@master sbin]$ sh start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-namenode-master.out
slave01: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave01.out
slave02: starting datanode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-datanode-slave02.out
Starting secondary namenodes [master]
master: starting secondarynamenode, logging to /home/hadoop/bigdata/hadoop/logs/hadoop-hadoop-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-resourcemanager-master.out
slave01: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave01.out
slave02: starting nodemanager, logging to /home/hadoop/bigdata/hadoop/logs/yarn-hadoop-nodemanager-slave02.out


[hadoop@master sbin]$ jps -m
22996 SecondaryNameNode
23158 ResourceManager
22778 NameNode
23420 Jps -m


[hadoop@slave01 current]$ jps -m
16377 NodeManager
16250 DataNode
16506 Jps -m


[hadoop@slave02 current]$ jps -m
59138 NodeManager
59011 DataNode
59275 Jps -m


[hadoop@slave01 ~]$ hadoop dfsadmin -report

Configured Capacity: 38002491392 (35.39 GB)
Present Capacity: 27343609856 (25.47 GB)
DFS Remaining: 27343589376 (25.47 GB)
DFS Used: 20480 (20 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

Live datanodes (2)://清楚看到2个datanode节点存活

Name: (slave01)
Hostname: slave01
Decommission Status : Normal
Configured Capacity: 19001245696 (17.70 GB)
DFS Used: 12288 (12 KB)
Non DFS Used: 5321039872 (4.96 GB)
DFS Remaining: 13680193536 (12.74 GB)
DFS Used%: 0.00%
DFS Remaining%: 72.00%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 03 18:06:47 PDT 2017

Name: (slave02)
Hostname: slave02
Decommission Status : Normal
Configured Capacity: 19001245696 (17.70 GB)
DFS Used: 8192 (8 KB)
Non DFS Used: 5337841664 (4.97 GB)
DFS Remaining: 13663395840 (12.73 GB)
DFS Used%: 0.00%
DFS Remaining%: 71.91%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Mon Apr 03 18:06:48 PDT 2017







2017-04-02 14:58:10,052 WARN org.apache.hadoop.hdfs.server.common.Storage: Storage directory /home/hadoop/bigdata/data/hadoop/hdfs/namenode does not exist
2017-04-02 14:58:10,053 WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Encountered exception loading fsimage
org.apache.hadoop.hdfs.server.common.InconsistentFSStateException: Directory /home/hadoop/bigdata/data/hadoop/hdfs/namenode is in an inconsistent state: storage directory does not exist or is not accessible.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverStorageDirs(FSImage.java:327)
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:215)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:812)
at org.apache.hadoop.hdfs.server.namenode.NameNode.(NameNode.java:796)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1493)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1559)
2017-04-02 14:58:10,090 INFO org.mortbay.log: Stopped [email protected]:50070
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NameNode metrics system...
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system stopped.
2017-04-02 14:58:10,091 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2017-04-02 14:58:10,091 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.


cat /home/hadoop/bigdata/data/hadoop/hdfs/namenode/current/VERSION




vim /home/hadoop/bigdata/data/hadoop/hdfs/datanode/current/VERSION



vim /home/hadoop/bigdata/data/hadoop/hdfs/datanode/current/VERSION




第一种方法是删除DataNode的所有资料(及将集群中每个datanode的/home/hadoop/bigdata/data/hadoop/hdfs/namenode/current/VERSION删掉,然后执行hadoop namenode -format重启集群




[hadoop@slave02 current]$ hadoop fs -ls /
ls: No Route to Host from  slave02/ to master:9000 failed on socket timeout exception: java.net.NoRouteToHostException: No route to host; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost


  • The hostname of the remote machine is wrong in the configuration files
  • The client's host table /etc/hosts has an invalid IPAddress for the target host.

  • The DNS server's host table has an invalid IPAddress for the target host.
  • The client's routing tables (In Linux, iptables) are wrong.






   4、集群hadoop用户环境配置文件 .bashrc修改确认












  11、进入/home/hadoop/bigdata/hadoop/sbin下通过sh start-all.sh启动hadoop服务;
