Hadoop单机版安装

   以前对Hadoop有过一点了解,但没有深入,现在越来越感觉这东西挺有意思的,打算学习下,前两天买了两本Hadoop相关的书,先粗略的翻了下,今天就动手先把环境搭起来。

      环境:centos6.2,jdk7_u45,hadoop2.2.0

      下载,解压过程就不说了,直接环境配置(包括JAVA_HOME的配置,以及HADOOP_HOME的环境变量配置,都略过了)。参考文档http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/SingleCluster.html

       1,修改hadoop-env.sh中修改JAVA_HOME

       2,修改core-site.xml配置文件




	
  		hadoop.tmp.dir
  		/data/hadoop/tmp
	
	
	  
	  fs.defaultFS  
	  hdfs://localhost:9000  
	  true  
	  
	

     3,修改hdfs-site.xml配置文件

    dfs.namenode.name.dir file:///data/hadoop/dfs/name true   dfs.datanode.data.dir file:///data/hadoop/dfs/data true   dfs.replication 1   dfs.permissions.enabled false   

     4,复制mapred-site.xml.template成mapred-site.xml,修改mapred-site.xml

cp mapred-site.xml.template mapred-site.xml
vi mapred-site.xml



	
	  mapreduce.framework.name
	  yarn
	
	

   5,修改yarn-site.xml配置文件



	
	  yarn.resourcemanager.hostname
	  localhost
	  hostanem of RM
	


	
    yarn.resourcemanager.resource-tracker.address
    localhost:5274
    host is the hostname of the resource manager and 
    port is the port on which the NodeManagers contact the Resource Manager.
    
  

  
    yarn.resourcemanager.scheduler.address
    localhost:5273
    host is the hostname of the resourcemanager and port is the port
    on which the Applications in the cluster talk to the Resource Manager.
    
  

  
    yarn.resourcemanager.scheduler.class
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
    In case you do not want to use the default scheduler
  

  
    yarn.resourcemanager.address
    localhost:5271
    the host is the hostname of the ResourceManager and the port is the port on
    which the clients can talk to the Resource Manager. 
  

  
    yarn.nodemanager.local-dirs
    
    the local directories used by the nodemanager
  

  
    yarn.nodemanager.address
    localhost:5272
    the nodemanagers bind to this port
    

  
    yarn.nodemanager.resource.memory-mb
    10240
    the amount of memory on the NodeManager in GB
  
 
  
    yarn.nodemanager.remote-app-log-dir
    /app-logs
    directory on hdfs where the application logs are moved to 
  

   
    yarn.nodemanager.log-dirs
    
    the directories used by Nodemanagers as log directories
  

  
    yarn.nodemanager.aux-services
    mapreduce_shuffle
    shuffle service that needs to be set for Map Reduce to run 
  
	

到此为止,hadoop单机版配置已经完成。

1)接下来我们先格式化namenode,然后启动namenode

hadoop namenode –format
./hadoop-daemon.sh start namenode
可以查看http://localhost:50070/dfshealth.jsp中logs的日志  (带namenode*.log字眼) ,确认是否启动成功,如果没有报错则启动成功。

2)接着启动hdfs datanode

./hadoop-daemon.sh start datanode
同时也可以在开始页面上查询对应的日志文件(带datanode  *.log 字眼),如果没有报错,和namenode通信成功,即启动成功。

还可以在命令行数据Jps查看是否有结果

3)继续启动yarn

yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager

判断启动成功与否方法同上面一致。

最后进入hadoop-2.2.0\share\hadoop\mapreduce录入中,测试运行

hadoop jar hadoop-mapreduce-examples-2.2.0.jar randomwriter out

查看运行是否成功


安装中的错误

016-09-29 19:08:11,085 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioned to standby state
2016-09-29 19:08:11,085 FATAL org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Error starting ResourceManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not instantiate Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1103)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
at java.lang.Class.forName(Class.java:191)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createScheduler(ResourceManager.java:264)
... 6 more
2016-09-29 19:08:11,103 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down ResourceManager at localhost/127.0.0.1
***********************************************************/

在hadoop-2.5.2版本上,已经把 yarn-site注释掉 

yarn.resourcemanager.scheduler.class 这个属性。用默认的scheduler即可解决掉。


你可能感兴趣的:(Hadoop单机版安装)