MapReduce和Yarn部署+入门

看的黑马视频记的笔记


目录

1.入门知识点

2.部署

mapred-env.sh

mapred-site.xml

yarn-env.sh

yarn-site.xml

分发到另外两个节点

启动YARN

启动WEB UI页面

 3.提交自带MapReduce示例程序到YARN运行

wordcount

求圆周率



1.入门知识点

明天

2.部署

        在node1以hadoop用户做出以下改进:

mapred-env.sh

export JAVA_HOME=/export/server/jdk
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA

mapred-site.xml


  
    mapreduce.framework.name
    yarn
    
  

  
    mapreduce.jobhistory.address
    node1:10020
    
  


  
    mapreduce.jobhistory.webapp.address
    node1:19888
    
  


  
    mapreduce.jobhistory.intermediate-done-dir
    /data/mr-history/tmp
    
  


  
    mapreduce.jobhistory.done-dir
    /data/mr-history/done
    
  

  yarn.app.mapreduce.am.env
  HADOOP_MAPRED_HOME=$HADOOP_HOME


  mapreduce.map.env
  HADOOP_MAPRED_HOME=$HADOOP_HOME


  mapreduce.reduce.env
  HADOOP_MAPRED_HOME=$HADOOP_HOME

yarn-env.sh

export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
# export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
export HADOOP_LOG_DIR=$HADOOP_HOME/logs

yarn-site.xml





    yarn.log.server.url
    http://node1:19888/jobhistory/logs
    


  
    yarn.web-proxy.address
    node1:8089
    proxy server hostname and port
  


  
    yarn.log-aggregation-enable
    true
    Configuration to enable or disable log aggregation
  

  
    yarn.nodemanager.remote-app-log-dir
    /tmp/logs
    Configuration to enable or disable log aggregation
  



  
    yarn.resourcemanager.hostname
    node1
    
  

  
    yarn.resourcemanager.scheduler.class
    org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
    
  

  
    yarn.nodemanager.local-dirs
    /data/nm-local
    Comma-separated list of paths on the local filesystem where intermediate data is written.
  


  
    yarn.nodemanager.log-dirs
    /data/nm-log
    Comma-separated list of paths on the local filesystem where logs are written.
  


  
    yarn.nodemanager.log.retain-seconds
    10800
    Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
  

  
    yarn.nodemanager.aux-services
    mapreduce_shuffle
    Shuffle service that needs to be set for Map Reduce applications.
  

分发到另外两个节点

 node1配置完,为了省事直接分发给node2、node3

scp * node2:`pwd`/
scp * node3:`pwd`/

启动YARN

#一键启动YARN集群:
$HADOOP_HOME/sbin/start-yarn.sh

#启动历史服务器
$HADOOP_HOME/bin/mapred --daemon start historyserver 

jps

MapReduce和Yarn部署+入门_第1张图片

MapReduce和Yarn部署+入门_第2张图片

启动WEB UI页面

        在浏览器输入http://node1:8088

MapReduce和Yarn部署+入门_第3张图片

 3.提交自带MapReduce示例程序到YARN运行

   YARN作为资源调度管控框架,其本身提供资源供许多程序运行,常见的有:

  1. MapReduce程序

  2. Spark程序

  3. Flink程序

wordcount

        这些内置的示例MapReduce程序代码,都在:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar文件内

        可以通过 hadoop jar 命令来运行它,提交MapReduce程序到YARN中。

#在/export操作
vim words.txt

#填入
itheima itcast itheima itcast
hadoop hdfs hadoop hdfs
hadoop mapreduce hadoop yarn
itheima hadoop itcast hadoop
itheima itcast hadoop yarn mapreduce


hadoop fs -mkdir -p /input/wordcount
hadoop fs -mkdir /output
hadoop fs -put words.txt /input/wordcount/

#提交示例MapReduce程序WordCount到YARN中执行

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount hdfs://node1:8020/input/wordcount/ hdfs://node1:8020/output/wc1

#参数
参数wordcount,表示运行jar包中的单词计数程序(Java Class)
参数1是数据输入路径(hdfs://node1:8020/input/wordcount/)
参数2是结果输出路径(hdfs://node1:8020/output/wc1), 需要确保输出的文件夹不存在

提交程序后,可以在YARNWEB UI页面看到运行中的程序(http://node1:8088/cluster/apps)

MapReduce和Yarn部署+入门_第4张图片

        查看结果:

MapReduce和Yarn部署+入门_第5张图片

wc1中的
_SUCCESS 文件是标记文件,表示运行成功,本身是空文件
part-r-00000 ,是结果文件,结果存储在以 part 开头的文件中

MapReduce和Yarn部署+入门_第6张图片

MapReduce和Yarn部署+入门_第7张图片

查看到详细的运行日志信息

MapReduce和Yarn部署+入门_第8张图片

求圆周率

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 3 1000


#参数pi表示要运行的Java类,这里表示运行jar包中的求pi程序
#参数3,表示设置几个map任务
#参数1000,表示模拟求PI的样本数(越大求的PI越准确,但是速度越慢)

你可能感兴趣的:(大数据开发,mapreduce,大数据)