看的黑马视频记的笔记
目录
1.入门知识点
2.部署
mapred-env.sh
mapred-site.xml
yarn-env.sh
yarn-site.xml
分发到另外两个节点
启动YARN
启动WEB UI页面
3.提交自带MapReduce示例程序到YARN运行
wordcount
求圆周率
明天
在node1以hadoop用户做出以下改进:
export JAVA_HOME=/export/server/jdk
export HADOOP_JOB_HISTORYSERVER_HEAPSIZE=1000
export HADOOP_MAPRED_ROOT_LOGGER=INFO,RFA
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
node1:10020
mapreduce.jobhistory.webapp.address
node1:19888
mapreduce.jobhistory.intermediate-done-dir
/data/mr-history/tmp
mapreduce.jobhistory.done-dir
/data/mr-history/done
yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=$HADOOP_HOME
mapreduce.map.env
HADOOP_MAPRED_HOME=$HADOOP_HOME
mapreduce.reduce.env
HADOOP_MAPRED_HOME=$HADOOP_HOME
export JAVA_HOME=/export/server/jdk
export HADOOP_HOME=/export/server/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
# export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
# export YARN_LOG_DIR=$HADOOP_HOME/logs/yarn
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
yarn.log.server.url
http://node1:19888/jobhistory/logs
yarn.web-proxy.address
node1:8089
proxy server hostname and port
yarn.log-aggregation-enable
true
Configuration to enable or disable log aggregation
yarn.nodemanager.remote-app-log-dir
/tmp/logs
Configuration to enable or disable log aggregation
yarn.resourcemanager.hostname
node1
yarn.resourcemanager.scheduler.class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
yarn.nodemanager.local-dirs
/data/nm-local
Comma-separated list of paths on the local filesystem where intermediate data is written.
yarn.nodemanager.log-dirs
/data/nm-log
Comma-separated list of paths on the local filesystem where logs are written.
yarn.nodemanager.log.retain-seconds
10800
Default time (in seconds) to retain log files on the NodeManager Only applicable if log-aggregation is disabled.
yarn.nodemanager.aux-services
mapreduce_shuffle
Shuffle service that needs to be set for Map Reduce applications.
node1配置完,为了省事直接分发给node2、node3
scp * node2:`pwd`/
scp * node3:`pwd`/
#一键启动YARN集群:
$HADOOP_HOME/sbin/start-yarn.sh
#启动历史服务器
$HADOOP_HOME/bin/mapred --daemon start historyserver
jps
在浏览器输入http://node1:8088
YARN作为资源调度管控框架,其本身提供资源供许多程序运行,常见的有:
MapReduce程序
Spark程序
Flink程序
这些内置的示例MapReduce程序代码,都在:$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar文件内。
可以通过 hadoop jar 命令来运行它,提交MapReduce程序到YARN中。
#在/export操作
vim words.txt
#填入
itheima itcast itheima itcast
hadoop hdfs hadoop hdfs
hadoop mapreduce hadoop yarn
itheima hadoop itcast hadoop
itheima itcast hadoop yarn mapreduce
hadoop fs -mkdir -p /input/wordcount
hadoop fs -mkdir /output
hadoop fs -put words.txt /input/wordcount/
#提交示例MapReduce程序WordCount到YARN中执行
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar wordcount hdfs://node1:8020/input/wordcount/ hdfs://node1:8020/output/wc1
#参数
参数wordcount,表示运行jar包中的单词计数程序(Java Class)
参数1是数据输入路径(hdfs://node1:8020/input/wordcount/)
参数2是结果输出路径(hdfs://node1:8020/output/wc1), 需要确保输出的文件夹不存在
提交程序后,可以在YARN的WEB UI页面看到运行中的程序(http://node1:8088/cluster/apps)
查看结果:
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.0.jar pi 3 1000
#参数pi表示要运行的Java类,这里表示运行jar包中的求pi程序
#参数3,表示设置几个map任务
#参数1000,表示模拟求PI的样本数(越大求的PI越准确,但是速度越慢)