hadoop cluster config

有配置文件的默认值参考,可见Hadoop安装目录下的share/doc

只读的默认配置:core-default.xml, hdfs-default.xml, yarn-default.xml and mapred-default.xml

特定的配置:etc/hadoop/core-site.xml, etc/hadoop/hdfs-site.xml, etc/hadoop/yarn-site.xml and etc/hadoop/mapred-site.xml

另外对于hadoop的脚本,可以使用etc/hadoop/hadoop-env.sh and etc/hadoop/yarn-env.sh配置环境变量,至少需要JAVA_HOME,此外对于每个部件还可以通过不同的OPTS指定:

NameNode    HADOOP_NAMENODE_OPTS
DataNode    HADOOP_DATANODE_OPTS
Secondary NameNode        HADOOP_SECONDARYNAMENODE_OPTS
ResourceManager YARN_RESOURCEMANAGER_OPTS
NodeManager YARN_NODEMANAGER_OPTS
WebAppProxy YARN_PROXYSERVER_OPTS
Map Reduce Job History Server         HADOOP_JOB_HISTORYSERVER_OPTS

hadoop-env.sh

HADOOP_PID_DIR
HADOOP_LOG_DIR
HADOOP_HEAPSIZE YARN_HEAPSIZE YARN_RESOURCEMANAGER_HEAPSIZE YARN_NODEMANAGER_HEAPSIZE YARN_PROXYSERVER_HEAPSIZE HADOOP_JOB_HISTORYSERVER_HEAPSIZE

core-site.xml 配置通用属性

fs.defaultFS    Namenode URI
io.file.buffer.size 读写sequence file时的缓冲大小 131072

hdfs-site.xml 配置HDFS属性

dfs.namenode.name.dir   NameNode存储数据,可以设置用逗号分隔的一系列路径,这样可使得数据冗余
dfs.hosts / dfs.hosts.exclude   指定datanode或排除datanode
dfs.blocksize   HDFS的块大小 默认268435456(256MB)
dfs.namenode.handler.count  NameNode处理DataNode的RPC请求的handler数量,默认为100,如果datanode很多的话需要相应增加

dfs.datanode.data.dir   DataNode存储数据,可以设置用逗号分隔的一系列路径,这样会使得数据分散到这些路径上,这样可使得数据读取变快

yarn-site.xml 配置YARN属性

yarn.acl.enable 默认为false
yarn.admin.acl  默认为* 任何人都可以
yarn.log-aggregation-enable 是否聚集log,默认为false
yarn.nodemanager.remote-app-log-dir 聚集的日志存储路径 是在hdfs中
yarn.nodemanager.remote-app-log-dir-suffix  聚集的日志的前缀
yarn.log-aggregation.retain-seconds 
yarn.log-aggregation.retain-check-interval-seconds  

yarn.resourcemanager.hostname   资源管理器的主机名
yarn.resourcemanager.scheduler.class    调度器类型CapacityScheduler (recommended), FairScheduler (recommended), or FifoScheduler

yarn.scheduler.minimum-allocation-mb    容器的最少内存配置
yarn.scheduler.maximum-allocation-mb    容器的最大内存配置
yarn.resourcemanager.nodes.include-path / yarn.resourcemanager.nodes.exclude-path   指定NodeManagers或排除NodeManagers

yarn.nodemanager.resource.memory-mb 一个机器提供运行的物理内存
yarn.nodemanager.vmem-pmem-ratio    虚拟内存-物理内存比例
yarn.nodemanager.local-dirs 中间文件的读写路径
yarn.nodemanager.log-dirs   日志文件路径
yarn.nodemanager.log.retain-seconds 日志保存是否,默认10800(3小时)
yarn.nodemanager.aux-services   mapreduce_shuffle

mapred-site.xml 用于配置mapred属性

mapreduce.framework.name    yarn    Execution framework set to Hadoop YARN.
mapreduce.map.memory.mb 1536    Larger resource limit for maps.
mapreduce.map.java.opts -Xmx1024M   Larger heap-size for child jvms of maps.
mapreduce.reduce.memory.mb  3072    Larger resource limit for reduces.
mapreduce.reduce.java.opts  -Xmx2560M   Larger heap-size for child jvms of reduces.
mapreduce.task.io.sort.mb   512 Higher memory-limit while sorting data for efficiency.
mapreduce.task.io.sort.factor   100 More streams merged at once while sorting files.
mapreduce.reduce.shuffle.parallelcopies 50  Higher number of parallel copies run by reduces to fetch outputs from very large number of maps.

你可能感兴趣的:(hadoop cluster config)