fsimage和edits存储目录
Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
datanode物理文件存储目录
Determines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on different devices. The directories should be tagged with corresponding storage types ([SSD]/[DISK]/[ARCHIVE]/[RAM_DISK]) for HDFS storage policies. The default storage type will be DISK if the directory does not have a storage type tagged explicitly. Directories that do not exist will be created if local filesystem permission allows.
块副本数, 默认值3
副本数越多, 数据的容错性越强, 当然使用的磁盘空间也越多
Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
块大小, 默认128M
值过小 >> 单文件切分出的块数量多 >> namenode需要维持的元数据大, 占用内存多,寻址时间长
值过大 >> 块的网络传输时间长, 失败重试成本高
The default block size for new files, in bytes. You can use the following suffix (case insensitive): k(kilo), m(mega), g(giga), t(tera), p(peta), e(exa) to specify the size (such as 128k, 512m, 1g, etc.), Or provide complete size in bytes (such as 134217728 for 128 MB).
DN预留磁盘空间, 默认值0
Reserved space in bytes per volume. Always leave this much space free for non dfs use. Specific storage type based reservation is also supported. The property can be followed with corresponding storage types ([ssd]/[disk]/[archive]/[ram_disk]) for cluster with heterogeneous storage. For example, reserved space for RAM_DISK storage can be configured using property ‘dfs.datanode.du.reserved.ram_disk’. If specific storage type reservation is not configured then dfs.datanode.du.reserved will be used. Support multiple size unit suffix(case insensitive), as described in dfs.blocksize. Note: In case of using tune2fs to set reserved-blocks-percentage, or other filesystem tools, then you can possibly run into out of disk errors because hadoop will not check those external tool configurations.
DN可容忍失效的磁盘数, 默认值0
The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown. The value should be greater than or equal to -1 , -1 represents minimum 1 valid volume.
是否校验HDFS权限 (UGO)
If “true”, enable permission checking in HDFS. If “false”, permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
DN块信息上报周期, 默认值21600000 (6小时)
Determines block reporting interval in milliseconds.
dfs.block.scanner.volume.bytes.per.second
blokc扫描IO阀值, 默认值1048576 (1M)
If this is 0, the DataNode’s block scanner will be disabled. If this is positive, this is the number of bytes per second that the DataNode’s block scanner will try to scan from each volume.
block全量扫描周期, 默认值504 (3周)
If this is positive, the DataNode will not scan any individual block more than once in the specified scan period. If this is negative, the block scanner is disabled. If this is set to zero, then the default value of 504 hours or 3 weeks is used. Prior versions of HDFS incorrectly documented that setting this key to zero will disable the block scanner.
DN目录扫描周期, 检查本地文件与元数据的一致性, 默认值21600 (6小时)
Interval in seconds for Datanode to scan data directories and reconcile the difference between blocks in memory and on the disk. Support multiple time unit suffix(case insensitive), as described in dfs.heartbeat.interval.If no time unit is specified then seconds is assumed
DN目录扫描线程数, 默认值1
How many threads should the threadpool used to compile reports for volumes in parallel have
块复制操作数量倍数, 默认2
Note: Advanced property. Change with caution. This determines the total amount of block transfers to begin in parallel at a DN, for replication, when such a command list is being sent over a DN heartbeat by the NN. The actual number is obtained by multiplying this multiplier with the total number of live nodes in the cluster. The result number is the number of blocks to begin transfers immediately for, per DN heartbeat. This number can be any positive, non-zero integer.
块复制任务数, 默认值2
Hard limit for the number of replication streams other than those with highest-priority.
块复制任务数上限, 默认值4
Hard limit for all replication streams.
DN读写数据块线程数, 默认值4096
Specifies the maximum number of threads to use for transferring data in and out of the DN.
dfs.datanode.max.transfer.threads
is the number ofDataXceiver
threads, which is used for transfering blocks via the DTP (data transfer protocol). The block data is big and the transfer takes some time. 1 thread will be served for one block reading. only until the whole block is transferred, the thread can be reused. If there’s many clients request block at the same time, we need more threads. For each write connection, there will be 2 threads. So this number should be larger for write bound applications. [java - Threads in Hadoop - Stack Overflow]
最小副本数, 默认值1
Minimal block replication.
安全模式阀值, 默认值0.999f
Specifies the percentage of blocks that should satisfy the minimal replication requirement defined by dfs.namenode.replication.min. Values less than or equal to 0 mean not to wait for any particular percentage of blocks before exiting safemode. Values greater than 1 will make safe mode permanent.
the logical name for this new nameservice
dfs.nameservices
mycluster
unique identifiers for each NameNode in the nameservice
dfs.ha.namenodes.mycluster
nn1,nn2,nn3
the fully-qualified RPC address for each NameNode to listen on
dfs.namenode.rpc-address.mycluster.nn1
machine1.example.com:8020
dfs.namenode.rpc-address.mycluster.nn2
machine2.example.com:8020
dfs.namenode.rpc-address.mycluster.nn3
machine3.example.com:8020
the fully-qualified HTTP address for each NameNode to listen on
dfs.namenode.http-address.mycluster.nn1
machine1.example.com:9870
dfs.namenode.http-address.mycluster.nn2
machine2.example.com:9870
dfs.namenode.http-address.mycluster.nn3
machine3.example.com:9870
the location of the shared storage directory
dfs.namenode.shared.edits.dir
file:///mnt/filer1/dfs/ha-name-dir-shared
the Java class that HDFS clients use to contact the Active NameNode
dfs.client.failover.proxy.provider.mycluster
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
a list of scripts or Java classes which will be used to fence the Active NameNode during a failover
It is critical for correctness of the system that only one NameNode be in the Active state at any given time. Thus, during a failover, we first ensure that the Active NameNode is either in the Standby state, or the process has terminated, before transitioning another NameNode to the Active state. In order to do this, you must configure at least one fencing method. These are configured as a carriage-return-separated list, which will be attempted in order until one indicates that fencing has succeeded. There are two methods which ship with Hadoop: shell and sshfence. For information on implementing your own custom fencing method, see the org.apache.hadoop.ha.NodeFencer class.
dfs.ha.fencing.methods
shell(/bin/true)
the default path prefix used by the Hadoop FS client when none is given
fs.defaultFS
hdfs://mycluster