hadoop1.2.1的core-default.xml文件
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Do not modify this file directly. Instead, copy entries that you --> <!-- wish to modify from this file into core-site.xml and change them --> <!-- there. If core-site.xml does not already exist, create it. --> <!-- 必须配置参数: 1、fs.default.name 常用参数: 1、hadoop.tmp.dir 2、hadoop.native.lib 3、hadoop.logfile.size 4、hadoop.logfile.count 5、io.file.buffer.size 6. io.compression.codecs 7. fs.trash.interval 8.local.cache.size --> <configuration> <!--- 全局属性 global properties --> <property> <name>hadoop.tmp.dir</name> <value>/tmp/hadoop-${user.name}</value> <description> 临时目录设定。 尽量手动配置这个选项,否则的话都默认存在了系统的默认临时文件/tmp里。 并且手动配置的时候,如果服务器是多磁盘的,每个磁盘都设置一个临时文件目录, 这样便于mapreduce或者hdfs等使用的时候提高磁盘IO效率。 hadoop.tmp.dir是hadoop文件系统依赖的基础配置,很多路径都依赖它。 它默认的位置是在/tmp/{$user}下面,但是在/tmp路径下的存储是不安全的,因为linux一次重启,文件就可能被删除。 </description> <description>A base for other temporary directories.</description> </property> <property> <name>hadoop.native.lib</name> <value>true</value> <description> 如果hadoop本地库存在,则使用本地库 在Hadoop中,本地库应用在文件的压缩上面: 在使用zlib和gzip这两种压缩方式的时候,Hadoop默认会从$HADOOP_HOME/lib/native/Linux-*目录中加载本地库。 如果加载成功,输出为: DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... INFO util.NativeCodeLoader - Loaded the native-hadoop library 如果加载失败,输出为: INFO util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable。 </description> <description>Should native hadoop libraries, if present, be used.</description> </property> <property> <name>hadoop.http.filter.initializers</name> <value></value> <description> 该属性的值可以是以逗号分隔的一组类。 该类必须继承org.apache.hadoop.http.FilterInitializer。 这些fliter用于所有使用jsp和servlet。 </description> <description>A comma separated list of class names. Each class in the list must extend org.apache.hadoop.http.FilterInitializer. The corresponding Filter will be initialized. Then, the Filter will be applied to all user facing jsp and servlet web pages. The ordering of the list defines the ordering of the filters.</description> </property> <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.ShellBasedUnixGroupsMapping</value> <description> 该类被用来映射用户的组。 </description> <description>Class for user to group mapping (get groups for a given user) </description> </property> <property> <name>hadoop.security.authorization</name> <value>false</value> <description> 代表了是否启用服务级别授权。 </description> <description>Is service-level authorization enabled?</description> </property> <property> <name>hadoop.security.instrumentation.requires.admin</name> <value>false</value> <description> </description> <description> Indicates if administrator ACLs are required to access instrumentation servlets (JMX, METRICS, CONF, STACKS). </description> </property> <property> <name>hadoop.security.authentication</name> <value>simple</value> <description> 授权方式,取值为simple(不授权)或kerberos </description> <description>Possible values are simple (no authentication), and kerberos </description> </property> <property> <name>hadoop.security.token.service.use_ip</name> <value>true</value> <description> </description> <description>Controls whether tokens always use IP addresses. DNS changes will not be detected if this option is enabled. Existing client connections that break will always reconnect to the IP of the original host. New clients will connect to the host's new IP but fail to locate a token. Disabling this option will allow existing and new clients to detect an IP change and continue to locate the new host's token. </description> </property> <property> <name>hadoop.security.use-weak-http-crypto</name> <value>false</value> <description> </description> <description>If enabled, use KSSL to authenticate HTTP connections to the NameNode. Due to a bug in JDK6, using KSSL requires one to configure Kerberos tickets to use encryption types that are known to be cryptographically weak. If disabled, SPNEGO will be used for HTTP authentication, which supports stronger encryption types. </description> </property> <!-- <property> <name>hadoop.security.service.user.name.key</name> <value></value> <description>Name of the kerberos principal of the user that owns a given service daemon </description> </property> --> <!--- 日志属性 logging properties --> <property> <name>hadoop.logfile.size</name> <value>10000000</value> <description> 每个日志文件大小的最大值 </description> <description>The max size of each log file</description> </property> <property> <name>hadoop.logfile.count</name> <value>10</value> <description> 日志文件的最大个数 </description> <description>The max number of log files</description> </property> <!-- I/O属性 i/o properties --> <property> <name>io.file.buffer.size</name> <value>4096</value> <description> 在使用sequence 文件时缓冲的大小。该值应该是硬件页大小的整数倍。 它决定了在一次读取操作时有多少数据被缓冲。 SequenceFiles在读写中可以使用的缓存大小。 hadoop在读写hdfs的文件,还有map的输出都用到了这个缓冲区容量, 如果系统允许,64KB(65536字节)至128KB(131072字节)是较普遍的选择。 </description> <description>The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.</description> </property> <property> <name>io.bytes.per.checksum</name> <value>512</value> <description> 在数据完整性检查时使用该参数。 必须比io.file.buffer.size参数小。 含义:对数据块中io.bytes.per.checksum个字节做一次CRC32,冗余检测。 </description> <description>The number of bytes per checksum. Must not be larger than io.file.buffer.size.</description> </property> <property> <name>io.skip.checksum.errors</name> <value>false</value> <description> 如果为true,当在checksum时遇到错误,跳过错误,而不是抛出异常。 </description> <description>If true, when a checksum error is encountered while reading a sequence file, entries are skipped, instead of throwing an exception.</description> </property> <property> <name>io.compression.codecs</name> <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.SnappyCodec</value> <description> 编解码器列表,用来压缩和解压缩 设置压缩和解压的方式,常用的压缩算法有 gzip,lzo,snappy等。 使用lzo压缩算法,启用压缩后,与不用压缩相比,数据量能减少1/3,但时间减少的并不明显。 </description> <description>A list of the compression codec classes that can be used for compression/decompression.</description> </property> <property> <name>io.serializations</name> <value>org.apache.hadoop.io.serializer.WritableSerialization</value> <description> 序列化工厂类列表,用来获取序列化器和反序列化器 </description> <description>A list of serialization classes that can be used for obtaining serializers and deserializers.</description> </property> <!-- 文件系统属性 file system properties --> <property> <name>fs.default.name</name> <value>file:///</value> <description> 默认文件系统的名称。不设置,则使用本地文件系统。 一般设置为hdfs://${namenode}:${port} </description> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property> <property> <name>fs.trash.interval</name> <value>0</value> <description> 垃圾检查点时间间隔。单位:分钟。如果为0,该特性会失效。 文件废弃标识设定,0为禁止此功能。 这个是开启hdfs文件删除自动转移到垃圾箱的选项,值为垃圾箱文件清除时间。 当hadoop执行rm后,会将文件move到当前文件夹下的.Trash目录下, 误删文件后,可以到对应的.Trash目录下恢复文件。 一般设置为1440,即一天。 </description> <description>Number of minutes between trash checkpoints. If zero, the trash feature is disabled. </description> </property> <property> <name>fs.file.impl</name> <value>org.apache.hadoop.fs.LocalFileSystem</value> <description> 本地文件系统实现类,用于"file://"开头的URI </description> <description>The FileSystem for file: uris.</description> </property> <property> <name>fs.hdfs.impl</name> <value>org.apache.hadoop.hdfs.DistributedFileSystem</value> <description> HDFS文件系统实现类,用于"hdfs://"开头的URI </description> <description>The FileSystem for hdfs: uris.</description> </property> <property> <name>fs.s3.impl</name> <value>org.apache.hadoop.fs.s3.S3FileSystem</value> <description> amazon的文件系统实现类,用于"s3://"开头的URI </description> <description>The FileSystem for s3: uris.</description> </property> <property> <name>fs.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> <description> amazon的文件系统实现类,用于"s3n://"开头的URI </description> <description>The FileSystem for s3n: (Native S3) uris.</description> </property> <property> <name>fs.kfs.impl</name> <value>org.apache.hadoop.fs.kfs.KosmosFileSystem</value> <description> Kosmos文件系统实现类,用于"kfs://"开头的URI KFS是一个对GFS的C++实现 </description> <description>The FileSystem for kfs: uris.</description> </property> <property> <name>fs.hftp.impl</name> <value>org.apache.hadoop.hdfs.HftpFileSystem</value> <description> 基于HTTP协议访问文件系统的协议的实现。 </description> </property> <property> <name>fs.hsftp.impl</name> <value>org.apache.hadoop.hdfs.HsftpFileSystem</value> <description> 为HftpFileSystem的子类。基于HTTPS访问文件系统的实现。 </description> </property> <property> <name>fs.webhdfs.impl</name> <value>org.apache.hadoop.hdfs.web.WebHdfsFileSystem</value> <description> </description> </property> <property> <name>fs.ftp.impl</name> <value>org.apache.hadoop.fs.ftp.FTPFileSystem</value> <description> 实现了FTP协议的文件系统 </description> <description>The FileSystem for ftp: uris.</description> </property> <property> <name>fs.ramfs.impl</name> <value>org.apache.hadoop.fs.InMemoryFileSystem</value> <description> 内存文件系统的实现 </description> <description>The FileSystem for ramfs: uris.</description> </property> <property> <name>fs.har.impl</name> <value>org.apache.hadoop.fs.HarFileSystem</value> <description> Hadoop archives格式 文件系统 </description> <description>The filesystem for Hadoop archives. </description> </property> <property> <name>fs.har.impl.disable.cache</name> <value>true</value> <description> 不缓存'har'文件系统实例。 </description> <description>Don't cache 'har' filesystem instances.</description> </property> <property> <name>fs.checkpoint.dir</name> <value>${hadoop.tmp.dir}/dfs/namesecondary</value> <description> secondary namenode检查点存放映像文件的目录。 逗号分割,如果有多个目录,则每个目录都会保存一份冗余数据 </description> <description>Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy. </description> </property> <property> <name>fs.checkpoint.edits.dir</name> <value>${fs.checkpoint.dir}</value> <description> secondary namenode检查点存放编辑日志的目录。 逗号分割,如果有多个目录,则每个目录都会保存一份冗余数据 </description> <description>Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directoires then teh edits is replicated in all of the directoires for redundancy. Default value is same as fs.checkpoint.dir </description> </property> <property> <name>fs.checkpoint.period</name> <value>3600</value> <description> 检查点执行周期。单位秒 </description> <description>The number of seconds between two periodic checkpoints. </description> </property> <property> <name>fs.checkpoint.size</name> <value>67108864</value> <description> 编辑日志的大小。单位:字节。 </description> <description>The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired. </description> </property> <property> <name>fs.s3.block.size</name> <value>67108864</value> <description> 写数据到s3文件系统时,文件块的大小。 </description> <description>Block size to use when writing files to S3.</description> </property> <property> <name>fs.s3.buffer.dir</name> <value>${hadoop.tmp.dir}/s3</value> <description> 在发送到s3之前或者正在接收s3的文件时,临时文件应该存放到本地文件系统的哪个地方 </description> <description>Determines where on the local filesystem the S3 filesystem should store files before sending them to S3 (or after retrieving them from S3). </description> </property> <property> <name>fs.s3.maxRetries</name> <value>4</value> <description> 在发送失败信号到应用程序之前,尝试读或写的最大次数。 </description> <description>The maximum number of retries for reading or writing files to S3, before we signal failure to the application. </description> </property> <property> <name>fs.s3.sleepTimeSeconds</name> <value>10</value> <description> 每个s3重试之间sleep时间。单位:秒 </description> <description>The number of seconds to sleep between each S3 retry. </description> </property> <property> <name>local.cache.size</name> <value>10737418240</value> <description> 保存缓存最大大小。默认为10737418240个字节,即10G mapreduce.tasktracker.local.cache.numberdirectories(10000) 缓存文件/目录数目上限 mapreduce.tasktracker.distributedcache.checkperiod (默认60s) 缓存文件控制周期 hadoop distributedcache将文件分发到各个结点本地磁盘保存,用完后并不会被立即清理, 而是专门启动一个线程根据文件大小限制(local.cache.size设定,默认是10G) 和文件/目录数(mapreduce.tasktracker.local.cache.numberdirectories,默认是10000)的上限 周期性(mapreduce.tasktracker.distributedcache.checkperiod控制,默认60s)进行清理。 </description> <description>The limit on the size of cache you want to keep, set by default to 10GB. This will act as a soft limit on the cache directory for out of band data. </description> </property> <property> <name>io.seqfile.compress.blocksize</name> <value>1000000</value> <description> 在块压缩序列文件中,块最小尺寸。 </description> <description>The minimum block size for compression in block compressed SequenceFiles. </description> </property> <property> <name>io.seqfile.lazydecompress</name> <value>true</value> <description> 延迟解压 </description> <description>Should values of block-compressed SequenceFiles be decompressed only when necessary. </description> </property> <property> <name>io.seqfile.sorter.recordlimit</name> <value>1000000</value> <description> 在内存中保存记录的最大数。 </description> <description>The limit on number of records to be kept in memory in a spill in SequenceFiles.Sorter </description> </property> <property> <name>io.mapfile.bloom.size</name> <value>1048576</value> <description> </description> <description>The size of BloomFilter-s used in BloomMapFile. Each time this many keys is appended the next BloomFilter will be created (inside a DynamicBloomFilter). Larger values minimize the number of filters, which slightly increases the performance, but may waste too much space if the total number of keys is usually much smaller than this number. </description> </property> <property> <name>io.mapfile.bloom.error.rate</name> <value>0.005</value> <description> </description> <description>The rate of false positives in BloomFilter-s used in BloomMapFile. As this value decreases, the size of BloomFilter-s increases exponentially. This value is the probability of encountering false positives (default is 0.5%). </description> </property> <property> <name>hadoop.util.hash.type</name> <value>murmur</value> <description> Hash的默认实现。当前可以取两个值murmur和jenkins。分别对应MurmurHash和JenkinsHash </description> <description>The default implementation of Hash. Currently this can take one of the two values: 'murmur' to select MurmurHash and 'jenkins' to select JenkinsHash. </description> </property> <!-- ipc properties --> <property> <name>ipc.client.idlethreshold</name> <value>4000</value> <description> 探寻空闲链接之后,定义最大链接数。 </description> <description>Defines the threshold number of connections after which connections will be inspected for idleness. </description> </property> <property> <name>ipc.client.kill.max</name> <value>10</value> <description> 客户端一次可以释放链接的最大数。 </description> <description>Defines the maximum number of clients to disconnect in one go. </description> </property> <property> <name>ipc.client.connection.maxidletime</name> <value>10000</value> <description> 客户端最大空闲时间。默认为10000毫秒 </description> <description>The maximum time in msec after which a client will bring down the connection to the server. </description> </property> <property> <name>ipc.client.connect.max.retries</name> <value>10</value> <description> 客户端同server建立连接时最大重试次数。默认为10 </description> <description>Indicates the number of retries a client will make to establish a server connection. </description> </property> <property> <name>ipc.server.listen.queue.size</name> <value>128</value> <description> server监听客户端请求链接的监听队列大小。 </description> <description>Indicates the length of the listen queue for servers accepting client connections. </description> </property> <property> <name>ipc.server.tcpnodelay</name> <value>false</value> <description> 是否开启Nagle算法(该算法是对TCP/IP拥塞控制)。如果开启,会减少延迟,但是会增加小数据报。 </description> <description>Turn on/off Nagle's algorithm for the TCP socket connection on the server. Setting to true disables the algorithm and may decrease latency with a cost of more/smaller packets. </description> </property> <property> <name>ipc.client.tcpnodelay</name> <value>false</value> <description> </description> <description>Turn on/off Nagle's algorithm for the TCP socket connection on the client. Setting to true disables the algorithm and may decrease latency with a cost of more/smaller packets. </description> </property> <!-- Web Interface Configuration --> <property> <name>webinterface.private.actions</name> <value>false</value> <description> If set to true, the web interfaces of JT and NN may contain actions, such as kill job, delete file, etc., that should not be exposed to public. Enable this option if the interfaces are only reachable by those who have the right authorization. </description> </property> <!-- 代理配置 Proxy Configuration --> <property> <name>hadoop.rpc.socket.factory.class.default</name> <value>org.apache.hadoop.net.StandardSocketFactory</value> <description> Default SocketFactory to use. This parameter is expected to be formatted as "package.FactoryClassName". </description> </property> <property> <name>hadoop.rpc.socket.factory.class.ClientProtocol</name> <value></value> <description> SocketFactory to use to connect to a DFS. If null or empty, use hadoop.rpc.socket.class.default. This socket factory is also used by DFSClient to create sockets to DataNodes. </description> </property> <property> <name>hadoop.socks.server</name> <value></value> <description> Address (host:port) of the SOCKS server to be used by the SocksSocketFactory. </description> </property> <!-- Topology Configuration --> <property> <name>topology.node.switch.mapping.impl</name> <value>org.apache.hadoop.net.ScriptBasedMapping</value> <description> The default implementation of the DNSToSwitchMapping. It invokes a script specified in topology.script.file.name to resolve node names. If the value for topology.script.file.name is not set, the default value of DEFAULT_RACK is returned for all node names. </description> </property> <property> <name>net.topology.impl</name> <value>org.apache.hadoop.net.NetworkTopology</value> <description> The default implementation of NetworkTopology which is classic three layer one. </description> </property> <property> <name>topology.script.file.name</name> <value></value> <description> The script name that should be invoked to resolve DNS names to NetworkTopology names. Example: the script would take host.foo.bar as an argument, and return /rack1 as the output. </description> </property> <property> <name>topology.script.number.args</name> <value>100</value> <description> The max number of args that the script configured with topology.script.file.name should be run with. Each arg is an IP address. </description> </property> <property> <name>hadoop.security.uid.cache.secs</name> <value>14400</value> <description> NativeIO maintains a cache from UID to UserName. This is the timeout for an entry in that cache. </description> </property> <!-- HTTP web-consoles Authentication --> <property> <name>hadoop.http.authentication.type</name> <value>simple</value> <description> Defines authentication used for Oozie HTTP endpoint. Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME# </description> </property> <property> <name>hadoop.http.authentication.token.validity</name> <value>36000</value> <description> Indicates how long (in seconds) an authentication token is valid before it has to be renewed. </description> </property> <property> <name>hadoop.http.authentication.signature.secret.file</name> <value>${user.home}/hadoop-http-auth-signature-secret</value> <description> The signature secret for signing the authentication tokens. If not set a random secret is generated at startup time. The same secret should be used for JT/NN/DN/TT configurations. </description> </property> <property> <name>hadoop.http.authentication.cookie.domain</name> <value></value> <description> The domain to use for the HTTP cookie that stores the authentication token. In order to authentiation to work correctly across all Hadoop nodes web-consoles the domain must be correctly set. IMPORTANT: when using IP addresses, browsers ignore cookies with domain settings. For this setting to work properly all nodes in the cluster must be configured to generate URLs with hostname.domain names on it. </description> </property> <property> <name>hadoop.http.authentication.simple.anonymous.allowed</name> <value>true</value> <description> Indicates if anonymous requests are allowed when using 'simple' authentication. </description> </property> <property> <name>hadoop.http.authentication.kerberos.principal</name> <value>HTTP/localhost@LOCALHOST</value> <description> Indicates the Kerberos principal to be used for HTTP endpoint. The principal MUST start with 'HTTP/' as per Kerberos HTTP SPNEGO specification. </description> </property> <property> <name>hadoop.http.authentication.kerberos.keytab</name> <value>${user.home}/hadoop.keytab</value> <description> Location of the keytab file with the credentials for the principal. Referring to the same keytab file Oozie uses for its Kerberos credentials for Hadoop. </description> </property> <property> <name>hadoop.relaxed.worker.version.check</name> <value>false</value> <description> By default datanodes refuse to connect to namenodes if their build revision (svn revision) do not match, and tasktrackers refuse to connect to jobtrackers if their build version (version, revision, user, and source checksum) do not match. This option changes the behavior of hadoop workers to only check for a version match (eg "1.0.2") but ignore the other build fields (revision, user, and source checksum). </description> </property> <property> <name>hadoop.skip.worker.version.check</name> <value>false</value> <description> By default datanodes refuse to connect to namenodes if their build revision (svn revision) do not match, and tasktrackers refuse to connect to jobtrackers if their build version (version, revision, user, and source checksum) do not match. This option changes the behavior of hadoop workers to skip doing a version check at all. This option supersedes the 'hadoop.relaxed.worker.version.check' option. </description> </property> <property> <name>hadoop.jetty.logs.serve.aliases</name> <value>true</value> <description> Enable/Disable aliases serving from jetty </description> </property> <property> <name>ipc.client.fallback-to-simple-auth-allowed</name> <value>false</value> <description> When a client is configured to attempt a secure connection, but attempts to connect to an insecure server, that server may instruct the client to switch to SASL SIMPLE (unsecure) authentication. This setting controls whether or not the client will accept this instruction from the server. When false (the default), the client will not allow the fallback to SIMPLE authentication, and will abort the connection. </description> </property> </configuration>