IP:192.168.1.106 

    HOSTNAME:master

添加用户用组可以参考:

    http://472053211.blog.51cto.com/3692116/1577110

    用户组名称:hadoops

    用户名:hadoop 

安装JDK 可以参考:

    http://472053211.blog.51cto.com/3692116/1577109

添加互信可以参考:

    http://472053211.blog.51cto.com/3692116/1577111

说明:HADOOP 解压到了/app目录

        1、解压文件

                tar -zxvf hadoop-2.2.0.tar.gz 

        2、对解压的文件重命名

                mv  hadoop-2.2.0  hadoop 

        3、首先修改HADOOP 运行环境变量在这里面HADOOP1与2的版本差异比较大hadoop1 版本直接在hadoop目录下的conf目录中而2周日在$HADOOP_HOME/etc/hadoop目录中。

            1)、cd /app/hadoop/etc/hadoop  会发现里面有好多文件不过里面文件好多都是和hadoop1的文件相同的。。

            2)、vi hadoop-env.sh修改里面export JAVA_HOME=${JAVA_HOME} 为export         JAVA_HOME=/app/jdk (这里面的我的JDK 安装在了/app/目录并命名为了jdk)

            3)、:wq!保存文件

        4、配置core-site.xml。其中这个配置代表hadoop 的核心配置文件.我的建议是到源码包找core-default.xml这个文件。打开会发现里面有好多配置。

        这里面我们暂时需要配置的有两项一个是格式化数据存储的临时目录另一个HDFS的访问路径,找到core-site.xml中的hadoop.tmp.dir 和fs.defaultFS拷贝并修改它们的value中的内容

        

            

              hadoop.tmp.dir

              /app/hadoop/tmpdata

              A base for other temporary directories.

            

            

            

              fs.defaultFS

              hdfs://master:49000

              The name of the default file system.  A URI whose

              scheme and authority determine the FileSystem implementation.  The

              uri's scheme determines the config property (fs.SCHEME.impl) naming

              the FileSystem implementation class.  The uri's authority is used to

              determine the host, port, etc. for a filesystem.

            

        

        最后保存就可以了.

        5、配置 hdfs-site.xml。这是HADOOP两大核心HDFS 的配置文件.同样需要去解压HDFS中的源码文件.里面会有一个hdfs-defalut.xml文件。

        这里面我们需要配置有几项目主要hdfs 数据存储目录和文件备份数据.

        

        

          dfs.namenode.name.dir

          file:///app/hadoop/dfs/name

          Determines where on the local filesystem the DFS name node

              should store the name table(fsp_w_picpath).  If this is a comma-delimited list

              of directories then the name table is replicated in all of the

              directories, for redundancy.

        

        

        

          dfs.datanode.data.dir

          file:///app/hadoop/dfs/data

          Determines where on the local filesystem an DFS data node

          should store its blocks.  If this is a comma-delimited

          list of directories, then data will be stored in all named

          directories, typically on different devices.

          Directories that do not exist are ignored.

          

        

        

        

          dfs.permissions.enabled

          false

          

            If "true", enable permission checking in HDFS.

            If "false", permission checking is turned off,

            but all other behavior is unchanged.

            Switching from one parameter value to the other does not change the mode,

            owner or group of files or directories.

          

        

        

          dfs.replication

          1

          Default block replication. 

          The actual number of replications can be specified when the file is created.

          The default is used if replication is not specified in create time.

          

        

        最后保存文件.

        5、配置mapred-site.xml.这个文件与HADOOP1 差别很大.

        

            

              mapreduce.framework.name

              yarn

              The runtime framework for executing MapReduce jobs.

              Can be one of local, classic or yarn.

              

            

        

        最后保存文件.

        6、配置yarn-site.xml

        

        

            

                The hostname of the RM.

                yarn.resourcemanager.hostname

                master

               

            

                yarn.resourcemanager.resource-tracker.address

                master:49100

              

              

              

                The address of the scheduler interface.

                yarn.resourcemanager.scheduler.address

                master:49200

              

              

              

                The class to use as the resource scheduler.

                yarn.resourcemanager.scheduler.class

                org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

              

              

              

                The address of the applications manager interface in the RM.

                yarn.resourcemanager.address

                master:49300

              

              

              

                List of directories to store localized files in. An 

                  application's localized file directory will be found in:

                  ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.

                  Individual containers' work directories, called container_${contid}, will

                  be subdirectories of this.

              

                yarn.nodemanager.local-dirs

                

              

              

              

              

                The address of the container manager in the NM.

                yarn.nodemanager.address

                master:49400

              

              

              

                Amount of physical memory, in MB, that can be allocated 

                for containers.

                yarn.nodemanager.resource.memory-mb

                10000

              

              

              

                Where to aggregate logs to.

                yarn.nodemanager.remote-app-log-dir

                /app/hadoop/logs

              

              

              

                

                  Where to store container logs. An application's localized log directory 

                  will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.

                  Individual containers' log directories will be below this, in directories 

                  named container_{$contid}. Each container directory will contain the files

                  stderr, stdin, and syslog generated by that container.

                

                yarn.nodemanager.log-dirs

                /app/hadoop/logs/userlogs

              

              

              

                the valid service name should only contain a-zA-Z0-9_ and can not start with numbers

                yarn.nodemanager.aux-services

                mapreduce_shuffle

                

              

        

        保存文件

        

        7、修改集群文件slaves将localhost修改为master保存文件

        

        8、将HADOOP_HOME 目录改变为hadoop 用户

                chown -R hadoop:hadoops hadoop

        9、格式化namenode

                hadoop namenode -format

        9、启动

                start-all.sh

        10、通过浏览器访问http://192.168.1.106:50070/dfshealth.jsp

        其中在Cluster Summary里面有两项目Live Nodes 和Dead Nodes 

        Live Nodes >=1 表示DATANODE 启动正确

        可以通过http://192.168.1.106:8088地址查看所有的应用