Hadoop安装、Hadoop环境搭建(Apache)版本

  今天早上帮一新人远程搭建Hadoop集群(1.x或者0.22以下版本),感触颇深,在此写下最简单的Apache Hadoop搭建方法,给新人提供帮助,我尽量说得详尽点;点击查看Avatorhadoop搭建步骤。

1.环境准备:

  1).机器准备:安装目标机器要能相互ping通,所以对于不同机器上的虚拟机要采取"桥连接"的方式进行网络配置(如果是宿主方式,要先关闭宿主机防火墙;上网方式的具体配置方法请google vmvair上网配置、Kvm桥连接上网、Xen在安装的时候就能够手动配置局域网IP,实在不行,请留言);关闭机器的防火墙:/etc/init.d/iptables stop;chkconfig iptables off;修改机器的主机名建议用hadoopservern,n为实际你给机器的机器编号,因为主机名如果含有'_''.'等特殊符号会导致启动问题的。修改机器的/etc/hosts,将IP和hostname的映射关系添加进去。

  2).下载稳定版本Hadoop包并解压,配置Java环境(对于java环境,一般都配置~/.bash_profile,考虑到机器的安全性问题);

  3).免密钥,这里有个小的技巧:在hadoopserver1上

    ssh-kengen -t rsa -P '';一路回车

    ssh-copy-id user@host;

    然后将~/.ssh/目录下的id_rsa和id_rsa.pub,复制到其它机器;

    ssh hadoopserver2;运行scp -r ~/.ssh/authorized_keys hadoopserver1:~/.ssh/;这样所有的免密钥都完成了,可以相互互相ssh;多实际,多学习,网上没有说hadoop免密钥用用ssh-copy-id来简化操作的。

2.步骤:

  1).在hadoopserver1(namenode)上hadoop解压目录的conf下修改下面几个文件:

    core-site.xml:

      

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

<property>

          <name>fs.default.name</name>

          <value>hdfs://hadoopserver1:9000</value>

</property>



<property>

          <name>hadoop.tmp.dir</name>

          <value>/xxx/hadoop-version/tmp</value>

</property>



</configuration>

 

    hdfs-site.xml:

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>

    <property>

  <name>dfs.permissions</name>

  <value>false</value>

</property>

    

    <property>

          <name>dfs.replication</name>

          <value>3</value>

        </property>



        <property>

          <name>dfs.name.dir</name>

          <value>/xxx/hadoop-version/name</value>

        </property>



        <property>

          <name>dfs.data.dir</name>

          <value>/xxx/hadoop-version/data</value>

        </property>



        <property>

          <name>dfs.block.size</name>

          <value>670720</value>

        </property>

<!--

<property>

  <name>dfs.secondary.http.address</name>

  <value>0.0.0.0:60090</value>

  <description>

    The secondary namenode http server address and port.

    If the port is 0 then the server will start on a free port.

  </description>

</property>



<property>

  <name>dfs.datanode.address</name>

  <value>0.0.0.0:60010</value>

  <description>

    The address where the datanode server will listen to.

    If the port is 0 then the server will start on a free port.

  </description>

</property>



<property>

  <name>dfs.datanode.http.address</name>

  <value>0.0.0.0:60075</value>

  <description>

    The datanode http server address and port.

    If the port is 0 then the server will start on a free port.

  </description>

</property>



<property>

  <name>dfs.datanode.ipc.address</name>

  <value>0.0.0.0:60020</value>

  <description>

    The datanode ipc server address and port.

    If the port is 0 then the server will start on a free port.

  </description>

</property>







<property>

  <name>dfs.http.address</name>

  <value>0.0.0.0:60070</value>

  <description>

    The address and the base port where the dfs namenode web ui will listen on.

    If the port is 0 then the server will start on a free port.

  </description>

</property>

-->



<property>

  <name>dfs.support.append</name>

  <value>true</value>

  <description>Does HDFS allow appends to files?

               This is currently set to false because there are bugs in the

               "append code" and is not supported in any prodction cluster.

  </description>

</property>



</configuration>

    mapred-site.xml

      

<?xml version="1.0"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>



<!-- Put site-specific property overrides in this file. -->



<configuration>





        <property>

          <name>mapred.job.tracker</name>

          <value>hadoopserver1:9001</value>

        </property>



        <property>

          <name>mapred.tasktracker.map.tasks.maximum</name>

          <value>2</value>

        </property>



        <property>

          <name>mapred.tasktracker.reduce.tasks.maximum</name>

          <value>2</value>

        </property>

<!--

<property>    

  <name>mapred.job.tracker.http.address</name>

  <value>0.0.0.0:50030</value>

  <description>

    The job tracker http server address and port the server will listen on.

    If the port is 0 then the server will start on a free port.

  </description>

</property>



<property>

  <name>mapred.task.tracker.http.address</name>

  <value>0.0.0.0:60060</value>

  <description>

    The task tracker http server address and port.

    If the port is 0 then the server will start on a free port.

  </description>

</property>

-->





</configuration>

    master中填写的是secondname的hostname,用来告知hadoop在这个机器上启动secondname;

    slaves则标示的是datanode节点,一行一个hostname

  2).修改hadoop-env.sh:

    指定JAVA_HOME到你的java安装目录

    添加一个启动环境:export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"。用来保证绑定IPV4IP;

  3).手动分发:scp -r hadoop目录 hadoopserver1...n:/相同前缀目录/

  4).启动:

    bin/hadooop namenode -format

    bin/start-all.sh

  5).在浏览器里输入http://hadoopserver1的iP:50070即可查看机器的状态

你可能感兴趣的:(apache)