Hadoop 单机伪分布式运行

1. 安装Java.

2. 安装Hadoop

      从 http://hadoop.apache.org/common/releases.html下载安装包hadoop-0.20.2.tar.gz,并解压~/目录

      $wget http://apache.etoak.com//hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

      $tar -zxvf hadoop-0.20.2.tar.gz

      $cd hadoop-0.20.2

3. 配置环境

      $cd conf

      $vim hadoop-env.sh

       添加:

       export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk
       export   HADOOP_NAMENODE_OPTS="-Dcom.sun.managem/

                    ent.jmxremote$HADOOP_NAMENODE_OPTS"
       export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"
       export HADOOP_LOG_DIR="/home/maxi/hadoop-0.20.2/logs"

       export HADOOP_HOME=/home/maxi/hadoop-0.20.2
       export PATH=$PATH:/home/maxi/hadoop-0.20.2/bin

       :wq

       $vim core-site.xml

       <configuration>
       <property>
          <name>fs.default.name</name> 
          <value>hdfs://localhost:9000</value>
          <description>HDFS的URI,文件系统://namende 标识:端口号</description>
      </property>
      <property>
          <name>hadoop.tmp.dir</name>
          <value>/home/maxi/hadoop-0.20.2/hadooptmp</value>
          <description>namenode上本地的hadoop临时文件夹</description>
      </property>                                                                                
      </configuration>

      :wq

      $vim mapred-site.xml

      填加:

        <configuration>                                                                 
        <property>                                                                  
           <name>mapred.job.tracker</name>                                         
           <value>localhost:9001</value>                                           
          <description>jobtracker标识:端口号,不是URI</description>                
      </property>                                                                 
      <property>                                                                  
          <name>mapred.local.dir</name>                                           
          <value>/home/maxi/hadoop-0.20.2/mapred/local</value>                    
          <description>tasktracker上执行mapreduce程序时的本地目录</description>   
      </property>                                                                 
      <property>                                                                  
          <name>mapred.system.dir</name>                                          
          <value>/home/maxi/hadoop-0.20.2/tmp/hadoop/mapred/system</value>        
          <description>这个是hdfs中的目录,存储执行mr程序时的共享文件</description>              
      </property>                                                                  

     </configuration>

      :wq

      $vim hdfs-site.xml

      填加:

      <configuration>
      <property> 
        <name>dfs.name.dir</name>
        <value>/home/maxi/hadoop-0.20.2/hdfs/name</value>
        <description>namenode上存储hdfs名字空间元数据</description>
       </property>
               
     <property> 
        <name>dfs.data.dir</name>
        <value>/home/maxi/hadoop-0.20.2/hdfs/data</value>
        <description>datanode上数据块的物理存储地址</description>
     </property>
               
    <property> 
        <name>dfs.replication</name>
        <value>1</value>
        <description>副本个数,不配置默认是3,应小于datanode机器数量</description>
     </property>
     </configuration>

       :wq

    4. 配置本地网络:

       在/etc/hosts中填加:

      127.0.0.1 localhost
      127.0.0.1 master
      127.0.0.1 slave

      $sudo ufw disable //关闭防火墙,否则会有问题。

     5. 安装ssh 服务:

       $sudo apt-get install openssh-server

      设置ssh为不需要手动输入密码的方式登陆
       $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
        $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

     6. 为了可以直接执行hadoop中的sh命令,而不用每次都输入其路径修改.bash_profile

        填加:PATH=$PATH:/home/maxi/hadoop-0.20.2/bin/

      7. Java环境配置

        在.bash_profile和/etc/profile中填加:

          PATH=$PATH:/usr/lib/jvm/java-1.6.0-openjdk/bin:$JAVA_HOME/jre/bin
          export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk
          export JRE_HOME=/usr/lib/jvm/java-1.6.0-openjdk/jre
          export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

       8.运行系统

       a)格式化一个新的分布式文件系统

       $ cd hadoop-0.20.2
       $ hadoop namenode –format

       b)启动 hadoop 进程。

       $ start-all.sh
       控制台上的输出信息应该显示启动了 namenode, datanode, secondary namenode, jobtracker,                tasktracker。启动完成之后,通过 ps –ef 应该可以看到启动了5个新的 java 进程.
  
    

      c)运行 wordcount 应用 
       $ cd hadoop-0.20.2
       $ mkdir test
       $ cd test
       #在 test目录下创建两个文本文件, WordCount 程序将统计其中各个单词出现次数
       $ echo "hello world, bye , world." >file1.txt
       $ echo "hello hadoop, goodbye , hadoop" >file2.txt
       $ cd ..
       #将本地文件系统上的./test-txt目录拷到 HDFS 的根目录上,目录名改为 input
       $ bin/hadoop dfs -put ./test input
       #执行例子中的WordCount
       $ bin/hadoop jar hadoop-0.20.2-examples.jar wordcount input output
       #查看执行结果:
       #将文件从 HDFS 拷到本地文件系统中再查看: 
       $ bin/hadoop dfs -get output output
       $ cat output/*
       #也可以直接查看 
       $ bin/hadoop dfs -cat output/*
      

        d) $ bin/stop-all.sh #停止hadoop进程
        e) 关闭SSH-Server
      sudo /etc/init.d/ssh stop

      

你可能感兴趣的:(java,mapreduce,hadoop,vim,Path,output)