Hadoop伪分布式环境搭建

Hadoop伪分布式环境搭建 

本次主要是对Hadoop Pseudo-Distributed环境搭建做介绍,以下操作都是在root用户下进行。

一、软件环境配置

1、 VM:VMware-workstationl-v7.1.4

2、 OS:ubuntu-11.04

3、 JDK:jdk1.6.0_27

4、 Hadoop:hadoop-0.20.2

5、 ssh

二、安装JDK

1、下载JDK:jdk-6u27-linux-i586.bin,并把它放到安装JDK的目录。

2、解压安装命令如下:

root@ubuntu:/usr/java# ./jdk-6u27-linux-i586.bin

3、配置环境变量

用如下命令打开/etc/profile文件:

root@ubuntu:/# vim /etc/profile

在文件最后添加内容如下:

export JAVA_HOME=/usr/java/jdk1.6.0_27
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

保存并退出文件,然后运行source命令使文件生效:

root@ubuntu:~# source /etc/profile

4、测试JDK

root@ubuntu:~# java -version
java version "1.6.0_27"
Java(TM) SE Runtime Environment (build 1.6.0_27-b07)
Java HotSpot(TM) Client VM (build 20.2-b06, mixed mode, sharing)

三、安装配置ssh

1、检查是否安装了ssh

root@ubuntu:~# dpkg --list | grep ssh

默认一般只安装了openssh-client,没有安装openssh-server。可以通过下面命令安装server:

root@ubuntu:~# apt-get install openssh-server

2、检查ssh是否启动

# 杳看是否有sshd进程
root@ubuntu:~# ps -ef | grep ssh

如果未启动,可以通过下面命令启动:

root@ubuntu:~# /etc/init.d/ssh start

3、免密码配置ssh

root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
##  -P表示密码,-P '' 就表示空密码,也可以不用-P参数,这样就要三车回车,用-P就一次回车。
Generating public/private dsa key pair.
Your identification has been saved in .ssh/id_dsa.
Your public key has been saved in .ssh/id_dsa.pub.
省略...
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

4、验证没有密码是否能ssh到localhost

root@ubuntu:~# ssh localhost
Welcome to Ubuntu 11.04 (GNU/Linux 2.6.38-8-generic i686)
*Documentation:  https://help.ubuntu.com/
225 packages can be updated.
75 updates are security updates.
Last login: Tue Sep 27 03:00:30 2011 from ip6-localhost

四、Hadoop安装配置

1、下载以前的稳定版:hadoop-0.20.2.tar.gz复制到准备安装的目录。

2、切换到安装目录,并解压。

3、配置

hadoop-env.sh:

取消JAVA_HOME注释并做如下修改:

export JAVA_HOME=/usr/java/jdk1.6.0_27

 其它的可以根据需要做修改。

con/core-site.xml:

<configuration>

<property>

    <name>fs.default.name</name>

    <value>hdfs://hadoop-test1:9000</value>

</property>

<property>

    <name>hadoop.tmp.dir</name>

    <value>/home/hadoop/tmp</value>

    <description>A base for other temporary directories.</description>

</property>

</configuration>

conf/mapred-site.xml:

<configuration>

<property>

    <name>mapred.job.tracker</name>

    <value>hadoop-test1:9001</value>

</property>

</configuration>

conf/hdfs-site.xml:

<configuration>

<property>

    <name>dfs.name.dir</name>

    <value>/home/hadoop/dfs/name</value>

    <description>Determines where on the local filesystem the DFS name node should store </description>

</property>

<property>

    <name>dfs.data.dir</name>

    <value>/home/hadoop/dfs/data</value>

    <description>Determin. If this is a comma-delimited </description>

</property>

<property>

    <name>dfs.replication</name>

    <value>1</value>

    <description>Default block replicied when the file is created. The default </description>

</property>

<property>

    <name>dfs.permissions</name>

    <value>false</value>

</property>

</configuration>

4、运行

格式化HDFS:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop namenode –format

启动hadoop守护进程:

 root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/start-all.sh

通过浏览器查看hadoop运行状态:

NameNode - http://localhost:50070/

JobTracker - http://localhost:50030/

复制本地文件到HDFS的input目录:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs –put conf input

运行hadoop提供的例子:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop jar hadoop-0.20.2-examples.jar grep input output 'dfs[a-z.]+'

查看DFS文件:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -ls output

复制DFS文件到本地,并在本地查看:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -get output output

root@ubuntu:/usr/hadoop/hadoop-0.20.2# cat output/*

或者直接查看DFS文件:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/hadoop fs -cat output/*

关闭hadoop守护进程:

root@ubuntu:/usr/hadoop/hadoop-0.20.2# bin/stop-all.sh

五、其它

    Hadoop下载:

    http://hadoop.apache.org/hdfs/releases.html

    Hadoop详细开发指南请参考:

    http://hadoop.apache.org/common/docs/stable/

 

你可能感兴趣的:(java,hadoop,ssh,ubuntu,documentation)