今天介绍一下hadoop的相关配置。现在hadoop的版本更新比较快,在配置的时候肯定有些不同,大家可以参考官方文档进行配置。安装hadoop有些先决条件:Sun Java6(更高的版本也行,至于OpenJDK我还没有尝试过。),添加专用的hadoop系统用户,配置SSH(这里的ssh是指的OpenServer,用于在多节点下进行远程操作)

1.linux下安装sun-jdk ,下面是具体步骤

参考:http://www.devsniper.com/ubuntu-12-04-install-sun-jdk-6-7/

  • 下载sun-jdk-6-bin 点击下载
  • 确保文件具有可执行权限
1 chmod +x jdk-6u32-linux-x64.bin
  •      执行bin文件 
1 ./jdk-6u32-linux-x64.bin
  •     移动解压后的文件到指定目录
1 sudo mv jdk1.6.0_32 /usr/lib/jvm/
  •     在系统中安装新的java源
1 sudo update-alternatives --install/usr/bin/javac javac /usr/lib/jvm/jdk1.6.0_32/bin/javac 1
2 sudo update-alternatives --install/usr/bin/java java /usr/lib/jvm/jdk1.6.0_32/bin/java 1
3 sudo update-alternatives --install/usr/bin/javaws javaws /usr/lib/jvm/jdk1.6.0_32/bin/javaws 1
  •    当系统中存在多个java版本时,需要配置系统默认的java
1 sudo update-alternatives --config javac
2 sudo update-alternatives --config java
3 sudo update-alternatives --config javaws
  •     验证java版本
1 java -version

 

2.添加hadoop的系统用户

我们需要使用一个hadoop用户来运行hadoop.

 

$ sudo addgroup hadoop   //添加用户组
$ sudo adduser --ingroup hadoop hduser //在组内添加用户

 

3.SSH配置

SSH的功能已经给大家介绍了。这里我们直接进行SSH的配置。注意:为了在远程访问的时候避免每次都输入密码,我们在生成密钥的时候一般不输入密码或者密码为空。

 

user@ubuntu:~$ su - hduser
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart p_w_picpath is:
[...snipp...]
hduser@ubuntu:~$

 

接下来我们要让SSH能使用新生成的密钥。需要做一下事情。

 

hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

 

最后测试链接到本机是否成功:

 

hduser@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS
[...snipp...]
hduser@ubuntu:~$

 

当你看到上面的信息时候。说明已经成功了。

4.Hadoop安装,你需要从Apache的官方网站下载Hadoop的文件。当前安装的文件是0.2的版本。

下载以后的操作:

 

$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

 

更新$HOME/.bashrc文件,在文件的末尾添加以下内容:

 

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

 

下面对hadoop的文件进行配置:

首先是/usr/local/hadoop/conf/hadoop-env.sh文件

将${JAVA_HOME}改成你的jdk安装路径

 

# The java implementation to use.  Required.
# export JAVA_HOME=${JAVA_HOME}

to

# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

 

其次是改变conf/core-site.xml文件:

 



  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.



  fs.default.name
  hdfs://localhost:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.

 

接下来是conf/mapred-site.xml文件:

 



  mapred.job.tracker
  localhost:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  

 

最后是conf/hdfs-site.xml:

 



  dfs.replication
  1
  Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  

 

在启动hadoop之前我们需要对HDFS文件系统进行格式化,执行一下命令即可。

 

hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

 

 

10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = ubuntu/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/127.0.1.1
************************************************************/
hduser@ubuntu:/usr/local/hadoop$

 

启动单节点集群:

 

hduser@ubuntu:/usr/local/hadoop$ bin/start-all.sh
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.out
hduser@ubuntu:/usr/local/hadoop$

 

你可以使用以下命令来查看hadoop的监听端口:

 

hduser@ubuntu:~$ sudo netstat -plten | grep java
tcp   0  0 0.0.0.0:50070   0.0.0.0:*  LISTEN  1001  9236  2471/java
tcp   0  0 0.0.0.0:50010   0.0.0.0:*  LISTEN  1001  9998  2628/java
tcp   0  0 0.0.0.0:48159   0.0.0.0:*  LISTEN  1001  8496  2628/java
tcp   0  0 0.0.0.0:53121   0.0.0.0:*  LISTEN  1001  9228  2857/java
tcp   0  0 127.0.0.1:54310 0.0.0.0:*  LISTEN  1001  8143  2471/java
tcp   0  0 127.0.0.1:54311 0.0.0.0:*  LISTEN  1001  9230  2857/java
tcp   0  0 0.0.0.0:59305   0.0.0.0:*  LISTEN  1001  8141  2471/java
tcp   0  0 0.0.0.0:50060   0.0.0.0:*  LISTEN  1001  9857  3005/java
tcp   0  0 0.0.0.0:49900   0.0.0.0:*  LISTEN  1001  9037  2785/java
tcp   0  0 0.0.0.0:50030   0.0.0.0:*  LISTEN  1001  9773  2857/java
hduser@ubuntu:~$

 

停止单节点集群:

 

hduser@ubuntu:/usr/local/hadoop$ bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hduser@ubuntu:/usr/local/hadoop$