声明本机环境:
test@test-VirtualBox:~/hadoop-1.2.1$ uname -a Linux test-VirtualBox 3.11.0-20-generic #34~precise1-Ubuntu SMP Thu Apr 3 17:26:42 UTC 2014 i686 i686 i386 GNU/Linux
由于Hadoop是一个JAVA项目,必须一个JVM环境,这个请大家翻阅我的另外一篇配置JAVA环境的博文。
现在,假设JAVA环境已经配置好来,让我们来开始配置Hadoop的环境吧。
我现在采用的是hadoop-1.2.1,之前我采用的Hadoop-2.3版本。其实呢,大家也心知肚明,这个是两个版本的问题,之前我2.3的配置我也写了一篇博文,大家有空可以去看看。
我呢,说一个很显示的问题,1.X版本比2.X的版本入手要简单一点,2.X里面没有我们做测试的一些必须核心包,另外就是2.X的插件现在虽然有人编译出来,但是据说错误还是蛮多的,1.X避免了这些问题。
首先,我们需要解压,将文件目录拷到用户根目录。(个人喜好)
test@test-VirtualBox:~/Documents/archive$ tar -xvf hadoop-1.2.1.tar.gz test@test-VirtualBox:~/Documents/archive$ cp -r hadoop-1.2.1 /home/test/ test@test-VirtualBox:~$ chmod 755 -R hadoop-1.2.1/
现在开始配置大计了。
配置的文件在 hadoop_home/conf下面:
test@test-VirtualBox:~/hadoop-1.2.1/conf$ ls -al total 88 drwxrwxr-x 2 test test 4096 Jan 13 19:00 . drwxrwxr-x 16 test test 4096 Jan 13 19:38 .. -rwxrwxr-x 1 test test 7457 Jan 13 19:00 capacity-scheduler.xml -rwxrwxr-x 1 test test 1095 Jan 13 19:00 configuration.xsl -rwxrwxr-x 1 test test 374 Jan 13 20:40 core-site.xml -rwxrwxr-x 1 test test 327 Jan 13 19:00 fair-scheduler.xml -rwxrwxr-x 1 test test 2476 Jan 13 19:19 hadoop-env.sh -rwxrwxr-x 1 test test 2052 Jan 13 19:00 hadoop-metrics2.properties -rwxrwxr-x 1 test test 4644 Jan 13 19:00 hadoop-policy.xml -rwxrwxr-x 1 test test 246 Jan 13 20:13 hdfs-site.xml -rwxrwxr-x 1 test test 5018 Jan 13 19:00 log4j.properties -rwxrwxr-x 1 test test 2033 Jan 13 19:00 mapred-queue-acls.xml -rwxrwxr-x 1 test test 262 Jan 13 20:28 mapred-site.xml -rwxrwxr-x 1 test test 10 Jan 13 19:00 masters -rwxrwxr-x 1 test test 10 Jan 13 19:00 slaves -rwxrwxr-x 1 test test 2042 Jan 13 19:00 ssl-client.xml.example -rwxrwxr-x 1 test test 1994 Jan 13 19:00 ssl-server.xml.example -rwxrwxr-x 1 test test 382 Jan 13 19:00 taskcontroller.cfg -rwxrwxr-x 1 test test 3890 Jan 13 19:00 task-log4j.properties
我们要修改的有以下文件。
hadoop-env.sh
我们追加JAVA-HOME。
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hadoop-env.sh # Set Hadoop-specific environment variables here. # The only required environment variable is JAVA_HOME. All others are # optional. When running a distributed configuration it is best to # set JAVA_HOME in this file, so that it is correctly defined on # remote nodes. # The java implementation to use. Required. # export JAVA_HOME=/usr/lib/j2sdk1.5-sun export JAVA_HOME=/home/test/jdk1.7.0_45 # Extra Java CLASSPATH elements. Optional. # export HADOOP_CLASSPATH= # The maximum amount of heap to use, in MB. Default is 1000. # export HADOOP_HEAPSIZE=2000 # Extra Java runtime options. Empty by default. # export HADOOP_OPTS=-server # Command specific options appended to HADOOP_OPTS when specified export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS" export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS" export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS" export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS" export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS" # export HADOOP_TASKTRACKER_OPTS= # The following applies to multiple commands (fs, dfs, fsck, distcp etc) # export HADOOP_CLIENT_OPTS # Extra ssh options. Empty by default. # export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR" # Where log files are stored. $HADOOP_HOME/logs by default. # export HADOOP_LOG_DIR=${HADOOP_HOME}/logs # File naming remote slave hosts. $HADOOP_HOME/conf/slaves by default. # export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves # host:path where hadoop code should be rsync'd from. Unset by default. # export HADOOP_MASTER=master:/home/$USER/src/hadoop # Seconds to sleep between slave commands. Unset by default. This # can be useful in large clusters, where, e.g., slave rsyncs can # otherwise arrive faster than the master can service them. # export HADOOP_SLAVE_SLEEP=0.1 # The directory where pid files are stored. /tmp by default. # NOTE: this should be set to a directory that can only be written to by # the users that are going to run the hadoop daemons. Otherwise there is # the potential for a symlink attack. # export HADOOP_PID_DIR=/var/hadoop/pids # A string representing this instance of hadoop. $USER by default. # export HADOOP_IDENT_STRING=$USER # The scheduling priority for daemon processes. See 'man nice'. # export HADOOP_NICENESS=10
core-site.xml
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/test/hadoop_tmp</value> </property> </configuration>
hdfs-site.xml
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat core-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/test/hadoop_tmp/hadoop_${user.name}</value> </property> </configuration> test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
mapred-site.xml
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hdfs-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration> test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
还有就是本机环境变量:
test@test-VirtualBox:~$ cat .profile # ~/.profile: executed by the command interpreter for login shells. # This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login # exists. # see /usr/share/doc/bash/examples/startup-files for examples. # the files are located in the bash-doc package. # the default umask is set in /etc/profile; for setting the umask # for ssh logins, install and configure the libpam-umask package. #umask 022 # if running bash if [ -n "$BASH_VERSION" ]; then # include .bashrc if it exists if [ -f "$HOME/.bashrc" ]; then . "$HOME/.bashrc" fi fi # set PATH so it includes user's private bin if it exists if [ -d "$HOME/bin" ] ; then PATH="$HOME/bin:$PATH" fi #java home export JAVA_HOME=/home/test/jdk1.7.0_45 export PATH=.:$JAVA_HOME/bin:$PATH #hadoop home export HADOOP_HOME=/home/test/hadoop-1.2.1 export PATH=$PATH:$HADOOP_HOME/bin
因为,Hadoop启动,是要输入当前用户的密码的,每次都要输入,也挺烦的,我希望免登录。
test@test-VirtualBox:~$ ssh-keygen test@test-VirtualBox:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub test@localhost
现在配置都完成了,可以初始化namenode,启动Hadoop服务了。
启动是在hadoop_home/bin中。
test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./hadoop namenode -format test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./start-all.sh Warning: $HADOOP_HOME is deprecated. starting namenode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-namenode-test-VirtualBox.out localhost: starting datanode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-datanode-test-VirtualBox.out localhost: starting secondarynamenode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-secondarynamenode-test-VirtualBox.out starting jobtracker, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-jobtracker-test-VirtualBox.out localhost: starting tasktracker, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-tasktracker-test-VirtualBox.out
测试页面:
Namenode : http://localhost:50070/dfshealth.jsp
Datanode:http://localhost:50030/jobtracker.jsp
不要忘记可以关闭Hadoop。
test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./stop-all.sh