Hadoop 1.x版本伪单机配置


声明本机环境:

test@test-VirtualBox:~/hadoop-1.2.1$ uname -a
Linux test-VirtualBox 3.11.0-20-generic #34~precise1-Ubuntu SMP Thu Apr 3 17:26:42 UTC 2014 i686 i686 i386 GNU/Linux

由于Hadoop是一个JAVA项目,必须一个JVM环境,这个请大家翻阅我的另外一篇配置JAVA环境的博文。

现在,假设JAVA环境已经配置好来,让我们来开始配置Hadoop的环境吧。


我现在采用的是hadoop-1.2.1,之前我采用的Hadoop-2.3版本。其实呢,大家也心知肚明,这个是两个版本的问题,之前我2.3的配置我也写了一篇博文,大家有空可以去看看。

我呢,说一个很显示的问题,1.X版本比2.X的版本入手要简单一点,2.X里面没有我们做测试的一些必须核心包,另外就是2.X的插件现在虽然有人编译出来,但是据说错误还是蛮多的,1.X避免了这些问题。


首先,我们需要解压,将文件目录拷到用户根目录。(个人喜好)

test@test-VirtualBox:~/Documents/archive$ tar -xvf hadoop-1.2.1.tar.gz 

test@test-VirtualBox:~/Documents/archive$ cp -r hadoop-1.2.1 /home/test/

test@test-VirtualBox:~$ chmod 755 -R hadoop-1.2.1/


现在开始配置大计了。

配置的文件在 hadoop_home/conf下面:

test@test-VirtualBox:~/hadoop-1.2.1/conf$ ls -al
total 88
drwxrwxr-x  2 test test 4096 Jan 13 19:00 .
drwxrwxr-x 16 test test 4096 Jan 13 19:38 ..
-rwxrwxr-x  1 test test 7457 Jan 13 19:00 capacity-scheduler.xml
-rwxrwxr-x  1 test test 1095 Jan 13 19:00 configuration.xsl
-rwxrwxr-x  1 test test  374 Jan 13 20:40 core-site.xml
-rwxrwxr-x  1 test test  327 Jan 13 19:00 fair-scheduler.xml
-rwxrwxr-x  1 test test 2476 Jan 13 19:19 hadoop-env.sh
-rwxrwxr-x  1 test test 2052 Jan 13 19:00 hadoop-metrics2.properties
-rwxrwxr-x  1 test test 4644 Jan 13 19:00 hadoop-policy.xml
-rwxrwxr-x  1 test test  246 Jan 13 20:13 hdfs-site.xml
-rwxrwxr-x  1 test test 5018 Jan 13 19:00 log4j.properties
-rwxrwxr-x  1 test test 2033 Jan 13 19:00 mapred-queue-acls.xml
-rwxrwxr-x  1 test test  262 Jan 13 20:28 mapred-site.xml
-rwxrwxr-x  1 test test   10 Jan 13 19:00 masters
-rwxrwxr-x  1 test test   10 Jan 13 19:00 slaves
-rwxrwxr-x  1 test test 2042 Jan 13 19:00 ssl-client.xml.example
-rwxrwxr-x  1 test test 1994 Jan 13 19:00 ssl-server.xml.example
-rwxrwxr-x  1 test test  382 Jan 13 19:00 taskcontroller.cfg
-rwxrwxr-x  1 test test 3890 Jan 13 19:00 task-log4j.properties


我们要修改的有以下文件。

hadoop-env.sh

我们追加JAVA-HOME。

test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hadoop-env.sh 
# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

export JAVA_HOME=/home/test/jdk1.7.0_45

# Extra Java CLASSPATH elements.  Optional.
# export HADOOP_CLASSPATH=

# The maximum amount of heap to use, in MB. Default is 1000.
# export HADOOP_HEAPSIZE=2000

# Extra Java runtime options.  Empty by default.
# export HADOOP_OPTS=-server

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_NAMENODE_OPTS"
export HADOOP_SECONDARYNAMENODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_SECONDARYNAMENODE_OPTS"
export HADOOP_DATANODE_OPTS="-Dcom.sun.management.jmxremote $HADOOP_DATANODE_OPTS"
export HADOOP_BALANCER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_BALANCER_OPTS"
export HADOOP_JOBTRACKER_OPTS="-Dcom.sun.management.jmxremote $HADOOP_JOBTRACKER_OPTS"
# export HADOOP_TASKTRACKER_OPTS=
# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
# export HADOOP_CLIENT_OPTS

# Extra ssh options.  Empty by default.
# export HADOOP_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HADOOP_CONF_DIR"

# Where log files are stored.  $HADOOP_HOME/logs by default.
# export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

# File naming remote slave hosts.  $HADOOP_HOME/conf/slaves by default.
# export HADOOP_SLAVES=${HADOOP_HOME}/conf/slaves

# host:path where hadoop code should be rsync'd from.  Unset by default.
# export HADOOP_MASTER=master:/home/$USER/src/hadoop

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HADOOP_SLAVE_SLEEP=0.1

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
#       the users that are going to run the hadoop daemons.  Otherwise there is
#       the potential for a symlink attack.
# export HADOOP_PID_DIR=/var/hadoop/pids

# A string representing this instance of hadoop. $USER by default.
# export HADOOP_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HADOOP_NICENESS=10



core-site.xml

test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/test/hadoop_tmp</value>
</property>
</configuration>



hdfs-site.xml

test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat core-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/test/hadoop_tmp/hadoop_${user.name}</value>
</property>
</configuration>
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>


mapred-site.xml

test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat hdfs-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
test@test-VirtualBox:~/hadoop-1.2.1/conf$ cat mapred-site.xml 
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
</configuration>


还有就是本机环境变量:

test@test-VirtualBox:~$ cat .profile 
# ~/.profile: executed by the command interpreter for login shells.
# This file is not read by bash(1), if ~/.bash_profile or ~/.bash_login
# exists.
# see /usr/share/doc/bash/examples/startup-files for examples.
# the files are located in the bash-doc package.

# the default umask is set in /etc/profile; for setting the umask
# for ssh logins, install and configure the libpam-umask package.
#umask 022

# if running bash
if [ -n "$BASH_VERSION" ]; then
    # include .bashrc if it exists
    if [ -f "$HOME/.bashrc" ]; then
	. "$HOME/.bashrc"
    fi
fi

# set PATH so it includes user's private bin if it exists
if [ -d "$HOME/bin" ] ; then
    PATH="$HOME/bin:$PATH"
fi


#java home
export JAVA_HOME=/home/test/jdk1.7.0_45
export PATH=.:$JAVA_HOME/bin:$PATH

#hadoop home
export HADOOP_HOME=/home/test/hadoop-1.2.1
export PATH=$PATH:$HADOOP_HOME/bin


因为,Hadoop启动,是要输入当前用户的密码的,每次都要输入,也挺烦的,我希望免登录。

test@test-VirtualBox:~$ ssh-keygen 

test@test-VirtualBox:~$ ssh-copy-id -i ~/.ssh/id_rsa.pub test@localhost


现在配置都完成了,可以初始化namenode,启动Hadoop服务了。

启动是在hadoop_home/bin中。

test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./hadoop namenode -format

test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./start-all.sh
Warning: $HADOOP_HOME is deprecated.

starting namenode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-namenode-test-VirtualBox.out
localhost: starting datanode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-datanode-test-VirtualBox.out
localhost: starting secondarynamenode, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-secondarynamenode-test-VirtualBox.out
starting jobtracker, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-jobtracker-test-VirtualBox.out
localhost: starting tasktracker, logging to /home/test/hadoop-1.2.1/libexec/../logs/hadoop-test-tasktracker-test-VirtualBox.out


测试页面:

Namenode : http://localhost:50070/dfshealth.jsp

Datanode:http://localhost:50030/jobtracker.jsp


不要忘记可以关闭Hadoop。

test@test-VirtualBox:~/hadoop-1.2.1/bin$ ./stop-all.sh



你可能感兴趣的:(java,hadoop,大数据)