前天安装了一个HADOOP,我直接使用了别人重新编译的包,所以安装很顺利。这里把步骤抄下。
HADOOP只是一个平台,更有挑战性的是怎样用好。
怎样使用HADOOP平台下的MAP-REDUCE 来做大数据处理。任重道远!!! 加油!
HADOOP2.5.0_64安装日志
=============================================================================
1.部署
使用了3台虚拟机做试验
NameNode
SecondaryNameNode DataNodes
---------------------------------
192.168.2.9 192.168.2.8
192.168.2.11
1.2 hostname
192.168.2.8 ts1
192.168.2.9 ts2
192.168.2.11 ts3
这里涉及到修改/etc/hosts 及/etc/sysconfig/network
简单不再累述
1.3 用户:
/usr/sbin/useradd -g grid
1.4 profile
-------------------------
1.4.1 GRID .bash_profile
$ cat .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
umask 022
stty erase ^h #删除键可用处理
export HADOOP_HOME=/opt/hadoop-2.5.0
export PATH=$HADOOP_HOME/bin:$PATH
PATH=$PATH:$HOME/bin
export PATH
export HADOOP_PREFIX=/opt/hadoop-2.5.0
export PATH=$PATH:$HADOOP_PREFIX/bin
export PATH=$PATH:$HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOMD=${HADOOP_PREFIX}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export YARN_HOME=${HADOOP_PREFIX}
export HADOOP_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export HDFS_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_PREFIX}/etc/hadoop
#这里加了一些目录,因为在后面安装时,一些目录无法找到,所以直接加上了,比如:
HADOOP_HDFS_HOME
1.4.2 /etc/profile
-------------------------
/etc/profile 也添加了以下内容:
export JAVA_HOME=/usr/java/jdk1.7.0
export HADOOP_PREFIX=/opt/hadoop-2.5.0
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/bin:$HADOOP_PREFIX/bin:$PATH"
1.4 SSH 配置信任关系
这里只要注意一点:文件权限不要错误,就不会有错
[grid@ts1 ~]$ ll .ssh
总用量 28
-rw-r--r--. 1 grid grid 4944 10月 15 18:07 authorized_keys
-rw-------. 1 grid grid 668 10月 15 10:01 id_dsa
-rw-r--r--. 1 grid grid 598 10月 15 10:01 id_dsa.pub
-rw-------. 1 grid grid 1679 10月 15 10:01 id_rsa
-rw-r--r--. 1 grid grid 390 10月 15 10:01 id_rsa.pub
-rw-r--r--. 1 grid grid 1192 10月 15 18:08 known_hosts
设置SSH,
1).在主节点ts1上以grid,用户身份生成用户的公匙和私匙
# su - grid
$ mkdir ~/.ssh
$ ssh-keygen -t rsa
$ ssh-keygen -t dsa
2).在副节点ts2上执行相同的操作,确保通信无阻
# ping ts1
# ping ts3
# su - grid
$ mkdir ~/.ssh
$ ssh-keygen -t rsa
$ ssh-keygen -t dsa
3).在主节点ts1上grid用户执行以下操作
$ touch ~/.ssh/authorized_keys
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#将ts2的加入
$ ssh ts2 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ ssh ts2 cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ scp ~/.ssh/authorized_keys ts2:~/.ssh/authorized_keys
ts3节点同ts2
4).主节点RAC1上执行检验操作
$ ssh ts1 date
$ ssh ts2 date
$ ssh ts3 date
5).在副节点ts2上执行检验操作
$ ssh ts1 date
$ ssh ts2 date
1.5 下载jdk,hadoop
jdk-7-linux-x64.tar.gz
http://download.oracle.com/otn-pub/java/jdk/7/jdk-7-linux-x64.tar.gz -O jdk-7-linux-x64.tar.gz
hadoop-2.5.0-linux64-aboutyun.tar.gz
链接:http://pan.baidu.com/s/1i3BpmIx 密码:4ldc
我直接找了别人编译好后的文件直接下载安装。
出处:http://www.douban.com/note/393721422/
jdk 目录如下:
/usr/java/jdk1.7.0
(直接拷贝过去,再修改软连接
rm -r lastest
ln -s /usr/java/jdk1.7.0 lastest )
hadoop的安装目录为:
/opt/hadoop-2.5.0
1.6 修改配置文件
涉及到的配置文件有7个:
/opt/hadoop-2.5.0/etc/hadoop/hadoop-env.sh
/opt/hadoop-2.5.0/etc/hadoop/yarn-env.sh
/opt/hadoop-2.5.0/etc/hadoop/slaves
/opt/hadoop-2.5.0/etc/hadoop/core-site.xml
/opt/hadoop-2.5.0/etc/hadoop/hdfs-site.xml
/opt/hadoop-2.5.0/etc/hadoop/mapred-site.xml
/opt/hadoop-2.5.0/etc/hadoop/yarn-site.xml
以上个别文件默认丌存在的,可以复制相应的template文件获得。
1.6.1 /opt/hadoop-2.5.0/etc/hadoop/hadoop-env.sh
#修改了一个JAVA环境目录
export JAVA_HOME=/usr/java/jdk1.7.0
#添加的内容如下
export HADOOP_FREFIX=/opt/hadoop-2.5.0
export HADOOP_COMMON_HOME=${HADOOP_FREFIX}
export HADOOP_HDFS_HOME=${HADOOP_FREFIX}
export PATH=$PATH:$HADOOP_FREFIX/bin
export PATH=$PATH:$HADOOP_FREFIX/sbin
export HADOOP_MAPRED_HOME=${HADOOP_FREFIX}
export YARN_HOME=${HADOOP_FREFIX}
export HADOOP_CONF_HOME=${HADOOP_FREFIX}/etc/hadoop
export YARN_CONF_DIR=${HADOOP_FREFIX}/etc/hadoop
---------------------------------------------------
1.6.2 /opt/hadoop-2.5.0/etc/hadoop/yarn-env.sh
#修改了一个JAVA环境目录
export JAVA_HOME=/usr/java/jdk1.7.0
---------------------------------------------------
1.6.3 /opt/hadoop-2.5.0/etc/hadoop/slaves
#把两个slaves 节点hostname 加上(1.0版本还有一个master的配置,2.0没了)
ts1
ts3
---------------------------------------------------
1.6.3 /opt/hadoop-2.5.0/etc/hadoop/core-site.xml
#修改配置如下:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://ts2:9000</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/opt/hadoop-2.5.0/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>hadoop.proxyuser.hduser.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hduser.groups</name>
<value>*</value>
</property>
</configuration>
---------------------------------------------------
1.6.4 /opt/hadoop-2.5.0/etc/hadoop/mapred-site.xml
#mapreduce 的配置,主节点、端口
#修改配置如下
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>ts2:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>ts2:19888</value>
</property>
</configuration>
---------------------------------------------------
1.6.5 /opt/hadoop-2.5.0/etc/hadoop/yarn-site.xml
#yarn 的配置,主节点、端口
#修改配置如下
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>ts2:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>ts2:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>ts2:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>ts2:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>ts2:8088</value>
</property>
</configuration>
1.7 分发文件
配置完文件后,把/opt/hadoop-2.5.0 分别拷贝到其它两个节点中
scp -r /opt/hadoop-2.5.0 ts2:/opt
scp -r /opt/hadoop-2.5.0 ts3:/opt
1.8 测试启动
1.8.1 主节点初始化:
[grid@ts2 ~]$ hdfs namenode -format
然后依次执行start-dfs.sh
再执行start-yarn.sh
也可以简单粗暴的直接start-all.sh
然后jps命令就可以查看到hadoop的运行状态了
[grid@ts2 ~]$ jps
39726 NameNode
39878 SecondaryNameNode
40735 ResourceManager
53234 Jps
[grid@ts2 ~]$
hadoop dfsadmin -report 查看状态
web界面 master:50070
如果能打开,说明你已完成安装,可以看到东西了。
[grid@ts2 ~]$ hadoop dfsadmin -report
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
14/10/17 11:34:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Configured Capacity: 171214802944 (159.46 GB)
Present Capacity: 146903875584 (136.81 GB)
DFS Remaining: 146903826432 (136.81 GB)
DFS Used: 49152 (48 KB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
-------------------------------------------------
Live datanodes (2):
Name: 192.168.2.11:50010 (ts3)
Hostname: ts3
Decommission Status : Normal
Configured Capacity: 128940085248 (120.08 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 14019387392 (13.06 GB)
DFS Remaining: 114920673280 (107.03 GB)
DFS Used%: 0.00%
DFS Remaining%: 89.13%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Oct 17 11:35:00 CST 2014
Name: 192.168.2.8:50010 (ts1)
Hostname: ts1
Decommission Status : Normal
Configured Capacity: 42274717696 (39.37 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 10291539968 (9.58 GB)
DFS Remaining: 31983153152 (29.79 GB)
DFS Used%: 0.00%
DFS Remaining%: 75.66%
Configured Cache Capacity: 0 (0 B)
Cache Used: 0 (0 B)
Cache Remaining: 0 (0 B)
Cache Used%: 100.00%
Cache Remaining%: 0.00%
Xceivers: 1
Last contact: Fri Oct 17 11:34:59 CST 2014