hadoop 第二节 单节点集群配置 Setting up a Single Node Cluster

一. 单机模式:Local (Standalone) Mode

直接使用hadoop命令执行jar包,以下是官方例子:

root@ubuntu:~/hadoop/output# cd $HADOOP_HOME
root@ubuntu:~/hadoop# mkdir input
root@ubuntu:~/hadoop# cp etc/hadoop/*.xml input
root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dsf[a-z.]+'
root@ubuntu:~/hadoop# cat output/*

二.伪分布式模式:Pseudo-Distributed Mode

1. ssh设置免密登录

root@ubuntu:~# ssh localhost

2. 如果提示输入密码,需要进行导入公钥

root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:9xUa1hH5XJJQYr3G5AU5rapYcYXNICK8hIgshfI/SWQ root@ubuntu
The key's randomart image is:
+---[DSA 1024]----+
|.+.. o. . . =O=B |
|=.. E o. . o.o%.+|
|o. o . .    o=oO.|
|  . . .   ...o*.o|
|   o .  S .o.o.  |
|    +    .....   |
|     .   o ..    |
|        . .      |
|                 |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized.keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized.keys 
root@ubuntu:~# ssh localhost 
root@localhost's password: 
PS:按照官方文档,并不能无密登录,换成RSA就能成功了(表示疑惑,两者感觉并无不同啊??)
root@ubuntu:~# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:bc4qQR2NMHsGVL6NzNSJ5ycuUBiqXtS9jB1BQJGXxsM root@ubuntu
The key's randomart image is:
+---[RSA 2048]----+
|     oXX++       |
|     o.OE+..     |
|    o ++Xo+      |
|   o  [email protected]       |
|  . ..o S * .    |
| . .  .. = o     |
|  .    .. +      |
|      .  o       |
|       ..        |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized_keys 
root@ubuntu:~# ssh localhost 
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-21-generic x86_64)

 * Documentation:  https://help.ubuntu.com/
Last login: Mon Jun 27 16:57:53 2016 from 192.168.80.1

3. 格式化文件系统

root@ubuntu:~# hdfs namenode -format

PS:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml根据文档配置说明,hadoop.tmp.dir 的默认值是/tmp/hadoop-${user.name},如果重启系统后,文件系统会被删除,所以避免这种情况可以更改此配置文件etc/hadoop/core-sites.xml,加入以下配置:

<configuration>
  <property>
    <name>hadoop.tmp.dirname>
    <value>/data/hadoop-${user.name}value>
  property>
configuration>

4. 启动NameNode和DataNode守护进程

root@ubuntu:~# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.

ERROR1:提示JAVA_HOME找不到,但是前文已经配置了。
解决问题步骤如下:
1) 先搜索错误提示经查证,发现问题出在hadoop-config.sh脚本。

root@ubuntu:~/hadoop# grep -R "JAVA_HOME is not set and could not be found" .
./libexec/hadoop-config.sh:    echo "Error: JAVA_HOME is not set and could not be found." 1>&2

2)hadoop-config.sh中的$JAVA_HOME变量又是执行etc/hadoop/hadoop-env.sh脚本export的,将${JAVA_HOME}改成绝对路径。

# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/root/jdk
root@ubuntu:~/hadoop# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out
0.0.0.0: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:471)
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:461)
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:454)
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:229)
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)

ERROR2:再次启动,发生文件系统URI无效
解决问题步骤如下:

root@ubuntu:~/hadoop# vi etc/hadoop/core-site.xml 
<configuration>
  <property>
    <name>hadoop.tmp.dirname>
    <value>/data/hadoop-${user.name}value>
  property>
  
  <property>
    <name>fs.defaultFSname>
    <value>hdfs://localhost:9000value>
  property>
configuration>

5. 启动成功

root@ubuntu:~/hadoop# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out

6. 通过web接口查看NameNode:http://localhost:50070/

7.创建HDFS目录

root@ubuntu:~/hadoop# hdfs dfs -mkdir /user
root@ubuntu:~/hadoop# hdfs dfs -mkdir /user/root

8.把文件put到dfs中

root@ubuntu:~/hadoop# hdfs dfs -put etc/hadoop input
root@ubuntu:~/hadoop# hdfs dfs -ls
Found 1 items
drwxr-xr-x   - root supergroup          0 2016-06-28 09:44 input

9.用hadoop提供的examples运行mapreduce,做一个检查的搜索字符串

root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

10.查看运行结果

1)可以直接在dfs中查看

root@ubuntu:~/hadoop# hdfs dfs -cat output/*
6       dfs.audit.logger
4       dfs.class
3       dfs.server.namenode.
2       dfs.period
2       dfs.audit.log.maxfilesize
2       dfs.audit.log.maxbackupindex
1       dfsmetrics.log
1       dfsadmin
1       dfs.servers
1       dfs.file

2)也可以先从dfs中拿到本地目录,再查看

root@ubuntu:~/hadoop# hdfs dfs -get output output
root@ubuntu:~/hadoop# cat output/*
6       dfs.audit.logger
4       dfs.class
3       dfs.server.namenode.
2       dfs.period
2       dfs.audit.log.maxfilesize
2       dfs.audit.log.maxbackupindex
1       dfsmetrics.log
1       dfsadmin
1       dfs.servers
1       dfs.file

总结

严格按照文档操作,仍会出现一些意想不到的事情。类似上文中的第4点,启动NameNode和DataNode守护进程失败,遇到这类情况,如果有报错信息,可以按图索骥,一步一步检查修正。如果不能自己解决,网上也有许多资料可供参考和答疑的。
TOO ME,KEEP GOING,JUST DO IT!!!

你可能感兴趣的:(hadoop)