直接使用hadoop命令执行jar包,以下是官方例子:
root@ubuntu:~/hadoop/output# cd $HADOOP_HOME
root@ubuntu:~/hadoop# mkdir input
root@ubuntu:~/hadoop# cp etc/hadoop/*.xml input
root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dsf[a-z.]+'
root@ubuntu:~/hadoop# cat output/*
root@ubuntu:~# ssh localhost
root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
Generating public/private dsa key pair.
Your identification has been saved in /root/.ssh/id_dsa.
Your public key has been saved in /root/.ssh/id_dsa.pub.
The key fingerprint is:
SHA256:9xUa1hH5XJJQYr3G5AU5rapYcYXNICK8hIgshfI/SWQ root@ubuntu
The key's randomart image is:
+---[DSA 1024]----+
|.+.. o. . . =O=B |
|=.. E o. . o.o%.+|
|o. o . . o=oO.|
| . . . ...o*.o|
| o . S .o.o. |
| + ..... |
| . o .. |
| . . |
| |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized.keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized.keys
root@ubuntu:~# ssh localhost
root@localhost's password:
root@ubuntu:~# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:bc4qQR2NMHsGVL6NzNSJ5ycuUBiqXtS9jB1BQJGXxsM root@ubuntu
The key's randomart image is:
+---[RSA 2048]----+
| oXX++ |
| o.OE+.. |
| o ++Xo+ |
| o [email protected] |
| . ..o S * . |
| . . .. = o |
| . .. + |
| . o |
| .. |
+----[SHA256]-----+
root@ubuntu:~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
root@ubuntu:~# chmod 0600 ~/.ssh/authorized_keys
root@ubuntu:~# ssh localhost
Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-21-generic x86_64)
* Documentation: https://help.ubuntu.com/
Last login: Mon Jun 27 16:57:53 2016 from 192.168.80.1
root@ubuntu:~# hdfs namenode -format
PS:http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml根据文档配置说明,hadoop.tmp.dir 的默认值是/tmp/hadoop-${user.name},如果重启系统后,文件系统会被删除,所以避免这种情况可以更改此配置文件etc/hadoop/core-sites.xml,加入以下配置:
<configuration>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoop-${user.name}value>
property>
configuration>
root@ubuntu:~# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: Error: JAVA_HOME is not set and could not be found.
localhost: Error: JAVA_HOME is not set and could not be found.
Starting secondary namenodes [0.0.0.0]
0.0.0.0: Error: JAVA_HOME is not set and could not be found.
ERROR1:提示JAVA_HOME找不到,但是前文已经配置了。
解决问题步骤如下:
1) 先搜索错误提示经查证,发现问题出在hadoop-config.sh脚本。
root@ubuntu:~/hadoop# grep -R "JAVA_HOME is not set and could not be found" .
./libexec/hadoop-config.sh: echo "Error: JAVA_HOME is not set and could not be found." 1>&2
2)hadoop-config.sh中的$JAVA_HOME变量又是执行etc/hadoop/hadoop-env.sh脚本export的,将${JAVA_HOME}改成绝对路径。
# The java implementation to use.
#export JAVA_HOME=${JAVA_HOME}
export JAVA_HOME=/root/jdk
root@ubuntu:~/hadoop# start-dfs.sh
Incorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.
Starting namenodes on []
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out
0.0.0.0: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:471)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:461)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:454)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:229)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.(SecondaryNameNode.java:192)
0.0.0.0: at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)
ERROR2:再次启动,发生文件系统URI无效
解决问题步骤如下:
root@ubuntu:~/hadoop# vi etc/hadoop/core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dirname>
<value>/data/hadoop-${user.name}value>
property>
<property>
<name>fs.defaultFSname>
<value>hdfs://localhost:9000value>
property>
configuration>
root@ubuntu:~/hadoop# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.out
localhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out
root@ubuntu:~/hadoop# hdfs dfs -mkdir /user
root@ubuntu:~/hadoop# hdfs dfs -mkdir /user/root
root@ubuntu:~/hadoop# hdfs dfs -put etc/hadoop input
root@ubuntu:~/hadoop# hdfs dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-06-28 09:44 input
root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
1)可以直接在dfs中查看
root@ubuntu:~/hadoop# hdfs dfs -cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.file
2)也可以先从dfs中拿到本地目录,再查看
root@ubuntu:~/hadoop# hdfs dfs -get output output
root@ubuntu:~/hadoop# cat output/*
6 dfs.audit.logger
4 dfs.class
3 dfs.server.namenode.
2 dfs.period
2 dfs.audit.log.maxfilesize
2 dfs.audit.log.maxbackupindex
1 dfsmetrics.log
1 dfsadmin
1 dfs.servers
1 dfs.file
严格按照文档操作,仍会出现一些意想不到的事情。类似上文中的第4点,启动NameNode和DataNode守护进程失败,遇到这类情况,如果有报错信息,可以按图索骥,一步一步检查修正。如果不能自己解决,网上也有许多资料可供参考和答疑的。
TOO ME,KEEP GOING,JUST DO IT!!!