1.在服务器上创建Hadoop账号。
以root用户登陆服务器,创建名为Hadoop的新用户并授予最高权限。
# useradd -m hadoop -s /bin/bash # 创建新用户Hadoop
# passwd hadoop #给hadoop用户设置新密码,连续输入两次新密码
# visudo #给hadoop用户赋予最高权限
执行visudo后,下滑找到root ALL=(ALL) ALL 这行,然后按i进入编辑模式,添加如下代码
hadoop ALL=(ALL) ALL #给hadoop用户授予最高权限
添加完成后按shift+冒号wq回车保存退出。
至此hadoop用户添加成功,su hadoop输入密码后切换为hadoop用户。
2.JDK的安装。
进入hadoop用户,输入rpm -qa | grep jdk检验是否自带安装了jdk,阿里云一般是自带1.7.0版本的,如果没有,需要手动安装:
sudo yum install java-1.8.0-openjdk* -y #用yum安装1.8.0版本的openjdk
openjdk与jdk的区别
执行命令后会自动安装成功,默认安装目录为/usr/lib/jvm/java-1.8.0-openjdk,接着配环境变量
vim ~/.bashrc
在最后面添加一行:
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
编辑添加完后,保存退出。
然后执行:
source ~/.bashrc
让环境变量生效。
接着我们执行java -version命令检查是否成功,可怜的楼主因为少打了一个r错了一下午.....
$ java -version
openjdk version "1.8.0_201"
OpenJDK Runtime Environment (build 1.8.0_201-b09)
OpenJDK 64-Bit Server VM (build 25.201-b09, mixed mode)
出现版本信息证明成功,1.7版本的如果出错可以试试
java -V
java --version
两个都试一下,忘记是在哪看到的1.7版本有时无法识别-version命令。所以楼主一度甩锅jdk版本....
3.在服务器上配置SSH无密码登陆。
rpm -qa | grep ssh #检查是否带有ssh-clients和ssh-server
openssh-clients-6.6.1p1-35.el7_3.x86_64
openssh-server-6.6.1p1-35.el7_3.x86_64
如果含有这两项,则不用安装秒如果没有,则需要用yum命令安装:
sudo yum –y install openssh-clients
sudo yum –y install openssh-server
测试是否可用
ssh localhost
第一次测试会出现
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is 1b:3f:21:37:4d:66:93:77:66:02:55:f9:d2:1a:d0:d2.
Are you sure you want to continue connecting (yes/no)?
输入yes,再按提示输入hadoop用户的登录密码,这样就实现登录到本机了
接下来输入:
$ exit # 退出刚才的 ssh localhost
$ cd ~/.ssh/ # 若没有该目录,请先执行一次ssh localhost
$ ssh-keygen -t rsa # 会有提示,都按回车就可以
$ cat id_rsa.pub >> authorized_keys # 加入授权
$ chmod 600 ./authorized_keys # 修改文件权限
4.单机版hadoop的安装和配置
楼主这边根据老师的推荐使用Hadoop2.6.5 ,提取码:18cg,下载好后上传至服务器后:
$ sudo tar -zxf ~/路径/hadoop-2.6.5-tar.gz -C /usr/local # 解压
$ cd /usr/local/
$ sudo mv ./hadoop-2.6.5/ ./hadoop # 将文件夹名改为hadoop
$ sudo chown -R hadoop:hadoop ./hadoop # 修改文件权限
代码中路径为你上传压缩包的路径,Hadoop 解压后即可使用。
$ cd /usr/local/hadoop2.6.5 #进入hadoop文件夹
[hadoop@aliyunhost hadoop2.6.5]$ ./bin/hadoop version #检查hadoop能否正常使用
成功会显示hadoop版本信息。
若出现
[hadoop@aliyunhost hadoop]$ ./bin/hadoop: line xxx: /usr/lib/jvm/java-1.7.0-openjdk/bin/java: No such file or directory
是因为环境变量的错误,楼主这边就是因为之前的1.8.0安装后没改环境变量导致报错。
接下来是hadoop的配置:
[hadoop@aliyunhost hadoop]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.5.jar
可以看到所有例子,包括 wordcount、terasort、join、grep 等,如下
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
grep: A map/reduce program that counts the matches of a regex in the input.
join: A job that effects a join over sorted, equally partitioned datasets
multifilewc: A job that counts words from several files.
pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
randomwriter: A map/reduce program that writes 10GB of random data per node.
secondarysort: An example defining a secondary sort to the reduce.
sort: A map/reduce program that sorts the data written by the random writer.
sudoku: A sudoku solver.
teragen: Generate data for the terasort
terasort: Run the terasort
teravalidate: Checking results of terasort
wordcount: A map/reduce program that counts the words in the input files.
wordmean: A map/reduce program that counts the average length of the words in the input files.
wordmedian: A map/reduce program that counts the median length of the words in the input files.
wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.
可能会出现Not a valid JAR:xxxx的错误,这时候就要主要下载的hadoop版本是否与执行的代码相同,并且这时候运行的hadoop jar为本地路径而不是HDFS路径。
我们选择运行一个 grep 例子(查找),将 input 文件夹中的所有文件作为输入,筛选当中符合正则表达式 dfs[a-z.]+ 的单词并统计出现的次数,最后输出结果到 output 文件夹中:
[hadoop@aliyunhost hadoop]$ cd /usr/local/hadoop
[hadoop@aliyunhost hadoop]$ mkdir ./input
[hadoop@aliyunhost hadoop]$ cp ./etc/hadoop/*.xml ./input # 将配置文件作为输入文件
[hadoop@aliyunhost hadoop]$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoomapreduce-examples-*.jar grep ./input ./output 'dfs[a-z.]+'
[hadoop@aliyunhost hadoop]$ cat ./output/* # 查看运行结果