大数据
大数据hadoop学习,看官网,hadoop搭建有三种方式,
下载安装包地址为:
https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.4/hadoop-3.2.4.tar.gz
前置条件,安装好jdk8,这里就不展开了,很多教程。
拷贝安装包到你需要安装的目录,我这里是新建了一个安装目录
/root/tools
执行命令
tar –zxvf hadoop-3.2.4.tar.gz
解压后得到如下图目录
/root/tools/hadoop-3.2.4
vi /etc/profile
在文件末尾添加:
export JAVA_HOME=/root/tools/jdk/jdk1.8.0_144
export HADOOP_HOME=/root/tools/hadoop-3.2.4
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/root/tools/hadoop-3.2.4/etc/Hadoop
这里我安装hadoop的路径为/root/tools/hadoop-3.2.4
jdk的安装路径为 /root/tools/jdk/jdk1.8.0_144
编辑完后保存,执行命令 如下,让刚刚的修改生效。
source /etc/profile
执行命令
hadoop version
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
执行后输入ssh localhost正常的输出如下
ssh localhost
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
第一次启动需要执行命令
bin/hdfs namenode –format
报错如下:
[root@localhost hadoop-3.2.4]# bin/hdfs namenode –format
ERROR: JAVA_HOME is not set and could not be found.
解决办法:
修改解压后目录的 etc/Hadoop/hadoop-env.sh
修改java_home路径,
export JAVA_HOME=/root/tools/jdk/jdk1.8.0_144
如果不知道路径,可以用 vi /etc/profile查看java_home路径。,
执行结果如下
解决了报错后,执行format指令的结果如下图所示
执行命令
./sbin/start-dfs.sh
报错如下
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation
解决:
在etc/Hadoop/hadoop-env.sh 文件添加
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
http://localhost:9870/ namenode web地址
结果如下能看到如下界面启动成功
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/root
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
执行一下命令
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'
bin/hdfs dfs -get output output
cat output/*
yarn方式计算也是要启动hdfs为前提的,上面的步骤依然要执行。
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HOME</value>
</property>
</configuration>
sbin/start-yarn.sh
浏览器输入以下地址,ResourceManager 页面:
http://localhost:8088/
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'
报错如下
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /user/root/grep-temp-1038537
解决:
hdfs-site.xml文件中添加
<property>
<name>dfs.safemode.threshold.pct</name>
<value>0f</value>
<description>
Specifies the percentage of blocks that should satisfy
the minimal replication requirement defined by dfs.replication.min.
Values less than or equal to 0 mean not to wait for any particular
percentage of blocks before exiting safemode.
Values greater than 1 will make safe mode permanent.
</description>
</property>
报错如下
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://localhost:9000/user/roo
解决:删除输出目录
./bin/hdfs dfs -rm -r output
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.4.jar grep input output 'dfs[a-z.]+'
查看输出结果,执行命令bin/hdfs dfs -cat output/*
如下图
Hdfs启动有SecondaryNameNode,namenode,datanode,yarn启动有NodeManager ResourceManager
本文是根据官网上面的案例进行实践的,过程中遇到的问题也记录下来了,如果本文对你有帮助,点个赞吧,不对的地方也欢迎指出。