一、上次课程回顾
二、初识Hadoop
三、本次课程作业
Hadoop的官网:hadoop.apache.org、spark.apache.org、kafka.apache.org
广义:以apache hadoop软件为主的的生态圈(hive、SQOOP、flume、flink、hbase…)
狭义:单纯的指apache hadoop软件
apache hadoop软件:
1.x:基本不用
2.x:现在市场主流,对应的cdh5.X
3.x:有一些企业尝试使用,cdh对应的版本就是cdh6.X
cdh版的hadoop的下载网址,本次课程的组件主要使用cdh的官方提供的:
http://archive.cloudera.com/cdh5/cdh/5/
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2.tar.gz
这就说明了hadoop的版本是2.6.0,我们使用的cdh版本的2.6.0,他可以媲美apache hadoop2.9
hadoop-2.6.0-cdh5.16.2.tar.gz
apache hadoop2.6.0 + 以后的patch==apache hadoop2.9
如下cdh hadoop的每一个版本都会进行升级打包,比如某个组件有bug,我们从cdh5.14升级到cdh5.16,进入到changes.log进行查看升级即可。
CDH5.14.0 hadoop-2.6.0
CDH5.16.2 hadoop-2.6.0
apache的hadoop2.9、3.X版本已经出现了,apache基金会的hadoop是开源的,其主要bug是由cloudera公司的人员进行提交代码、推进的。
使用cdh版本hadoop的好处:版本兼容性不必考虑,比如未来要安装hbase,hbase安装的分支也需要和hadoop一样在cdh5.16.2这个分支下。
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.16.2-changes.log
http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.16.2-changes.log
hdfs 存储
mapreduce 计算 作业 挖掘出有价值的数据进行挖掘 --> hive sql spark flink
yarn 资源(内存、Vcore)调度+作业调度
大数据就是由海量数据组成,一台机器完成不了存储,一台机器是一个单点计算;比如我们有1000台的机器,是由hdfs进行存储,mapreduce进行分布式存储,yarn根据cpu、内存来进行资源
作业调度。
1、创建用户、解压软件
1、创建hadoop用户:
- useradd hadoop
2、mkdir app data lib log software sourcecode tmp
[hadoop@hadoop ~]$ ll
total 28
drwxrwxr-x 3 hadoop hadoop 4096 Mar 20 16:21 app 压缩包解压后的文件夹 尽量做软连接
drwxrwxr-x 2 hadoop hadoop 4096 Mar 8 17:49 data 数据目录
drwxrwxr-x 2 hadoop hadoop 4096 Mar 20 16:23 lib 第三方的jar
drwxrwxr-x 2 hadoop hadoop 4096 Mar 20 16:23 log 日志文件夹爱
drwxrwxr-x 2 hadoop hadoop 4096 Mar 8 20:27 software 压缩包
drwxrwxr-x 2 hadoop hadoop 4096 Mar 20 16:23 sourcecode 源代码编译
drwxrwxr-x 2 hadoop hadoop 4096 Mar 20 16:23 tmp 临时文件夹
//linux本身已经自带了tmp目录,为什么我们还要建一个tmp目录呢,系统自带的会30天定期删除。
3、进行解压缩,并且做一个软连接:
[hadoop@hadoop ~]$ tar -xzvf hadoop-2.6.0-cdh5.16.2.tar.gz -C /home/hadoop/app/
[hadoop@hadoop app]$ ln -s hadoop-2.6.0-cdh5.16.2 hadoop
软件的安装前提:java环境、ssh无密码
2、安装java jdk环境:
1、mkdir /usr/java,创建这个目录
2、rz把软件进行上传,解压到这个目录以后配置环境变量,如下所示:
#env
export JAVA_HOME=/usr/java/jdk1.8.0_45
#export JAVA_HOME=/usr/java/jdk1.7.0_45
export PATH=$JAVA_HOME/bin:$PATH
"/etc/profile" 82L, 1900C written
3、使的环境变量生效:
[root@hadoop java]# source /etc/profile
[root@hadoop java]# which java
/usr/java/jdk1.8.0_45/bin/java
[root@hadoop java]# echo $JAVA_HOME
/usr/java/jdk1.8.0_45
[root@hadoop java]# java -version
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
3、编辑hadoop-env.sh
[hadoop@hadoop hadoop]$ vi hadoop-env.sh
# The java implementation to use.
export JAVA_HOME=/usr/java/jdk1.8.0_45
4、了解hadoop部署的几种模式:
5、开始修改core-site.xml
//vi /etc/hosts,需要配置好ip和hostname的映射:
106.54.226.205 hadoop
1、修改etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
</configuration>
2、修改etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
6、配置ssh hadoop无密码:
[root@hadoop ~]# pwd
/root
[root@hadoop ~]# rm -rf .ssh
[root@hadoop ~]# ssh-keygen
//连续三次回车
[root@hadoop ~]# cd .ssh
[root@hadoop .ssh]# ll
total 8
-rw------- 1 root root 1671 Mar 20 17:26 id_rsa
-rw-r--r-- 1 root root 393 Mar 20 17:26 id_rsa.pub
//把公钥文件追加到信任文件中:
[root@hadoop .ssh]# cat id_rsa.pub >> authorized_keys
//权限修改为600
[root@hadoop .ssh]# chmod 600 authorized_keys
//测试如下没问题:
[hadoop@hadoop001 .ssh]$ ssh hadoop001 date
Tue Mar 24 15:19:49 CST 2020
遇到了个问题:
[hadoop@hadoop001 ~]$ ssh hadoop001 date
ssh: connect to host hadoop001 port 22: Connection timed out
[hadoop@hadoop001 ~]$ ssh hadoop001 date
The authenticity of host 'hadoop001 (172.17.0.5)' can't be established.
RSA key fingerprint is 33:6f:23:f9:ff:10:39:2d:cd:42:72:66:c8:7d:5a:6a.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'hadoop001,172.17.0.5' (RSA) to the list of known hosts.
Tue Mar 24 15:25:19 CST 2020
坑:root用户部署ssh,600权限不需要修改;而使用其它用户的时候,需要进行600权限修改。
7、配置环境变量&&生效环境变量&&which hadoop:
#env
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH
source ~/.bashrc
[hadoop@hadoop001 app]$ source ~/.bashrc
[hadoop@hadoop001 app]$ which hadoop
~/app/hadoop/bin/hadoop
[hadoop@hadoop001 app]$ which hdfs
~/app/hadoop/bin/hdfs
8、格式化:
9、进行启动:(第一次启动会让我们输入一个yes)
[hadoop@hadoop001 sbin]$ which start-dfs.sh
~/app/hadoop/sbin/start-dfs.sh
[hadoop@hadoop001 sbin]$ start-dfs.sh
20/03/24 15:50:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/03/24 15:50:57 WARN hdfs.DFSUtil: Namenode for null remains unresolved for ID null. Check your hdfs-site.xml file to ensure namenodes are configured properly.
Starting namenodes on [hadoop]
hadoop: ssh: Could not resolve hostname hadoop: Name or service not known
localhost: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
RSA key fingerprint is 33:6f:23:f9:ff:10:39:2d:cd:42:72:66:c8:7d:5a:6a.
Are you sure you want to continue connecting (yes/no)? yes
http://blog.itpub.net/30089851/viewspace-2127102/ 故障(修改用户和用户组)
ssh 除了第一次认证以外还是需要密码的话需要去查看系统日志。
http://blog.itpub.net/30089851/viewspace-1992210/ ssh多台 高级班会讲 坑
1、注意:
hadoop001是启动namenode,0.0.0.0上是启动secondarynamenode;localhost启动的是datanode;
为什么第一次要输入yes,查看~/.ssh/known_hosts
[hadoop@hadoop001 hadoop]$ start-dfs.sh
20/03/24 15:53:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-hadoop001.out
localhost: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
20/03/24 15:53:54 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
查看这个信任文件记录,第一次输入yes那句话是维护在known_hosts文件;
[hadoop@hadoop001 .ssh]$ cat known_hosts
localhost ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0RbK9Erb/qkvjdPvION6G35oFhaS8ZfA13dyILU069Hzd36i2tgUZa9D4IPbvZNk6DvDFW2zH8jW/j1RS99ZZ4+9yl/CrcrJRP9GjjBS2W1rQCUjolRdLf9PzAZs/AbFvpjxwMbd7vSb5AOeqQ0pTC4BFlvX6IJLGdZmhUYTqNCWj0e40l409o/Hidy0oEXByDaJWmRvRuI6jc5V1v9/FZNze96W/oJC6FLR7MrVgSJA2MmZvzvS3zbCgKU/umgD4ENy+JRBiifHwBWVTkIADHhVq7Ob14eFlawVnEY6tQkdgrwSc2LWBgFQXAYbYmeFLOrEJQKi3e/h4fMLaoeubQ==
hadoop001,172.17.0.5 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0RbK9Erb/qkvjdPvION6G35oFhaS8ZfA13dyILU069Hzd36i2tgUZa9D4IPbvZNk6DvDFW2zH8jW/j1RS99ZZ4+9yl/CrcrJRP9GjjBS2W1rQCUjolRdLf9PzAZs/AbFvpjxwMbd7vSb5AOeqQ0pTC4BFlvX6IJLGdZmhUYTqNCWj0e40l409o/Hidy0oEXByDaJWmRvRuI6jc5V1v9/FZNze96W/oJC6FLR7MrVgSJA2MmZvzvS3zbCgKU/umgD4ENy+JRBiifHwBWVTkIADHhVq7Ob14eFlawVnEY6tQkdgrwSc2LWBgFQXAYbYmeFLOrEJQKi3e/h4fMLaoeubQ==
0.0.0.0 ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA0RbK9Erb/qkvjdPvION6G35oFhaS8ZfA13dyILU069Hzd36i2tgUZa9D4IPbvZNk6DvDFW2zH8jW/j1RS99ZZ4+9yl/CrcrJRP9GjjBS2W1rQCUjolRdLf9PzAZs/AbFvpjxwMbd7vSb5AOeqQ0pTC4BFlvX6IJLGdZmhUYTqNCWj0e40l409o/Hidy0oEXByDaJWmRvRuI6jc5V1v9/FZNze96W/oJC6FLR7MrVgSJA2MmZvzvS3zbCgKU/umgD4ENy+JRBiifHwBWVTkIADHhVq7Ob14eFlawVnEY6tQkdgrwSc2LWBgFQXAYbYmeFLOrEJQKi3e/h4fMLaoeubQ==
补充一个坑:
比如我们的公钥文件已经发生了变更,在执行命令的时候会出现一个记录,存在一个旧的连接,只要找到这条记录给他删掉就行;工作中如果碰到ssh信任的错误,下意识的要去这个文件中找。
1、修改内容后尝试重新启动会发现失败了
localhost: key_from_blob: can't read key type
localhost: key_read: key_from_blob EEEEB3NzaC1yc2EAAAABIwAAAQEA0RbK9Erb/qkvjdPvION6G35oFhaS8ZfA13dyILU069Hzd36i2tgUZa9D4IPbvZNk6DvDFW2zH8jW/j1RS99ZZ4+9yl/CrcrJRP9GjjBS2W1rQCUjolRdLf9PzAZs/AbFvpjxwMbd7vSb5AOeqQ0pTC4BFlvX6IJLGdZmhUYTqNCWj0e40l409o/Hidy0oEXByDaJWmRvRuI6jc5V1v9/FZNze96W/oJC6FLR7MrVgSJA2MmZvzvS3zbCgKU/umgD4ENy+JRBiifHwBWVTkIADHhVq7Ob14eFlawVnEY6tQkdgrwSc2LWBgFQXAYbYmeFLOrEJQKi3e/h4fMLaoeubQ==
localhost: failed
2、解决:进入~/.ssh/known_hosts文件下,找到对应的localhost那一行,使用dd进行删除。
重新启动start-dfs.sh,然后再次输入yes进行认证,会再known_hosts中追加进去一条新的记录。
节点名称 | 对应文件名称 |
---|---|
namenode | core-site.xml |
datanode | slaves |
secondarynamenode | hdfs-site.xml |
1、cat /etc/hadoop
1、修改slaves,删除原有的,直接写入hadoop001
2、修改hdfs-site.xml,如下这两段话进行插入:
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop001:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>hadoop001:50091</value>
</property>
[hadoop@hadoop001 .ssh]$ jps
11488 NameNode
13634 Jps
12308 DataNode
11805 SecondaryNameNode
[hadoop@hadoop001 .ssh]$ netstat -nlp|grep 11805
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 0.0.0.0:50090 0.0.0.0:* LISTEN 11805/java
成果如下:Hadoop的三进程都以机器名hadoop001进行启动如下:
[hadoop@hadoop001 hadoop]$ start-dfs.sh
20/03/24 16:48:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [hadoop001]
hadoop001: starting namenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-namenode-hadoop001.out
hadoop001: starting datanode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-datanode-hadoop001.out
Starting secondary namenodes [hadoop001]
hadoop001: starting secondarynamenode, logging to /home/hadoop/app/hadoop-2.6.0-cdh5.16.2/logs/hadoop-hadoop-secondarynamenode-hadoop001.out
20/03/24 16:48:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
1、创建用户及用户目录,su - hadoop进行切换
2、上传压缩包并进行解压,做一个软连接
3、环境要求:java1.8、ssh必须安装
4、java_home的显性配置
5、配置core-site.xml和hdfs-site.xml
6、ssh的无密码信任关系
7、Hadoop配置环境变量
8、格式话namenode
9、start-dfs.sh进行第一次启动,输入yes进行验证
10、datanode、secondarynamenode都要知道在官网的哪边去修改参数
11、namenode是名称节点,老大,读写请求都要先经过它;datanode是数据节点,小弟,存储数据,检索数据;secondarynamenode是第二名称节点,是n+1是时间进行备份。
1、safemode is off,安全模式
2、磁盘空间解读:集群总容量49.21G,DFS的使用是12k,不是DFS使用的是13.39G,DFS预留是33.33G;如下图我们还剩34G,这个空间默认是我们dfs的使用空间。
[hadoop@hadoop001 hadoop]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 50G 14G 34G 29% /
2、文件可视化系统界面:
1、[hadoop@hadoop001 data]$ hdfs dfs -mkdir ruozedata
2、[hadoop@hadoop001 data]$ hdfs dfs -put ruoze.log /ruozedata/
3、下载到本地目录下:
[hadoop@hadoop001 data]$ hdfs dfs -get /ruozedata/ruoze.log ./
20/03/24 17:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
[hadoop@hadoop001 data]$ ll
total 4
-rw-r--r-- 1 hadoop hadoop 10 Mar 24 17:37 ruoze.log
4、删除hdfs上的该文件:
[hadoop@hadoop001 data]$ hdfs dfs -rm -r /ruozedata/ruoze.log
20/03/24 17:38:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Deleted /ruozedata/ruoze.log
1、hdfs上的基础命令尽心总结
2、安装hadoop的hdfs伪分布式部署