嘤嘤嘤

大数据技术之Hadoop_HDFS

参考：《尚硅谷》大数据学习，日常总结。

版本：Apache Hadoop 2.7.2

hdfs-default.xml：http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

集群配置情况：https://blog.csdn.net/qq_40794973/article/details/86681941#t7

第1章 HDFS概述

1.1 HDFS产出背景及定义

1.2 HDFS优缺点

1.3 HDFS组成架构

1.4 HDFS文件块大小（面试重点）

第2章 HDFS的Shell操作（开发重点）

1．基本语法

2．命令大全

3．常用命令实操

第3章 HDFS客户端操作（开发重点）

3.1 HDFS客户端环境准备

3.2 HDFS的API操作

3.3 HDFS的I/O流操作

第4章 HDFS的数据流（面试重点）

4.1 HDFS写数据流程

4.2 HDFS读数据流程

第5章 NameNode和SecondaryNameNode（面试开发重点）

5.1 NN和2NN工作机制

5.2 Fsimage和Edits解析

5.3 CheckPoint时间设置

5.4 NameNode故障处理

5.5 集群安全模式

5.6 NameNode多目录配置

第6章 DataNode（面试开发重点）

6.1 DataNode工作机制

6.2 数据完整性

6.3 掉线时限参数设置

6.4 服役新数据节点

6.5 退役旧数据节点

6.6 Datanode多目录配置

第7章 HDFS 2.X新特性

7.1 集群间数据拷贝

7.2 小文件存档

7.3 回收站

7.4 快照管理

第1章 HDFS概述

1.1 HDFS产出背景及定义

1.2 HDFS优缺点

1.3 HDFS组成架构

1.4 HDFS文件块大小（面试重点）

第2章 HDFS的Shell操作（开发重点）

1．基本语法

bin/hadoop fs 具体命令  
    或者
bin/hdfs dfs 具体命令

dfs是fs的实现类：

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs
Usage: hadoop fs [generic options]
   [-appendToFile ... ]
   [-cat [-ignoreCrc] ...]
   [-checksum ...]
   [-chgrp [-R] GROUP PATH...]
   [-chmod [-R] PATH...]
   [-chown [-R] [OWNER][:[GROUP]] PATH...]
   [-copyFromLocal [-f] [-p] [-l] ... ]
   [-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
   [-count [-q] [-h] ...]
   [-cp [-f] [-p | -p[topax]] ... ]
   [-createSnapshot []]
   [-deleteSnapshot ]
   [-df [-h] [ ...]]
   [-du [-s] [-h] ...]
   [-expunge]
   [-find ... ...]
   [-get [-p] [-ignoreCrc] [-crc] ... ]
   [-getfacl [-R] ]
   [-getfattr [-R] {-n name | -d} [-e en] ]
   [-getmerge [-nl] ]
   [-help [cmd ...]]
   [-ls [-d] [-h] [-R] [ ...]]
   [-mkdir [-p] ...]
   [-moveFromLocal ... ]
   [-moveToLocal ]
   [-mv ... ]
   [-put [-f] [-p] [-l] ... ]
   [-renameSnapshot ]
   [-rm [-f] [-r|-R] [-skipTrash] ...]
   [-rmdir [--ignore-fail-on-non-empty]
...]
   [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
   [-setfattr {-n name [-v value] | -x name} ]
   [-setrep [-R] [-w] ...]
   [-stat [format] ...]
   [-tail [-f] ]
   [-test -[defsz] ]
   [-text [-ignoreCrc] ...]
   [-touchz ...]
   [-truncate [-w] ...]
   [-usage [cmd ...]]

Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs
Usage: hadoop fs [generic options]
   [-appendToFile ... ]
   [-cat [-ignoreCrc] ...]
   [-checksum ...]
   [-chgrp [-R] GROUP PATH...]
   [-chmod [-R] PATH...]
   [-chown [-R] [OWNER][:[GROUP]] PATH...]
   [-copyFromLocal [-f] [-p] [-l] ... ]
   [-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
   [-count [-q] [-h] ...]
   [-cp [-f] [-p | -p[topax]] ... ]
   [-createSnapshot []]
   [-deleteSnapshot ]
   [-df [-h] [ ...]]
   [-du [-s] [-h] ...]
   [-expunge]
   [-find ... ...]
   [-get [-p] [-ignoreCrc] [-crc] ... ]
   [-getfacl [-R] ]
   [-getfattr [-R] {-n name | -d} [-e en] ]
   [-getmerge [-nl] ]
   [-help [cmd ...]]
   [-ls [-d] [-h] [-R] [ ...]]
   [-mkdir [-p] ...]
   [-moveFromLocal ... ]
   [-moveToLocal ]
   [-mv ... ]
   [-put [-f] [-p] [-l] ... ]
   [-renameSnapshot ]
   [-rm [-f] [-r|-R] [-skipTrash] ...]
   [-rmdir [--ignore-fail-on-non-empty]
...]
   [-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
   [-setfattr {-n name [-v value] | -x name} ]
   [-setrep [-R] [-w] ...]
   [-stat [format] ...]
   [-tail [-f] ]
   [-test -[defsz] ]
   [-text [-ignoreCrc] ...]
   [-touchz ...]
   [-truncate [-w] ...]
   [-usage [cmd ...]]

Generic options supported are
-conf specify an application configuration file
-D use value for given property
-fs specify a namenode
-jt specify a ResourceManager
-files specify comma separated files to be copied to the map reduce cluster
-libjars specify comma separated jar files to include in the classpath.
-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

[atguigu@hadoop102 hadoop-2.7.2]$

注： hadoop fs 是 dfs 的父类，两个都差不多。

2．命令大全

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop fs

[-appendToFile ... ]
[-cat [-ignoreCrc] ...]
[-checksum ...]
[-chgrp [-R] GROUP PATH...]
[-chmod [-R] PATH...]
[-chown [-R] [OWNER][:[GROUP]] PATH...]
[-copyFromLocal [-f] [-p] ... ]
[-copyToLocal [-p] [-ignoreCrc] [-crc] ... ]
[-count [-q] ...]
[-cp [-f] [-p] ... ]
[-createSnapshot []]
[-deleteSnapshot ]
[-df [-h] [ ...]]
[-du [-s] [-h] ...]
[-expunge]
[-get [-p] [-ignoreCrc] [-crc] ... ]
[-getfacl [-R] ]
[-getmerge [-nl] ]
[-help [cmd ...]]
[-ls [-d] [-h] [-R] [ ...]]
[-mkdir [-p] ...]
[-moveFromLocal ... ]
[-moveToLocal ]
[-mv ... ]
[-put [-f] [-p] ... ]
[-renameSnapshot ]
[-rm [-f] [-r|-R] [-skipTrash] ...]
[-rmdir [--ignore-fail-on-non-empty]
...]
[-setfacl [-R] [{-b|-k} {-m|-x } ]|[--set ]]
[-setrep [-R] [-w] ...]
[-stat [format] ...]
[-tail [-f] ]
[-test -[defsz] ]
[-text [-ignoreCrc] ...]
[-touchz ...]
[-usage [cmd ...]]

3．常用命令实操

（0）启动Hadoop集群（方便后续的测试）

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

（1）-help：输出这个命令参数

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -help rm

（2）-ls: 显示目录信息

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -ls /

（3）-mkdir：在HDFS上创建目录

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir -p /sanguo/shuguo

（4）-moveFromLocal：从本地剪切粘贴到HDFS

[atguigu@hadoop102 hadoop-2.7.2]$ touch kongming.txt

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -moveFromLocal ./kongming.txt /sanguo/shuguo

（5）-appendToFile：追加一个文件到已经存在的文件末尾

[atguigu@hadoop102 hadoop-2.7.2]$ touch liubei.txt

[atguigu@hadoop102 hadoop-2.7.2]$ vi liubei.txt

输入

san gu mao lu

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -appendToFile liubei.txt /sanguo/shuguo/kongming.txt

（6）-cat：显示文件内容

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cat /sanguo/shuguo/kongming.txt

（7）-chgrp 、-chmod、-chown：Linux文件系统中的用法一样，修改文件所属权限

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -chmod 666 /sanguo/shuguo/kongming.txt

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -chown atguigu:atguigu /sanguo/shuguo/kongming.txt

（8）-copyFromLocal：从本地文件系统中拷贝文件到HDFS路径去

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -copyFromLocal README.txt /

（9）-copyToLocal：从HDFS拷贝到本地

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -copyToLocal /sanguo/shuguo/kongming.txt ./

（10）-cp ：从HDFS的一个路径拷贝到HDFS的另一个路径

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cp /sanguo/shuguo/kongming.txt /zhuge.txt

（11）-mv：在HDFS目录中移动文件

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mv /zhuge.txt /sanguo/shuguo/

（12）-get：等同于copyToLocal，就是从HDFS下载文件到本地

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -get /sanguo/shuguo/kongming.txt ./

（13）-getmerge：合并下载多个文件，比如HDFS的目录 /user/atguigu/test下有多个文件:log.1, log.2,log.3,...

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -getmerge /user/atguigu/test/* ./zaiyiqi.txt

（14）-put：等同于copyFromLocal

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put ./zaiyiqi.txt /user/atguigu/test/

（15）-tail：显示一个文件的末尾

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -tail /sanguo/shuguo/kongming.txt

（16）-rm：删除文件或文件夹

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rm /user/atguigu/test/jinlian2.txt

删除目录：

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rm -R /user/atguigu/test

不进入回收站（配置过回收站）直接删除

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rm -r -skipTrash /user/atguigu/test

（17）-rmdir：删除空目录

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir /test

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rmdir /test

（18）-du统计文件夹的大小信息

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -du -s -h /user/atguigu/test
2.7 K  /user/atguigu/test

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -du  -h /user/atguigu/test
1.3 K  /user/atguigu/test/README.txt
15     /user/atguigu/test/jinlian.txt
1.4 K  /user/atguigu/test/zaiyiqi.txt

（19）-setrep：设置HDFS中文件的副本数量

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -setrep 10 /sanguo/shuguo/kongming.txt

注：这里设置的副本数只是记录在NameNode的元数据中，是否真的会有这么多副本，还得看DataNode的数量。因为目前只有3台设备，最多也就3个副本，只有节点数的增加到10台时，副本数才能达到10。

第3章 HDFS客户端操作（开发重点）

3.1 HDFS客户端环境准备

1．根据自己电脑的操作系统拷贝对应的编译后的hadoop jar包到非中文路径（例如：D:\Develop\hadoop-2.7.2）。

2．配置HADOOP_HOME环境变量。HADOOP_HOME=安装路径

3. 配置Path环境变量。%HADOOP_HOME%\bin

4．创建一个Maven工程HdfsClientDemo

注：只需要填写前面两个即可。

5．导入相应的依赖坐标+日志添加


		
			junit
			junit
			RELEASE
		
		
			org.apache.logging.log4j
			log4j-core
			2.8.2
		
		
			org.apache.hadoop
			hadoop-common
			2.7.2
		
		
			org.apache.hadoop
			hadoop-client
			2.7.2
		
		
			org.apache.hadoop
			hadoop-hdfs
			2.7.2
		
		
			jdk.tools
			jdk.tools
			1.8
			system
			${JAVA_HOME}/lib/tools.jar

注意：如果Eclipse/Idea打印不出日志，在控制台上只显示

1.log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
2.log4j:WARN Please initialize the log4j system properly.
3.log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

需要在项目的src/main/resources目录下，新建一个文件，命名为“log4j.properties”，在文件中填入

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

6．创建包名：com.atguigu.hdfs

7．创建HdfsClient类

public class HdfsClient{	
@Test
public void testMkdirs() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	// 配置在集群上运行
	 configuration.set("fs.defaultFS", "hdfs://hadoop102:9000");
	 FileSystem fs = FileSystem.get(configuration);
	//FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 创建目录
	fs.mkdirs(new Path("/1108/daxian/banzhang"));	
	// 3 关闭资源
	fs.close();
    }
}

8．执行程序

运行时需要配置用户名称

客户端去操作HDFS时，是有一个用户身份的。默认情况下，HDFS客户端API会从JVM中获取一个参数来作为自己的用户身份：-DHADOOP_USER_NAME=atguigu，atguigu为用户名称。

3.2 HDFS的API操作

/**
 * 这样运行的时候需要通过配置方式来启动 
 *                                 -DHADOOP_USER_NAME=atguigu
 * 不然会报错
 * 	org.apache.hadoop.security.AccessControlException: Permission denied: user=yuanyu, access=WRITE, inode="/0528/dashen/banzhang":atguigu:supergroup:drwxr-xr-x
 * 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
 * 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
 * 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)                               
 * @throws IOException
 * @throws InterruptedException
 * @throws URISyntaxException
 */
@Test
public void test() throws IOException, InterruptedException, URISyntaxException {
	//System.setProperty("hadoop.home.dir", "D:/programming software/hadoop-2.7.2");
	Configuration conf = new Configuration();
	conf.set("fs.defaultFS", "hdfs://hadoop102:9000");
	//1、获取 hdfs 客户端对象
	FileSystem fs = FileSystem.get(conf);
	//2、在  hdfs 上创建路径
	fs.mkdirs(new Path("/0529/dashen/banzhang"));
	//3、关闭资源 
	fs.close();
	System.out.println("HDFSClient.test2()");
}

//运行项目不需要添加配置信息，直接点击运行即可
@Test
public void test2() throws IOException, InterruptedException, URISyntaxException {
	//System.setProperty("hadoop.home.dir", "D:/programming software/hadoop-2.7.2");//配置了 HADDOOP_HOME 后可以不用这句话
	Configuration conf = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), conf, "atguigu");//指定 创建文件的用户名
	//2、在  hdfs 上创建路径
	fs.mkdirs(new Path("/0529/dashen/banzhang"));
	//3、关闭资源 
	fs.close();
	System.out.println("HDFSClient.test()");
}

3.2.1 HDFS文件上传（测试参数优先级）

1．编写源代码

@Test
public void testCopyFromLocalFile() throws IOException, InterruptedException, URISyntaxException {
		// 1 获取文件系统
		Configuration configuration = new Configuration();
		configuration.set("dfs.replication", "2");
		FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
		// 2 上传文件
		fs.copyFromLocalFile(new Path("e:/banzhang.txt"), new Path("/banzhang.txt"));
		// 3 关闭资源
		fs.close();
		System.out.println("over");
}

2．将 hdfs-site.xml 拷贝到resources目录下





	
		dfs.replication
        1

3．参数优先级

参数优先级排序：（1）客户端代码中设置的值 >（2）ClassPath下的用户自定义配置文件 >（3）然后是服务器的默认配置

3.2.2 HDFS文件下载

@Test
public void testCopyToLocalFile() throws IOException, InterruptedException, URISyntaxException{

	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
		
	// 2 执行下载操作
	// boolean delSrc 指是否将原文件删除
	// Path src 指要下载的文件路径
	// Path dst 指将文件下载到的路径
	// boolean useRawLocalFileSystem 是否开启文件校验
	fs.copyToLocalFile(false, new Path("/banzhang.txt"), new Path("e:/banhua.txt"), true);
		
	// 3 关闭资源
	fs.close();
}

3.2.3 HDFS文件夹删除

@Test
public void testDelete() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 执行删除
	fs.delete(new Path("/0508/"), true);
	// 3 关闭资源
	fs.close();
}

3.2.4 HDFS文件名更改

@Test
public void testRename() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 
	// 2 修改文件名称
	fs.rename(new Path("/banzhang.txt"), new Path("/banhua.txt"));
	// 3 关闭资源
	fs.close();
}

3.2.5 HDFS文件详情查看

查看文件名称、权限、长度、块信息

@Test
public void testListFiles() throws IOException, InterruptedException, URISyntaxException{
	// 1获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu"); 
	// 2 获取文件详情
	RemoteIterator listFiles = fs.listFiles(new Path("/"), true);
	while(listFiles.hasNext()){
		LocatedFileStatus status = listFiles.next();
		// 输出详情
		// 文件名称
		System.out.println("文件名称: "+status.getPath().getName());
		// 路径
		System.out.println("路径: "+status.getPath());
		// 长度
		System.out.println("长度: "+status.getLen());
		// 权限
		System.out.println("权限: "+status.getPermission());
		// 分组
		System.out.println("分组: "+status.getGroup());
		// 获取存储的块信息
		BlockLocation[] blockLocations = status.getBlockLocations();
		for (BlockLocation blockLocation : blockLocations) {
			// 获取块存储的主机节点
			String[] hosts = blockLocation.getHosts();
			for (String host : hosts) {
				System.out.println("存储的主机节点: "+host);
			}
		}
		System.out.println("-----------班长的分割线----------");
	}
	// 3 关闭资源
	fs.close();
}

3.2.6 HDFS文件和文件夹判断

@Test
public void testListStatus() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件配置信息
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 判断是文件还是文件夹
	FileStatus[] listStatus = fs.listStatus(new Path("/"));
	for (FileStatus fileStatus : listStatus) {
		// 如果是文件
		if (fileStatus.isFile()) {
				System.out.println("f:"+fileStatus.getPath().getName());
			}else {
				System.out.println("d:"+fileStatus.getPath().getName());
			}
		}
	// 3 关闭资源
	fs.close();
}

3.3 HDFS的I/O流操作

上面我们学的API操作HDFS系统都是框架封装好的。那么如果我们想自己实现上述API的操作该怎么实现呢？

我们可以采用IO流的方式实现数据的上传和下载。

3.3.1 HDFS文件上传 create

1．需求：把本地e盘上的banhua.txt文件上传到HDFS根目录

2．编写代码

@Test
public void putFileToHDFS() throws IOException, InterruptedException, URISyntaxException {
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 创建输入流
	FileInputStream fis = new FileInputStream(new File("e:/banhua.txt"));
	// 3 获取输出流
	FSDataOutputStream fos = fs.create(new Path("/banhua.txt"));
	// 4 流对拷
	IOUtils.copyBytes(fis, fos, configuration);
	// 5 关闭资源
	IOUtils.closeStream(fos);
	IOUtils.closeStream(fis);
    fs.close();
}

3.3.2 HDFS文件下载 open

1．需求：从HDFS上下载banhua.txt文件到本地e盘上

2．编写代码

// 文件下载
@Test
public void getFileFromHDFS() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 获取输入流
	FSDataInputStream fis = fs.open(new Path("/banhua.txt"));
	// 3 获取输出流
	FileOutputStream fos = new FileOutputStream(new File("e:/banhua.txt"));
	// 4 流的对拷
	IOUtils.copyBytes(fis, fos, configuration);
	// 5 关闭资源
	IOUtils.closeStream(fos);
	IOUtils.closeStream(fis);
	fs.close();
}

3.3.3 定位文件读取

1．需求：分块读取HDFS上的大文件，比如根目录下的/hadoop-2.7.2.tar.gz

2．编写代码

（1）下载第一块

@Test
public void readFileSeek1() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 获取输入流
	FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));
	// 3 创建输出流
	FileOutputStream fos = new FileOutputStream(new File("e:/hadoop-2.7.2.tar.gz.part1"));
		
	// 4 流的拷贝
	byte[] buf = new byte[1024];
	for(int i =0 ; i < 1024 * 128; i++){
		fis.read(buf);
		fos.write(buf);
	}
	// 5关闭资源
	IOUtils.closeStream(fis);
	IOUtils.closeStream(fos);
fs.close();
}

（2）下载第二块

@Test
public void readFileSeek2() throws IOException, InterruptedException, URISyntaxException{
	// 1 获取文件系统
	Configuration configuration = new Configuration();
	FileSystem fs = FileSystem.get(new URI("hdfs://hadoop102:9000"), configuration, "atguigu");
	// 2 打开输入流
	FSDataInputStream fis = fs.open(new Path("/hadoop-2.7.2.tar.gz"));
	// 3 定位输入数据位置
	fis.seek(1024*1024*128);
	// 4 创建输出流
	FileOutputStream fos = new FileOutputStream(new File("e:/hadoop-2.7.2.tar.gz.part2"));
	// 5 流的对拷
	IOUtils.copyBytes(fis, fos, configuration);
	// 6 关闭资源
	IOUtils.closeStream(fis);
	IOUtils.closeStream(fos);
}

（3）合并文件

在Window命令窗口中进入到目录E:\，然后执行如下命令，对数据进行合并

type hadoop-2.7.2.tar.gz.part2 >> hadoop-2.7.2.tar.gz.part1

前后比对 hadoop-2.7.2.tar.gz.part1 文件的大小，合并完成后，将hadoop-2.7.2.tar.gz.part1重新命名为hadoop-2.7.2.tar.gz。解压发现该tar包非常完整。

第4章 HDFS的数据流（面试重点）

4.1 HDFS写数据流程

4.1.1 剖析文件写入

HDFS写数据流程：

1）客户端通过 Distributed FileSystem 模块向 NameNode 请求上传文件，NameNode检查目标文件是否已存在、父目录是否存在。
2）NameNode返回是否可以上传。
3）客户端请求第一个 Block上传到哪几个 DataNode 服务器上。
4）NameNode返回3个DataNode节点，分别为dn1、dn2、dn3。（根据节点的距离和负载）
5）客户端通过 FSDataOutputStream 模块请求 dn1上传数据，dn1收到请求会继续调用dn2，然后dn2调用dn3，将这个通信管道建立完成。
6）dn1、dn2、dn3逐级应答客户端。
7）客户端开始往dn1上传第一个Block（先从磁盘读取数据放到一个本地内存缓存），以Packet为单位，dn1收到一个Packet就会传给dn2，dn2传给dn3；dn1每传一个packet会放入一个应答队列等待应答。
8）当一个 Block 传输完成之后，客户端再次请求 NameNode上传第二个Block的服务器。（重复执行3-7步）。

4.1.2 网络拓扑-节点距离计算

在HDFS写数据的过程中，NameNode会选择距离待上传数据最近距离的DataNode接收数据。那么这个最近距离怎么计算呢？

节点距离：两个节点到达最近的共同祖先的距离总和。

例如，假设有数据中心 d1 机架 r1 中的节点 n1。该节点可以表示为 /d1/r1/n1 。利用这种标记，这里给出四种距离描述，如下图所示。

大家算一算每两个节点之间的距离，如图3-10所示。

4.1.3 机架感知（副本存储节点选择）

1. 官方ip地址

机架感知说明

http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication

For the common case, when the replication factor is three, HDFS’s placement policy is to put one replica on one node in the local rack, another on a different node in the local rack, and the last on a different node in a different rack.

2. Hadoop2.7.2副本节点选择

解释：首先第一个考虑的问题是 IO 传输的距离最短，为了快速的备份。另一个考虑的问题是安全性，如何前面那个机架崩了，我另一个机架上还有一个。首先考虑速度，其次考虑安全。

老版本：IO 传输高速度慢，但是可靠性更高。

4.2 HDFS读数据流程

HDFS的读数据流程，如图3-13所示。

1）客户端通过 Distributed FileSystem向NameNode 请求下载文件，NameNode 通过查询元数据，找到文件块所在的 DataNode地址。

2）挑选一台 DataNode（就近原则，然后随机）服务器，请求读取数据。

3）DataNode开始传输数据给客户端（从磁盘里面读取数据输入流，以Packet为单位来做校验）。

4）客户端以Packet为单位接收，先在本地缓存，然后写入目标文件。

第5章 NameNode和SecondaryNameNode（面试开发重点）

5.1 NN和2NN工作机制

思考：NameNode中的元数据是存储在哪里的？

首先，我们做个假设，如果存储在NameNode节点的磁盘中，因为经常需要进行随机访问，还有响应客户请求，必然是效率过低。因此，元数据需要存放在内存中。但如果只存在内存中，一旦断电，元数据丢失，整个集群就无法工作了。因此产生在磁盘中备份元数据的 FsImage（镜像文件）。

这样又会带来新的问题，当在内存中的元数据更新时，如果同时更新 FsImage ，就会导致效率过低，但如果不更新，就会发生一致性问题，一旦NameNode节点断电，就会产生数据丢失。因此，引入 Edits 文件(只进行追加操作，效率很高)。每当元数据有更新或者添加元数据时，修改内存中的元数据并追加到Edits中。这样，一旦 NameNode 节点断电，可以通过 FsImage 和 Edits的合并，合成元数据。（内存中的内容=FsImage +Edits）

但是，如果长时间添加数据到 Edits 中，会导致该文件数据过大，效率降低，而且一旦断电，恢复元数据需要的时间过长。因此，需要定期进行 FsImage 和 Edits 的合并，如果这个操作由 NameNode 节点完成，又会效率过低。因此，引入一个新的节点SecondaryNamenode，专门用于 FsImage 和 Edits 的合并。

NN和2NN工作机制，如图3-14所示。

1. 第一阶段：NameNode启动

（1）第一次启动NameNode格式化后，创建 Fsimage 和 Edits 文件。如果不是第一次启动，直接加载编辑日志和镜像文件到内存。

（2）客户端对元数据进行增删改的请求。

（3）NameNode 记录操作日志，更新滚动日志。

（4）NameNode 在内存中对数据进行增删改。

2. 第二阶段：Secondary NameNode工作

（1）Secondary NameNode 询问NameNode是否需要CheckPoint。直接带回NameNode是否检查结果。

（2）Secondary NameNode 请求执行 CheckPoint 。

（3）NameNode 滚动正在写的 Edits 日志。

（4）将滚动前的编辑日志和镜像文件拷贝到 Secondary NameNode。

（5）Secondary NameNode 加载编辑日志和镜像文件到内存，并合并。

（6）生成新的镜像文件 fsimage.chkpoint。

（7）拷贝 fsimage.chkpoint 到 NameNode。

（8）NameNode 将 fsimage.chkpoint 重新命名成 fsimage。

NN和2NN工作机制详解：

Fsimage：NameNode内存中元数据序列化后形成的文件。
Edits：记录客户端更新元数据信息的每一步操作（可通过Edits运算出元数据）。

NameNode启动时，先滚动 Edits 并生成一个空的 edits.inprogress，然后加载Edits和Fsimage到内存中，此时NameNode内存就持有最新的元数据信息。Client开始对NameNode发送元数据的增删改的请求，这些请求的操作首先会被记录到edits.inprogress中（查询元数据的操作不会被记录在Edits中，因为查询操作不会更改元数据信息），如果此时NameNode挂掉，重启后会从Edits中读取元数据的信息。然后，NameNode会在内存中执行元数据的增删改的操作。

由于Edits中记录的操作会越来越多，Edits文件会越来越大，导致NameNode在启动加载Edits时会很慢，所以需要对Edits和Fsimage进行合并（所谓合并，就是将Edits和Fsimage加载到内存中，照着Edits中的操作一步步执行，最终形成新的Fsimage）。SecondaryNameNode的作用就是帮助NameNode进行Edits和Fsimage的合并工作。

SecondaryNameNode首先会询问NameNode是否需要CheckPoint（触发CheckPoint需要满足两个条件中的任意一个，定时时间到和Edits中数据写满了）。直接带回 NameNode 是否检查结果。SecondaryNameNode 执行 CheckPoint 操作，首先会让 NameNode 滚动 Edits 并生成一个空的 edits.inprogress ，滚动 Edits 的目的是给 Edits 打个标记，以后所有新的操作都写入edits.inprogress，其他未合并的 Edits 和 Fsimage 会拷贝到 SecondaryNameNode 的本地，然后将拷贝的 Edits 和 Fsimage 加载到内存中进行合并，生成 fsimage.chkpoint，然后将fsimage.chkpoint拷贝给NameNode，重命名为 Fsimage 后替换掉原来的Fsimage。NameNode 在启动时就只需要加载之前未合并的 Edits 和 Fsimage 即可，因为合并过的 Edits中的元数据信息已经被记录在Fsimage中。

5.2 Fsimage和Edits解析

1. 概念

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs
Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
       where COMMAND is one of:
  dfs                  run a filesystem command on the file systems supported in Hadoop.
  classpath            prints the classpath
  namenode -format     format the DFS filesystem
  secondarynamenode    run the DFS secondary namenode
  namenode             run the DFS namenode
  journalnode          run the DFS journalnode
  zkfc                 run the ZK Failover Controller daemon
  datanode             run a DFS datanode
  dfsadmin             run a DFS admin client
  haadmin              run a DFS HA admin client
  fsck                 run a DFS filesystem checking utility
  balancer             run a cluster balancing utility
  jmxget               get JMX exported values from NameNode or DataNode.
  mover                run a utility to move block replicas across
                       storage types
  oiv                  apply the offline fsimage viewer to an fsimage
  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
  oev                  apply the offline edits viewer to an edits file
  fetchdt              fetch a delegation token from the NameNode
  getconf              get config values from configuration
  groups               get the groups which users belong to
  snapshotDiff         diff two snapshots of a directory or diff the
                       current directory contents with a snapshot
  lsSnapshottableDir   list all snapshottable dirs owned by the current user
						Use -help to see options
  portmap              run a portmap service
  nfs3                 run an NFS version 3 gateway
  cacheadmin           configure the HDFS cache
  crypto               configure HDFS encryption zones
  storagepolicies      list/get/set block storage policies
  version              print the version

Most commands print help when invoked w/o parameters.
[atguigu@hadoop102 hadoop-2.7.2]$

2. oiv查看Fsimage文件

（1）查看oiv和oev命令

[atguigu@hadoop102 current]$ hdfs
oiv apply the offline fsimage viewer to an fsimage

允许你去查看镜像文件
oev apply the offline edits viewer to an edits file

查看编辑日志

（2）基本语法

hdfs oiv -p 文件类型 -i镜像文件 -o 转换后文件输出路径

（3）案例实操

NameNode:

[atguigu@hadoop102 current]$ jps
2580 DataNode
3268 Jps
2917 NodeManager
2463 NameNode
[atguigu@hadoop102 hadoop-2.7.2]$ tree data/
data/
└── tmp
├── dfs
│   ├── data
│   │   ├── current
│   │   │   ├── BP-132169342-192.168.19.102-1548944709783
│   │   │   │   ├── current
│   │   │   │   │   ├── dfsUsed
│   │   │   │   │   ├── finalized
│   │   │   │   │   │   └── subdir0
│   │   │   │   │   │    └── subdir0
│   │   │   │   │   │    ├── blk_1073741825
│   │   │   │   │   │    ├── blk_1073741825_1001.meta
│   │   │   │   │   │    ├── blk_1073741826
│   │   │   │   │   │    ├── blk_1073741826_1002.meta
│   │   │   │   │   │    ├── blk_1073741827
│   │   │   │   │   │    ├── blk_1073741827_1003.meta
│   │   │   │   │   │    ├── blk_1073741829
│   │   │   │   │   │    ├── blk_1073741829_1005.meta
│   │   │   │   │   │    ├── blk_1073741830
│   │   │   │   │   │    ├── blk_1073741830_1006.meta
│   │   │   │   │   │    ├── blk_1073741832
│   │   │   │   │   │    ├── blk_1073741832_1008.meta
│   │   │   │   │   │    ├── blk_1073741833
│   │   │   │   │   │    ├── blk_1073741833_1009.meta
│   │   │   │   │   │    ├── blk_1073741836
│   │   │   │   │   │    └── blk_1073741836_1012.meta
│   │   │   │   │   ├── rbw
│   │   │   │   │   └── VERSION
│   │   │   │   ├── scanner.cursor
│   │   │   │   └── tmp
│   │   │   └── VERSION
│   │   └── in_use.lock
│   └── name
│    ├── current
│    │   ├── edits_0000000000000000001-0000000000000000020
│    │   ├── edits_0000000000000000021-0000000000000000021
│    │   ├── edits_0000000000000000022-0000000000000000032
│    │   ├── edits_0000000000000000033-0000000000000000069
│    │   ├── edits_0000000000000000070-0000000000000000070
│    │   ├── edits_0000000000000000071-0000000000000000086
│    │   ├── edits_0000000000000000087-0000000000000000092
│    │   ├── edits_0000000000000000093-0000000000000000095
│    │   ├── edits_0000000000000000096-0000000000000000097
│    │   ├── edits_inprogress_0000000000000000098
│    │   ├── fsimage_0000000000000000095
│    │   ├── fsimage_0000000000000000095.md5
│    │   ├── fsimage_0000000000000000097
│    │   ├── fsimage_0000000000000000097.md5
│    │   ├── seen_txid
│    │   └── VERSION
│    └── in_use.lock
└── nm-local-dir
├── filecache
├── nmPrivate
└── usercache

17 directories, 38 files
[atguigu@hadoop102 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current

//SecondaryNameNode 没有 edits_inprogress_0000000000000000098

----------------------------------------------------------------------------------------------------------------------------

SecondaryNameNode:

[atguigu@hadoop104 current]$ jps
2324 DataNode
3229 Jps
2415 SecondaryNameNode
2559 NodeManager

[atguigu@hadoop104 hadoop-2.7.2]$ cd data/tmp/dfs/namesecondary/current/
[atguigu@hadoop104 current]$ ll
总用量 44
-rw-rw-r-- 1 atguigu atguigu 1647 1月 31 23:22 edits_0000000000000000001-0000000000000000020
-rw-rw-r-- 1 atguigu atguigu 705 2月 1 06:33 edits_0000000000000000022-0000000000000000032
-rw-rw-r-- 1 atguigu atguigu 2862 2月 1 07:33 edits_0000000000000000033-0000000000000000069
-rw-rw-r-- 1 atguigu atguigu 1051 2月 2 06:26 edits_0000000000000000071-0000000000000000086
-rw-rw-r-- 1 atguigu atguigu 110 2月 3 00:35 edits_0000000000000000093-0000000000000000095
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 01:36 edits_0000000000000000096-0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 00:35 fsimage_0000000000000000095
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 00:35 fsimage_0000000000000000095.md5
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 01:36 fsimage_0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 01:36 fsimage_0000000000000000097.md5
-rw-rw-r-- 1 atguigu atguigu 206 2月 3 01:36 VERSION

NameNode:

[atguigu@hadoop102 current]$ ll
总用量 4148
-rw-rw-r-- 1 atguigu atguigu 1647 1月 31 23:21 edits_0000000000000000001-0000000000000000020
-rw-rw-r-- 1 atguigu atguigu 1048576 1月 31 23:21 edits_0000000000000000021-0000000000000000021
-rw-rw-r-- 1 atguigu atguigu 705 2月 1 06:33 edits_0000000000000000022-0000000000000000032
-rw-rw-r-- 1 atguigu atguigu 2862 2月 1 07:33 edits_0000000000000000033-0000000000000000069
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 1 07:33 edits_0000000000000000070-0000000000000000070
-rw-rw-r-- 1 atguigu atguigu 1051 2月 2 06:26 edits_0000000000000000071-0000000000000000086
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 2 06:37 edits_0000000000000000087-0000000000000000092
-rw-rw-r-- 1 atguigu atguigu 110 2月 3 00:35 edits_0000000000000000093-0000000000000000095
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 01:36 edits_0000000000000000096-0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 02:36 edits_0000000000000000098-0000000000000000099
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 3 02:36 edits_inprogress_0000000000000000100
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 01:36 fsimage_0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 01:36 fsimage_0000000000000000097.md5
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 02:36 fsimage_0000000000000000099
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 02:36 fsimage_0000000000000000099.md5
-rw-rw-r-- 1 atguigu atguigu 4 2月 3 02:36 seen_txid
-rw-rw-r-- 1 atguigu atguigu 206 2月 2 23:41 VERSION
[atguigu@hadoop102 current]$ cat seen_txid
100

-----------------------------------------------------------------------------------------------------------------------------------------

SecondaryNameNode:

[atguigu@hadoop104 current]$ ll
总用量 44
-rw-rw-r-- 1 atguigu atguigu 1647 1月 31 23:22 edits_0000000000000000001-0000000000000000020
-rw-rw-r-- 1 atguigu atguigu 705 2月 1 06:33 edits_0000000000000000022-0000000000000000032
-rw-rw-r-- 1 atguigu atguigu 2862 2月 1 07:33 edits_0000000000000000033-0000000000000000069
-rw-rw-r-- 1 atguigu atguigu 1051 2月 2 06:26 edits_0000000000000000071-0000000000000000086
-rw-rw-r-- 1 atguigu atguigu 110 2月 3 00:35 edits_0000000000000000093-0000000000000000095
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 01:36 edits_0000000000000000096-0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 00:35 fsimage_0000000000000000095
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 00:35 fsimage_0000000000000000095.md5
-rw-rw-r-- 1 atguigu atguigu 1151 2月 3 01:36 fsimage_0000000000000000097
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 01:36 fsimage_0000000000000000097.md5
-rw-rw-r-- 1 atguigu atguigu 206 2月 3 01:36 VERSION
[atguigu@hadoop104 current]$

[atguigu@hadoop102 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/name/current

[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000025 -o /opt/module/hadoop-2.7.2/fsimage.xml

[atguigu@hadoop102 current]$ cat /opt/module/hadoop-2.7.2/fsimage.xml

将显示的xml文件内容拷贝到Eclipse中创建的xml文件中，并格式化。部分显示结果如下。

   16386
   DIRECTORY
   user
   1512722284477
   atguigu:supergroup:rwxr-xr-x
   -1
   -1

   16387
   DIRECTORY
   atguigu
   1512790549080
   atguigu:supergroup:rwxr-xr-x
   -1
   -1

   16389
   FILE
   wc.input
   3
   1512722322219
   1512722321610
   134217728
   atguigu:supergroup:rw-r--r--


           1073741825
           1001
           59

思考：可以看出，Fsimage中没有记录块所对应DataNode，为什么？

在集群启动后，要求DataNode上报数据块信息，并间隔一段时间后再次上报。

3. oev查看Edits文件

（1）基本语法

hdfs oev -p 文件类型 -i编辑日志 -o 转换后文件输出路径

（2）案例实操

[atguigu@hadoop102 current]$ hdfs oev -p XML -i edits_0000000000000000012-0000000000000000013 -o /opt/module/hadoop-2.7.2/edits.xml
[atguigu@hadoop102 current]$ cat /opt/module/hadoop-2.7.2/edits.xml

将显示的xml文件内容拷贝到Eclipse中创建的xml文件中，并格式化。显示结果如下。

   -63

       OP_START_LOG_SEGMENT

           129



       OP_ADD

           130
           0
           16407
           /hello7.txt
           2
           1512943607866
           1512943607866
           134217728
           DFSClient_NONMAPREDUCE_-1544295051_1
           192.168.1.5
           true

               atguigu
               supergroup
               420

           908eafd4-9aec-4288-96f1-e8011d181561
           0



       OP_ALLOCATE_BLOCK_ID

           131
           1073741839



       OP_SET_GENSTAMP_V2

           132
           1016



       OP_ADD_BLOCK

           133
           /hello7.txt

               1073741839
               0
               1016


           -2



       OP_CLOSE

           134
           0
           0
           /hello7.txt
           2
           1512943608761
           1512943607866
           134217728


           false

               1073741839
               25
               1016


               atguigu
               supergroup
               420

思考：NameNode如何确定下次开机启动的时候合并哪些Edits？

seen_txid 记录了最新的 Edits，当前哪一个是最新的。以前合并的它就不在加载了。

演示：格式化集群后，创建一个目录 /user/atguigu/input 上传文件 xiaopan.txt 到 /user/atguigu/input 目录中

1、停止 NameNode 和 DataNode，使用 jps 确认成功

2、删除 data 和 logs rm -rf data/ logs/ ，注意每个节点都要去执行

3、格式化 NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -format

4、格式化之后这个时候还没有 DataNode，还没有编辑日志，只有一个空的 fs 镜像文件

[atguigu@hadoop102 hadoop-2.7.2]$ tree data/
data/
└── tmp
└── dfs
└── name
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION

4 directories, 4 files

5、启动 hdfs

[atguigu@hadoop102 ~]$ cd /opt/module/hadoop-2.7.2/
[atguigu@hadoop102 hadoop-2.7.2]$ jps
5758 Jps
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

................省略................

[atguigu@hadoop102 hadoop-2.7.2]$ jps
6032 DataNode
5894 NameNode
6271 Jps

6、查看

[atguigu@hadoop102 hadoop-2.7.2]$ tree data/
data/
└── tmp
└── dfs
├── data
│   ├── current
│   │   ├── BP-265186074-192.168.19.102-1549143814464
│   │   │   ├── current
│   │   │   │   ├── finalized
│   │   │   │   ├── rbw
│   │   │   │   └── VERSION
│   │   │   ├── scanner.cursor
│   │   │   └── tmp
│   │   └── VERSION
│   └── in_use.lock
└── name
├── current
│   ├── edits_0000000000000000001-0000000000000000002
│   ├── edits_inprogress_0000000000000000003
│   ├── fsimage_0000000000000000000
│   ├── fsimage_0000000000000000000.md5
│   ├── fsimage_0000000000000000002
│   ├── fsimage_0000000000000000002.md5
│   ├── seen_txid
│   └── VERSION
└── in_use.lock

11 directories, 13 files

7、创建目录 /user/atguigu/input 上传文件 xiaopan.txt 到 /user/atguigu/input 目录中

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir -p /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$ ls
bin etc input libexec logs output sbin wcinput
data include lib LICENSE.txt NOTICE.txt README.txt share xiaopan.txt
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put xiaopan.txt /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$

8、网页查看 http://hadoop102:50070/explorer.html#/user/atguigu/input

9、查看镜像文件中的内容

[atguigu@hadoop102 hadoop-2.7.2]$ cd data/tmp/dfs/name/current/
[atguigu@hadoop102 current]$ ll
总用量 1052
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 05:52 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 3 05:58 edits_inprogress_0000000000000000003
-rw-rw-r-- 1 atguigu atguigu 354 2月 3 05:43 fsimage_0000000000000000000
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 05:43 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 atguigu atguigu 354 2月 3 05:52 fsimage_0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 05:52 fsimage_0000000000000000002.md5
-rw-rw-r-- 1 atguigu atguigu 2 2月 3 05:52 seen_txid
-rw-rw-r-- 1 atguigu atguigu 205 2月 3 05:43 VERSION

~~hdfs oiv -p XML fsimage_0000000000000000002 -i fsimage_0000000000000000002 -o fs.xml~~

[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000000 -o fs.xml
[atguigu@hadoop102 current]$ cat fs.xml

10001000010737418240
1638516385DIRECTORY0atguigu:supergroup:rwxr-xr-x9223372036854775807-1

0

16385
001

[atguigu@hadoop102 current]$ rm -rf fs.xml
[atguigu@hadoop102 current]$
[atguigu@hadoop102 current]$ hdfs oiv -p XML -i fsimage_0000000000000000002 -o fs.xml
[atguigu@hadoop102 current]$ ll
总用量 1056
-rw-rw-r-- 1 atguigu atguigu      42 2月   3 05:52 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 1048576 2月   3 05:58 edits_inprogress_0000000000000000003
-rw-rw-r-- 1 atguigu atguigu     354 2月   3 05:43 fsimage_0000000000000000000
-rw-rw-r-- 1 atguigu atguigu      62 2月   3 05:43 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 atguigu atguigu     354 2月   3 05:52 fsimage_0000000000000000002
-rw-rw-r-- 1 atguigu atguigu      62 2月   3 05:52 fsimage_0000000000000000002.md5
-rw-rw-r-- 1 atguigu atguigu     992 2月   3 06:06 fs.xml
-rw-rw-r-- 1 atguigu atguigu       2 2月   3 05:52 seen_txid
-rw-rw-r-- 1 atguigu atguigu     205 2月   3 05:43 VERSION
[atguigu@hadoop102 current]$ cat fs.xml

10001000010737418242
1638516385DIRECTORY0atguigu:supergroup:rwxr-xr-x9223372036854775807-1

0

16385
001

10、复制上面的 xml 文件，用 eclipse 打开后格式化即可



	
		1000
		1000
		0
		1073741824
		0
	
	
		16385
		
			16385
			DIRECTORY
			
			
			0
			atguigu:supergroup:rwxr-xr-x
			9223372036854775807
			-1
		
	
	
	
		0
	
	
	
	
		
			16385
		
	
	
		0
		0
	
	
		1



	
		1000
		1000
		0
		1073741824
		2
	
	
		16385
		
			16385
			DIRECTORY
			
			0
			atguigu:supergroup:rwxr-xr-x
			9223372036854775807
			-1
		
	
	
	
		0
	
	
	
	
		
			16385
		
	
	
		0
		0
	
	
		1

我创建了目录，上传了文件，查看文件我们发现里面什么都没有，说明只在内存中修改了还没有合并Fsimage。

11、查看编辑日志

[atguigu@hadoop102 current]$ ll
总用量 1052
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 05:52 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 3 05:58 edits_inprogress_0000000000000000003
-rw-rw-r-- 1 atguigu atguigu 354 2月 3 05:43 fsimage_0000000000000000000
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 05:43 fsimage_0000000000000000000.md5
-rw-rw-r-- 1 atguigu atguigu 354 2月 3 05:52 fsimage_0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 05:52 fsimage_0000000000000000002.md5
-rw-rw-r-- 1 atguigu atguigu 2 2月 3 05:52 seen_txid
-rw-rw-r-- 1 atguigu atguigu 205 2月 3 05:43 VERSION
[atguigu@hadoop102 current]$ hdfs oev -p XML -i edits_0000000000000000001-0000000000000000002 -o ed.xml
[atguigu@hadoop102 current]$ cat ed.xml

-63

OP_START_LOG_SEGMENT

1

OP_END_LOG_SEGMENT

2

[atguigu@hadoop102 current]$ hdfs oev -p XML -i edits_inprogress_0000000000000000003 -o ed2.xml
[atguigu@hadoop102 current]$ cat ed2.xml

-63

OP_START_LOG_SEGMENT

3

OP_MKDIR

4
0
16386
/user
1549144684817

atguigu
supergroup
493

OP_MKDIR

5
0
16387
/user/atguigu
1549144684827

atguigu
supergroup
493

OP_MKDIR

6
0
16388
/user/atguigu/input
1549144684827

atguigu
supergroup
493

OP_ADD

7
0
16389
/user/atguigu/input/xiaopan.txt._COPYING_
3
1549144736207
1549144736207
134217728
DFSClient_NONMAPREDUCE_872970592_1
192.168.19.102
true

atguigu
supergroup
420

ce5a449a-ae85-43c4-aeaf-980dbec2d6a1
3

OP_ALLOCATE_BLOCK_ID

8
1073741825

OP_SET_GENSTAMP_V2

9
1001

OP_ADD_BLOCK

10
/user/atguigu/input/xiaopan.txt._COPYING_

1073741825
0
1001

-2

OP_CLOSE

11
0
0
/user/atguigu/input/xiaopan.txt._COPYING_
3
1549144738833
1549144736207
134217728

false

1073741825
15
1001

atguigu
supergroup
420

OP_RENAME_OLD

12
0
/user/atguigu/input/xiaopan.txt._COPYING_
/user/atguigu/input/xiaopan.txt
1549144738842
ce5a449a-ae85-43c4-aeaf-980dbec2d6a1
9

[atguigu@hadoop102 current]$



	-63
	
		OP_START_LOG_SEGMENT
		
		
			3
		
	
	
		OP_MKDIR
		
		
			4
			0
			16386
			/user
			
			1549144684817
			
				atguigu
				supergroup
				493
			
		
	
	
		OP_MKDIR
		
			5
			0
			16387
			/user/atguigu
			
			1549144684827
			
				atguigu
				supergroup
				493
			
		
	
	
		OP_MKDIR
		
			6
			0
			16388
			/user/atguigu/input
			
			1549144684827
			
				atguigu
				supergroup
				493
			
		
	
	
		OP_ADD
		
		
		
		
			7
			0
			16389
			/user/atguigu/input/xiaopan.txt._COPYING_
			
			3
			1549144736207
			1549144736207
			134217728
			DFSClient_NONMAPREDUCE_872970592_1
			192.168.19.102
			true
			
				atguigu
				supergroup
				420
			
			ce5a449a-ae85-43c4-aeaf-980dbec2d6a1
			3
		
	
	
		OP_ALLOCATE_BLOCK_ID
		
		
			8
			1073741825
		
	
	
		OP_SET_GENSTAMP_V2
		
		
			9
			1001
		
	
	
		OP_ADD_BLOCK
		
		
			10
			/user/atguigu/input/xiaopan.txt._COPYING_
			
				1073741825
				0
				1001
			
			
			-2
		
	
	
		OP_CLOSE
		
		
			11
			0
			0
			/user/atguigu/input/xiaopan.txt._COPYING_
			3
			1549144738833
			1549144736207
			134217728
			
			
			false
			
				1073741825
				15
				1001
			
			
				atguigu
				supergroup
				420
			
		
	
	
		OP_RENAME_OLD
		
		
			12
			0
			/user/atguigu/input/xiaopan.txt._COPYING_
			/user/atguigu/input/xiaopan.txt
			1549144738842
			ce5a449a-ae85-43c4-aeaf-980dbec2d6a1
			9

5.3 CheckPoint时间设置

（1）通常情况下，SecondaryNameNode每隔一小时执行一次。

[hdfs-default.xml]

dfs.namenode.checkpoint.period
3600

（2）一分钟检（默认）查一次操作次数，3当操作次数达到1百万次（默认），SecondaryNameNode 执行一次。

dfs.namenode.checkpoint.txns
1000000
操作动作次数

dfs.namenode.checkpoint.check.period
60
1分钟检查一次操作次数

注释：如何检查达到1百万次，通过时刻的去查看，通过设置 dfs.namenode.checkpoint.check.period 设置多长时间去检查一次。

需要自定义在 hdfs-site.xml 配置即可

5.4 NameNode故障处理

NameNode故障后，可以采用如下两种方法恢复数据。

方法一：将SecondaryNameNode中数据拷贝到 NameNode 存储数据的目录；

NameNode 和 SecondaryNameNode 几乎一致，只是 NameNode 多了一个edits_inprogress_*

[atguigu@hadoop102 hadoop-2.7.2]$ cd data/tmp/dfs/name/current/
[atguigu@hadoop102 current]$ ll
总用量 1060
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 07:40 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 881 2月 3 07:40 edits_0000000000000000003-0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 07:53 edits_0000000000000000014-0000000000000000015
-rw-rw-r-- 1 atguigu atguigu 1048576 2月 3 07:53 edits_inprogress_0000000000000000016
-rw-rw-r-- 1 atguigu atguigu 638 2月 3 07:40 fsimage_0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 07:40 fsimage_0000000000000000013.md5
-rw-rw-r-- 1 atguigu atguigu 638 2月 3 07:53 fsimage_0000000000000000015
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 07:53 fsimage_0000000000000000015.md5
-rw-rw-r-- 1 atguigu atguigu 3 2月 3 07:53 seen_txid
-rw-rw-r-- 1 atguigu atguigu 205 2月 3 07:40 VERSION
-------------------------------------分割线--------------------------------------
[atguigu@hadoop104 current]$ cd /opt/module/hadoop-2.7.2/
[atguigu@hadoop104 hadoop-2.7.2]$ cd data/tmp/dfs/namesecondary/current/
[atguigu@hadoop104 current]$ ll
总用量 32
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 05:52 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 881 2月 3 06:52 edits_0000000000000000003-0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 07:53 edits_0000000000000000014-0000000000000000015
-rw-rw-r-- 1 atguigu atguigu 638 2月 3 06:52 fsimage_0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 06:52 fsimage_0000000000000000013.md5
-rw-rw-r-- 1 atguigu atguigu 638 2月 3 07:53 fsimage_0000000000000000015
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 07:53 fsimage_0000000000000000015.md5
-rw-rw-r-- 1 atguigu atguigu 205 2月 3 07:53 VERSION

1. kill -9 NameNode进程号

[atguigu@hadoop102 current]$ jps
6032 DataNode
7106 Jps
5894 NameNode
6958 NodeManager
[atguigu@hadoop102 current]$ kill -9 5894
[atguigu@hadoop102 current]$ jps
6032 DataNode
5894 -- process information unavailable
7127 Jps
6958 NodeManager

2. 删除 NameNode 存储的数据（/opt/module/hadoop-2.7.2/data/tmp/dfs/name）

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*

3. 拷贝 SecondaryNameNode 中数据到原 NameNode 存储数据目录

[atguigu@hadoop104 hadoop-2.7.2]$ cd data/tmp/dfs/namesecondary/current/
[atguigu@hadoop104 current]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/current
[atguigu@hadoop104 current]$ ll
总用量 28
-rw-rw-r-- 1 atguigu atguigu 42 2月 3 05:52 edits_0000000000000000001-0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 881 2月 3 06:52 edits_0000000000000000003-0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 354 2月 3 05:52 fsimage_0000000000000000002
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 05:52 fsimage_0000000000000000002.md5
-rw-rw-r-- 1 atguigu atguigu 638 2月 3 06:52 fsimage_0000000000000000013
-rw-rw-r-- 1 atguigu atguigu 62 2月 3 06:52 fsimage_0000000000000000013.md5
-rw-rw-r-- 1 atguigu atguigu 205 2月 3 06:52 VERSION

[atguigu@hadoop102 dfs]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs

[atguigu@hadoop102 dfs]$ ll
总用量 8
drwx------ 3 atguigu atguigu 4096 2月 3 05:51 data
drwxrwxr-x 2 atguigu atguigu 4096 2月 3 07:37 name
[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/*  ./name/
scp: /opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/* : No such file or directory
[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary/* ./name/
edits_0000000000000000001-0000000000000000002 100% 42 0.0KB/s 00:00
fsimage_0000000000000000002.md5 100% 62 0.1KB/s 00:00
fsimage_0000000000000000013 100% 638 0.6KB/s 00:00
fsimage_0000000000000000002 100% 354 0.4KB/s 00:00
fsimage_0000000000000000013.md5 100% 62 0.1KB/s 00:00
edits_0000000000000000003-0000000000000000013 100% 881 0.9KB/s 00:00
VERSION 100% 205 0.2KB/s 00:00
in_use.lock 100% 14 0.0KB/s 00:00
[atguigu@hadoop102 dfs]$ ll
总用量 8
drwx------ 3 atguigu atguigu 4096 2月 3 05:51 data
drwxrwxr-x 3 atguigu atguigu 4096 2月 3 07:40 name
[atguigu@hadoop102 dfs]$ tree name/
name/
├── current
│   ├── edits_0000000000000000001-0000000000000000002
│   ├── edits_0000000000000000003-0000000000000000013
│   ├── fsimage_0000000000000000002
│   ├── fsimage_0000000000000000002.md5
│   ├── fsimage_0000000000000000013
│   ├── fsimage_0000000000000000013.md5
│   └── VERSION
└── in_use.lock

1 directory, 8 files

4. 重新启动 NameNode

[atguigu@hadoop102 dfs]$ cd /opt/module/hadoop-2.7.2/
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.7.2/logs/hadoop-atguigu-namenode-hadoop102.out
[atguigu@hadoop102 hadoop-2.7.2]$ jps
6032 DataNode
7299 NameNode
7349 Jps
6958 NodeManager

前面无法访问的网站又可以访问了

方法二：使用-importCheckpoint选项启动NameNode守护进程，从而将SecondaryNameNode中数据拷贝到NameNode目录中。

插入(修改)hdfs-site.xml中的(缩减检查时间)

[atguigu@hadoop102 hadoop]$ pwd
/opt/module/hadoop-2.7.2/etc/hadoop
[atguigu@hadoop102 hadoop]$ vi hdfs-site.xml


  dfs.namenode.checkpoint.period
  120



  dfs.namenode.name.dir
  /opt/module/hadoop-2.7.2/data/tmp/dfs/name

分发：

[atguigu@hadoop102 hadoop-2.7.2]$ pwd
/opt/module/hadoop-2.7.2
[atguigu@hadoop102 hadoop-2.7.2]$ xsync etc/hadoop/

2. kill -9 NameNode进程

[atguigu@hadoop102 hadoop-2.7.2]$ jps
6032 DataNode
7299 NameNode
7445 Jps
6958 NodeManager
[atguigu@hadoop102 hadoop-2.7.2]$ kill -9 7299
[atguigu@hadoop102 hadoop-2.7.2]$ jps
6032 DataNode
7465 Jps
6958 NodeManager

3. 删除NameNode存储的数据（/opt/module/hadoop-2.7.2/data/tmp/dfs/name）

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf /opt/module/hadoop-2.7.2/data/tmp/dfs/name/*
[atguigu@hadoop102 hadoop-2.7.2]$ ls data/tmp/dfs/name/
[atguigu@hadoop102 hadoop-2.7.2]$

4. 如果SecondaryNameNode不和NameNode在一个主机节点上，需要将SecondaryNameNode存储数据的目录拷贝到NameNode存储数据的平级目录，并删除in_use.lock文件

[atguigu@hadoop102 dfs]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs
[atguigu@hadoop102 dfs]$ ll
总用量 8
drwx------ 3 atguigu atguigu 4096 2月 3 05:51 data
drwxrwxr-x 2 atguigu atguigu 4096 2月 3 08:09 name

[atguigu@hadoop102 dfs]$ scp -r atguigu@hadoop104:/opt/module/hadoop-2.7.2/data/tmp/dfs/namesecondary ./

....................................省略....................................

[atguigu@hadoop102 dfs]$ ll
总用量 12
drwx------ 3 atguigu atguigu 4096 2月 3 05:51 data
drwxrwxr-x 2 atguigu atguigu 4096 2月 3 08:09 name
drwxrwxr-x 3 atguigu atguigu 4096 2月 3 08:18 namesecondary

[atguigu@hadoop102 dfs]$ cd namesecondary/
[atguigu@hadoop102 namesecondary]$ ll
总用量 8
drwxrwxr-x 2 atguigu atguigu 4096 2月 3 08:18 current
-rw-rw-r-- 1 atguigu atguigu 14 2月 3 08:18 in_use.lock
[atguigu@hadoop102 namesecondary]$ rm -rf in_use.lock
[atguigu@hadoop102 namesecondary]$ cd ..
[atguigu@hadoop102 dfs]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs
[atguigu@hadoop102 dfs]$ ls
data name namesecondary

5. 导入检查点数据（等待一会ctrl+c结束掉）

[atguigu@hadoop102 namesecondary]$ jps
6032 DataNode
7546 Jps
6958 NodeManager

[atguigu@hadoop102 namesecondary]$ cd /opt/module/hadoop-2.7.2/
[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -importCheckpoint

6. 启动NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start namenode

[atguigu@hadoop102 hadoop-2.7.2]$ jps
6032 DataNode
7681 Jps
6958 NodeManager

5.5 集群安全模式

1. 概述

2. 基本语法

集群处于安全模式，不能执行重要操作（写操作）。集群启动完成后，自动退出安全模式。

（1）bin/hdfs dfsadmin -safemode get        （功能描述：查看安全模式状态）

（2）bin/hdfs dfsadmin -safemode enter    （功能描述：进入安全模式状态）

（3）bin/hdfs dfsadmin -safemode leave     （功能描述：离开安全模式状态）

（4）bin/hdfs dfsadmin -safemode wait      （功能描述：等待安全模式状态）

3. 案例

模拟等待安全模式

（1）查看当前模式

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -safemode get
Safe mode is OFF

（2）先进入安全模式

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs dfsadmin -safemode enter

（3）创建并执行下面的脚本

在/opt/module/hadoop-2.7.2路径上，编辑一个脚本safemode.sh
[atguigu@hadoop102 hadoop-2.7.2]$ touch safemode.sh
[atguigu@hadoop102 hadoop-2.7.2]$ vim safemode.sh

#!/bin/bash
hdfs dfsadmin -safemode wait
#安全模式一离开，就会执行下面的语句所有只要安全模式打开的话，就会一直阻塞
hdfs dfs -put /opt/module/hadoop-2.7.2/README.txt /

[atguigu@hadoop102 hadoop-2.7.2]$ chmod 777 safemode.sh
[atguigu@hadoop102 hadoop-2.7.2]$ ./safemode.sh

（4）再打开一个窗口，执行

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs dfsadmin -safemode leave

（5）观察

（a）再观察上一个窗口

Safe mode is OFF

（b）HDFS集群上已经有上传的数据了。

5.6 NameNode多目录配置

1. NameNode的本地目录可以配置成多个，且每个目录存放内容相同，增加了可靠性

2. 具体配置如下

（1）在hdfs-site.xml文件中增加如下内容


    dfs.namenode.name.dir
file:///${hadoop.tmp.dir}/dfs/name1,file:///${hadoop.tmp.dir}/dfs/name2

分发脚本：

[atguigu@hadoop102 hadoop-2.7.2]$ xsync etc/hadoop

（2）停止集群，删除data和logs中所有数据。

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/stop-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/stop-yarn.sh

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop103 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop104 hadoop-2.7.2]$ rm -rf data/ logs/

（3）格式化集群并启动。

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode –format
[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

（4）查看结果

[atguigu@hadoop102 dfs]$ ll
总用量 12
drwx------. 3 atguigu atguigu 4096 12月 11 08:03 data
drwxrwxr-x. 3 atguigu atguigu 4096 12月 11 08:03 name1
drwxrwxr-x. 3 atguigu atguigu 4096 12月 11 08:03 name2

注：这里只是保证了数据有两份，并不能一个 NameNode 挂掉，马上启动另一个。

第6章 DataNode（面试开发重点）

6.1 DataNode工作机制

DataNode工作机制，如图3-15所示。

1）一个数据块在DataNode上以文件形式存储在磁盘上，包括两个文件，一个是数据本身，一个是元数据包括数据块的长度，块数据的校验和，以及时间戳。

2）DataNode启动后向NameNode注册，通过后，周期性（1小时）的向NameNode上报所有的块信息。

3）心跳是每3秒一次，心跳返回结果带有NameNode给该DataNode的命令，如复制块数据到另一台机器，或删除某个数据块。如果超过10分钟没有收到某个DataNode的心跳，则认为该节点不可用。

4）集群运行中可以安全加入和退出一些机器。

6.2 数据完整性

思考：如果电脑磁盘里面存储的数据是控制高铁信号灯的红灯信号（1）和绿灯信号（0），但是存储该数据的磁盘坏了，一直显示是绿灯，是否很危险？同理DataNode节点上的数据损坏了，却没有发现，是否也很危险，那么如何解决呢？

如下是DataNode节点保证数据完整性的方法。

1）当DataNode读取Block的时候，它会计算CheckSum。

2）如果计算后的CheckSum，与Block创建时值不一样，说明Block已经损坏。

3）Client读取其他DataNode上的Block。

4）DataNode在其文件创建后周期验证CheckSum，如图3-16所示。

6.3 掉线时限参数设置

需要注意的是hdfs-site.xml 配置文件中的heartbeat.recheck.interval的单位为毫秒，dfs.heartbeat.interval的单位为秒。


  dfs.namenode.heartbeat.recheck-interval
  300000
  
    This time decides the interval to check for expired datanodes.
    With this value and dfs.heartbeat.interval, the interval of
    deciding the datanode is stale or not is also calculated.
    The unit of this configuration is millisecond.
  




  dfs.heartbeat.interval
  3
  Determines datanode heartbeat interval in seconds.


    dfs.namenode.heartbeat.recheck-interval
    300000


    dfs.heartbeat.interval
    3

6.4 服役新数据节点

0. 需求

随着公司业务的增长，数据量越来越大，原有的数据节点的容量已经不能满足存储数据的需求，需要在原有集群基础上动态添加新的数据节点。

1. 环境准备

（1）在hadoop104主机上再克隆一台hadoop105主机

（2）修改IP地址和主机名称

（3）删除原来HDFS文件系统留存的文件（/opt/module/hadoop-2.7.2/data和log）

（4）source一下配置文件

[atguigu@hadoop105 hadoop-2.7.2]$ source /etc/profile

2. 服役新节点具体步骤

（1）直接启动DataNode，即可关联到集群

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

http://hadoop102:50070/dfshealth.html#tab-overview

（2）在hadoop105上上传文件

[atguigu@hadoop105 hadoop-2.7.2]$ hadoop fs -put /opt/module/hadoop-2.7.2/LICENSE.txt /

（3）如果数据不均衡，可以用命令实现集群的再平衡

[atguigu@hadoop102 sbin]$ pwd
/opt/module/hadoop-2.7.2/sbin

[atguigu@hadoop102 sbin]$ ./start-balancer.sh
starting balancer, logging to /opt/module/hadoop-2.7.2/logs/hadoop-atguigu-balancer-hadoop102.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved

6.5 退役旧数据节点

6.5.1 添加白名单

添加到白名单的主机节点，都允许访问NameNode，不在白名单的主机节点，都会被退出。

配置白名单的具体步骤如下：

（1）在NameNode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建dfs.hosts文件

[atguigu@hadoop102 hadoop]$ pwd
/opt/module/hadoop-2.7.2/etc/hadoop
[atguigu@hadoop102 hadoop]$ touch dfs.hosts
[atguigu@hadoop102 hadoop]$ vi dfs.hosts

添加如下主机名称（不添加hadoop105）！不允许空行空格

hadoop102
hadoop103
hadoop104

（2）在NameNode的hdfs-site.xml配置文件中增加dfs.hosts属性


dfs.hosts
/opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts

（3）配置文件分发

[atguigu@hadoop102 hadoop]$ xsync hdfs-site.xml

（4）刷新NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

（5）更新ResourceManager节点

[atguigu@hadoop102 hadoop-2.7.2]$ yarn rmadmin -refreshNodes
17/06/24 14:17:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.1.103:8033

（6）在web浏览器上查看

4. 如果数据不均衡，可以用命令实现集群的再平衡

[atguigu@hadoop102 sbin]$ ./start-balancer.sh
starting balancer, logging to /opt/module/hadoop-2.7.2/logs/hadoop-atguigu-balancer-hadoop102.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved

退出了105后，上面的文件没有让103马上了替代105，命令实现集群的再平衡在会让103存储这个文件。

注释：白名单适用于刚刚搭建主机的时候，允许哪些主机访问。

6.5.2 黑名单退役

在黑名单上面的主机都会被强制退出。

删除上面为白名单配置的信息，让105从新回到集群

刷新NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

启动105

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/hadoop-daemon.sh start datanode

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/yarn-daemon.sh start nodemanager

1.在NameNode的/opt/module/hadoop-2.7.2/etc/hadoop目录下创建 dfs.hosts.exclude 文件

[atguigu@hadoop102 hadoop]$ pwd
/opt/module/hadoop-2.7.2/etc/hadoop
[atguigu@hadoop102 hadoop]$ touch dfs.hosts.exclude
[atguigu@hadoop102 hadoop]$ vi dfs.hosts.exclude

添加如下主机名称（要退役的节点）

hadoop105

2．在NameNode的hdfs-site.xml配置文件中增加dfs.hosts.exclude属性


dfs.hosts.exclude
      /opt/module/hadoop-2.7.2/etc/hadoop/dfs.hosts.exclude

分发：

[atguigu@hadoop102 hadoop]$ xsync hdfs-site.xml

3．刷新NameNode、刷新ResourceManager

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -refreshNodes
Refresh nodes successful

[atguigu@hadoop102 hadoop-2.7.2]$ yarn rmadmin -refreshNodes
17/06/24 14:55:56 INFO client.RMProxy: Connecting to ResourceManager at hadoop103/192.168.1.103:8033

4. 检查Web浏览器，退役节点的状态为decommission in progress（退役中），说明数据节点正在复制块到其他节点，如图3-17所示

图3-17 退役中

5. 等待退役节点状态为decommissioned（所有块已经复制完成），停止该节点及节点资源管理器。注意：如果副本数是3，服役的节点小于等于3，是不能退役成功的，需要修改副本数后才能退役，如图3-18所示

图3-18 已退役

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/hadoop-daemon.sh stop datanode
stopping datanode

[atguigu@hadoop105 hadoop-2.7.2]$ sbin/yarn-daemon.sh stop nodemanager
stopping nodemanager

6. 如果数据不均衡，可以用命令实现集群的再平衡

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-balancer.sh
starting balancer, logging to /opt/module/hadoop-2.7.2/logs/hadoop-atguigu-balancer-hadoop102.out
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved

注意：不允许白名单和黑名单中同时出现同一个主机名称。

6.6 Datanode多目录配置

1. DataNode也可以配置成多个目录，每个目录存储的数据不一样。即：数据不是副本

2．具体配置如下

hdfs-site.xml


        dfs.datanode.data.dir
file:///${hadoop.tmp.dir}/dfs/data1,file:///${hadoop.tmp.dir}/dfs/data2

分发：

[atguigu@hadoop102 hadoop]$ xsync hdfs-site.xml

关闭集群：

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/stop-dfs.sh
[atguigu@hadoop103 hadoop-2.7.2]$ sbin/stop-dfs.sh

[atguigu@hadoop102 hadoop-2.7.2]$ jps
4496 Jps
[atguigu@hadoop103 hadoop-2.7.2]$ jps
6735 Jps
[atguigu@hadoop104 ~]$ jps
3389 Jps

删除 data 和 logs ：

[atguigu@hadoop102 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop103 hadoop-2.7.2]$ rm -rf data/ logs/
[atguigu@hadoop104 hadoop-2.7.2]$ rm -rf data/ logs/

格式化 NameNode

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hdfs namenode -format

启动节点：

[atguigu@hadoop102 hadoop-2.7.2]$ sbin/start-dfs.sh

[atguigu@hadoop103 hadoop-2.7.2]$ sbin/start-yarn.sh

https://blog.csdn.net/qq_40794973/article/details/86681941#t7

[atguigu@hadoop102 hadoop-2.7.2]$ cd /opt/module/hadoop-2.7.2/data/tmp/dfs/
[atguigu@hadoop102 dfs]$ ll
总用量 12
drwx------ 3 atguigu atguigu 4096 2月 21 19:07 data1
drwx------ 3 atguigu atguigu 4096 2月 21 19:07 data2
drwxrwxr-x 3 atguigu atguigu 4096 2月 21 19:07 name

测试：上传文件

[atguigu@hadoop102 dfs]$ hadoop fs -put /opt/module/hadoop-2.7.2/xiaopan.txt /

每个目录存储的内容都不一样，只是分了一个路径而已，并没有吧内容进行完全的备份。要和 namenode 多目录区分开。

[atguigu@hadoop102 dfs]$ pwd
/opt/module/hadoop-2.7.2/data/tmp/dfs
[atguigu@hadoop102 dfs]$ tree
.
├── data1
│   ├── current
│   │   ├── BP-1528532229-192.168.19.102-1550747197324
│   │   │   ├── current
│   │   │   │   ├── finalized
│   │   │   │   │   └── subdir0
│   │   │   │   │    └── subdir0
│   │   │   │   │    ├── blk_1073741825
│   │   │   │   │    └── blk_1073741825_1001.meta
│   │   │   │   ├── rbw
│   │   │   │   └── VERSION
│   │   │   ├── scanner.cursor
│   │   │   └── tmp
│   │   └── VERSION
│   └── in_use.lock
├── data2
│   ├── current
│   │   ├── BP-1528532229-192.168.19.102-1550747197324
│   │   │   ├── current
│   │   │   │   ├── finalized
│   │   │   │   ├── rbw
│   │   │   │   └── VERSION
│   │   │   ├── scanner.cursor
│   │   │   └── tmp
│   │   └── VERSION
│   └── in_use.lock
└── name
├── current
│   ├── edits_0000000000000000001-0000000000000000002
│   ├── edits_inprogress_0000000000000000003
│   ├── fsimage_0000000000000000000
│   ├── fsimage_0000000000000000000.md5
│   ├── fsimage_0000000000000000002
│   ├── fsimage_0000000000000000002.md5
│   ├── seen_txid
│   └── VERSION
└── in_use.lock

18 directories, 19 files
//我们发现第二个目录根本没有文件的存在。

第7章 HDFS 2.X新特性

7.1 集群间数据拷贝

1．scp实现两个远程主机之间的文件复制
   scp -r hello.txt root@hadoop103:/user/atguigu/hello.txt       // 推 push
   scp -r root@hadoop103:/user/atguigu/hello.txt hello.txt       // 拉 pull
   scp -r root@hadoop103:/user/atguigu/hello.txt root@hadoop104:/user/atguigu //是通过本地主机中转实现两个远程主机的文件复制；如果在两个远程主机之间ssh没有配置的情况下可以使用该方式。

2．采用distcp命令实现两个Hadoop集群之间的递归数据复制

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop distcp

hdfs://haoop102:9000/user/atguigu/hello.txt hdfs://hadoop103:9000/user/atguigu/hello.txt

7.2 小文件存档

3．案例实操

（1）需要启动YARN进程

[atguigu@hadoop102 hadoop-2.7.2]$ start-yarn.sh

（2）归档文件

把/user/atguigu/input目录里面的所有文件归档成一个叫input.har的归档文件，并把归档后文件存储到/user/atguigu/output路径下。

[atguigu@hadoop102 hadoop-2.7.2]$ ls
bin etc input libexec logs output safemode.sh share xiaopan.txt
data include lib LICENSE.txt NOTICE.txt README.txt sbin wcinput
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mkdir -p /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put xiaopan.txt README.txt LICENSE.txt /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$ hadoop
Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]

........................省略..........................
archive -archiveName NAME -p * create a hadoop archive
........................省略..........................

[atguigu@hadoop102 hadoop-2.7.2]$ bin/hadoop archive -archiveName input.har –p /user/atguigu/input /user/atguigu/output

注释：output 文件是事先没有的。

（3）查看归档

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -lsr /user/atguigu/output/input.har

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -lsr har:///user/atguigu/output/input.har

注释：har:// 为一个协议。

注释：归档之前的那个三个小文件还在。

（4）解归档文件

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cp har:///user/atguigu/output/input.har/* /user/atguigu

7.3 回收站

开启回收站功能，可以将删除的文件在不超时的情况下，恢复原数据，起到防止误删除、备份等作用。

1．回收站参数设置及工作机制

2．启用回收站

修改core-site.xml，配置垃圾回收时间为1分钟。


   fs.trash.interval
1

群发：

[atguigu@hadoop102 hadoop]$ xsync core-site.xml

从新启动集群：

删除文件：

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -rm /user/atguigu/input/xiaopan.txt
19/02/21 23:50:16 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1 minutes, Emptier interval = 0 minutes.
Moved: 'hdfs://hadoop102:9000/user/atguigu/input/xiaopan.txt' to trash at: hdfs://hadoop102:9000/user/atguigu/.Trash/Current

3．查看回收站

回收站在集群中的路径：/user/atguigu/.Trash/….

4．修改访问垃圾回收站用户名称

进入垃圾回收站用户名称，默认是 dr.who，修改为atguigu用户

[core-site.xml]


  hadoop.http.staticuser.user
  atguigu

5. 通过程序删除的文件不会经过回收站，需要调用moveToTrash()才进入回收站

Trash trash = New Trash(conf);

trash.moveToTrash(path);

6. 恢复回收站数据

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -mv /user/atguigu/.Trash/Current/user/atguigu/input /user/atguigu/input

7. 清空回收站（跟Windows里面的不一样，它是打包，文件名为时间戳，时间到了依然删除）

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -expunge

7.4 快照管理

官方说明：https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsSnapshots.html

注释：禁用指定目录的快照功能，需要删除这个目录里面的快照。

2．案例实操

（1）开启/禁用指定目录的快照功能

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -allowSnapshot /user/atguigu/input

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfsadmin -disallowSnapshot /user/atguigu/input

（2）对目录创建快照

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -createSnapshot /user/atguigu/input
Created snapshot /user/atguigu/input/.snapshot/s20190222-002249.68

通过web访问hdfs://hadoop102:50070/user/atguigu/input/.snapshot/s…..// 快照和源文件使用相同数据

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -lsr /user/atguigu/input/.snapshot/

hdfs dfs -createSnapshot  []

path The path of the snapshottable directory.

snapshotName 快照名字。如果没有指定，会用时间戳生成一个默认的名字格式是"'s'yyyyMMdd-HHmmss.SSS", 例如 "s20130412-151029.033".

（3）指定名称创建快照

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -createSnapshot /user/atguigu/input miao170508

（4）重命名快照

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -renameSnapshot /user/atguigu/input/ miao170508 atguigu170508

（5）列出当前用户所有可快照目录

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 atguigu supergroup 0 2019-02-22 00:22 1 65536 /user/atguigu/input

（6）比较两个快照目录的不同之处

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs snapshotDiff /user/atguigu/input . .snapshot/s20190222-002249.686
Difference between current directory and snapshot s20190222-002249.686 under directory /user/atguigu/input:

[atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put /opt/module/hadoop-2.7.2/xiaopan.txt /user/atguigu/input
[atguigu@hadoop102 hadoop-2.7.2]$ hdfs snapshotDiff /user/atguigu/input . .snapshot/s20190222-002249.686
Difference between current directory and snapshot s20190222-002249.686 under directory /user/atguigu/input:
M .
- ./xiaopan.txt

注释：- ./xiaopan.txt 快照相对于文件少了一个xiaopan.txt。

（7）恢复快照

[atguigu@hadoop102 hadoop-2.7.2]$ hdfs dfs -cp /user/atguigu/input/.snapshot/s20190222-002249.686 /user

其他参考：https://blog.csdn.net/Amber_amber/article/details/47021841

你可能感兴趣的:(大数据,Big,data)

竞技体育数据可视化与可视化分析综述 *小白* 文献笔记大数据数据分析
Asurveyofcompetitivesportsdatavisualizationandvisualanalysis竞技体育数据可视化与可视化分析综述研究背景：1、竞技体育的发展导致竞技体育数据的大规模产生；2、针对竞技体育数据已有研究人员进行分析和软件开发；3、竞技体育数据的分析有助于专业分析，并可通过有效行为决策达到提高体育训练和比赛效果。研究目的：1、处理大规模竞技体育数据，认知运动员的
no persistent volumes available for this claim and no storage class is set 喝醉酒的小白 K8s 运维
问题1问题描述User:NormalFailedBinding7s(x5over52s)persistentvolume-controllernopersistentvolumesavailableforthisclaimandnostorageclassissetKimi:根据您提供的错误信息和搜索结果，PVCdata-vector-aggregator-0无法绑定的原因是“没有可用的Persi
Hibernate与Spring Data JPA：深入解读两大持久化框架的优劣与选择 devme hibernate spring java
亲爱的读者，你是否在处理Java应用程序的数据库交互时，曾对选择哪种持久化框架感到困惑？你是否曾对Hibernate和SpringDataJPA之间的关系感到好奇？今天，我将带你深入探索这两个流行的Java持久化框架，帮助你更好地理解它们的优缺点，以及在何种情况下应该选择哪一个。首先，让我们来了解一下Hibernate和SpringDataJPA的基本概念。HibernateHibernate是一
LLaMA-Factory 基于 LoRA 的 SFT 指令微调及相关功能梳理 Ambition_LAO 深度学习人工智能机器学习
1.数据准备微调数据的格式为Alpaca或ShareGPT格式，需进行以下步骤：自定义数据集转换：将原始数据集转换成指定格式（JSON格式）。示例数据：{"instruction":"写一个商品文案","input":"类型#裤*版型#宽松","output":"宽松的阔腿裤吸引了大量明星的喜爱，设计感十足。"}数据注册：修改data/dataset_info.json文件，将数据集注册到系统中。
Flink CDC MySQL同步MySQL错误记录 lingllllove flink mysql 大数据
FlinkCDC简介FlinkCDC（ChangeDataCapture）是一种高效的数据同步工具，利用Flink强大的实时流处理能力，从MySQL等数据库捕获数据变更，并将这些变更实时同步到目标数据库。本文将详细介绍FlinkCDCMySQL同步到MySQL时常见的错误记录及其解决方法。常见错误及解决方法1.连接错误错误信息：FailedtoconnecttoMySQLserver.可能原因：
function isBulkReadStatement, file SQLiteDatabaseTracking.cpp 丁乾坤的博客 iOS常见问题 Xcode16 mysql ios18闪退
一问题：Xcode16.0运行在iPhone16/ios18.0以上发生闪退，闪退在YYCache–>YYKVStorage文件内。以上删除保以下错误：functionisBulkReadStatement,fileSQLiteDatabaseTracking.cpp解决方案：找到YYKVStorage文件中_dbClose方法替换里面的一行代码：-(BOOL)_dbClose{//if(_dbS
批量作业调度、数据挖掘，这几款应该是今年 “最值得推荐” 的ETL工具了加菲盐008 taskctl Kettle kettle etl 批量作业调度数据挖掘 taskctl
工具传送门：Taskctl商业付费版(付费)TaskctlWeb商业免费版（永久免费）Kettle（开源免费）Datastage(付费)ETL是数据仓库中的非常重要的一环，是承前启后的必要的一步。ETL负责将分布的、异构数据源中的数据如关系数据、平面数据文件等抽取到临时中间层后进行清洗、转换、集成，最后加载到数据仓库或数据集市中，成为联机分析处理、数据挖掘的基础。下面给大家介绍一下什么是ETL以及
Android 导出CSV文件乱码问题处理 AD钙奶-lalala android
最近有一个需求，需要在Android端导出CSV文件，自测是用的WPS，没啥问题。可到了测试那边，用Excel直接打开就是乱码，需要在Excel数据里面用【从文件/CSV】打开。这样就显示非常的不方便。解决办法：publicstaticvoidexportToCsv(Listdata,StringfilePath)throwsIOException{FilecsvFolder=newFile(Fi
MySQL表的创建实验谁把我睡的觉偷了xhxh mysql 数据库
创建并使用数据库mydb6_product。mysql>createdatabasemydb6_product;QueryOK,1rowaffected(0.01sec)mysql>usemydb6_product;Databasechanged新建employees表。对于gender，有默认值意味着不为空，在建表时可以选择不写notnull；mysql>createtableemployees
Day_1 数据结构与算法&LeetCode入门及攻略 Finger-Von-Frings c++leetcode
数据结构与算法学习目的：我们学习算法和数据结构，是为了学会在编程中从时间复杂度、空间复杂度方面考虑解决方案，训练自己的逻辑思维，从而写出高质量的代码，以此提升自己的编程技能，获取更高的工作回报。数据结构定义：数据结构(DataStructure)指的是带有结构特性的数据元素的集合。学习的目的：为了帮助我们了解和掌握计算机中的数据是以何种方式进行组织、存储的。Q1：何为结构特性？所谓结构特性，指的是
DolphinScheduler × Jiron：打造高效智能的数据调度新生态 jiron开源平台开发 flink 大数据 hadoop hive sqoop spring cloud sentinel
JironGitHub地址https://github.com/642933588/jiron-cloudhttps://gitee.com/642933588/jiron-cloudDolphinScheduler×Jiron：打造高效智能的数据调度新生态DolphinScheduler是一个开源的分布式任务调度平台，专为大数据场景下的工作流调度和数据治理而设计。将DolphinSchedule
一种时序数据模式演化的跟踪与查询方法米朵儿技术屋智能科学与技术专栏分类学习数据挖掘
摘要在物联网与大数据应用蓬勃发展的背景下，各类感知设备产生海量的时序数据，设备管理软件版本的快速迭代导致时序数据的模式演化问题日益凸显.模式演化要求对数据模式进行版本管理，使数据进行模式变更时不产生信息损失，且支持对数据跨模式版本进行读写操作.结合流行的时序数据库管理系统，调研总结了各类数据库管理系统对模式演化的支持情况，对时序数据及其模式进行了形式化表述，对其模式演化的过程进行了分析，设计了一种
FPGA在高速数据采集系统中的应用！！！ FPGA资料库 fpga开发 fpga verilog 物联网 stm32
FPGA（现场可编程门阵列）在高速数据采集系统中的应用非常广泛，主要得益于其并行处理能力、可编程性和高速接口特性。以下是FPGA在高速数据采集系统中的详细应用，以及一些具体例子：1.应用背景高速数据采集系统通常用于需要高采样率和大数据量处理的场合，如雷达信号处理、医疗成像、高速通信等。FPGA因其独特的硬件架构，能够有效处理高速数据流，因此在这些系统中扮演着关键角色。2.应用内容2.1数据采集接口
【数据集】全球预报系统GFS概述：数据下载及处理 WW、forever 数据集 GFS
【数据集】全球预报系统GFS概述：数据下载及处理GFSweatherdata数据下载NOAANOMADSNOAA数据处理基于Python完成数据重命名参考GFSweatherdata全球预报系统GFS（GlobalForecastSystem）是美国国家海洋和大气管理局（NOAA）开发和运行的数值天气预报模型。它是一个全球性的大气模式，提供中长期天气预报。以下是一些关键点：全球覆盖：GFS提供全球
OpenDRG/DRG_Datas 项目使用教程咎宁准Karena
OpenDRG/DRG_Datas项目使用教程项目地址:https://gitcode.com/gh_mirrors/dr/DRG_Datas1.项目目录结构及介绍DRG_Datas/├──ICD/│├──ICD诊断、手术操作编码.csv│├──基础数据.csv│├──版本对照关系.csv│└──手术操作类别属性.csv├──Payment/│├──各地DRG病组清单.csv│└──医保支付标准.
AWS SAP-C02教程6--安全_aws sap c02题库(1) 2401_84252743 程序员 aws 安全区块链
有AWS管理密钥，因此安全度高AWSKMS与大多数用于加密数据的其他AWS服务集成例题：Acompanyneedstomoveitswrite-intensiveAmazonRDSforPostgreSQLdatabasefromtheeu-west-1Regiontotheeu-north-1Region.Aspartofthemigration,thecompanyneedstochangef
StarRocks Awards 2024 年度贡献人物开源
在过去一年，StarRocks在Lakehouse与AI等关键领域取得了显著进步，其卓越的产品功能极大地简化和提升了数据分析的效率，使得"OneData，AllAnalytics"的愿景变得更加触手可及。虽然实现这一目标的道路充满挑战且漫长，但我们并不孤单，因为有一群社区伙伴与我们并肩作战。每一位贡献者的代码提交和每一次的布道，都在推动着StarRocks社区向前发展。为了表达对这些贡献者的深深感
BUUCTF--October 2019 Twice SQL Injection Uzero.
根据题目可以知道这是一个二次注入题注册时把我们sql语句放到username处,登录后即可看到我们想要的信息payload为:username=1'unionselectdatabase()#username=1'unionselectgroup_concat(table_name)frominformation_schema.tableswheretable_schema='ctftrainin
04-初识Docker-Docker架构我以为心都空了微服务 docker 架构容器
04-初识Docker-Docker架构1.镜像和容器：(1)镜像(Image)：Docker将应用程序及其所需的依赖、函数库、环境、配置等文件打包在一起，称为镜像。解释：比如之前讲过的Mysql镜像，它里面肯定就会有各种各样所需要的依赖。这些东西最终落到硬盘就是一个一个的文件。比如说这里有Mysql运行时需要写数据的data目录文件，还有log日志文件，当然还有bin里面的可执行文件，这些就组成
LeetCode Top Interview 150 - Linked List everecursion leetcode 算法职场和发展开源 python 数据结构
Alinkedlistisalineardatastructureconsistingofaseriesofnodes,whereeachnodecontainsdataandapointertothenextnode(inasinglylinkedlist)orbothpointerstothenextnodeandthepreviousnode(inadoublylinkedlist).The
Mysql数据库和Sql语句 Jessica小戴数据库 mysql sql
数据库管理：sql语句：数据库用来增删改查的语句（重要）备份：数据库的数据进行备份主从复制、读写分离、高可用（重要）Mysql数据库和Sql语句一、Mysql数据库1、数据库：组织、存储、管理数据的仓库2、数据库的管理系统（DBMS）：实现对数据有效组织、管理和存取的系统软件3、数据库软件：mysql、oracle（大数据系统一般使用、大企业使用）、sql-server、MariaDB也是mysq
Vue3项目el-table表格动态合并相同数据单元格(可指定列+自定义合并) KT553 vue.js javascript 前端 elementui html 前端框架 typescript
一、先看效果：二、完整代码：import{reactive,onMounted}from'vue';//存放所有的表头一定要与tableData一致constcolFields=reactive(["city","name","life","ind","agr","eco"]);//存储合并单元格的开始位置constspanArr=reactive([]);//表格数据consttableData
gds文件导出_GaussDB 200使用GDS服务导入导出数据 weixin_39576066 gds文件导出
GaussDB200支持将存在远端服务器上的TEXT、CSV和FIXED格式的数据导入到集群中。本文介绍使用GDS(GaussDataService)工具将远端服务器上的数据导入GaussDB200。环境如下表:1、准备源数据这里从PostgreSQL数据库中，使用copy命令导出一个csv格式的文件，如下：rhnschema=>copyrhnpackagefileto'/tmp/rhnpacka
Java的DatagramPacket在C#中体现 hh_fine c#java
C#创建UDP客户端和服务端在C#中，DatagramPacket是Java中用于UDP通信的一个类，而C#并没有直接对应的DatagramPacket类。不过，C#提供了类似的机制来处理基于UDP的数据报（datagram）通信，主要通过System.Net.Sockets命名空间中的UdpClient和Socket类来实现使用UDP客户端发送UdpClient是相对于Socket更高级的类，适
C语言编程数据结构编程练习-顺序栈的操作墨楠。 #C 语言数据结构研习汇 C c语言数据结构开发语言
#define_CRT_SECURE_NO_WARNINGS#include#include#include#include#include#defineMAX_SIZE20//通过数组的方式创建顺序栈出栈，入栈等操作typedefintelementType;typedefstructstack{elementTypedata[MAX_SIZE];inttop;//栈顶intbottom;//栈
OpenBayes 一周速览｜微软 Phi-4 发布，降低更多成本实现高效推理；Terra 时空数据集上线
公共资源速递5个数据集：Terra多模态时空数据集ChineseCouplets中文对联数据集AqueousSolubility无机化合物数据集HumanLikeDPODataset大模型对话微调数据集SentimentandEmotionAnalysisDataset情感情绪分析数据集4个教程：一键部署Phi-4Docling：文档解析神器一键部署QVQ-72B-preview铅笔素描风格文生图
洞见数据未来，StarRocks Summit Asia 2024 即将启幕！人工智能data
在AI时代，我们需要怎样的数据基础软件？数据量和数据类型的需求飞速上涨，我们不仅需要将历史上各种基础设施中的数据进行分析使用，还要关注性能、灵活性、性价比，以及确保单一可信数据源。这一切构成了当前大数据领域的核心难题。今年12月，StarRocksSummitAsia重磅启动！作为年度数据盛会，我们将从用户、平台方、业务领袖和技术极客等不同视角展开交流，携手共建未来的数据解决方案。本届峰会，我们将
小明，谈谈你对Vue 虚拟dom的理解程序员
Vue.js的虚拟DOM（VirtualDOM）是为了提高前端性能和开发体验而引入的一种技术。Vue.js虚拟DOM的大致实现虚拟DOM的定义虚拟DOM是一种JavaScript对象，它用来描述用户界面（UI）的结构和内容。每个虚拟DOM节点（VNode）代表一个真实的DOM元素或组件实例。//VNode示例constvnode={tag:'div',data:{id:'app'},childre
使用scorecardpy库计算woe分箱和iv值亲持红叶机器学习风控相关算法人工智能机器学习
woe分箱_iv值计算基于scorecardpy库，乳腺癌数据集importpandasaspdimportnumpyasnpfromsklearn.datasetsimportload_breast_cancerimportscorecardpyasscfromtqdmimportnotebookcancer=load_breast_cancer()df=pd.DataFrame(cancer.
在PyTorch框架上训练ImageNet时，Dataloader加载速度慢怎么解决？ cda2024 pytorch 人工智能 python
在深度学习领域，PyTorch因其灵活性和易用性而受到广泛欢迎。然而，在实际应用中，特别是在处理大规模数据集如ImageNet时，Dataloader的加载速度往往成为瓶颈。本文将深入探讨这一问题，并提供多种解决方案，帮助你在PyTorch框架上高效地训练ImageNet。1.问题背景ImageNet是一个包含超过1400万张图像的大规模数据集，被广泛用于图像分类任务的研究。在PyTorch中，D
jsonp 常用util方法 hw1287789687 jsonp jsonp常用方法 jsonp callback
jsonp 常用java方法 (1)以jsonp的形式返回:函数名(json字符串) /*** * 用于jsonp调用 * @param map : 用于构造json数据 * @param callback : 回调的javascript方法名 * @param filters : <code>SimpleBeanPropertyFilter theFilt
多线程场景 alafqq 多线程
0 能不能简单描述一下你在java web开发中需要用到多线程编程的场景？0 对多线程有些了解，但是不太清楚具体的应用场景，能简单说一下你遇到的多线程编程的场景吗？ Java多线程 2012年11月23日 15:41 Young9007 Young9007 4 0 0 4 Comment添加评论关注(2) 3个答案按时间排序按投票排序 0 0 最典型的如： 1、
Maven学习——修改Maven的本地仓库路径 Kai_Ge maven
安装Maven后我们会在用户目录下发现.m2 文件夹。默认情况下，该文件夹下放置了Maven本地仓库.m2/repository。所有的Maven构件(artifact)都被存储到该仓库中，以方便重用。但是windows用户的操作系统都安装在C盘，把Maven仓库放到C盘是很危险的，为此我们需要修改Maven的本地仓库路径。
placeholder的浏览器兼容 120153216 placeholder
【前言】自从html5引入placeholder后，问题就来了，不支持html5的浏览器也先有这样的效果，各种兼容，之前考虑，今天测试人员逮住不放，想了个解决办法，看样子还行，记录一下。【原理】不使用placeholder，而是模拟placeholder的效果，大概就是用focus和focusout效果。【代码】 <scrip
debian_用iso文件创建本地apt源 2002wmj Debian
1.将N个debian-506-amd64-DVD-N.iso存放于本地或其他媒介内，本例是放在本机/iso/目录下 2.创建N个挂载点目录如下： debian:~#mkdir –r /media/dvd1 debian:~#mkdir –r /media/dvd2 debian:~#mkdir –r /media/dvd3 …. debian:~#mkdir –r /media
SQLSERVER耗时最长的SQL 357029540 SQL Server
对于DBA来说，经常要知道存储过程的某些信息： 1. 执行了多少次 2. 执行的执行计划如何 3. 执行的平均读写如何 4. 执行平均需要多少时间列名 &
com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 7454103 eclipse
今天eclipse突然报了com/genuitec/eclipse/j2eedt/core/J2EEProjectUtil 错误，并且工程文件打不开了，在网上找了一下资料，然后按照方法操作了一遍，好了，解决方法如下：错误提示信息： An error has occurred.See error log for more details. Reason: com/genuitec/
用正则删除文本中的html标签 adminjun java html 正则表达式去掉html标签
使用文本编辑器录入文章存入数据中的文本是HTML标签格式，由于业务需要对HTML标签进行去除只保留纯净的文本内容，于是乎Java实现自动过滤。如下： public static String Html2Text(String inputString) { String htmlStr = inputString; // 含html标签的字符串 String textSt
嵌入式系统设计中常用总线和接口 aijuans linux 基础
嵌入式系统设计中常用总线和接口任何一个微处理器都要与一定数量的部件和外围设备连接，但如果将各部件和每一种外围设备都分别用一组线路与CPU直接连接，那么连线
Java函数调用方式——按值传递 ayaoxinchao java 按值传递对象基础数据类型
Java使用按值传递的函数调用方式，这往往使我感到迷惑。因为在基础数据类型和对象的传递上，我就会纠结于到底是按值传递，还是按引用传递。其实经过学习，Java在任何地方，都一直发挥着按值传递的本色。首先，让我们看一看基础数据类型是如何按值传递的。 public static void main(String[] args) { int a = 2;
ios音量线性下降 bewithme ios音量
直接上代码吧 //second 几秒内下降为0 - (void)reduceVolume:(int)second { KGVoicePlayer *player = [KGVoicePlayer defaultPlayer]; if (!_flag) { _tempVolume = player.volume;
与其怨它不如爱它 bijian1013 选择理想职业规划
抱怨工作是年轻人的常态，但爱工作才是积极的心态，与其怨它不如爱它。一般来说，在公司干了一两年后，不少年轻人容易产生怨言，除了具体的埋怨公司“扭门”，埋怨上司无能以外，也有许多人是因为根本不爱自已的那份工作，工作完全成了谋生的手段，跟自已的性格、专业、爱好都相差甚远。
一边时间不够用一边浪费时间 bingyingao 工作时间浪费
一方面感觉时间严重不够用，另一方面又在不停的浪费时间。每一个周末，晚上熬夜看电影到凌晨一点，早上起不来一直睡到10点钟，10点钟起床，吃饭后玩手机到下午一点。精神还是很差，下午像一直野鬼在城市里晃荡。为何不尝试晚上10点钟就睡，早上7点就起，时间完全是一样的，把看电影的时间换到早上，精神好，气色好，一天好状态。控制让自己周末早睡早起，你就成功了一半。有多少个工作
【Scala八】Scala核心二：隐式转换 bit1129 scala
Implicits work like this: if you call a method on a Scala object, and the Scala compiler does not see a definition for that method in the class definition for that object, the compiler will try to con
sudoku slover in Haskell (2) bookjovi haskell sudoku
继续精简haskell版的sudoku程序，稍微改了一下，这次用了8行，同时性能也提高了很多，对每个空格的所有解不是通过尝试算出来的，而是直接得出。 board = [0,3,4,1,7,0,5,0,0, 0,6,0,0,0,8,3,0,1, 7,0,0,3,0,0,0,0,6, 5,0,0,6,4,0,8,0,7,
Java-Collections Framework学习与总结-HashSet和LinkedHashSet BrokenDreams linkedhashset
本篇总结一下两个常用的集合类HashSet和LinkedHashSet。它们都实现了相同接口java.util.Set。Set表示一种元素无序且不可重复的集合；之前总结过的java.util.List表示一种元素可重复且有序
读《研磨设计模式》-代码笔记-备忘录模式-Memento bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; /* * 备忘录模式的功能是，在不破坏封装性的前提下，捕获一个对象的内部状态，并在对象之外保存这个状态，为以后的状态恢复作“备忘”
《RAW格式照片处理专业技法》笔记 cherishLC PS
注意，这不是教程！仅记录楼主之前不太了解的一、色彩（空间）管理作者建议采用ProRGB（色域最广），但camera raw中设为ProRGB，而PS中则在ProRGB的基础上，将gamma值设为了1.8（更符合人眼）注意：bridge、camera raw怎么设置显示、输出的颜色都是正确的（会读取文件内的颜色配置文件），但用PS输出jpg文件时，必须先用Edit->conv
使用 Git 下载 Spring 源码编译 for Eclipse crabdave eclipse
使用 Git 下载 Spring 源码编译 for Eclipse 1、安装gradle，下载 http://www.gradle.org/downloads 配置环境变量GRADLE_HOME，配置PATH %GRADLE_HOME%/bin，cmd，gradle -v 2、spring4 用jdk8 下载 https://jdk8.java.
mysql连接拒绝问题 daizj mysql 登录权限
mysql中在其它机器连接mysql服务器时报错问题汇总一、[running][email protected]:~$mysql -uroot -h 192.168.9.108 -p //带-p参数，在下一步进行密码输入 Enter password: //无字符串输入 ERROR 1045 (28000): Access
Google Chrome 为何打压 H.264 dsjt apple html5 chrome Google
Google 今天在 Chromium 官方博客宣布由于 H.264 编解码器并非开放标准，Chrome 将在几个月后正式停止对 H.264 视频解码的支持，全面采用开放的 WebM 和 Theora 格式。 Google 在博客上表示，自从 WebM 视频编解码器推出以后，在性能、厂商支持以及独立性方面已经取得了很大的进步，为了与 Chromium 现有支持的編解码器保持一致，Chrome
yii 获取控制器名和方法名 dcj3sjt126com yii framework
1. 获取控制器名在控制器中获取控制器名: $name = $this->getId(); 在视图中获取控制器名: $name = Yii::app()->controller->id; 2. 获取动作名在控制器beforeAction()回调函数中获取动作名: $name =
Android知识总结（二） come_for_dream android
明天要考试了，速速总结如下 1、Activity的启动模式 standard：每次调用Activity的时候都创建一个（可以有多个相同的实例，也允许多个相同Activity叠加。） singleTop：可以有多个实例，但是不允许多个相同Activity叠加。即，如果Ac
高洛峰收徒第二期：寻找未来的“技术大牛” ——折腾一年，奖励20万元 gcq511120594 工作项目管理
高洛峰，兄弟连IT教育合伙人、猿代码创始人、PHP培训第一人、《细说PHP》作者、软件开发工程师、《IT峰播》主创人、PHP讲师的鼻祖！首期现在的进程刚刚过半，徒弟们真的很棒，人品都没的说，团结互助，学习刻苦，工作认真积极，灵活上进。我几乎会把他们全部留下来，现在已有一多半安排了实际的工作，并取得了很好的成绩。等他们出徒之日，凭他们的能力一定能够拿到高薪，而且我还承诺过一个徒弟，当他拿到大学毕
linux expect heipark expect
1. 创建、编辑文件go.sh #!/usr/bin/expect spawn sudo su admin expect "*password*" { send "13456\r\n" } interact 2. 设置权限 chmod u+x go.sh 3.
Spring4.1新特性——静态资源处理增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
idea ubuntuxia 乱码 liyonghui160com
1.首先需要在windows字体目录下或者其它地方找到simsun.ttf 这个字体文件。 2.在ubuntu 下可以执行下面操作安装该字体： sudo mkdir /usr/share/fonts/truetype/simsun sudo cp simsun.ttf /usr/share/fonts/truetype/simsun fc-cache -f -v
改良程序的11技巧 pda158 技巧
有很多理由都能说明为什么我们应该写出清晰、可读性好的程序。最重要的一点，程序你只写一次，但以后会无数次的阅读。当你第二天回头来看你的代码时，你就要开始阅读它了。当你把代码拿给其他人看时，他必须阅读你的代码。因此，在编写时多花一点时间，你会在阅读它时节省大量的时间。让我们看一些基本的编程技巧：尽量保持方法简短永远永远不要把同一个变量用于多个不同的
300个涵盖IT各方面的免费资源（下）——工作与学习篇 shoothao 创业免费资源学习课程远程工作
工作与生产效率: A. 背景声音 Noisli:背景噪音与颜色生成器。 Noizio:环境声均衡器。 Defonic:世界上任何的声响都可混合成美丽的旋律。 Designers.mx:设计者为设计者所准备的播放列表。 Coffitivity:这里的声音就像咖啡馆里放的一样。 B. 避免注意力分散 Self Co
深入浅出RPC uule rpc
深入浅出RPC-浅出篇深入浅出RPC-深入篇 RPC Remote Procedure Call Protocol 远程过程调用协议它是一种通过网络从远程计算机程序上请求服务，而不需要了解底层网络技术的协议。RPC协议假定某些传输协议的存在，如TCP或UDP，为通信程序之间携带信息数据。在OSI网络通信模型中，RPC跨越了传输层和应用层。RPC使得开发

path	The path of the snapshottable directory.
snapshotName	快照名字。如果没有指定，会用时间戳生成一个默认的名字格式是"'s'yyyyMMdd-HHmmss.SSS", 例如 "s20130412-151029.033".