一瓢一瓢的饮 alanchan

21、MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件

Hadoop系列文章目录

1、hadoop3.1.4简单介绍及部署、简单验证
2、HDFS操作 - shell客户端
3、HDFS的使用（读写、上传、下载、遍历、查找文件、整个目录拷贝、只拷贝文件、列出文件夹下文件、删除文件及目录、获取文件及文件夹属性等）-java
4、HDFS-java操作类HDFSUtil及junit测试（HDFS的常见操作以及HA环境的配置）
5、HDFS API的RESTful风格–WebHDFS
6、HDFS的HttpFS-代理服务
7、大数据中常见的文件存储格式以及hadoop中支持的压缩算法
8、HDFS内存存储策略支持和“冷热温”存储
9、hadoop高可用HA集群部署及三种方式验证
10、HDFS小文件解决方案–Archive
11、hadoop环境下的Sequence File的读写与合并
12、HDFS Trash垃圾桶回收介绍与示例
13、HDFS Snapshot快照
14、HDFS 透明加密KMS
15、MapReduce介绍及wordcount
16、MapReduce的基本用法示例-自定义序列化、排序、分区、分组和topN
17、MapReduce的分区Partition介绍
18、MapReduce的计数器与通过MapReduce读取/写入数据库示例
19、Join操作map side join 和 reduce side join
20、MapReduce 工作流介绍
21、MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件
22、MapReduce使用Gzip压缩、Snappy压缩和Lzo压缩算法写文件和读取相应的文件
23、hadoop集群中yarn运行mapreduce的内存、CPU分配调度计算与优化

文章目录

Hadoop系列文章目录
一、MapReduce读写SequenceFile
- 1、写SequenceFile
- 2、读SequenceFile
- 3、使用SequenceFile合并小文件
二、MapFile
- 1、写MapFile
- 2、读MapFile
- - 1）、实现说明
  - 2）、实现
三、ORCFile
- 1、写ORCFile
- - 1）、pom.xml
  - 2）、实现
- 2、读ORCFile
- 3、写ORCFile（读取数据库）
- 4、读ORCFile（写入数据库）
四、ParquetFile
- 1、pom.xml
- 2、写ParquetFile
- 3、读parquetfile

本文介绍使用MapReduce读写文件，包括：读写SequenceFile、MapFile、ORCFile和ParquetFile文件。
本文前提：hadoop环境可正常使用。pom.xml文件内容参考本专栏中的其他文章内容。
本文分为四部分，即MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件。
下篇文章介绍压缩算法的使用。

关于本文的前置内容介绍，参考链接hdfs的文件系统与压缩算法

一、MapReduce读写SequenceFile

1、写SequenceFile

本示例的写入内容是根据读取的txt文件内容。
使用SequenceFileOutputFormat将结果保存为SequenceFile。
代码示例：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WriteSeqFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/in/seq";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/seq";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new WriteSeqFile(), args);
		System.exit(status);
	}

	/**
	 * 注意文件类型，确定mapper的keyin类型
	 * 如果使用mapper输出，则Mapper的输出keyOut类型需要是非null、text等类型，测试下来LongWritable可以
	 * Mapper
	 * 如果使用mapper-reducer输出，则Mapper输出keyOut类型好像都可以
	 * 
	 * @author alanchan
	 *
	 */
	static class WriteSeqFileMapper extends Mapper<LongWritable, Text, Text, NullWritable> {
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			context.write(value, NullWritable.get());
		}
	}

	static class WriteSeqFileReducer extends Reducer<Text, NullWritable, Text, NullWritable> {
		protected void reduce(Text key, Iterable<NullWritable> values, Context context)
				throws IOException, InterruptedException {
			context.write(key, NullWritable.get());
		}
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(WriteSeqFileMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(NullWritable.class);

		job.setReducerClass(WriteSeqFileReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(NullWritable.class);

//		job.setNumReduceTasks(0);

		// 配置作业的输入数据路径
		FileInputFormat.addInputPath(job, new Path(in));

		// 设置作业的输出为SequenceFileOutputFormat
		job.setOutputFormatClass(SequenceFileOutputFormat.class);
		// 使用SequenceFile的块级别压缩
		SequenceFileOutputFormat.setOutputCompressionType(job, SequenceFile.CompressionType.BLOCK);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

}

2、读SequenceFile

读取本示例中的Sequence文件，生成TextFile文件。
使用SequenceFileInputformat读取SequenceFile。
代码示例：

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

/**
 * @author alanchan
 * 读取SequenceFile文件
 */
public class ReadSeqFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/out/seq";;
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/seqread";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new ReadSeqFile(), args);
		System.exit(status);
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(ReadSeqFileMapper.class);
		job.setMapOutputKeyClass(NullWritable.class);
		job.setMapOutputValueClass(Text.class);

		job.setReducerClass(ReadSeqFileReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(NullWritable.class);

//		job.setNumReduceTasks(0);

		// 设置作业的输入为SequenceFileInputFormat（SequenceFile文本）
		job.setInputFormatClass(SequenceFileInputFormat.class);
		// 配置作业的输入数据路径
		SequenceFileInputFormat.addInputPath(job, new Path(in));

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	/**
	 * 特别注意：mapper的输入key类型要根据文件类型来设定，否则会出现类型转换异常
	 * 
	 * @author alanchan
	 *
	 */
	static class ReadSeqFileMapper extends Mapper<NullWritable, Text, NullWritable, Text> {
		protected void map(NullWritable key, Text value, Context context) throws IOException, InterruptedException {
			context.write(NullWritable.get(), value);
		}
	}

	static class ReadSeqFileReducer extends Reducer<NullWritable, Text, Text, NullWritable> {
		protected void reduce(Text key, Iterable<NullWritable> values, Context context)
				throws IOException, InterruptedException {
			context.write(key, NullWritable.get());
		}
	}
}

3、使用SequenceFile合并小文件

将所有的小文件写入到一个Sequence File中，即将文件名作为key，文件内容作为value序列化到Sequence File大文件中。

import java.io.File;
import java.io.FileInputStream;
import java.nio.charset.Charset;
import java.util.ArrayList;
import java.util.List;

import org.apache.commons.codec.digest.DigestUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileUtil;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.SequenceFile.Reader;
import org.apache.hadoop.io.SequenceFile.Writer;
import org.apache.hadoop.io.Text;

public class MergeSmallFilesToSequenceFile {
	private static Configuration configuration = new Configuration();
	static String srcPath = "D:/workspace/bigdata-component/hadoop/test/in/sf";
	static String destPath = "D:/workspace/bigdata-component/hadoop/test/out/sf";

	public static void main(String[] args) throws Exception {
		MergeSmallFilesToSequenceFile msf = new MergeSmallFilesToSequenceFile();
		// 合并小文件
		List<String> fileList = msf.getFileListByPath(srcPath);

		msf.mergeFile(configuration, fileList, destPath);
		// 读取大文件
		msf.readMergedFile(configuration, destPath);
	}

	public List<String> getFileListByPath(String inputPath) throws Exception {
		List<String> smallFilePaths = new ArrayList<String>();
		File file = new File(inputPath);
		// 给定路径是文件夹，则遍历文件夹，将子文件夹中的文件都放入smallFilePaths
		// 给定路径是文件，则把文件的路径放入smallFilePaths
		if (file.isDirectory()) {
			File[] files = FileUtil.listFiles(file);
			for (File sFile : files) {
				smallFilePaths.add(sFile.getPath());
			}
		} else {
			smallFilePaths.add(file.getPath());
		}
		return smallFilePaths;
	}

	// 把smallFilePaths的小文件遍历读取，然后放入合并的sequencefile容器中
	public void mergeFile(Configuration configuration, List<String> smallFilePaths, String destPath) throws Exception {
		Writer.Option bigFile = Writer.file(new Path(destPath));
		Writer.Option keyClass = Writer.keyClass(Text.class);
		Writer.Option valueClass = Writer.valueClass(BytesWritable.class);
		// 构造writer
		Writer writer = SequenceFile.createWriter(configuration, bigFile, keyClass, valueClass);
		// 遍历读取小文件，逐个写入sequencefile
		Text key = new Text();
		for (String path : smallFilePaths) {
			File file = new File(path);
			long fileSize = file.length();// 获取文件的字节数大小
			byte[] fileContent = new byte[(int) fileSize];
			FileInputStream inputStream = new FileInputStream(file);
			inputStream.read(fileContent, 0, (int) fileSize);// 把文件的二进制流加载到fileContent字节数组中去
			String md5Str = DigestUtils.md5Hex(fileContent);
			System.out.println("merge小文件：" + path + ",md5:" + md5Str);
			key.set(path);
			// 把文件路径作为key，文件内容做为value，放入到sequencefile中
			writer.append(key, new BytesWritable(fileContent));
		}
		writer.hflush();
		writer.close();
	}

	// 读取大文件中的小文件
	public void readMergedFile(Configuration configuration, String srcPath) throws Exception {
		Reader.Option file = Reader.file(new Path(srcPath));
		Reader reader = new Reader(configuration, file);
		Text key = new Text();
		BytesWritable value = new BytesWritable();
		while (reader.next(key, value)) {
			byte[] bytes = value.copyBytes();
			String md5 = DigestUtils.md5Hex(bytes);
			String content = new String(bytes, Charset.forName("GBK"));
			System.out.println("读取到文件：" + key + ",md5:" + md5 + ",content:" + content);
		}
	}

}

运行日志输出

2022-09-22 19:16:55,192 WARN zlib.ZlibFactory: Failed to load/initialize native-zlib library
2022-09-22 19:16:55,193 INFO compress.CodecPool: Got brand-new compressor [.deflate]
merge小文件：D:\workspace\bigdata-component\hadoop\test\in\sf\java.txt,md5:b086a9d7084ccea407df5b3215085bd4
merge小文件：D:\workspace\bigdata-component\hadoop\test\in\sf\java1.txt,md5:b086a9d7084ccea407df5b3215085bd4
merge小文件：D:\workspace\bigdata-component\hadoop\test\in\sf\testhadoopclient_java.txt,md5:b086a9d7084ccea407df5b3215085bd4
2022-09-22 19:16:55,209 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
读取到文件：D:\workspace\bigdata-component\hadoop\test\in\sf\java.txt,md5:b086a9d7084ccea407df5b3215085bd4,content:testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
读取到文件：D:\workspace\bigdata-component\hadoop\test\in\sf\java1.txt,md5:b086a9d7084ccea407df5b3215085bd4,content:testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
读取到文件：D:\workspace\bigdata-component\hadoop\test\in\sf\testhadoopclient_java.txt,md5:b086a9d7084ccea407df5b3215085bd4,content:testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt
testhadoopclient_java.txt

二、MapFile

可以理解为MapFile是排序后的SequenceFile，通过观察其结构可以看到MapFile由两部分组成。分别是data和index。data为存储数据的文件，index作为文件的数据索引，主要记录了每个Record的Key值，以及该Record在文件中的偏移位置

1、写MapFile

读取普通TextFile，生成MapFile文件

代码示例

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MapFileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WriteMapFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/in/seq";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/mapfile";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new WriteMapFile(), args);
		System.exit(status);
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(WriteMapFileMapper.class);
		job.setMapOutputKeyClass(LongWritable.class);
		job.setMapOutputValueClass(Text.class);

		job.setNumReduceTasks(0);

		// 配置作业的输入数据路径
		FileInputFormat.addInputPath(job, new Path(in));

		// 设置作业的输出为MapFileOutputFormat
		job.setOutputFormatClass(MapFileOutputFormat.class);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	static class WriteMapFileMapper extends Mapper<LongWritable, Text, LongWritable, Text> {
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			context.write(key, value);
		}
	}

}

运行结果

2、读MapFile

读取MapFile文件，生成普通TextFile文件

1）、实现说明

MapReduce中没有封装MapFile的读取输入类，工作中可根据情况选择以下方案来实现
方案一：自定义InputFormat，使用MapFileOutputFormat中的getReader方法获取读取对象
方案二：使用SequenceFileInputFormat对MapFile的数据进行解析

2）、实现

使用方案二示例

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class ReadMapFile extends Configured implements Tool {
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/mapfileread";
	static String in = "D:/workspace/bigdata-component/hadoop/test/out/mapfile";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new ReadMapFile(), args);
		System.exit(status);
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(ReadMapFileMapper.class);
		job.setMapOutputKeyClass(NullWritable.class);
		job.setMapOutputValueClass(Text.class);

		job.setNumReduceTasks(0);

		FileInputFormat.addInputPath(job, new Path(in));

        // 设置作业的输入为SequenceFileInputFormat（Hadoop没有直接提供MapFileInput）
//		job.setInputFormatClass(MapFileInputFormat.class);
		job.setInputFormatClass(SequenceFileInputFormat.class);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	static class ReadMapFileMapper extends Mapper<LongWritable, Text, NullWritable, Text> {
		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			context.write(NullWritable.get(), value);
		}
	}
}

三、ORCFile

1、写ORCFile

读取普通TextFile，生成ORC文件

1）、pom.xml

需要在上文的基础上添加额外的orcfile支持内容


<dependency>
    <groupId>org.apache.orcgroupId>
    <artifactId>orc-shimsartifactId>
    <version>1.6.3version>
dependency>
<dependency>
    <groupId>org.apache.orcgroupId>
    <artifactId>orc-coreartifactId>
    <version>1.6.3version>
dependency>
<dependency>
    <groupId>org.apache.orcgroupId>
    <artifactId>orc-mapreduceartifactId>
    <version>1.6.3version>
dependency>

2）、实现

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.mapred.OrcStruct;
import org.apache.orc.mapreduce.OrcOutputFormat;

/**
 * @author alanchan 
 * 读取普通文本文件转换为ORC文件
 */
public class WriteOrcFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/in/orc";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/orc";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new WriteOrcFile(), args);
		System.exit(status);
	}

	@Override
	public int run(String[] args) throws Exception {
		// 设置Schema
		OrcConf.MAPRED_OUTPUT_SCHEMA.setString(this.getConf(), SCHEMA);

		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(WriteOrcFileMapper.class);
		job.setMapOutputKeyClass(NullWritable.class);
		job.setMapOutputValueClass(OrcStruct.class);

		job.setNumReduceTasks(0);

		// 配置作业的输入数据路径
		FileInputFormat.addInputPath(job, new Path(in));

		// 设置作业的输出为MapFileOutputFormat
		job.setOutputFormatClass(OrcOutputFormat.class);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	// 定义数据的字段信息
//数据格式	
//	id                 ,type  ,orderid              ，bankcard,ctime              ,utime
//	2.0191130220014E+27,ALIPAY,191130-461197476510745,356886,,
//	2.01911302200141E+27,ALIPAY,191130-570038354832903,404118,2019/11/30 21:44,2019/12/16 14:24
//	2.01911302200143E+27,ALIPAY,191130-581296620431058,520083,2019/11/30 18:17,2019/12/4 20:26
//	2.0191201220014E+27,ALIPAY,191201-311567320052455,622688,2019/12/1 10:56,2019/12/16 11:54
	private static final String SCHEMA = "struct";

	static class WriteOrcFileMapper extends Mapper<LongWritable, Text, NullWritable, OrcStruct> {
		// 获取字段描述信息
		private TypeDescription schema = TypeDescription.fromString(SCHEMA);
		// 构建输出的Key
		private final NullWritable outputKey = NullWritable.get();
		// 构建输出的Value为ORCStruct类型
		private final OrcStruct outputValue = (OrcStruct) OrcStruct.createValue(schema);

		protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			// 将读取到的每一行数据进行分割，得到所有字段
			String[] fields = value.toString().split(",", 6);
			// 将所有字段赋值给Value中的列
			outputValue.setFieldValue(0, new Text(fields[0]));
			outputValue.setFieldValue(1, new Text(fields[1]));
			outputValue.setFieldValue(2, new Text(fields[2]));
			outputValue.setFieldValue(3, new Text(fields[3]));
			outputValue.setFieldValue(4, new Text(fields[4]));
			outputValue.setFieldValue(5, new Text(fields[5]));

			context.write(outputKey, outputValue);
		}
	}

}

运行结果如下

2、读ORCFile

读取ORC文件，转换为普通文本文件
本示例就是读取上一个示例生成的文件。

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.orc.mapred.OrcStruct;
import org.apache.orc.mapreduce.OrcInputFormat;

/**
 * @author alanchan 
 * 读取ORC文件进行解析还原成普通文本文件
 */
public class ReadOrcFile extends Configured implements Tool {
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/orcread";
	static String in = "D:/workspace/bigdata-component/hadoop/test/out/orc";

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new ReadOrcFile(), args);
		System.exit(status);
	}

	@Override
	public int run(String[] args) throws Exception {
		Job job = Job.getInstance(getConf(), this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(ReadOrcFileMapper.class);
		job.setMapOutputKeyClass(NullWritable.class);
		job.setMapOutputValueClass(Text.class);

		job.setNumReduceTasks(0);

		FileInputFormat.addInputPath(job, new Path(in));
		// 設置輸入文件類型
		job.setInputFormatClass(OrcInputFormat.class);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	static class ReadOrcFileMapper extends Mapper<NullWritable, OrcStruct, NullWritable, Text> {
		Text outValue = new Text();

		protected void map(NullWritable key, OrcStruct value, Context context)
				throws IOException, InterruptedException {
//			outValue.set(value.toString());
//			value.getFieldValue(0).toString()
			// 或者根據OrcStruct的格式進行獲取值，按照要求進行組裝輸出，本示例僅僅是轉為字符串輸出

			context.write(NullWritable.get(), new Text(value.toString()));
		}
	}

}

运行结果如下：

3、写ORCFile（读取数据库）

读取数据库，转换为ORC文件
pom.xml文件中需要增加mysql的驱动依赖。

源数据记录条数：12606948条
clickhouse系统存储文件大小：50.43 MB
逐条读出存成文本文件大小：1.07G(未压缩)
逐条读出存成ORC文件大小：105M(未压缩)

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.orc.OrcConf;
import org.apache.orc.TypeDescription;
import org.apache.orc.mapred.OrcStruct;
import org.apache.orc.mapreduce.OrcOutputFormat;
import org.hadoop.mr.db.User;

/**
 * @author alanchan 
 * 从mysql中读取数据，并写入到文件中
 *
 */

public class ReadFromMysqlToOrcFile extends Configured implements Tool {
	private static final String SCHEMA = "struct";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/mysql";

	static class ReadFromMysqlMapper extends Mapper<LongWritable, User, NullWritable, OrcStruct> {
		private TypeDescription schema = TypeDescription.fromString(SCHEMA);
		private final NullWritable outKey = NullWritable.get();
		private final OrcStruct outValue = (OrcStruct) OrcStruct.createValue(schema);

		protected void map(LongWritable key, User value, Context context) throws IOException, InterruptedException {
			Counter counter = context.getCounter("mysql_records_counters", "User Records");
			counter.increment(1);

			// 将所有字段赋值给Value中的列
			outValue.setFieldValue(0, new IntWritable(value.getId()));
			outValue.setFieldValue(1, new Text(value.getUserName()));
			outValue.setFieldValue(2, new Text(value.getPassword()));
			outValue.setFieldValue(3, new Text(value.getPhone()));
			outValue.setFieldValue(4, new Text(value.getEmail()));
			outValue.setFieldValue(5, new Text(value.getCreateDay()));

			context.write(outKey, outValue);
		}
	}

	@Override
	public int run(String[] args) throws Exception {
		OrcConf.MAPRED_OUTPUT_SCHEMA.setString(this.getConf(), SCHEMA);
		Configuration conf = getConf();

		DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://192.168.10.44:3306/test", "root","root");

		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(this.getClass());

		job.setInputFormatClass(DBInputFormat.class);
		DBInputFormat.setInput(job, User.class,
		"select id, user_Name,pass_word,phone,email,create_day from dx_user",
		// 12606948 条数据
		"select count(*) from dx_user ");

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, outputDir);

		job.setMapperClass(ReadFromMysqlMapper.class);
		job.setMapOutputKeyClass(NullWritable.class);
		job.setMapOutputValueClass(OrcStruct.class);
		job.setOutputFormatClass(OrcOutputFormat.class);

		job.setNumReduceTasks(0);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	public static void main(String[] args) throws Exception {
		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new ReadFromMysqlToOrcFile(), args);
		System.exit(status);
	}
}

4、读ORCFile（写入数据库）

读取ORC文件，写入mysql数据库

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.orc.mapred.OrcStruct;
import org.apache.orc.mapreduce.OrcInputFormat;
import org.springframework.util.StopWatch;

public class WriteFromOrcFileToMysql extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/out/mysql";

	@Override
	public int run(String[] args) throws Exception {
		Configuration conf = getConf();
		DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver", "jdbc:mysql://192.168.10.44:3306/test", "root","root");
		Job job = Job.getInstance(conf, this.getClass().getSimpleName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(WriteFromOrcFileToMysqlMapper.class);
		job.setMapOutputKeyClass(User.class);
		job.setMapOutputValueClass(NullWritable.class);

		FileInputFormat.addInputPath(job, new Path(in));
		job.setInputFormatClass(OrcInputFormat.class);
		job.setOutputFormatClass(DBOutputFormat.class);
		// id, user_Name,pass_word,phone,email,create_day
		DBOutputFormat.setOutput(job, "dx_user_copy", "id", "user_name", "pass_word", "phone", "email", "create_day");

//		job.setReducerClass(WriteFromOrcFileToMysqlReducer.class);
//		job.setOutputKeyClass(NullWritable.class);
//		job.setOutputValueClass(Text.class);

		job.setNumReduceTasks(0);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	public static void main(String[] args) throws Exception {
		StopWatch clock = new StopWatch();
		clock.start(WriteFromOrcFileToMysql.class.getSimpleName());

		Configuration conf = new Configuration();

		int status = ToolRunner.run(conf, new WriteFromOrcFileToMysql(), args);

		clock.stop();
		System.out.println(clock.prettyPrint());

		System.exit(status);
	}

	static class WriteFromOrcFileToMysqlMapper extends Mapper<NullWritable, OrcStruct, User, NullWritable> {
		User outValue = new User();
		protected void map(NullWritable key, OrcStruct value, Context context)
				throws IOException, InterruptedException {
			// SCHEMA = "struct";
			outValue.setId(Integer.parseInt(value.getFieldValue("id").toString()));
			outValue.setUserName(value.getFieldValue("userName").toString());
			outValue.setPassword(value.getFieldValue("password").toString());
			outValue.setPhone(value.getFieldValue("phone").toString());
			outValue.setEmail(value.getFieldValue("email").toString());
			outValue.setCreateDay(value.getFieldValue("createDay").toString());
			context.write(outValue,NullWritable.get());
		}

	}
}

四、ParquetFile

1、pom.xml

读写需要增加额外的parquetfile支持的maven依赖


		<dependency>
			<groupId>org.apache.parquetgroupId>
			<artifactId>parquet-hadoopartifactId>
			<version>${parquet.version}version>
		dependency>
		<dependency>
			<groupId>org.apache.parquetgroupId>
			<artifactId>parquet-columnartifactId>
			<version>${parquet.version}version>
		dependency>
		<dependency>
			<groupId>org.apache.parquetgroupId>
			<artifactId>parquet-commonartifactId>
			<version>${parquet.version}version>
		dependency>
		<dependency>
			<groupId>org.apache.parquetgroupId>
			<artifactId>parquet-encodingartifactId>
			<version>${parquet.version}version>
		dependency>

2、写ParquetFile

读取textfile文件，写成parquetfile文件

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.example.data.simple.SimpleGroupFactory;
import org.apache.parquet.hadoop.ParquetOutputFormat;
import org.apache.parquet.hadoop.example.GroupWriteSupport;
import org.apache.parquet.hadoop.metadata.CompressionCodecName;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.OriginalType;
import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName;
import org.apache.parquet.schema.Types;
import org.springframework.util.StopWatch;

/**
 * @author alanchan
 *
 */
public class WriteParquetFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/in/parquet";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/parquet";

	public static void main(String[] args) throws Exception {
		StopWatch clock = new StopWatch();
		clock.start(WriteParquetFile.class.getSimpleName());

		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new WriteParquetFile(), args);
		System.exit(status);

		clock.stop();
		System.out.println(clock.prettyPrint());
	}

	@Override
	public int run(String[] args) throws Exception {
		Configuration conf = getConf();
		// 此demo 输入数据为2列 city ip
		//输入文件格式：https://www.win.com/233434,8283140
		//https://www.win.com/242288,8283139
		MessageType schema = Types.buildMessage().required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("city").required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8)
				.named("ip").named("pair");

		System.out.println("[schema]==" + schema.toString());

		GroupWriteSupport.setSchema(schema, conf);

		Job job = Job.getInstance(conf, this.getClass().getName());
		job.setJarByClass(this.getClass());

		job.setMapperClass(WriteParquetFileMapper.class);
		job.setInputFormatClass(TextInputFormat.class);
		job.setMapOutputKeyClass(NullWritable.class);
		// 设置value是parquet的Group
		job.setMapOutputValueClass(Group.class);
		
		FileInputFormat.setInputPaths(job, in);

		// parquet输出
		job.setOutputFormatClass(ParquetOutputFormat.class);
		ParquetOutputFormat.setWriteSupportClass(job, GroupWriteSupport.class);

		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, new Path(out));
//        ParquetOutputFormat.setOutputPath(job, new Path(out));
		ParquetOutputFormat.setCompression(job, CompressionCodecName.SNAPPY);
		job.setNumReduceTasks(0);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	public static class WriteParquetFileMapper extends Mapper<LongWritable, Text, NullWritable, Group> {
		SimpleGroupFactory factory = null;

		protected void setup(Context context) throws IOException, InterruptedException {
			factory = new SimpleGroupFactory(GroupWriteSupport.getSchema(context.getConfiguration()));
		};

		public void map(LongWritable _key, Text ivalue, Context context) throws IOException, InterruptedException {
			Group pair = factory.newGroup();
			//截取输入文件的一行，且是以逗号进行分割
			String[] strs = ivalue.toString().split(",");
			pair.append("city", strs[0]);
			pair.append("ip", strs[1]);
			context.write(null, pair);
		}
	}
}

3、读parquetfile

读取上示例的parquetFile，写成textfile文件

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.parquet.example.data.Group;
import org.apache.parquet.hadoop.ParquetInputFormat;
import org.apache.parquet.hadoop.example.GroupReadSupport;
import org.apache.parquet.hadoop.example.GroupWriteSupport;
import org.apache.parquet.schema.MessageType;
import org.apache.parquet.schema.OriginalType;
import org.apache.parquet.schema.Types;
import org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName;
import org.hadoop.mr.filetype.parquetfile.ParquetReaderAndWriteMRDemo.ParquetReadMapper;
import org.springframework.util.StopWatch;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Mapper.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class ReadParquetFile extends Configured implements Tool {
	static String in = "D:/workspace/bigdata-component/hadoop/test/out/parquet";
	static String out = "D:/workspace/bigdata-component/hadoop/test/out/parquet_read";

	public static void main(String[] args) throws Exception {
		StopWatch clock = new StopWatch();
		clock.start(ReadParquetFile.class.getSimpleName());

		Configuration conf = new Configuration();
		int status = ToolRunner.run(conf, new ReadParquetFile(), args);
		System.exit(status);

		clock.stop();
		System.out.println(clock.prettyPrint());
	}

	@Override
	public int run(String[] args) throws Exception {
		Configuration conf = new Configuration(this.getConf());
		// 此demo 输入数据为2列 city ip
		MessageType schema = Types.buildMessage().required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8).named("city").required(PrimitiveTypeName.BINARY).as(OriginalType.UTF8)
				.named("ip").named("pair");

		System.out.println("[schema]==" + schema.toString());

		GroupWriteSupport.setSchema(schema, conf);
		Job job = Job.getInstance(conf, this.getClass().getName());
		job.setJarByClass(this.getClass());

		// parquet输入
		job.setMapperClass(ReadParquetFileMapper.class);
		job.setNumReduceTasks(0);
		job.setInputFormatClass(ParquetInputFormat.class);
		ParquetInputFormat.setReadSupportClass(job, GroupReadSupport.class);
		FileInputFormat.setInputPaths(job, in);

		job.setOutputKeyClass(NullWritable.class);
		job.setOutputValueClass(Text.class);
		Path outputDir = new Path(out);
		outputDir.getFileSystem(this.getConf()).delete(outputDir, true);
		FileOutputFormat.setOutputPath(job, new Path(out));

		job.setNumReduceTasks(0);

		return job.waitForCompletion(true) ? 0 : 1;
	}

	public static class ReadParquetFileMapper extends Mapper<NullWritable, Group, NullWritable, Text> {
		protected void map(NullWritable key, Group value, Context context) throws IOException, InterruptedException {
			
//			String city = value.getString(0, 0);
//			String ip = value.getString(1, 0);
//			context.write(NullWritable.get(), new Text(city + "," + ip));
			
			String city = value.getString("city", 0);
			String ip = value.getString("ip", 0);
			
			//输出文件格式：https://www.win.com/237516,8284068
			context.write(NullWritable.get(), new Text(value.getString(0, 0) + "," + value.getString(1, 0)));
			
			//输出文件格式：https://www.win.com/237516,8284068
			context.write(NullWritable.get(), new Text(city + "," + ip));
			
			//输出文件格式：
			//city: https://www.win.com/237516
			//ip: 8284068
			context.write(NullWritable.get(), new Text(value.toString()));
			
			context.write(NullWritable.get(), new Text("\n"));

		}
	}
}

至此，MapReduce读写SequenceFile、MapFile、ORCFile和ParquetFile文件操作完成，下片介绍压缩算法的使用。

你可能感兴趣的:(#,hadoop专栏,mapreduce,hadoop,hdfs,大数据,big,data)

【华为OD机试真题E卷】 27、计算最大乘积 | 机试真题+思路参考+代码解析（C++、Java、Py） KFickle Java Py）华为od c++java 华为OD机试真题计算最大乘积
文章目录一、题目题目描述输入输出样例1二、代码与思路参考C++代码Java代码Python代码订阅本专栏后即可解锁在线OJ刷题权限个人博客首页：KFickle专栏介绍：最新的华为OD机试真题D、E卷，每题都使用C++，Java，Python语言进行解答，每个题目的思路分析都非常详细，持续更新，支持在线OJ刷题，订阅后评论获取权限，有代码问题随时解答，代码仅供学习参考一、题目题目描述给定一个元素类型
使用宝塔大家Java项目遇到的问题 LOVE_DDZ JAVA Spring-Boot java spring boot 开发语言
记录一下使用宝塔大家Java项目遇到的问题：1.没有那个文件或目录/var/tmp/springboot/vhost/scripts/system-service.sh:没有那个文件或目录Feb2811:13:01hadoop05spring_system-service:/bin/bash:/var/tmp/springboot/vhost/scripts/system-service.sh:没
Spring Boot 配置ObjectMapper处理JSON序列化凉宫二萌 spring boot spring boot 1024程序员节
添加配置类importcom.fasterxml.jackson.annotation.JsonInclude;importcom.fasterxml.jackson.databind.ObjectMapper;importcom.fasterxml.jackson.databind.SerializationFeature;importcom.fasterxml.jackson.datatype
ROS2软件调用架构和机制解析：Publisher创建 slam02∞ ros2 dds
术语DDS(DataDistributionService):用于实时系统的数据分发服务标准，是ROS2底层通信的基础RMW(ROSMiddleware):ROS中间件接口，提供与具体DDS实现无关的抽象APIQoS(QualityofService):服务质量策略，控制通信的可靠性、历史记录、耐久性等属性符号解析:动态库加载过程中，查找和绑定函数指针的机制1.架构概述ROS2采用分层设计，通过多
docker快速安装Es和kibana_docker安装es和kibana 2401_84159783 2024年程序员学习 docker elasticsearch 容器
2：准备环境mkdir-p/home/docker/es#创建挂载目录mkdir-p/home/docker/es/logschmod777/home/docker/es/logs#授权mkdir-p/home/docker/es/datachmod777/home/docker/es/data#授权dockerrun-d--namees_temp-p9200:9200-p9300:9300ela
数据结构之【无头单向非循环链表】(C语言实现) zl_dfq 数据结构数据结构链表
下面将无头单向非循环链表简称为单链表头指针：指向链表第一个节点的指针链表为空时，头指针也为空要实现单链表，就是要实现单链表的增删查改一、无头单向非循环链表的c语言实现1.准备工作#include#include#includetypedefintSLTDataTypde;typedefstructSLTNode{SLTDataTypdedata;structSLTNode*next;}SLTNod
The underlying provider failed on open 问题解决 yyueshen SQLServer 链接字符串
用EntityFramework6，链接MSSqlServer，本地测试没问题，同事用IP访问就提示了“Theunderlyingproviderfailedonopen”，在网上查，有人说是连接字符串写的有问题，喵了个咪的，连接字符串写的有问题，为啥我用Add-Migration和update-database命令可以成功更新数据库，连接字符串有问题，为啥我IISExpress下的localho
使用Spring Data Redis操作Redis 吃海鲜的骆驼 Redis spring redis java
使用SpringDataRedis操作Redis文章目录使用SpringDataRedis操作Redis1.添加依赖2.配置Redis连接3.创建Redis配置类4.编写Redis操作类5.操作各种数据类型操作字符串（String）操作列表（List）操作集合（Set）操作哈希（Hash）操作有序集合（ZSet）6.启动并测试总结在Java中使用SpringDataRedis操作Redis的步骤可
【人工智能】数据挖掘与应用题库（301-400）奋力向前123 人工智能人工智能数据挖掘 pandas
1、关于pandas中的Series描述错误的是答案：Series默认没有index2、关于DataFrame描述正确的是答案：DataFrame指数据框，相当于程序中的虚拟Excel表格创建DataFrame后，可以重新指定indexDataFrame允许有缺失值3、在DataFrame中，可以获取某一列的值，也可以获取某一行的值。答案:对4、对于数据框book_info，以下用法有误的是答案：
Go语言学习笔记（五）正在绘制中 Go语言学习之路 golang 学习笔记
文章目录十八、go操作MySQL、RedisMySQLRedis十九、泛型泛型函数泛型类型泛型约束泛型特化泛型接口二十、workspaces核心概念示例二十一、模糊测试十八、go操作MySQL、RedisMySQLpackagemainimport("database/sql""errors""fmt"_"github.com/go-sql-driver/mysql""log""time")typ
【Sequel Ace/Pro】苹果Mac电脑上免费的mysql管理工具 weixin_43343144 大数据
下载方式：appStore搜索下载【SequelAce】SequelAce|MySQL/MariaDBdatabasemanagementformacOSSequelPro
safari 调试移动端_使用Safari开发工具调试iPhone移动Web应用程序 weixin_26735419 debug java python
safari调试移动端IwasdevelopingamobilewebapplicationforworkthatscansaQRcodetochecksomeoneinforahiringevent,andIwantedtomakesurethedatabeingdecodedfromtheQRcodewascorrect.Ialsowantedtomakesureoneofmykeyfunct
Java基于SpringBoot的校园心声墙小程序（附源码，文档） stormjun 小程序毕业设计 java spring boot 小程序校园心声墙小程序
基于SpringBoot的校园心声墙小程序博主介绍：✌stormjun、8年大厂程序员经历。全网粉丝15w+、csdn博客专家、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java技术领域和毕业项目实战✌文末获取源码联系精彩专栏推荐订阅不然下次找不到哟Java项目精品实战案例《100套》Java微信小程序项目实战《100套》Python项目实战《100套》感兴趣的可以先收藏起来，还有大家
VSCode连接远程服务器报错：vscode-remote://ssh-remote%2B%E5%AE%9E%E9%AA%8C%E5%AE%A4/data 的文件系统提供程序不可用。万年枝服务器使用常见运行错误 vscode 服务器 ssh
文章目录出现问题尝试解决其他可能问题出现问题vscode-remote://ssh-remote%2B%E5%AE%9E%E9%AA%8C%E5%AE%A4/data的文件系统提供程序不可用。尝试解决1.使用本地ssh连接，判断是否能够正常连接,样例如下，请修改成个人的端口号和root@ip形式。ssh-p10023lydc@10.163.23.832.如果能够连接，用ssh登录到远程服务器，删除
基于web在线餐饮网站的设计与实现——蛋糕甜品店铺(HTML+CSS+JavaScript) html网页制作网页设计前端 javascript html dreamweaver网页设计 web网页设计期末课程大作业
‍静态网站的编写主要是用HTMLDIV+CSSJS等来完成页面的排版设计‍,常用的网页设计软件有Dreamweaver、EditPlus、HBuilderX、VScode、Webstorm、Animate等等，用的最多的还是DW，当然不同软件写出的前端Html5代码都是一致的，本网页适合修改成为各种类型的产品展示网页，比如美食、旅游、摄影、电影、音乐等等多种主题，希望对大家有所帮助。精彩专栏推荐❤
python 推荐算法库_[译] 详解个性化推荐五大最常用算法 weixin_39612733 python 推荐算法库
允中若朴编译自Stats&Bots量子位出品|公众号QbitAI推荐系统，是当今互联网背后的无名英雄。我们在某宝首页看见的商品，某条上读到的新闻，甚至在各种地方看见的广告，都有赖于它。昨天，一个名为Stats&Bots的博客详解了构建推荐系统的五种方法。量子位编译如下：现在，许多公司都在用大数据来向用户进行相关推荐，驱动收入增长。推荐算法有很多种，数据科学家需要根据业务的限制和要求选择最好的算法。
Python酷库之旅-第三方库Pandas(011) 神奇夜光杯 python pandas 开发语言标准库及第三方库基础知识学习与成长
目录一、用法精讲25、pandas.HDFStore.get函数25-1、语法25-2、参数25-3、功能25-4、返回值25-5、说明25-6、用法25-6-1、数据准备25-6-2、代码示例25-6-3、结果输出26、pandas.HDFStore.select函数26-1、语法26-2、参数26-3、功能26-4、返回值26-5、说明26-6、用法26-6-1、数据准备26-6-2、代码示例
Docker安装Minio 摩尔多0 linux docker docker
一.启动Dockersystemctlstartdocker二.安装Miniodockerrun-d-p9000:9000-p50000:50000--nameminio\-e"MINIO_ROOT_USER=admin"\-e"MINIO_ROOT_PASSWORD=12345678"\-v/mnt/data:/data\-v/mnt/config:/root/.minio\minio/mini
如何使用Spark Streaming将数据写入HBase Java资深爱好者 spark hbase 大数据
在SparkStreaming中将数据写入HBase涉及到几个步骤。以下是一个基本的指南，帮助你理解如何使用SparkStreaming将数据写入HBase。1.环境准备HBase：确保HBase集群已经安装并运行。Spark：确保Spark已经安装，并且Spark版本与HBase的Hadoop版本兼容。HBaseConnectorforSpark：你需要使用HBase的SparkConnecto
项目经验之LZO压缩？思维导图代码示例（java 架构) 用心去追梦 java 架构开发语言
LZO（LightweightZip/Unzip）是一种高效的压缩算法，它以快速解压缩著称，适用于需要频繁读取和处理的数据。在Hadoop生态系统中，使用LZO压缩可以显著减少存储空间，并且由于其快速的解压速度，对于大规模数据处理任务来说是非常有利的。以下是关于LZO压缩的项目经验总结、思维导图描述以及Java代码示例。项目经验之LZO压缩LZO的优势快速解压：LZO算法设计时优先考虑了解压速度，
华为数通 HCIP-Datacom H12-831 新题 IT考试认证华为考试认证网络华为 HCIP 数通 831
2024年HCIP-Datacom（H12-831）变题后的新题，完整题库请扫描上方二维码，新题在持续更新中。某台IS-IS路由器自己生成的LSP信息如图所示，从LSP信息中不能推断出以下哪一结论?A：该路由器某一个接口的IPv6地址为2000:24::2B：该路由器所属的区域号为49.0001C：该路由器引入了4个外部网段D：该路由器的SystemID为0000.0000.0002答案：C如图所
深入理解PyTorch模型训练所需的数据集 mosquito_lover1 pytorch 人工智能 python
在PyTorch中，模型训练的核心是数据集（Dataset）。数据集是模型训练的基础，它提供了模型训练所需的所有输入数据和对应的标签。理解数据集的结构、加载方式以及如何预处理数据是成功训练模型的关键。以下是对PyTorch模型训练所需数据集的深入解析：1.数据集的基本概念数据集：数据集是模型训练的基础，通常由输入数据（如图像、文本、音频等）和对应的标签（目标值）组成。样本（Sample）：数据集中
usbserver客户端临时数据清理孤独的程序员dis1500 玩转usbserver 服务器运维
C:\Users\Adinistrator\AppData\Roaming\usbnetkitplus这个目录下文件清理需要下应该就可以了usbserver官网www.usbserver.com我的软件1.网盘2.加解密软件3.嵌入式信息安全21年一线编程经验.....北京、上海、深圳一线工作经验....找工作，地域不限....
编程小白冲Kaggle每日打卡（7）--kaggle学堂：＜Python＞布尔型和条件形 AZmax01 编程小白冲Kaggle每日打卡 python 开发语言
Kaggle课程官网链接：BooleansandConditionals本专栏旨在Kaggle官方课程的汉化，让大家更方便地看懂。目录BooleansandConditionalsBooleansComparisonOperationsCombiningBooleanValuesConditionalsBooleanconversionYourTurnBooleansandConditionals
编程小白冲Kaggle每日打卡（4）--kaggle学堂：＜编程简介＞列表 AZmax01 编程小白冲Kaggle每日打卡机器学习人工智能 python
Kaggle课程官网链接：IntrotoLists本专栏旨在Kaggle官方课程的汉化，让大家更方便地看懂。IntrotoLists整理您的数据，以便您能够高效地使用它。Introduction在进行数据科学研究时，您需要一种组织数据的方法，以便高效地使用它。Python有许多数据结构可用于保存数据，如列表、集合、字典和元组。在本教程中，您将学习如何使用Python列表。Motivation在“花
Java 9模块与Maven的深度结合 t0_54program java maven python 个人开发
在Java9引入模块化之后，如何将模块化与Maven项目结合成为了许多开发者关注的焦点。本文将通过一个简单的示例，展示如何在Maven项目中开发Java9模块，并使用非模块化的外部库（如Jsoup）。1.Maven项目配置首先，我们需要创建一个Maven项目，并在pom.xml中配置相关的依赖和插件。以下是完整的pom.xml文件内容：4.0.0com.logicbig.examplejava9-
编程小白冲Kaggle每日打卡（5）--kaggle学堂：＜Python＞Hello,Python! AZmax01 编程小白冲Kaggle每日打卡 python 机器学习深度学习
Kaggle课程官方链接：Hello,Python本专栏旨在Kaggle官方课程的汉化，让大家更方便地看懂。Hello,PythonPython语法、变量赋值和数字的快速介绍本课程涵盖了您需要的关键Python技能，以便您可以开始将Python用于数据科学。这门课程非常适合那些有一些编程经验的人，他们想把Python添加到他们的技能库中。（如果你是第一次编程，我们鼓励你查看我们的编程入门课程，该课
利用Python生成器和迭代器高效处理大数据文件清水白石008 计算机 Python题库 python python
利用Python生成器和迭代器高效处理大数据文件在Python中，处理大型数据文件时，内存管理是一个重要的考虑因素。传统的数据处理方法可能会一次性将整个文件加载到内存中，这在数据量较小时是可行的，但当数据量变得非常大时，这种方法就不再适用。幸运的是，Python提供了生成器和迭代器的概念，它们可以帮助我们在处理大型数据集时节省内存。本文将详细介绍如何使用这些工具来高效地处理大数据文件。什么是生成器
SOME/IP--协议英文原文讲解7 忆源 TBOX tcp/ip 网络协议网络
前言SOME/IP协议越来越多的用于汽车电子行业中，关于协议详细完全的中文资料却没有，所以我将结合工作经验并对照英文原版协议做一系列的文章。基本分三大块：1.SOME/IP协议讲解2.SOME/IP-SD协议讲解3.python/C++举例调试讲解4.1.5De-serializationofDataStructuresThede-serializationprocessneedtoinspect
Python推导式练习题250225 taoyong001 python 服务器
Inferenceexamdata=["abc.mp4","efg.mp4","oprste.mp4"]#把mp4的后缀名去掉data=[ele[0:-4]foreleindata]#需要注意，如果等号右边操作是在原列表中进行，可能会出现问题data=[ele.rsplit(".",1)[0]foreleindata]#为防止还在原列表中操作，最好定义新列表变量把mp4的后缀名去掉需要注意，如果等
eclipse maven IXHONG eclipse
eclipse中使用maven插件的时候，运行run as maven build的时候报错 -Dmaven.multiModuleProjectDirectory system propery is not set. Check $M2_HOME environment variable and mvn script match. 可以设一个环境变量M2_HOME指
timer cancel方法的一个小实例 alleni123 多线程 timer
package com.lj.timer; import java.util.Date; import java.util.Timer; import java.util.TimerTask; public class MyTimer extends TimerTask { private int a; private Timer timer; pub
MySQL数据库在Linux下的安装 ducklsl mysql
1.建好一个专门放置MySQL的目录 /mysql/db数据库目录 /mysql/data数据库数据文件目录 2.配置用户，添加专门的MySQL管理用户 >groupadd mysql ----添加用户组 >useradd -g mysql mysql ----在mysql用户组中添加一个mysql用户 3.配置，生成并安装MySQL >cmake -D
spring------>>cvc-elt.1: Cannot find the declaration of element Array_06 spring bean
将-------- <?xml version="1.0" encoding="UTF-8"?> <beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3
maven发布第三方jar的一些问题 cugfy maven
maven中发布第三方jar到nexus仓库使用的是 deploy:deploy-file命令有许多参数，具体可查看 http://maven.apache.org/plugins/maven-deploy-plugin/deploy-file-mojo.html 以下是一个例子： mvn deploy:deploy-file -DgroupId=xpp3
MYSQL下载及安装 357029540 mysql
好久没有去安装过MYSQL，今天自己在安装完MYSQL过后用navicat for mysql去厕测试链接的时候出现了10061的问题，因为的的MYSQL是最新版本为5.6.24，所以下载的文件夹里没有my.ini文件，所以在网上找了很多方法还是没有找到怎么解决问题，最后看到了一篇百度经验里有这个的介绍，按照其步骤也完成了安装，在这里给大家分享下这个链接的地址
ios TableView cell的布局张亚雄 tableview
cell.imageView.image = [UIImage imageNamed:[imageArray objectAtIndex:[indexPath row]]]; CGSize itemSize = CGSizeMake(60, 50); &nbs
Java编码转义 adminjun java 编码转义
import java.io.UnsupportedEncodingException; /** * 转换字符串的编码 */ public class ChangeCharset { /** 7位ASCII字符，也叫作ISO646-US、Unicode字符集的基本拉丁块 */ public static final Strin
Tomcat 配置和spring aijuans spring
简介 Tomcat启动时，先找系统变量CATALINA_BASE，如果没有，则找CATALINA_HOME。然后找这个变量所指的目录下的conf文件夹，从中读取配置文件。最重要的配置文件：server.xml 。要配置tomcat，基本上了解server.xml，context.xml和web.xml。 Server.xml -- tomcat主
Java打印当前目录下的所有子目录和文件 ayaoxinchao 递归 File
其实这个没啥技术含量，大湿们不要操笑哦，只是做一个简单的记录，简单用了一下递归算法。 import java.io.File; /** * @author Perlin * @date 2014-6-30 */ public class PrintDirectory { public static void printDirectory(File f
linux安装mysql出现libs报冲突解决 BigBird2012 linux
linux安装mysql出现libs报冲突解决安装mysql出现 file /usr/share/mysql/ukrainian/errmsg.sys from install of MySQL-server-5.5.33-1.linux2.6.i386 conflicts with file from package mysql-libs-5.1.61-4.el6.i686
jedis连接池使用实例 bijian1013 redis jedis连接池 jedis
实例代码： package com.bijian.study; import java.util.ArrayList; import java.util.List; import redis.clients.jedis.Jedis; import redis.clients.jedis.JedisPool; import redis.clients.jedis.JedisPoo
关于朋友 bingyingao 朋友兴趣爱好维持
成为朋友的必要条件：志相同，道不合，可以成为朋友。譬如马云、周星驰一个是商人，一个是影星，可谓道不同，但都很有梦想，都要在各自领域里做到最好，当他们遇到一起，互相欣赏，可以畅谈两个小时。志不同，道相合，也可以成为朋友。譬如有时候看到两个一个成绩很好每次考试争做第一，一个成绩很差的同学是好朋友。他们志向不相同，但他
【Spark七十九】Spark RDD API一 bit1129 spark
aggregate package spark.examples.rddapi import org.apache.spark.{SparkConf, SparkContext} //测试RDD的aggregate方法 object AggregateTest { def main(args: Array[String]) { val conf = new Spar
ktap 0.1 released bookjovi kernel tracing
Dear, I'm pleased to announce that ktap release v0.1, this is the first official release of ktap project, it is expected that this release is not fully functional or very stable and we welcome bu
能保存Properties文件注释的Properties工具类 BrokenDreams properties
今天遇到一个小需求：由于java.util.Properties读取属性文件时会忽略注释，当写回去的时候，注释都没了。恰好一个项目中的配置文件会在部署后被某个Java程序修改一下，但修改了之后注释全没了，可能会给以后的参数调整带来困难。所以要解决这个问题。 &nb
读《研磨设计模式》-代码笔记-外观模式-Facade bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* * 百度百科的定义： * Facade（外观）模式为子系统中的各类（或结构与方法）提供一个简明一致的界面， * 隐藏子系统的复杂性，使子系统更加容易使用。他是为子系统中的一组接口所提供的一个一致的界面 * * 可简单地
After Effects教程收集 cherishLC After Effects
1、中文入门 http://study.163.com/course/courseMain.htm?courseId=730009 2、videocopilot英文入门教程（中文字幕） http://www.youku.com/playlist_show/id_17893193.html 英文原址： http://www.videocopilot.net/basic/ 素
Linux Apache 安装过程 crabdave apache
Linux Apache 安装过程下载新版本： apr-1.4.2.tar.gz（下载网站：http://apr.apache.org/download.cgi） apr-util-1.3.9.tar.gz（下载网站：http://apr.apache.org/download.cgi） httpd-2.2.15.tar.gz（下载网站：http://httpd.apac
Shell学习之变量赋值和引用 daizj shell 变量引用赋值
本文转自：http://www.cnblogs.com/papam/articles/1548679.html Shell编程中，使用变量无需事先声明，同时变量名的命名须遵循如下规则：首个字符必须为字母（a-z，A-Z）中间不能有空格，可以使用下划线（_）不能使用标点符号不能使用bash里的关键字（可用help命令查看保留关键字）需要给变量赋值时，可以这么写：
Java SE 第一讲（Java SE入门、JDK的下载与安装、第一个Java程序、Java程序的编译与执行） dcj3sjt126com java jdk
Java SE 第一讲： Java SE：Java Standard Edition Java ME: Java Mobile Edition Java EE：Java Enterprise Edition Java是由Sun公司推出的（今年初被Oracle公司收购）。收购价格：74亿美金 J2SE、J2ME、J2EE JDK：Java Development
YII给用户登录加上验证码 dcj3sjt126com yii
1、在SiteController中添加如下代码： /** * Declares class-based actions. */ public function actions() { return array( // captcha action renders the CAPTCHA image displ
Lucene使用说明 dyy_gusi Lucene search 分词器
Lucene使用说明 1、lucene简介 1.1、什么是lucene Lucene是一个全文搜索框架，而不是应用产品。因此它并不像baidu或者googleDesktop那种拿来就能用，它只是提供了一种工具让你能实现这些产品和功能。 1.2、lucene能做什么要回答这个问题，先要了解lucene的本质。实际
学习编程并不难,做到以下几点即可! gcq511120594 数据结构编程算法
不论你是想自己设计游戏，还是开发iPhone或安卓手机上的应用，还是仅仅为了娱乐，学习编程语言都是一条必经之路。编程语言种类繁多，用途各异，然而一旦掌握其中之一，其他的也就迎刃而解。作为初学者，你可能要先从Java或HTML开始学，一旦掌握了一门编程语言，你就发挥无穷的想象，开发各种神奇的软件啦。 1、确定目标学习编程语言既充满乐趣，又充满挑战。有些花费多年时间学习一门编程语言的大学生到
Java面试十问之三：Java与C++内存回收机制的差别 HNUlanwei java C++finalize()堆栈内存回收
大家知道， Java 除了那 8 种基本类型以外，其他都是对象类型（又称为引用类型）的数据。 JVM 会把程序创建的对象存放在堆空间中，那什么又是堆空间呢？其实，堆（ Heap）是一个运行时的数据存储区，从它可以分配大小各异的空间。一般，运行时的数据存储区有堆（ Heap）和堆栈（ Stack），所以要先看它们里面可以分配哪些类型的对象实体，然后才知道如何均衡使用这两种存储区。一般来说，栈中存放的
第二章 Nginx+Lua开发入门 jinnianshilongnian nginx lua
Nginx入门本文目的是学习Nginx+Lua开发，对于Nginx基本知识可以参考如下文章： nginx启动、关闭、重启 http://www.cnblogs.com/derekchen/archive/2011/02/17/1957209.html agentzh 的 Nginx 教程 http://openresty.org/download/agentzh-nginx-tutor
MongoDB windows安装基本命令 liyonghui160com
windows安装安装目录： D:\MongoDB\ 新建目录 D:\MongoDB\data\db 4.启动进城： cd D:\MongoDB\bin mongod -dbpath D:\MongoDB\data\db &n
Linux下通过源码编译安装程序 pda158 linux
一、程序的组成部分　　Linux下程序大都是由以下几部分组成：　　二进制文件：也就是可以运行的程序文件　　库文件：就是通常我们见到的lib目录下的文件　　配置文件：这个不必多说，都知道　　帮助文档：通常是我们在linux下用man命令查看的命令的文档　　二、linux下程序的存放目录　　linux程序的存放目录大致有三个地方：　　/etc, /b
WEB开发编程的职业生涯４个阶段 shw3588 编程 Web 工作生活
觉得自己什么都会 2007年从学校毕业，凭借自己原创的ASP毕业设计，以为自己很厉害似的，信心满满去东莞找工作，找面试成功率确实很高，只是工资不高，但依旧无法磨灭那过分的自信，那时候什么考勤系统、什么OA系统、什么ERP，什么都觉得有信心，这样的生涯大概持续了约一年。根本不是自己想的那样 2008年开始接触很多工作相关的东西，发现太多东西自己根本不会，都需要去学，不管是asp还是js，
遭遇jsonp同域下变作post请求的坑 vb2005xu jsonp 同域post
今天迁移一个站点时遇到一个坑爹问题,同一个jsonp接口在跨域时都能调用成功,但是在同域下调用虽然成功,但是数据却有问题. 此处贴出我的后端代码片段 $mi_id = htmlspecialchars(trim($_GET['mi_id '])); $mi_cv = htmlspecialchars(trim($_GET['mi_cv '])); 贴出我前端代码片段: $.aj