用到: 文件读写,文件夹下遍历文件处理
package org.apache.mahout.classifier;
public final class BayesFileFormatter
提供了两个处理方式
将文件夹下所有文件处理后写入单一文档,和文件分别写入文档
单文档
public static void collapse(String label, Analyzer analyzer, File inputDir,
Charset charset, File outputFile) throws IOException {
Writer writer = Files.newWriter(outputFile, charset);
try {
inputDir.listFiles(new FileProcessor(label, analyzer, charset, writer));
// listFiles() is called here as a way to recursively visit files,
// actually
} finally {
IOUtils.quietClose(writer);
}
}
多文档
public static void format(String label, Analyzer analyzer, File input,
Charset charset, File outDir) throws IOException {
if (input.isDirectory()) {
input.listFiles(new FileProcessor(label, analyzer, charset, outDir));
} else {
Writer writer = Files.newWriter(new File(outDir, input.getName()), charset);
try {
writeFile(label, analyzer, input, charset, writer);
} finally {
IOUtils.quietClose(writer);
}
}
}
处理都涉及对文件的遍历
实现的核心是利用已有的listFile方法,改写FileFilter,FileFilter原用于文件过滤,现在加入处理过程,内部类FileProcessor实现了FileFilter接口。
处理后写入单文件还是分别写入取决于writer的目标,通过控制writer达到不同效果。
@Override
public boolean accept(File file) {
if (file.isFile()) {
Writer theWriter = null;
try {
if (writer == null) {
theWriter = Files.newWriter(new File(outputDir, file.getName()), charset);
} else {
theWriter = writer;
}
writeFile(label, analyzer, file, charset, theWriter);
if (writer != null) {
// just write a new line
theWriter.write('\n');
}
} catch (IOException e) {
// TODO: report failed files instead of throwing exception
throw new IllegalStateException(e);
} finally {
if (writer == null) {
IOUtils.quietClose(theWriter);
}
}
} else {
file.listFiles(this);
}
return false;
}
}