读代码-BayesFileFormatter

用到: 文件读写,文件夹下遍历文件处理

package org.apache.mahout.classifier;
public final class BayesFileFormatter

提供了两个处理方式
将文件夹下所有文件处理后写入单一文档,和文件分别写入文档

单文档
  public static void collapse(String label, Analyzer analyzer, File inputDir,
                              Charset charset, File outputFile) throws IOException {
    Writer writer = Files.newWriter(outputFile, charset);
    try {
      inputDir.listFiles(new FileProcessor(label, analyzer, charset, writer));
      // listFiles() is called here as a way to recursively visit files,
      // actually
    } finally {
      IOUtils.quietClose(writer);
    }
  }



多文档
  public static void format(String label, Analyzer analyzer, File input,
                            Charset charset, File outDir) throws IOException {
    if (input.isDirectory()) {
      input.listFiles(new FileProcessor(label, analyzer, charset, outDir));
    } else {
      Writer writer = Files.newWriter(new File(outDir, input.getName()), charset);
      try {
        writeFile(label, analyzer, input, charset, writer);
      } finally {
        IOUtils.quietClose(writer);
      }
    }
  }



处理都涉及对文件的遍历
实现的核心是利用已有的listFile方法,改写FileFilter,FileFilter原用于文件过滤,现在加入处理过程,内部类FileProcessor实现了FileFilter接口。

处理后写入单文件还是分别写入取决于writer的目标,通过控制writer达到不同效果。
    @Override
    public boolean accept(File file) {
      if (file.isFile()) {
        Writer theWriter = null;
        try {
          if (writer == null) {
            theWriter = Files.newWriter(new File(outputDir, file.getName()), charset);
          } else {
            theWriter = writer;
          }
          writeFile(label, analyzer, file, charset, theWriter);
          if (writer != null) {
            // just write a new line
            theWriter.write('\n');
          }
        } catch (IOException e) {
          // TODO: report failed files instead of throwing exception
          throw new IllegalStateException(e);
        } finally {
          if (writer == null) {
            IOUtils.quietClose(theWriter);
          }
        }
      } else {
        file.listFiles(this);
      }
      return false;
    }
  }


你可能感兴趣的:(format)