HDFS 读取、写入、遍历目录获取文件全路径、append

1、从HDFS中读取数据

Configuration conf = getConf();
  Path path = new Path(pathstr); 
  FileSystem fs = FileSystem.get(conf);
   FSDataInputStream fsin= fs.open(path ); 
   BufferedReader br =null;
   String line ;
   try{
    br = new BufferedReader(new InputStreamReader(fsin));
       while ((line = br.readLine()) != null) {
         System.out.println(line);
        } 
   }finally{
    br.close();
   }


2、写HDFS

  Configuration conf = getConf();
  Path path = new Path(mid_sort); 
  FileSystem fs = FileSystem.get(conf); 
  FSDataOutputStream out = fs.create(resultpath);
  out.write(sb.toString().getBytes());
  out.close();


3、遍历目录 获取文件 全路径


/**
  * 得到一个目录(不包括子目录)下的所有名字匹配上pattern的文件名
  * @param fs
  * @param folderPath
  * @param pattern 用于匹配文件名的正则
  * @return
  * @throws IOException
  */
 public static List getFilesUnderFolder(FileSystem fs, Path folderPath, String pattern) throws IOException {
  List paths = new ArrayList();
  if (fs.exists(folderPath)) {
   FileStatus[] fileStatus = fs.listStatus(folderPath);
   for (int i = 0; i < fileStatus.length; i++) {
    FileStatus fileStatu = fileStatus[i];
    if (!fileStatu.isDir()) {//只要文件
     Path oneFilePath = fileStatu.getPath();
     if (pattern == null) {
      paths.add(oneFilePath);
     } else {
      if (oneFilePath.getName().contains(pattern)) {
       paths.add(oneFilePath);
      }
     }  
    }
   }
  }
  return paths;
 }

4、追加数据 append

  public static boolean appendRTData(String hdfsFile, String appendFile) {
    boolean flag = false;

    Configuration conf = new Configuration();
    FileSystem fs = null;
    try {
      fs = FileSystem.get(URI.create(hdfsFile), conf);
      InputStream in = new BufferedInputStream(new FileInputStream(appendFile));
      OutputStream out = fs.append(new Path(hdfsFile));
      IOUtils.copyBytes(in, out, 4096, true);
    } catch (IOException e) {
      e.printStackTrace();
    }

    return flag;
  }


***********************************************************************************************************************************************

***********************************************************************************************************************************************

异常信息

1、Exception in thread "main" java.lang.IllegalArgumentException: java.net.UnknownHostException: ns6

原因是没有加载hdfs的配置信息,需要添加下面的代码:

conf.addResource(new Path("/xxxx/hdfs-site.xml"));//path是配置文件地址
如果配置了环境变量可以在不同的机器上使用:

conf.addResource(new Path(System.getenv("HADOOP_CONF") + "/hdfs-site.xml"));



你可能感兴趣的:(Hadoop学习与使用)