Hadoop学习笔记之HDFS读取


通过java接口下载文件

//下载文件,获取FileSystem的实例,FileSystem是抽象类,其实是获取DistributedFileSystem
    FileSystem fs = FileSystem.get(new URI("hdfs://itcast01:9000"),new Configuration());
    //Returns the FileSystem for this URI's scheme and authority.

	//通过open()方法获取文件的输入流
	InputStream in = fs.open(new Path("/jdk1.7"));

	OutputStream out  = new FileOutputStream("E://jdk1.7");
	
   //拷贝字节流,缓冲区大小为4096,true表示拷贝完成将数据流关闭
		IOUtils.copyBytes(in,out,4096,true);

上传文件

    //需要指定用户,HDFS默认的是仅root可写,可改
    FileSystem fs = FileSystem.get(new URI("hdfs://itcast01:9000"),new Configuration(),”root”);

    //获取本地文件系统的文件,返回输入流
	InputStream in = new FileInputStream("E://alive.mp4");
	
	//在HDFS上创建一个文件,返回其输出流
	OutputStream out = fs.create(new Path("/alive"));
	
	//输出到输入
	IOUtils.copyBytes(in, out, 4096, true);
	

其他操作

    //删除文件
	boolean flag1 = fs.delete(new Path("/alive"), false);//true代表递归删除
		
    //创建文件夹
    boolean flag2 = fs.mkdirs(new Path("/home"));

这里想说的是上传与下载的两个简便方法
	//上传文件
	fs.copyFromLocalFile(new Path("E://alive.mp4"), new Path("/al"));
	
	//下载文件
	fs.copyToLocalFile(new Path("/jdk1.7"), new Path("f://jkd"));
同样的代码,为什么下载会出现空指针错误
查看源代码之后,发现copyFromLocalFile默认的调用以下的函数
 /**
   * The src file is on the local disk.  Add it to FS at
   * the given dst name and the source is kept intact afterwards
   * @param src path
   * @param dst path
   */
  public void copyFromLocalFile(Path src, Path dst)
    throws IOException {
    copyFromLocalFile(false, src, dst);
  }
  
  
  /**
   * The src file is on the local disk.  Add it to FS at
   * the given dst name.
   * delSrc indicates if the source should be removed
   * @param delSrc whether to delete the src
   * @param src path
   * @param dst path
   */
  public void copyFromLocalFile(boolean delSrc, Path src, Path dst)
    throws IOException {
    copyFromLocalFile(delSrc, true, src, dst);
  }
  
  
   /**
   * The src file is on the local disk.  Add it to FS at
   * the given dst name.
   * delSrc indicates if the source should be removed
   * @param delSrc whether to delete the src
   * @param overwrite whether to overwrite an existing file
   * @param src path
   * @param dst path
   */
  public void copyFromLocalFile(boolean delSrc, boolean overwrite, 
                                Path src, Path dst)
    throws IOException {
    Configuration conf = getConf();
    FileUtil.copy(getLocal(conf), src, this, dst, delSrc, overwrite, conf);
  }
而copyToLocalFile则会调用以下函数
/**
   * The src file is under FS, and the dst is on the local disk.
   * Copy it from FS control to the local dst name.
   * @param src path
   * @param dst path
   */
  public void copyToLocalFile(Path src, Path dst) throws IOException {
    copyToLocalFile(false, src, dst);
  }
  
  /**
   * The src file is under FS, and the dst is on the local disk.
   * Copy it from FS control to the local dst name.
   * delSrc indicates if the src will be removed or not.
   * @param delSrc whether to delete the src
   * @param src path
   * @param dst path
   */   
  public void copyToLocalFile(boolean delSrc, Path src, Path dst)
    throws IOException {
    copyToLocalFile(delSrc, src, dst, false);
  }
  /**
   * The src file is under FS, and the dst is on the local disk. Copy it from FS
   * control to the local dst name. delSrc indicates if the src will be removed
   * or not. useRawLocalFileSystem indicates whether to use RawLocalFileSystem
   * as local file system or not. RawLocalFileSystem is non crc file system.So,
   * It will not create any crc files at local.
   * 
   * @param delSrc    whether to delete the src
   * @param src       path
   * @param dst       path
   * @param useRawLocalFileSystem
   *          whether to use RawLocalFileSystem as local file system or not.

   */
  public void copyToLocalFile(boolean delSrc, Path src, Path dst,
      boolean useRawLocalFileSystem) throws IOException {
    Configuration conf = getConf();
    FileSystem local = null;
    if (useRawLocalFileSystem) {
      local = getLocal(conf).getRawFileSystem();
    } else {
      local = getLocal(conf);
    }
    FileUtil.copy(this, src, local, dst, delSrc, conf);
  }
它们的delSrc都是是否删除源文件
唯一不同的是上传多出来的一个boolean是询问是否覆盖已存在的文件
而下载多出来的一个boolean是是否创建本地文件系统,恰好是false
如果下载写成下面这样便会通过编译
fs.copyToLocalFile(false,new Path("/jdk1.7"), new Path("f://jkd"),true);
那么问题就出现在这个useRawLocalFileSystem上
“下载”源代码的42和44行,说明了如果useRawLocalFileSystem为真就会调用一个getRawFileSystem()方法
API上说这个方法将会返回一个本地文件系统
在这个博文 http://blog.csdn.net/shirdrn/article/details/4574402
上看到RawLocalSystem类是一个本地文件系统及详细的一些解释
而我猜想或许下载可以往多台机器上下载,不一定是本地机器,
所以当下载到本地机器上的时候,需要将useRawLocalFileSystem参数改为true
返回一个本地文件系统,否则默认是false,会抛出空指针异常

你可能感兴趣的:(Hadoop之路)