Hadoop分布式文件系统(HDFS)Java接口(HDFS Java API)详细版

误用聪明,何若一生守拙
滥交朋友,不如终日读书

相关连接

HDFS相关知识

  • Hadoop分布式文件系统(HDFS)快速入门
  • Hadoop分布式文件系统(HDFS)知识梳理(超详细)

Hadoop集群连接

  • Eclipse连接Hadoop集群
  • IntelliJ IDEA连接Hadoop集群

WordCount程序示例

使用Java API编写WordCount程序

HDFS Java API

代码下载

MyHadoop.java下载 提取码z458

具体介绍

注意:在使用Eclipse或者IntelliJ IDEA成功连接Hadoop集群后,方可进行如下操作

  • 本测试类类名为MyHadoop,其包含FileSystem类的属性fs和Configuration类的属性conf
  • 需要定义HDFSUtil()方法
  • 需要在主函数中加入System.setProperty(“HADOOP_USER_NAME”, ”root”);,以解决org.apache.hadoop.security.AccessControlException:Permission denied: user=...错误
package neu.software;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;

import java.io.IOException;

public class MyHadoop {
    private FileSystem fs;
    private Configuration conf;

    public static void main(String args[]) throws IOException {
        System.setProperty("HADOOP_USER_NAME","root");
        MyHadoop myHadoop = new MyHadoop();
        myHadoop.HDFSUtil();
    }
    public void HDFSUtil() throws IOException {
        conf = new org.apache.hadoop.conf.Configuration();
        fs = FileSystem.get(conf);
    }
}

1. 在HDFS中创建目录 /data/test

方法定义

public boolean mkdir(String path) throws IOException {
        Path srcPath = new Path(path);
        return fs.mkdirs(srcPath);
    }

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	myHadoop.mkdir("/data/test");
}

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -ls /data
Found 1 items
drwxr-xr-x   - root supergroup          0 2019-10-15 11:14 /data/test

2. 将本地文件夹mytest通过Java API上传到HDFS的 /data/test 目录中

方法定义

public void put(String src, String dst, boolean delSrc, boolean overwritted) throws IOException {
        Path srcPath = new Path(src);
        Path dstPath = new Path(dst);
        //调用文件系统的文件复制函数,delSrc参数指是否删除原文件,true为删除
        fs.copyFromLocalFile(delSrc, overwritted, srcPath, dstPath);
    }

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
}

注意:mytest文件夹中包含以下4个文件

  • data1.txt

Hello World

  • data2.txt

Hello Hadoop

  • data3.txt

Hello Java

  • data4.txt

Hello HDFS

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -ls /data/test/mytest
Found 4 items
-rw-r--r--   3 root supergroup         13 2019-10-15 11:30 /data/test/mytest/data1.txt
-rw-r--r--   3 root supergroup         14 2019-10-15 11:30 /data/test/mytest/data2.txt
-rw-r--r--   3 root supergroup         12 2019-10-15 11:30 /data/test/mytest/data3.txt
-rw-r--r--   3 root supergroup         12 2019-10-15 11:30 /data/test/mytest/data4.txt

3. 查看 /data/test/mytest 目录下的文件列表

方法定义

public List<String> ls (String filePath, String ext) throws IOException {
        List<String> listDir = new ArrayList<String>();
        Path path = new Path(filePath);
        RemoteIterator<LocatedFileStatus> it = fs.listFiles(path, true);
        while(it.hasNext()) {
            String name = it.next().getPath().toString();
            if(name.endsWith(ext)) {
                listDir.add(name);
            }
        }
        return listDir;
    }

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	List<String> list = myHadoop.ls("/data/test/mytest","");
	for(String file: list){
		System.out.println(file);
	}
}

结果验证(IDE标准输出窗口)


hdfs://master:9000/data/test/mytest/data1.txt
hdfs://master:9000/data/test/mytest/data2.txt
hdfs://master:9000/data/test/mytest/data3.txt
hdfs://master:9000/data/test/mytest/data4.txt

4. 统计 /data/test/mytest 目录下文件数和空间占用情况

方法定义

public String count(String filePath) throws IOException {
	Path path = new Path(filePath);
	ContentSummary contentSummary = fs.getContentSummary(path);
	return contentSummary.toString();
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
	//System.out.println(file);
	//}
	System.out.println(myHadoop.count("/data/test/mytest"));
    }

结果验证(IDE标准输出窗口)


DEBUG - Call: getContentSummary took 163ms
none inf none inf 1 4 51

5. 递归将 /data/test/mytest 下的文件拥有者修改为admin

方法定义

public void chown(String filePath, String username, String groupname) throws IOException {
	Path path = new Path(filePath);
	RemoteIterator<LocatedFileStatus> it = fs.listFiles(path, true);
	while(it.hasNext()) {
		fs.setOwner(it.next().getPath(), username, groupname);
	}
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
	//System.out.println(file);
	//}
	//System.out.println(myHadoop.count("/data/test/mytest"));
	myHadoop.chown("/data/test/mytest/", "admin", "supergroup");
}

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -ls /data/test/mytest
Found 4 items
-rw-r--r--   3 admin supergroup         13 2019-10-15 11:30 /data/test/mytest/data1.txt
-rw-r--r--   3 admin supergroup         14 2019-10-15 11:30 /data/test/mytest/data2.txt
-rw-r--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data3.txt
-rw-r--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data4.txt

6. 递归将 /data/test/mytest 下的文件ACL权限修改为只能自己读写执行,组和其他用户只可以读

方法定义

public void chmod(Path src, String mode) throws IOException {
	FsPermission fp = new FsPermission(mode);
	RemoteIterator<LocatedFileStatus> it = fs.listFiles(src, true);
	while(it.hasNext()) {
		fs.setPermission(it.next().getPath(), fp);
	}
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
		//System.out.println(file);
	//}
	//System.out.println(myHadoop.count("/data/test/mytest"));
	//myHadoop.chown("/data/test/mytest/", "admin", "supergroup");
	Path path = new Path("/data/test/mytest/");
	myHadoop.chmod(path, "744");
}

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -ls /data/test/mytest
Found 4 items
-rwxr--r--   3 admin supergroup         13 2019-10-15 11:30 /data/test/mytest/data1.txt
-rwxr--r--   3 admin supergroup         14 2019-10-15 11:30 /data/test/mytest/data2.txt
-rwxr--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data3.txt
-rwxr--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data4.txt

7. 在 /data/test/mytest 目录下创建一个空文件 empty.txt

方法定义

public void touchz(String filePath, String fileName) throws IOException {
	Path path = new Path(filePath, fileName);
	fs.create(path);
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
		//System.out.println(file);
	//}
	//System.out.println(myHadoop.count("/data/test/mytest"));
	//myHadoop.chown("/data/test/mytest/", "admin", "supergroup");
	//Path path = new Path("/data/test/mytest/");
	//myHadoop.chmod(path, "744");
	myHadoop.touchz("/data/test/mytest/", "empty.txt");
}

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -ls /data/test/mytest
Found 5 items
-rwxr--r--   3 admin supergroup         13 2019-10-15 11:30 /data/test/mytest/data1.txt
-rwxr--r--   3 admin supergroup         14 2019-10-15 11:30 /data/test/mytest/data2.txt
-rwxr--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data3.txt
-rwxr--r--   3 admin supergroup         12 2019-10-15 11:30 /data/test/mytest/data4.txt
-rw-r--r--   3 root  supergroup          0 2019-10-15 12:05 /data/test/mytest/empty.txt

8. 向 /data/test/mytest/empty.txt 中追加其他文件内容

方法定义

public boolean appendToFile (InputStream in, String filePath) throws IOException {
	conf.setBoolean("dfs.support.append", true);
	if(!check(filePath)) {
		fs.createNewFile(new Path(filePath));
	}
	OutputStream out = fs.append(new Path(filePath));
	IOUtils.copyBytes(in, out, 10, true);
	in.close();
	out.close();
	fs.close();
	return true;
}
private boolean check(String filePath) throws IOException {
	Path path = new Path(filePath);
	boolean isExists = fs.exists(path);
	return isExists;
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
		//System.out.println(file);
	//}
	//System.out.println(myHadoop.count("/data/test/mytest"));
	//myHadoop.chown("/data/test/mytest/", "admin", "supergroup");
	//Path path = new Path("/data/test/mytest/");
	//myHadoop.chmod(path, "744");
	//myHadoop.touchz("/data/test/mytest/", "empty.txt");
	File file =new File("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest\\data1.txt");
	FileInputStream fileInputStream = new FileInputStream(file);
	myHadoop.appendToFile(fileInputStream, "/data/test/mytest/empty.txt");
	fileInputStream.close();
}

结果验证(XShell命令窗口)

[root@master ~]# hadoop fs -cat /data/test/mytest/empty.txt
Hello World

9. 查看 /data/test/mytest/empty.txt中的内容

方法定义

public void cat(String filePath) throws IOException {
	Path path = new Path(filePath);
	if(!check(filePath)) {
		fs.createNewFile(new Path(filePath));
	}
	FSDataInputStream fsDataInputStream = fs.open(path);
	IOUtils.copyBytes(fsDataInputStream, System.out, 10, false);
}

方法测试

public static void main(String args[]) throws IOException {
	System.setProperty("HADOOP_USER_NAME","root");
	MyHadoop myHadoop = new MyHadoop();
	myHadoop.HDFSUtil();
	//myHadoop.mkdir("/data/test"); //不能二次执行
	//myHadoop.put("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest", "/data/test/", false, true);
	//List list = myHadoop.ls("/data/test/mytest","");
	//for(String file: list){
		//System.out.println(file);
	//}
	//System.out.println(myHadoop.count("/data/test/mytest"));
	//myHadoop.chown("/data/test/mytest/", "admin", "supergroup");
	//Path path = new Path("/data/test/mytest/");
	//myHadoop.chmod(path, "744");
	//myHadoop.touchz("/data/test/mytest/", "empty.txt");
	//File file =new File("C:\\Users\\Lenovo\\Desktop\\localfile\\mytest\\data1.txt");
	//FileInputStream fileInputStream = new FileInputStream(file);
	//myHadoop.appendToFile(fileInputStream, "/data/test/mytest/empty.txt");
	//fileInputStream.close();
	myHadoop.cat("/data/test/mytest/empty.txt");
}

结果验证(IDE标准输出窗口)

DEBUG - SASL client skipping handshake in unsecured configuration for addr = /172.16.29.95, datanodeId = DatanodeInfoWithStorage[172.16.29.95:50010,DS-c67f1790-f7ea-4a0c-b564-f91a70d347e4,DISK]
Hello World

有疑问的朋友可以在下方留言或者私信我,我尽快回答
欢迎各路大神萌新指点、交流!
求关注!求点赞!求收藏!

你可能感兴趣的:(Hadoop,Java)