通过Java client远程访问Hadoop Hdfs

0. Prerequisite

One hadoop-2.8.0 server has started on a remote ubuntu server

$ netstat -lnpt | grep -i TCP | grep `jps | grep -w NameNode | awk '{print $1}'` | grep "LISTEN"
tcp        0      0 192.168.55.250:8020     0.0.0.0:*               LISTEN      319922/java         
tcp        0      0 192.168.55.250:50070    0.0.0.0:*               LISTEN      319922/java 

Note:
8020 is port for hadoop default file system uri, the url used by client should be "hdfs://192.168.55.250:8020"
50070 is port where dfs namenode web ui listens on

1. Install hadoop-2.8.0 client

1) download hadoop client binary from https://jar-download.com/artifacts/org.apache.hadoop/hadoop-client?p=4
2) extract binary
$ cd ~/learn/java/java8/
$ tar xvf hadoop-client-2.8.0.tar -C lib/

2. Write and test the Java Hdfs client

1) edit Java code
$ cat Hdfs.java

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;

import java.io.*;

public class Hdfs {
    FileSystem fileSystem;

    public Hdfs(String host, int port) throws IOException {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", String.format("hdfs://%s:%d", host, port));
        fileSystem = FileSystem.get(conf);
    }

    public void close() throws IOException {
        fileSystem.close();
    }

    public void createFile(String filePath, String text) throws IOException {
        java.nio.file.Path path = java.nio.file.Paths.get(filePath);
        Path dir = new Path(path.getParent().toString());
        fileSystem.mkdirs(dir);
        OutputStream os = fileSystem.create(new Path(dir, path.getFileName().toString()));
        BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os));
        writer.write(text);
        writer.close();
        os.close();
    }

    public String readFile(String filePath) throws IOException {
        InputStream in = fileSystem.open(new Path(filePath));
        byte[] buffer = new byte[256];
        int bytesRead = in.read(buffer);
        return new String(buffer, 0, bytesRead);
    }

    public boolean delFile(String filePath) throws IOException {
        return fileSystem.delete(new Path(filePath), false);
    }

    public static void main(String[] args) throws IOException {
        Hdfs fs = new Hdfs("192.168.55.250", 8020);
        fs.createFile("/tmp/output/hello.txt", "Hello Hadoop");
        System.out.println(fs.readFile("/tmp/output/hello.txt"));
        System.out.println(fs.delFile("/tmp/output/myfile.txt"));
        fs.close();
    }
}

2) compile class

$ javac -cp "lib/hadoop-client-2.8.0/*" Hdfs.java

3) run the test

$ java -cp "lib/hadoop-client-2.8.0/*:." Hdfs
...
Exception in thread "main" org.apache.hadoop.security.AccessControlException: Permission denied: user=sun_xo, access=WRITE, inode="/tmp":sunxo:supergroup:drwxr-xr-x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:310)
...

4) The error is due to user on client is different from user on server, the easiest way is change directory mode on server side

$ hdfs dfs -chmod 777 /tmp
$ hdfs dfs -ls /
drwxrwxrwx   - sunxo supergroup          0 2023-05-12 09:11 /tmp
drwxr-xr-x   - sunxo supergroup          0 2022-06-14 10:44 /user

5) rerun test, it works as following

$ java -cp "lib/hadoop-client-2.8.0/*:." Hdfs
Hello Hadoop
false

6) check result from server side 

$ hdfs dfs -ls /tmp/output
-rw-r--r--   3 sun_xo supergroup         12 2023-05-12 09:26 /tmp/output/hello.txt
$ hdfs dfs -cat /tmp/output/hello.txt
Hello Hadoop

The result can be checked at http://ubuntu:50070/explorer.html#/tmp/output as well

你可能感兴趣的:(hadoop,java,hdfs)