hdfs 常用java API---代码篇(一)

一、hadoop环境准备(我安装的hadoop版本是hadoop-1.2.1)


     首先要配置hadoop环境变量HADOOP_CLASSPATH(我的hadoop安装在了/home/grid/hadoop-1.2.1):


 # Set Hadoop-specific environment variables here.


# The only required environment variable is JAVA_HOME.  All others are

# optional.  When running a distributed configuration it is best to

# set JAVA_HOME in this file, so that it is correctly defined on

# remote nodes.


# The java implementation to use.  Required.

export JAVA_HOME=/usr/jdk1.6.0_45


# Extra Java CLASSPATH elements.  Optional.

export HADOOP_CLASSPATH=/home/grid/hadoop-1.2.1/myclass


我在hadoop安装目录下创建了myclass目录

 

 确认hadoop_classpath环境变量生效

[grid@h1 conf]$ source hadoop-env.sh

[grid@h1 conf]$ echo $HADOOP_CLASSPATH

/home/grid/hadoop-1.2.1/myclass


二、编写并编译URLCat源代码

   1、代码功能

     调用hadoop的java API接口,读取hdfs中的文件

    hadoop的javaAPI说明可以参考http://hadoop.apache.org/docs/stable/api/index.html 

  2、代码内容

 

   [grid@h1 myclass]$ cat  URLCat.java

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;

import org.apache.hadoop.io.IOUtils;

import java.io.InputStream;

import java.net.URL;


public class URLCat {

    static

        {

        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

         }

        public static void main(String[] args) throws Exception

        {

                InputStream in = null;  //定义一个输入stream in

        try {

                in = new URL(args[0]).openStream(); // 传递参数给 in

                IOUtils.copyBytes(in, System.out, 4096, false);  //读取IO in的内容并打印到屏幕(System.out)

        }

        finally {

        IOUtils.closeStream(in);

        }

    }  

}


3、代码编译及运行

[grid@h1 myclass]$ javac URLCat.java

URLCat.java:1: 软件包 org.apache.hadoop.fs 不存在

import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;

                           ^

URLCat.java:2: 软件包 org.apache.hadoop.io 不存在

import org.apache.hadoop.io.IOUtils;

                           ^

URLCat.java:9: 找不到符号

符号: 类 FsUrlStreamHandlerFactory

位置: 类 URLCat

        URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());

                                           ^

URLCat.java:16: 找不到符号

符号: 变量 IOUtils

位置: 类 URLCat

                IOUtils.copyBytes(in, System.out, 4096, false);

                ^

URLCat.java:19: 找不到符号

符号: 变量 IOUtils

位置: 类 URLCat

        IOUtils.closeStream(in);

        ^

5 错误


需要指定classpath进行编译


[grid@h1 myclass]$ javac -cp  /home/grid/hadoop-1.2.1/hadoop-core-1.2.1.jar  URLCat.java

[grid@h1 myclass]$ ls

URLCat.class  URLCat.java


编译成功在$HADOOP_CLASSPATH目录下产生了.class的编译文件


运行代码:

[grid@h1 myclass]$ hadoop fs -ls in

Found 4 items

-rw-r--r--   2 grid supergroup        101 2013-08-24 12:50 /user/grid/in/VERSION

-rw-r--r--   2 grid supergroup          7 2013-08-23 21:30 /user/grid/in/test3.txt

-rw-r--r--   2 grid supergroup         12 2013-08-23 21:30 /user/grid/in/text1.txt

-rw-r--r--   2 grid supergroup         13 2013-08-23 21:30 /user/grid/in/text2.txt



[grid@h1 myclass]$ hadoop fs -cat in/test3.txt

hadoop

[grid@h1 myclass]$ hadoop URLCat hdfs://h1:9000/user/grid/in/test3.txt

hadoop


4、代码用到的API解读


1、hadoop基础类


org.apache.hadoop.fs Class FsUrlStreamHandlerFactory

java.lang.Object

  


org.apache.hadoop.fs.FsUrlStreamHandlerFactory



All Implemented Interfaces:

URLStreamHandlerFactory


public class FsUrlStreamHandlerFactory


extends Object

implements URLStreamHandlerFactory

Factory for URL stream handlers. There is only one handler whose job is to create UrlConnections. A FsUrlConnection relies on FileSystem to choose the appropriate FS implementation. Before returning our handler, we make sure that FileSystem knows an implementation for the requested scheme/protocol.


FsUrlStreamHandlerFactory:URL流处理工具,具体的作用要使用java.net.URL的 setURLStreamHandlerFactory()方法设置URLStreamHandlerFactory,这时需要传递一个 FsUrlStreamHandlerFactory。这个操作对一个jvm只能使用一次。



org.apache.hadoop.io Class IOUtils

java.lang.Object

  


org.apache.hadoop.io.IOUtils




public class IOUtils


extends Object

An utility class for I/O related functionality.


一个I/O相关功能的实用类

你可能感兴趣的:(java,hadoop,api,源代码,hdfs)