WebHDFS vs HttpFS GateWay

基于hadoop2.7.1版本

 

一、简介

 

1、 WebHDFS官方简介:

 

Introduction

 

The HTTP REST API supports the complete FileSystem/FileContext interface for HDFS.

 

2、HttpFS GateWay官方简介:

 

HttpFS is a server that provides a REST HTTP gateway supporting all HDFS File System operations (read and write). And it is inteoperable with the webhdfs REST HTTP API.

 

HttpFS can be used to transfer data between clusters running different versions of Hadoop (overcoming RPC versioning issues), for example using Hadoop DistCP.

 

HttpFS can be used to access data in HDFS on a cluster behind of a firewall (the HttpFS server acts as a gateway and is the only system that is allowed to cross the firewall into the cluster).

 

HttpFS can be used to access data in HDFS using HTTP utilities (such as curl and wget) and HTTP libraries Perl from other languages than Java.

 

The webhdfs client FileSytem implementation can be used to access HttpFS using the Hadoop filesystem command (hadoop fs) line tool as well as from Java aplications using the Hadoop FileSystem Java API.

 

HttpFS has built-in security supporting Hadoop pseudo authentication and HTTP SPNEGO Kerberos and other pluggable authentication mechanims. It also provides Hadoop proxy user support.

 

二、使用原因:

 

二者都提供基于REST的API,这使得一个集群外的host可以不用安装HADOOP和JAVA环境就可以对集群内的HADOOP进行访问,并且client不受语言的限制。

 

三、两者的区别:

 

1、WebHDFS是HDFS内置的、默认开启的一个服务,而HttpFS是HDFS一个独立的服务,若使用需要配置并手动开启。 

2、HttpFS重在后面的GateWay。即WebHDFS面向的是集群中的所有节点,首先通过namenode,然后转发到相应的datanode,而HttpFS面向的是集群中的一个节点(相当于该节点被配置为HttpFS的GateWay) 

3、WebHDFS是HortonWorks开发的,然后捐给了Apache;而HttpFS是Cloudera开发的,也捐给了Apache。

 

四、使用步骤:

 

1、使用WebHDFS的步骤:

 

(1)WebHDFS服务内置在HDFS中,不需额外安装、启动。需要在hdfs-site.xml打开WebHDFS开关,此开关默认打开。

 

<property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

</property>

 

(2)连接NameNode的50070端口进行文件操作。

 

curl "http://ctrl:50070/webhdfs/v1/?op=liststatus&user.name=root"

 

2、使用HttpFS GateWay的步骤:

 

(1)根据需求配置:httpfs-site.xml 

(2)配置:hdfs-site.xml,需要增加如下配置,其他两个参数名称中的root代表的是启动hdfs服务的OS用户,应以实际的用户名称代替。

 

<property>  

    <name>hadoop.proxyuser.root.hosts</name>  

    <value>*</value>  

</property>  

<property>  

<name>hadoop.proxyuser.root.groups</name>  

    <value>*</value>  

</property>

 

(3)启动:

 

sbin/httpfs.sh start

sbin/httpfs.sh stop

 

启动后,默认监听14000端口:

 

[hadoop@master hadoop]# netstat -antp | grep 14000

tcp        0      0 :::14000   :::*       LISTEN      7415/java

[hadoop@master hadoop]#

 

(4)使用:

 

#curl -i -L "http://HttpFS_host:14000/webhdfs/v1/foo/bar?op=OPEN"  

 

参考文章:

 

《简单说说WebHDFS和HttpFS》

你可能感兴趣的:(WebHDFS vs HttpFS GateWay)