Monospaced |
Used for commands, HTTP request and responses and code blocks. |
<Monospaced> |
User entered values. |
[Monospaced] |
Optional values. When the value is not specified, the default value is used. |
Italics |
Important phrases and words. |
HTTP REST API支持HDFSFileSystem/FileContext全部的API。HTTP操作和相应的FIleSystem/FileContext里的方法在下个部分展示。HTTP Query Parameter Dictionary部分详细的描述了默认值和有效值。
WebHDFS文件系统的scheme是“webhdfs://”。一个WebHDFS文件系统的URL有下面的格式:
下面是对应的HDFS的URL:
- hdfs://<HOST>:<RPC_PORT>/<PATH>
在REST API中,前缀” /webhdfs/v1”插入到path之前,一个query被增加到最后。因此,相应的HTTPURL有下面的格式:
- http://<HOST>:<HTTP_PORT>/webhdfs/v1/<PATH>?op=...
下面是HDFS配置中关于WebHDFS的配置属性:
Property Name |
Description |
dfs.webhdfs.enabled |
Enable/disable WebHDFS in Namenodes and Datanodes |
dfs.web.authentication.kerberos.principal |
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. |
dfs.web.authentication.kerberos.keytab |
The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. |
当security关闭的时候,认证的用户是在user.name查询参数中指定的用户。如果user.name参数没被设置,服务器设置认证用户为默认的web用户,如果是, if there is any, or return an error response。
当security开启时,认证通过Hadoop Delegation Token或者Kerberos SPNEGO执行。如果在delegation查询参数中设置了一个token,认证的用户就是编码进token的用户。如果delegation查询参数没有被设置,用户通过Kerberos SPNEGO认证。
下面是用curl命令工具的一下例子:
1. 当security关闭时的认证:
2. 当security开启时用Kerberos SPNEGO认证
3. 当security开启时用Hadoop Delegation Token认证
也可以查看:HTTP Authentication。
当代理用户特性开启时,一个代理用户P可以代表其他的用户U提交一个请求。U的用户名必须在doas查询参数中被指定,除非一个Delegation Token出现在认证中。在这种情况下,用户P和U的信息必须被编码进Delegation Token。
l 当security关闭时一个代理请求:
l 当security开启时用KerberosSPNEGO验证代理请求:
l 当security开启时用HadoopDelegation Token验证大力请求:
u Step1:提交一个HTTP PUT请求,没有自动接着重定向,也没有发送文件数据。
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=CREATE
- [&overwrite=<true|false>][&blocksize=<LONG>][&replication=<SHORT>]
- [&permission=<OCTAL>][&buffersize=<INT>]"
请求被重定向到要被写入数据的文件所在的DataNode:
- HTTP/1.1 307 TEMPORARY_REDIRECT
- Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE...
- Content-Length: 0
u Step2:用要被写入的文件数据,提交另一个HTTP PUT请求到上边返回的Header中的location的URL。
curl -i -X PUT -T <LOCAL_FILE> "http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=CREATE..."
客户端收到一个201 Created响应,content-length为0,location header是一个WebHDFS的URL。
- HTTP/1.1 201 Created
- Location: webhdfs://<HOST>:<PORT>/<PATH>
- Content-Length: 0
注意:分成create/append两步的原因是为了防止客户端在重定向之前发送数据。这个问题在HTTP/1.1中可以通过加入"Expect: 100-continue"头解决。不幸的是,还有一些软件库存在bug(例如Jetty 6HTTP Server and java 6 HTTP Client),它们没有正确的实现"Expect: 100-continue"。通过create/append两步是针对软件库bug一个临时的解决方案。
See also: overwrite, blocksize, replication, permission, buffersize, FileSystem.create
u Step1:提交一个HTTP POST请求,不会自动接着重定向,不发送文件数据:
请求重定向到要被附加数据的文件所在的datanode:
u Step2:使用上边的Location Header的URL提交另一个附加了要被增加到文件的数据的HTTP POST请求:
客户端收到一个content-length为0的响应:
可以参考上一部分的描述,为什么这个操作需要两步。
See also: buffersize, FileSystem.append
u 提交一个HTTP POST请求
客户端收到一个content-length=0的响应:
See also: sources, FileSystem.concat
u 提交一个自动重定向的HTTPGET请求
请求重定向到可以读到文件数据的DataNode:
客户端接着重定向到DataNode,然后读取文件数据:
See also: offset, length, buffersize, FileSystem.open
u 提交一个HTTP PUT请求
客户端收到一个Boolean JSON对象:
See also: permission, FileSystem.mkdirs
u 提交一个HTTP PUT请求。
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: destination, createParent, FileSystem.createSymlink
u 提交一个HTTP PUT请求。
客户端收到一个Boolean JSON对象:
See also: destination, FileSystem.rename
u 提交一个DELETE请求:
客户端收到一个Boolean JSON对象:
See also: recursive, FileSystem.delete
u 提交一个HTTP GET请求
客户端收到一个FIleStatus JSON对象的响应:
See also: FileSystem.getFileStatus
u 提交一个HTTP GET请求。
客户端收到一个FileStatuses JSON对象:
See also: FileSystem.listStatus
u 提交一个HTTP GET请求。
客户端收到一个ContentSummaryJSON对象:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "ContentSummary":
- {
- "directoryCount": 2,
- "fileCount" : 1,
- "length" : 24930,
- "quota" : -1,
- "spaceConsumed" : 24930,
- "spaceQuota" : -1
- }
- }
See also: FileSystem.getContentSummary
u 提交一个HTTP GET请求。
- curl -i "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM"
提交被重定向到一个DataNode。
- HTTP/1.1 307 TEMPORARY_REDIRECT
- Location: http://<DATANODE>:<PORT>/webhdfs/v1/<PATH>?op=GETFILECHECKSUM...
- Content-Length: 0
客户端跟着重定向到DataNode然后接收一个FileChecksum JSON对象:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "FileChecksum":
- {
- "algorithm": "MD5-of-1MD5-of-512CRC32",
- "bytes" : "eadb10de24aa315748930df6e185c0d ...",
- "length" : 28
- }
- }
See also: FileSystem.getFileChecksum
u 提交一个HTTP GET请求
- curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETHOMEDIRECTORY"
客户端收到一个PATH JSON对象的响应:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {"Path": "/user/szetszwo"}
See also: FileSystem.getHomeDirectory
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETPERMISSION
- [&permission=<OCTAL>]"
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: permission, FileSystem.setPermission
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETOWNER
- [&owner=<USER>][&group=<GROUP>]"
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: owner, group, FileSystem.setOwner
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETREPLICATION
- [&replication=<SHORT>]"
客户端收到一个Boolean JSON对象的请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETREPLICATION
- [&replication=<SHORT>]"
See also: replication, FileSystem.setReplication
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETTIMES
- [&modificationtime=<TIME>][&accesstime=<TIME>]"
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: modificationtime, accesstime, FileSystem.setTimes
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=MODIFYACLENTRIES
- &aclspec=<ACLSPEC>"
客户端收到一个content-length=0的请求
- HTTP/1.1 200 OK
- Content-Length: 0
See also: FileSystem.modifyAclEntries
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEACLENTRIES
- &aclspec=<ACLSPEC>"
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: FileSystem.removeAclEntries
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEDEFAULTACL"
客户端收到一个Content-Length=0的响应
- HTTP/1.1 200 OK
- Content-Length: 0
See also: FileSystem.removeDefaultAcl
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=REMOVEACL"
客户端收到一个Content-Length=0响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: FileSystem.removeAcl
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=SETACL
- &aclspec=<ACLSPEC>"
客户端收到一个Content-Length=0的请求
- HTTP/1.1 200 OK
- Content-Length: 0
See also: FileSystem.setAcl
u 提交一个HTTP GET请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/<PATH>?op=GETACLSTATUS"
客户端收到一个AclStatus JSON对象格式的响应:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "AclStatus": {
- "entries": [
- "user:carla:rw-",
- "group::r-x"
- ],
- "group": "supergroup",
- "owner": "hadoop",
- "stickyBit": false
- }
- }
See also: FileSystem.getAclStatus
u 提交一个HTTP GET 请求
- curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKEN&renewer=<USER>"
客户端收到一个Token JSON对象的请求
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "Token":
- {
- "urlString": "JQAIaG9y..."
- }
- }
See also: renewer, FileSystem.getDelegationToken
u 提交一个HTTP GET请求
- curl -i "http://<HOST>:<PORT>/webhdfs/v1/?op=GETDELEGATIONTOKENS&renewer=<USER>"
客户端收到一个Tokens JSON格式的响应:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "Tokens":
- {
- "Token":
- [
- {
- "urlString":"KAAKSm9i ..."
- }
- ]
- }
- }
See also: renewer, FileSystem.getDelegationTokens
u 提交一个HTTP PUT 请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=RENEWDELEGATIONTOKEN&token=<TOKEN>"
客户端收到一个long型JSON对象的响应:
- HTTP/1.1 200 OK
- Content-Type: application/json
- Transfer-Encoding: chunked
- {"long": 1320962673997} //the new expiration time
See also: token, FileSystem.renewDelegationToken
u 提交一个HTTP PUT请求
- curl -i -X PUT "http://<HOST>:<PORT>/webhdfs/v1/?op=CANCELDELEGATIONTOKEN&token=<TOKEN>"
客户端收到一个Content-Length=0的响应:
- HTTP/1.1 200 OK
- Content-Length: 0
See also: token, FileSystem.cancelDelegationToken
当一个操作失败,服务器可能会抛出一个错误。一个error响应的JSON格式定义在 RemoteExceptionJSON Schema中。下面的表格显示了exception到HTTP响应码的映射。
Exceptions |
HTTP Response Codes |
IllegalArgumentException |
400 Bad Request |
UnsupportedOperationException |
400 Bad Request |
SecurityException |
401 Unauthorized |
IOException |
403 Forbidden |
FileNotFoundException |
404 Not Found |
RumtimeException |
500 Internal Server Error |
下面是一些错误的响应的例子。
- HTTP/1.1 400 Bad Request
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "RemoteException":
- {
- "exception" : "IllegalArgumentException",
- "javaClassName": "java.lang.IllegalArgumentException",
- "message" : "Invalid value for webhdfs parameter \"permission\": ..."
- }
- }
- HTTP/1.1 401 Unauthorized
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "RemoteException":
- {
- "exception" : "SecurityException",
- "javaClassName": "java.lang.SecurityException",
- "message" : "Failed to obtain user group information: ..."
- }
- }
- HTTP/1.1 403 Forbidden
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "RemoteException":
- {
- "exception" : "AccessControlException",
- "javaClassName": "org.apache.hadoop.security.AccessControlException",
- "message" : "Permission denied: ..."
- }
- }
- HTTP/1.1 404 Not Found
- Content-Type: application/json
- Transfer-Encoding: chunked
- {
- "RemoteException":
- {
- "exception" : "FileNotFoundException",
- "javaClassName": "java.io.FileNotFoundException",
- "message" : "File does not exist: /foo/a.patch"
- }
- }
所有的操作,除了OPEN,要么返回一个长度为0的响应要么返回一个JSON响应。对于OPEN,响应是一个自己流。下面是JSON的模式。
注意,additionalProperties是默认值是一个空的模式,这允许为附加的属性设置任何值。因此,所有的WebHDFS JSON响应允许任何额外的属性。但是,如果响应中加入了额外的属性,为了保持兼容,它们被认为是可选的属性。
- {
- "name" : "AclStatus",
- "properties":
- {
- "AclStatus":
- {
- "type" : "object",
- "properties":
- {
- "entries":
- {
- "type": "array"
- "items":
- {
- "description": "ACL entry.",
- "type": "string"
- }
- },
- "group":
- {
- "description": "The group owner.",
- "type" : "string",
- "required" : true
- },
- "owner":
- {
- "description": "The user who is the owner.",
- "type" : "string",
- "required" : true
- },
- "stickyBit":
- {
- "description": "True if the sticky bit is on.",
- "type" : "boolean",
- "required" : true
- },
- }
- }
- }
- }
- {
- "name" : "boolean",
- "properties":
- {
- "boolean":
- {
- "description": "A boolean value",
- "type" : "boolean",
- "required" : true
- }
- }
- }
See also: MKDIRS, RENAME, DELETE, SETREPLICATION
- {
- "name" : "ContentSummary",
- "properties":
- {
- "ContentSummary":
- {
- "type" : "object",
- "properties":
- {
- "directoryCount":
- {
- "description": "The number of directories.",
- "type" : "integer",
- "required" : true
- },
- "fileCount":
- {
- "description": "The number of files.",
- "type" : "integer",
- "required" : true
- },
- "length":
- {
- "description": "The number of bytes used by the content.",
- "type" : "integer",
- "required" : true
- },
- "quota":
- {
- "description": "The namespace quota of this directory.",
- "type" : "integer",
- "required" : true
- },
- "spaceConsumed":
- {
- "description": "The disk space consumed by the content.",
- "type" : "integer",
- "required" : true
- },
- "spaceQuota":
- {
- "description": "The disk space quota.",
- "type" : "integer",
- "required" : true
- }
- }
- }
- }
- }
- {
- "name" : "FileChecksum",
- "properties":
- {
- "FileChecksum":
- {
- "type" : "object",
- "properties":
- {
- "algorithm":
- {
- "description": "The name of the checksum algorithm.",
- "type" : "string",
- "required" : true
- },
- "bytes":
- {
- "description": "The byte sequence of the checksum in hexadecimal.",
- "type" : "string",
- "required" : true
- },
- "length":
- {
- "description": "The length of the bytes (not the length of the string).",
- "type" : "integer",
- "required" : true
- }
- }
- }
- }
- }
See also: GETFILECHECKSUM
- {
- "name" : "FileStatus",
- "properties":
- {
- "FileStatus": fileStatusProperties //See FileStatus Properties
- }
- }
See also: FileStatus Properties, GETFILESTATUS, FileStatus
使用了Javascript的语法定义一个fileStatusProperties ,因此它可被在FileStatus和FileStatusesJSON模式中使用。
- var fileStatusProperties =
- {
- "type" : "object",
- "properties":
- {
- "accessTime":
- {
- "description": "The access time.",
- "type" : "integer",
- "required" : true
- },
- "blockSize":
- {
- "description": "The block size of a file.",
- "type" : "integer",
- "required" : true
- },
- "group":
- {
- "description": "The group owner.",
- "type" : "string",
- "required" : true
- },
- "length":
- {
- "description": "The number of bytes in a file.",
- "type" : "integer",
- "required" : true
- },
- "modificationTime":
- {
- "description": "The modification time.",
- "type" : "integer",
- "required" : true
- },
- "owner":
- {
- "description": "The user who is the owner.",
- "type" : "string",
- "required" : true
- },
- "pathSuffix":
- {
- "description": "The path suffix.",
- "type" : "string",
- "required" : true
- },
- "permission":
- {
- "description": "The permission represented as a octal string.",
- "type" : "string",
- "required" : true
- },
- "replication":
- {
- "description": "The number of replication of a file.",
- "type" : "integer",
- "required" : true
- },
- "symlink": //an optional property
- {
- "description": "The link target of a symlink.",
- "type" : "string"
- },
- "type":
- {
- "description": "The type of the path object.",
- "enum" : ["FILE", "DIRECTORY", "SYMLINK"],
- "required" : true
- }
- }
- };
一个FileStatuses JSON对象代表一个FileStatusJSON对象的数组。
- {
- "name" : "FileStatuses",
- "properties":
- {
- "FileStatuses":
- {
- "type" : "object",
- "properties":
- {
- "FileStatus":
- {
- "description": "An array of FileStatus",
- "type" : "array",
- "items" : fileStatusProperties //See FileStatus Properties
- }
- }
- }
- }
- }
See also: FileStatus Properties, LISTSTATUS, FileStatus