OIV用户手册

OIV用户手册/Offline Image Viewer Guide


官方文档的位置
http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html
http://hadoop.apache.org/docs/r2.7.0/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#WebHDFS_REST_API


Overview
总结


The Offline Image Viewer is a tool to dump the contents of hdfs fsimage files to a human-readable format and provide read-only WebHDFS API in order to allow offline analysis and examination of an Hadoop cluster’s namespace. The tool is able to process very large image files relatively quickly. The tool handles the layout formats that were included with Hadoop versions 2.4 and up. If you want to handle older layout formats, you can use the Offline Image Viewer of Hadoop 2.3 or oiv_legacy Command. If the tool is not able to process an image file, it will exit cleanly. The Offline Image Viewer does not require a Hadoop cluster to be running; it is entirely offline in its operation.


Offline Image Viewer,简称OIV,是一种将hdfs的fsimage文件内容转储成可方便供人阅读的格式的工具。并且提供只读的WebHDFS API用于离线分析或者检查hadoop集群的名称空间。这个工具能够相对快速的处理比较大的image文件。我们需要注意,oiv工具不具备向后的兼容性,hadoop 2.4版本以后的oiv工具不能够处理hadoop2.3版本或者以前的fsimage文件,如果要处理2.3版本或者以前版本,需要使用老版本的oiv工具。如果oiv不能处理对应的fsimage文件,那么它会自动的推出。当然了,就像它的名称所提示的(offline),oiv也不需要hadoop集群处于运行状态,它可以进行完全离线的操作。


The Offline Image Viewer provides several output processors:
oiv支持三种输出处理器:分别为Ls、XML和FileDistribution,通过选项-p指定。


Web is the default output processor. It launches a HTTP server that exposes read-only WebHDFS API. Users can investigate the namespace interactively by using HTTP REST API. Users can specify the address to listen by -addr option (default by localhost:5978).
web是默认的输出处理器,它会自动的启动一台暴露只读WebHDFS API的http服务器。用户可以通过使用HTTP REST API交互式的探索名称空间。用户可以通过指定参数-addr来指定api暴露的地址(默认 localhost:5978)


相关的操作命令
[hdfs@kiwi02 current]$ hdfs oiv -i fsimage_0000000000000154113 -o fsimage.web
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loading 14 strings
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loading 352 inodes.
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loading inode references
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode references
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loading inode directory section
17/02/05 23:38:25 INFO offlineImageViewer.FSImageHandler: Loaded 155 directories
17/02/05 23:38:25 INFO offlineImageViewer.WebImageViewer: WebImageViewer started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.


[hdfs@kiwi02 ~]$ hdfs dfs -ls webhdfs://127.0.0.1:5978/
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/cmss/bch/bc1.3.3/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/cmss/bch/bc1.3.3/tez/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Found 9 items
drwxr-xr-x   - hdfs   hdfs            0 2017-01-22 13:53 webhdfs://127.0.0.1:5978/amshbase
drwxrwxrwx   - yarn   hadoop          0 2017-01-22 13:52 webhdfs://127.0.0.1:5978/app-logs
drwxr-xr-x   - hdfs   hdfs            0 2017-01-22 13:53 webhdfs://127.0.0.1:5978/apps
drwxr-xr-x   - hdfs   hdfs            0 2017-01-22 13:45 webhdfs://127.0.0.1:5978/iothrottle
drwxr-xr-x   - mapred hdfs            0 2017-01-22 13:51 webhdfs://127.0.0.1:5978/mapred
drwxrwxrwx   - mapred hadoop          0 2017-01-22 13:51 webhdfs://127.0.0.1:5978/mr-history
drwxrwxrwx   - slider hdfs            0 2017-01-22 13:53 webhdfs://127.0.0.1:5978/slider
drwxrwxrwx   - hdfs   hdfs            0 2017-01-24 14:38 webhdfs://127.0.0.1:5978/tmp
drwxr-xr-x   - hdfs   hdfs            0 2017-01-22 13:53 webhdfs://127.0.0.1:5978/user
可以通过-R参数获得所有文件的信息
[hdfs@kiwi02 ~]$ hdfs dfs -ls -R webhdfs://127.0.0.1:5978/


此外可以通过http restful api获得jason格式的文件内容
包括3中形式:
1、liststatus
语法:
[hdfs@kiwi02 ~]$ curl -i http://127.0.0.1:5978/webhdfs/v1/?op=liststatus
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 1959


{"FileStatuses":{"FileStatus":[
{"fileId":16494,"accessTime":0,"replication":0,"owner":"hdfs","length":0,"permission":"755","blockSize":0,"modificationTime":1485064419909,"type":"DIRECTORY","group":"hdfs","childrenNum":1,"pathSuffix":"amshbase"},{"fileId":16393,"accessTime":0,"replication":0,"owner":"yarn","length":0,"permission":"777","blockSize":0,"modificationTime":1485064352428,"type":"DIRECTORY","group":"hadoop","childrenNum":1,"pathSuffix":"app-logs"},{"fileId":16399,"accessTime":0,"replication":0,"owner":"hdfs","length":0,"permission":"755","blockSize":0,"modificationTime":1485064416870,"type":"DIRECTORY","group":"hdfs","childrenNum":4,"pathSuffix":"apps"},{"fileId":16389,"accessTime":0,"replication":0,"owner":"hdfs","length":0,"permission":"755","blockSize":0,"modificationTime":1485063954102,"type":"DIRECTORY","group":"hdfs","childrenNum":1,"pathSuffix":"iothrottle"},{"fileId":16394,"accessTime":0,"replication":0,"owner":"mapred","length":0,"permission":"755","blockSize":0,"modificationTime":1485064277476,"type":"DIRECTORY","group":"hdfs","childrenNum":1,"pathSuffix":"mapred"},{"fileId":16396,"accessTime":0,"replication":0,"owner":"mapred","length":0,"permission":"777","blockSize":0,"modificationTime":1485064281157,"type":"DIRECTORY","group":"hadoop","childrenNum":2,"pathSuffix":"mr-history"},{"fileId":16476,"accessTime":0,"replication":0,"owner":"slider","length":0,"permission":"777","blockSize":0,"modificationTime":1485064393522,"type":"DIRECTORY","group":"hdfs","childrenNum":2,"pathSuffix":"slider"},{"fileId":16386,"accessTime":0,"replication":0,"owner":"hdfs","length":0,"permission":"777","blockSize":0,"modificationTime":1485239925878,"type":"DIRECTORY","group":"hdfs","childrenNum":8,"pathSuffix":"tmp"},{"fileId":16387,"accessTime":0,"replication":0,"owner":"hdfs","length":0,"permission":"755","blockSize":0,"modificationTime":1485064419679,"type":"DIRECTORY","group":"hdfs","childrenNum":8,"pathSuffix":"user"}


2、GETFILESTATUS
语法:curl -i  "http://:/webhdfs/v1/?op=GETFILESTATUS"
[hdfs@kiwi02 ~]$ curl -i http://127.0.0.1:5978/webhdfs/v1/user/ams/metrics/hbase.version?op=GETFILESTATUS
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 237


{"FileStatus":
{"fileId":16502,"accessTime":1485064427061,"replication":3,"owner":"ams","length":7,"permission":"644","blockSize":134217728,"modificationTime":1485064427444,"type":"FILE","group":"hdfs","childrenNum":0,"pathSuffix":""}
}


3、GETACLSTATUS
语法:curl -i -X PUT "http://:/webhdfs/v1/?op=GETACLSTATUS"
[hdfs@kiwi02 ~]$ curl -i http://127.0.0.1:5978/webhdfs/v1/user/ams/metrics/hbase.version?op=GETACLSTATUS
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 79


{"AclStatus":{"entries":[],"group": "hdfs","owner": "ams","stickyBit": false}}




XML creates an XML document of the fsimage and includes all of the information within the fsimage, similar to the lsr processor. The output of this processor is amenable to automated processing and analysis with XML tools. Due to the verbosity of the XML syntax, this processor will also generate the largest amount of output.
XML处理器输出fsimage的xml文档,包含了fsimage中的所有信息,比如inodeid等。该处理器的输出支持XML工具的自动化处理和分析,由于XML语法格式的冗长,该处理器的输出也最大。


hdfs oiv -p XML -i fsimage_0000000000000154113 -o fsimage.xml



1000
1004
0
1073741828
115


16418

16385
DIRECTORY

1412832662162
hadoop:supergroup:rwxr-xr-x
9223372036854775807
-1


16386
DIRECTORY
user
1413795010372
hadoop:supergroup:rwxr-xr-x
-1
-1





FileDistribution is the tool for analyzing file sizes in the namespace image. In order to run the tool one should define a range of integers [0, maxSize] by specifying maxSize and a step. The range of integers is divided into segments of size step: [0, s[1], …, s[n-1], maxSize], and the processor calculates how many files in the system fall into each segment [s[i-1], s[i]). Note that files larger than maxSize always fall into the very last segment. The output file is formatted as a tab separated two column table: Size and NumFiles. Where Size represents the start of the segment, and numFiles is the number of files form the image which size falls in this segment.
FileDistribution是分析命名空间中文件大小的工具。为了运行该工具需要通过指定最大文件大小和段数定义一个整数范围[0,maxSize],该整数范围根据段数分割为若干段[0, s[1], ..., s[n-1], maxSize],处理器计算有多少文件落入每个段中([s[i-1], s[i]),大于maxSize的文件总是落入最后的段中,即s[n-1], maxSize。输出文件被格式化为由tab分隔的包含Size列和NumFiles列的表,其中Size表示段的起始,NumFiles表示文件大小落入该段的文件数量。在使用FileDistribution处理器时还需要指定该处理器的参数maxSize和step,若未指定默认为0。

hdfs oiv -i fsimage_0000000000000154113 -o fsimage.fd -p FileDistribution maxSize 1000 step 5 

Processed 0 inodes.
Size    NumFiles
0       24
2097152 82
4194304 2
6291456 1
10485760        2
14680064        1
31457280        1
48234496        1
52428800        1
54525952        1
92274688        1
102760448       1
113246208       1
134217728       1
136314880       1
218103808       5
totalFiles = 126
totalDirectories = 226
totalBlocks = 108
totalSpace = 5686475517
maxFileSize = 217793184





你可能感兴趣的:(Hadoop)