《Hadoop The Definitive Guide》ch03 The Hadoop Distributed Filesystem

1. HDFS

1.1 block

1.2 namenode and datanode

《Hadoop The Definitive Guide》ch03 The Hadoop Distributed Filesystem_第1张图片

2. 命令行示例

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fsck / -files -blocks
FSCK started by nomad2 from /35.252.129.105 for path / at Sun Jul 01 22:54:16 CST 2012
/ <dir>
/tmp <dir>
/tmp/hadoop-nomad2 <dir>
/tmp/hadoop-nomad2/mapred <dir>
/tmp/hadoop-nomad2/mapred/system <dir>
/tmp/hadoop-nomad2/mapred/system/jobtracker.info 4 bytes, 1 block(s):  OK
0. blk_-2032187654688024062_1002 len=4 repl=1

/user <dir>
/user/nomad2 <dir>
/user/nomad2/bin <dir>
/user/nomad2/bin/max_temperature 229196 bytes, 1 block(s):  OK
0. blk_-2537695414273301264_1003 len=229196 repl=1

/user/nomad2/sample.txt 529 bytes, 1 block(s):  OK
0. blk_4831755025848443879_1004 len=529 repl=1

Status: HEALTHY
 Total size:    229729 B
 Total dirs:    8
 Total files:   3
 Total blocks (validated):      3 (avg. block size 76576 B)
 Minimally replicated blocks:   3 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    1
 Average block replication:     1.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          1
 Number of racks:               1
FSCK ended at Sun Jul 01 22:54:16 CST 2012 in 18 milliseconds


The filesystem under path '/' is HEALTHY

copy to HDFS

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fs -copyFromLocal input/docs/quangle.txt hdfs://localhost/user/tom/quangle.txt

mkdir

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fs -mkdir books   

ls

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop fs -ls .          
Found 3 items
drwxr-xr-x   - nomad2 supergroup          0 2012-07-01 22:47 /user/nomad2/bin
drwxr-xr-x   - nomad2 supergroup          0 2012-07-01 23:00 /user/nomad2/books
-rw-r--r--   1 nomad2 supergroup        529 2012-07-01 22:48 /user/nomad2/sample.txt

Filesystem API demo

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop URLCat hdfs://localhost/user/tom/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop FileSystemCat hdfs://localhost/user/tom/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop FileSystemDoubleCat hdfs://localhost/user/tom/quangle.txt
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.
On the top of the Crumpetty Tree
The Quangle Wangle sat,
But his face you could not see,
On account of his Beaver Hat.

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop FileCopyWithProgress input/docs/1400-8.txt hdfs://localhost/user/nomad2/1400-8.txt
...............[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> 

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop ListStatus /user/nomad2
hdfs://localhost/user/nomad2/1400-8.txt
hdfs://localhost/user/nomad2/bin
hdfs://localhost/user/nomad2/books
hdfs://localhost/user/nomad2/sample.txt

把网络看做一棵树,两个节点间的距离是距离他们最近的共同祖先的总和。
MiniDFSCluster is in the hadoop test jar.


归档文件

[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop archive -archiveName files.har /user/nomad2 /user 
12/07/02 00:03:51 INFO mapred.JobClient: Running job: job_201207012246_0002
12/07/02 00:03:52 INFO mapred.JobClient:  map 0% reduce 0%
12/07/02 00:04:05 INFO mapred.JobClient:  map 100% reduce 0%
12/07/02 00:04:17 INFO mapred.JobClient:  map 100% reduce 100%
12/07/02 00:04:22 INFO mapred.JobClient: Job complete: job_201207012246_0002
12/07/02 00:04:22 INFO mapred.JobClient: Counters: 25
12/07/02 00:04:22 INFO mapred.JobClient:   Job Counters 
12/07/02 00:04:22 INFO mapred.JobClient:     Launched reduce tasks=1
12/07/02 00:04:22 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=11796
12/07/02 00:04:22 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
12/07/02 00:04:22 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
12/07/02 00:04:22 INFO mapred.JobClient:     Launched map tasks=1
12/07/02 00:04:22 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=10003
12/07/02 00:04:22 INFO mapred.JobClient:   File Input Format Counters 
12/07/02 00:04:22 INFO mapred.JobClient:     Bytes Read=685
12/07/02 00:04:22 INFO mapred.JobClient:   File Output Format Counters 
12/07/02 00:04:22 INFO mapred.JobClient:     Bytes Written=0
12/07/02 00:04:22 INFO mapred.JobClient:   FileSystemCounters
12/07/02 00:04:22 INFO mapred.JobClient:     FILE_BYTES_READ=412
12/07/02 00:04:22 INFO mapred.JobClient:     HDFS_BYTES_READ=1264310
12/07/02 00:04:22 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44351
12/07/02 00:04:22 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1263857
12/07/02 00:04:22 INFO mapred.JobClient:   Map-Reduce Framework
12/07/02 00:04:22 INFO mapred.JobClient:     Map output materialized bytes=412
12/07/02 00:04:22 INFO mapred.JobClient:     Map input records=8
12/07/02 00:04:22 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/07/02 00:04:22 INFO mapred.JobClient:     Spilled Records=16
12/07/02 00:04:22 INFO mapred.JobClient:     Map output bytes=390
12/07/02 00:04:22 INFO mapred.JobClient:     Map input bytes=599
12/07/02 00:04:22 INFO mapred.JobClient:     Combine input records=0
12/07/02 00:04:22 INFO mapred.JobClient:     SPLIT_RAW_BYTES=129
12/07/02 00:04:22 INFO mapred.JobClient:     Reduce input records=8
12/07/02 00:04:22 INFO mapred.JobClient:     Reduce input groups=8
12/07/02 00:04:22 INFO mapred.JobClient:     Combine output records=0
12/07/02 00:04:22 INFO mapred.JobClient:     Reduce output records=0
12/07/02 00:04:22 INFO mapred.JobClient:     Map output records=8

>> hadoop FileSystemCat /user/files.har/_index                      
/ dir none 0 0 user 
/user dir none 0 0 nomad2 
/user/nomad2/sample.txt file part-0 1262947 529 
/user/nomad2/books dir none 0 0 
/user/nomad2/1400-8.txt file part-0 0 1033751 
/user/nomad2 dir none 0 0 1400-8.txt bin books sample.txt 
/user/nomad2/bin/max_temperature file part-0 1033751 229196 
/user/nomad2/bin dir none 0 0 max_temperature 
[ate: /local/nomad2/hadoop/tomwhite-hadoop-book-32dae01 ]
>> hadoop FileSystemCat /user/files.har/_masterindex
1 
0 1816954122 0 358 

你可能感兴趣的:(《Hadoop The Definitive Guide》ch03 The Hadoop Distributed Filesystem)