(4-2)block数据块

Block是最基本的存储单元


HDFS Client上传数据到HDFS时,会先在本地缓存数据,当数据达到一个Block大小时,请求NameNode分配一个Block。NameNode会把Block所在的DataNode的地址告诉HDFS Client。HDFS Client会直接和DataNode通信,把数据写到DataNode节点一个Block文件中。



设置数据块大小:hdfs-site.xml
dfs.blocksize
134217728


dfs.datanode.data.dir



Block元数据信息  (.meta)
 /usr/local/mydata/dfs/data/current/BP-1476006134-192.168.1.10-1427374210743/current/finalized/subdir0/subdir0


如:
-rw-r--r--. 1 root root  33574 3月  29 18:07 blk_1073741851
-rw-r--r--. 1 root root    271 3月  29 18:07 blk_1073741851_1027.meta
-rw-r--r--. 1 root root 103997 3月  29 18:07 blk_1073741852
-rw-r--r--. 1 root root    823 3月  29 18:07 blk_1073741852_1028.meta


清空HDFS,这里面也清空了。


hdfs存储一定依赖操作系统的文件管理


HDFS是分布式文件系统上的一层文件管理系统。


Linux文件系统上的数据存储到HDFS上面不压缩,可以看Block元数据信息。




DataNode中副本管理
hdfs-site.xml
dfs.replication




数据存储:Replication Pipelining
假设dfs.replication=3
当HDFS Client上传数据时,向NameNode申请Block,NameNode给Client三个DataNode的地址,Client会把数据上传到第一个DataNode的Block中。然后第一个DataNode把数据传给第二个DataNode,第二个DataNode再把数据传给第三个DataNode。




HDFS文件归档操作:
合并HDFS小文件:
/input
/input/a.txt
/input/b.txt


[root@i-love-you hadoop]# bin/hdfs dfs -ls /dir
Found 2 items
-rw-r--r--   1 root supergroup         13 2015-03-30 20:49 /dir/a.txt
-rw-r--r--   1 root supergroup         18 2015-03-30 20:49 /dir/b.txt


创建文件:
[root@i-love-you hadoop]# bin/hadoop archive -archiveName c.har /dir /dest




[root@i-love-you hadoop]# bin/hadoop archive -archiveName c.har -p /dir /dest
15/03/30 21:02:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/03/30 21:02:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/03/30 21:02:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/03/30 21:02:47 INFO mapreduce.JobSubmitter: number of splits:1
15/03/30 21:02:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427717039268_0001
15/03/30 21:02:52 INFO impl.YarnClientImpl: Submitted application application_1427717039268_0001
15/03/30 21:02:54 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1427717039268_0001/
15/03/30 21:02:54 INFO mapreduce.Job: Running job: job_1427717039268_0001
15/03/30 21:03:36 INFO mapreduce.Job: Job job_1427717039268_0001 running in uber mode : false
15/03/30 21:03:36 INFO mapreduce.Job:  map 0% reduce 0%
15/03/30 21:04:25 INFO mapreduce.Job:  map 100% reduce 0%
15/03/30 21:04:53 INFO mapreduce.Job:  map 100% reduce 100%
15/03/30 21:04:55 INFO mapreduce.Job: Job job_1427717039268_0001 completed successfully
15/03/30 21:04:57 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=206
                FILE: Number of bytes written=214211
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=418
                HDFS: Number of bytes written=236
                HDFS: Number of read operations=17
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=7
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Other local map tasks=1
                Total time spent by all maps in occupied slots (ms)=48204
                Total time spent by all reduces in occupied slots (ms)=20795
                Total time spent by all map tasks (ms)=48204
                Total time spent by all reduce tasks (ms)=20795
                Total vcore-seconds taken by all map tasks=48204
                Total vcore-seconds taken by all reduce tasks=20795
                Total megabyte-seconds taken by all map tasks=49360896
                Total megabyte-seconds taken by all reduce tasks=21294080
        Map-Reduce Framework
                Map input records=3
                Map output records=3
                Map output bytes=194
                Map output materialized bytes=206
                Input split bytes=116
                Combine input records=0
                Combine output records=0
                Reduce input groups=3
                Reduce shuffle bytes=206
                Reduce input records=3
                Reduce output records=0
                Spilled Records=6
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=467
                CPU time spent (ms)=3810
                Physical memory (bytes) snapshot=293769216
                Virtual memory (bytes) snapshot=1690853376
                Total committed heap usage (bytes)=136450048
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=271
        File Output Format Counters
                Bytes Written=0





查看内容:
[root@i-love-you hadoop]# bin/hadoop fs -ls -R har:///dest/c.har
-rw-r--r--   1 root supergroup         13 2015-03-30 20:49 har:///dest/c.har/a.txt
-rw-r--r--   1 root supergroup         18 2015-03-30 20:49 har:///dest/c.har/b.txt




[root@i-love-you hadoop]# bin/hdfs dfs -ls /dest
Found 1 items

drwxr-xr-x   - root supergroup          0 2015-03-30 21:04 /dest/c.har




你可能感兴趣的:(4-2block数据块)