HDFS的DistCp使用

用法:

hadoop distcp OPTIONS [source_path...] <target_path>
参数 描述
-append 重用目标文件中的现有数据,并在可能的情况下添加新数据
-atomic 提交所有更改或不提交更改
-bandwidth 以MB为单位指定每个映射的带宽
-blocksperchunk 如果将该值设置为正值,则包含比该值更多块的文件将被分割为多个块,以便并行传输,并在目标上重新组装。默认情况下,值为0,文件将被完整地传输,而不会被分割。此切换仅适用于源文件系统实现getBlockLocations方法和目标文件系统mplements concat方法时
-delete 从目标中删除源中丢失的文件
-diff 使用snapshot diff报告来识别源和目标之间的差异
-f 需要复制的文件列表
-filelimit (弃用)将复制的文件数量限制为 <= n
-filters 从复制的文件列表中排除
-i 忽略复制过程中的失败
-log 保存distcp执行日志的路径
-m map 最大数量
-mapredSslConf ssl配置文件的配置,必须在类路径中使用hftps://
-numListstatusThreads 用于构建文件清单的线程数(最多40个)
-overwrite 选择无条件地覆盖目标文件,即使它们存在。
-p 保存状态(rbugpcaxt)(复制、块大小、用户、组、权限、校验和类型、ACL、XATTR、时间戳)
-rdiff 使用目标快照差异报告进行识别目标变更
-sizelimit (弃用)限制复制的文件数量 <= n字节
-skipcrccheck 是否跳过源和目标路径之间的CRC检查。
-strategy 复制要使用的策略。默认情况下是根据文件大小划分工作
-tmp 用于原子提交的中间工作路径
-update 更新目标,只复制丢失的文件或目录
[root@cdh01:~]# hadoop distcp hdfs://192.168.1.11:8020/tmp/hbase/test/ /tmp/hbase/test/
19/05/29 10:01:49 INFO tools.OptionsParser: parseChunkSize: blocksperchunk false
19/05/29 10:01:50 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[hdfs://192.168.1.11:8020/tmp/hbase/test], targetPath=/tmp/hbase/test, targetPathExists=false, filtersFile='null', blocksPerChunk=0}
19/05/29 10:01:50 INFO client.RMProxy: Connecting to ResourceManager at cdh01/192.168.1.101:8032
19/05/29 10:01:51 INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 20; dirCnt = 1
19/05/29 10:01:51 INFO tools.SimpleCopyListing: Build file listing completed.
19/05/29 10:01:51 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/05/29 10:01:51 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/05/29 10:01:51 INFO tools.DistCp: Number of paths in the copy list: 20
19/05/29 10:01:51 INFO tools.DistCp: Number of paths in the copy list: 20
19/05/29 10:01:51 INFO client.RMProxy: Connecting to ResourceManager at cdh01/192.168.1.101:8032
19/05/29 10:01:51 INFO mapreduce.JobSubmitter: number of splits:11
19/05/29 10:01:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1559091338325_0002
19/05/29 10:01:52 INFO impl.YarnClientImpl: Submitted application application_1559091338325_0002
19/05/29 10:01:52 INFO mapreduce.Job: The url to track the job: http://cdh01:8088/proxy/application_1559091338325_0002/
19/05/29 10:01:52 INFO tools.DistCp: DistCp job-id: job_1559091338325_0002
19/05/29 10:01:52 INFO mapreduce.Job: Running job: job_1559091338325_0002
19/05/29 10:01:58 INFO mapreduce.Job: Job job_1559091338325_0002 running in uber mode : false
19/05/29 10:01:58 INFO mapreduce.Job:  map 0% reduce 0%
19/05/29 10:02:03 INFO mapreduce.Job:  map 9% reduce 0%
19/05/29 10:02:04 INFO mapreduce.Job:  map 18% reduce 0%
19/05/29 10:02:07 INFO mapreduce.Job:  map 27% reduce 0%
19/05/29 10:02:08 INFO mapreduce.Job:  map 36% reduce 0%
19/05/29 10:02:09 INFO mapreduce.Job:  map 45% reduce 0%
19/05/29 10:02:14 INFO mapreduce.Job:  map 55% reduce 0%
19/05/29 10:02:16 INFO mapreduce.Job:  map 64% reduce 0%
19/05/29 10:02:18 INFO mapreduce.Job:  map 82% reduce 0%
19/05/29 10:02:20 INFO mapreduce.Job:  map 91% reduce 0%
19/05/29 10:02:23 INFO mapreduce.Job:  map 100% reduce 0%
19/05/29 10:04:33 INFO mapreduce.Job: Job job_1559091338325_0002 completed successfully
19/05/29 10:04:34 INFO mapreduce.Job: Counters: 33
	File System Counters
		FILE: Number of bytes read=0
		FILE: Number of bytes written=1638715
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=16936022957
		HDFS: Number of bytes written=16936015418
		HDFS: Number of read operations=234
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=61
	Job Counters 
		Launched map tasks=11
		Other local map tasks=11
		Total time spent by all maps in occupied slots (ms)=679786
		Total time spent by all reduces in occupied slots (ms)=0
		Total time spent by all map tasks (ms)=679786
		Total vcore-milliseconds taken by all map tasks=679786
		Total megabyte-milliseconds taken by all map tasks=1044151296
	Map-Reduce Framework
		Map input records=20
		Map output records=0
		Input split bytes=1265
		Spilled Records=0
		Failed Shuffles=0
		Merged Map outputs=0
		GC time elapsed (ms)=1804
		CPU time spent (ms)=133510
		Physical memory (bytes) snapshot=3378626560
		Virtual memory (bytes) snapshot=33501384704
		Total committed heap usage (bytes)=2952790016
	File Input Format Counters 
		Bytes Read=6274
	File Output Format Counters 
		Bytes Written=0
	DistCp Counters
		Bytes Copied=16936015418
		Bytes Expected=16936015418
		Files Copied=20

你可能感兴趣的:(Hadoop)