hadoop 2.7 使用 distcp 在不同集群间数据迁移拷贝

文章目录

本文讲述在多个hadoop集群中进行数据迁移操作。工作中可能会遇到同样的需求,还是很实用的。

hadoop官方提供了distcp 工具吗,具体使用说明参加官方文档:
https://hadoop.apache.org/docs/r2.7.7/hadoop-distcp/DistCp.html#Command_Line_Options

假设在hadoop集群1 的目录/hbase/data/default/下有LBLTEST文件,我们将他拷贝到Hadoop集群2上去。
在这里插入图片描述在这里插入图片描述

命令:./hadoop distcp hdfs://集群1:9000/hbase/data/default/LBLTEST hdfs://集群2:9000/hbase/data/default/LBLTEST

$ ./hadoop distcp hdfs://172.16.48.195:9000/hbase/data/default/LBLTEST hdfs://172.16.50.28:9000/hbase/data/default/LBLTEST
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/data/program/hadoop-2.7.2/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/data/program/alluxio-1.7.0-hadoop-2.7/client/alluxio-1.7.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
19/07/31 13:11:33 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[hdfs://172.16.48.195:9000/hbase/data/default/LBLTEST], targetPath=hdfs://172.16.50.28:9000/hbase/data/default/LBLTEST, targetPathExists=false, preserveRawXattrs=false}
19/07/31 13:11:33 INFO client.RMProxy: Connecting to ResourceManager at /172.16.48.195:8032
19/07/31 13:11:34 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
19/07/31 13:11:34 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
19/07/31 13:11:34 INFO client.RMProxy: Connecting to ResourceManager at /172.16.48.195:8032
19/07/31 13:11:35 INFO mapreduce.JobSubmitter: number of splits:4
19/07/31 13:11:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545880014947_4110
19/07/31 13:11:35 INFO impl.YarnClientImpl: Submitted application application_1545880014947_4110
19/07/31 13:11:35 INFO mapreduce.Job: The url to track the job: http://172.16.48.195:20888/proxy/application_1545880014947_4110/
19/07/31 13:11:35 INFO tools.DistCp: DistCp job-id: job_1545880014947_4110
19/07/31 13:11:35 INFO mapreduce.Job: Running job: job_1545880014947_4110
19/07/31 13:11:45 INFO mapreduce.Job: Job job_1545880014947_4110 running in uber mode : false
19/07/31 13:11:45 INFO mapreduce.Job: map 0% reduce 0%
19/07/31 13:11:51 INFO mapreduce.Job: map 50% reduce 0%
19/07/31 13:11:52 INFO mapreduce.Job: map 75% reduce 0%
19/07/31 13:12:14 INFO mapreduce.Job: map 100% reduce 0%
19/07/31 13:12:15 INFO mapreduce.Job: Job job_1545880014947_4110 completed successfully
19/07/31 13:12:15 INFO mapreduce.Job: Counters: 33
 File System Counters
  FILE: Number of bytes read=0
  FILE: Number of bytes written=506418
  FILE: Number of read operations=0
  FILE: Number of large read operations=0
  FILE: Number of write operations=0
  HDFS: Number of bytes read=10368
  HDFS: Number of bytes written=6478
  HDFS: Number of read operations=87
  HDFS: Number of large read operations=0
  HDFS: Number of write operations=24
 Job Counters 
  Launched map tasks=4
  Other local map tasks=4
  Total time spent by all maps in occupied slots (ms)=39699
  Total time spent by all reduces in occupied slots (ms)=0
  Total time spent by all map tasks (ms)=39699
  Total vcore-milliseconds taken by all map tasks=39699
  Total megabyte-milliseconds taken by all map tasks=40651776
 Map-Reduce Framework
  Map input records=11
  Map output records=0
  Input split bytes=544
  Spilled Records=0
  Failed Shuffles=0
  Merged Map outputs=0
  GC time elapsed (ms)=239
  CPU time spent (ms)=3510
  Physical memory (bytes) snapshot=1122856960
  Virtual memory (bytes) snapshot=12267859968
  Total committed heap usage (bytes)=2974810112
 File Input Format Counters 
  Bytes Read=3346
 File Output Format Counters 
  Bytes Written=0
 org.apache.hadoop.tools.mapred.CopyMapper$Counter
  BYTESCOPIED=6478
  BYTESEXPECTED=6478
  COPY=11

上述命令会使用MapReduce来实现数据的拷贝转移。
转移成功后,就可以在集群2看到这个文件了:
在这里插入图片描述

你可能感兴趣的:(hadoop)