Hadoop中-put和-copyFromLocal的区别

如下中的stackoverflow的链接。

简单的说,-put更宽松,可以把本地或者HDFS上的文件拷贝到HDFS中;而-copyFromLocal则更严格限制只能拷贝本地文件到HDFS中。


???

PS:“ put would prefer the HDFS scheme instead of the local file system”,也就是说,如果本地和HDFS上都存在相同路径,则-put跟趋于优先取HDFS的源。

但是我测试了:

hadoop fs -put hdfs:///tmp/hive-XXX/test.txt /user/XXX/test.txt.hdfs

hadoop fs -put /tmp/hive-XXX/test.txt /user/XXX/test.txt.local       

hadoop fs -cat /user/XXX/test.txt.*    


local path:/tmp/hive-XXX
local path:/tmp/hive-XXX


所以。。。。


链接:http://stackoverflow.com/questions/7811284/difference-between-hadoop-fs-put-and-hadoop-fs-copyfromlocal

——————————————————————————————————————————————

Difference between hadoop fs -put and hadoop fs -copyFromLocal

up vote 17 down vote favorite
2

-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?

Same thing for -get and -copyToLocal

share improve this question
 

2 Answers

active oldest votes
up vote 27 down vote accepted
  • copyFromLocal is similar to put command, except that the source is restricted to a local file reference.

So, basically you can do with put, all that you do with copyFromLocal, but not vice-versa.

Similarly,

  • copyToLocal is similar to get command, except that the destination is restricted to a local filereference.

Hence, you can use get instead of copyToLocal, but not the other way round.

Reference: Hadoop's documentation.

share improve this answer
 
up vote 20 down vote

Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.

share improve this answer
 
1  
What do you mean by "the hdfs API won't know which one you mean"? For '-put' the source is always the first argument. Or you mean that some users may confuse '-put' with '-get' ? –   snappy  Oct 18 '11 at 17:52
 
No, neither way. We are speaking about two different file systems here. HDFS and local file system (say ext4). By using bin/hadoop fs -put /tmp/somepath /user/hadoop/somepath the command actually does not know whether /tmp/somepath exists in both filesystems, or just in local filesystem. Same thing with the destination path. –   Thomas Jungblut  Oct 18 '11 at 17:58
5  
So the first parameter is not always an local fs path so to say. You can put from one HDFS to another if you'd like. -copyFromLocal will ensure that it just picks from the local disk and uploads to HDFS. –  Thomas Jungblut  Oct 18 '11 at 17:58 
 
Why does it need to know? Your command example (and the -copyFromLocal variant) always copies /tmp/somepath/* from local to /user/hadoop/somepath/* on HDFS, and creates /user/hadoop/somepath directories if they are not yet created. Right? –   snappy  Oct 18 '11 at 18:08
 
No, put would prefer the HDFS scheme instead of the local file system. copyFromLocal would not do this and pick it from local file system. –   Thomas Jungblut  Oct 19 '11 at 8:06

你可能感兴趣的:(Hadoop中-put和-copyFromLocal的区别)