Difference between hadoop fs -put and hadoop fs -copyFromLocal

Question

-put and -copyFromLocal are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?

Same thing for -get and -copyToLocal

score 27 · Accepted Answer · Apr 18 at 2:24

copyFromLocal is similar to put command, except that the source is restricted to a local file reference.

So, basically you can do with put, all that you do with copyFromLocal, but not vice-versa.

Similarly,

copyToLocal is similar to get command, except that the destination is restricted to a local filereference.

Hence, you can use get instead of copyToLocal, but not the other way round.

Reference: Hadoop's documentation.

Thomas Jungblut 13.2k 3 39 65 · Answer 2 · Oct 18 '11 at 17:42

up vote 20 down vote

Let's make an example: If your HDFS contains the path: /tmp/dir/abc.txt And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file:// or hdfs://. Maybe it picks the path you did not want to copy.

Therefore you have -copyFromLocal which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.

Put is for more advanced users who know which scheme to put in front.

It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.

answered Oct 18 '11 at 17:42

Thomas Jungblut
13.2k 3 39 65

1

What do you mean by "the hdfs API won't know which one you mean"? For '-put' the source is always the first argument. Or you mean that some users may confuse '-put' with '-get' ? – snappy Oct 18 '11 at 17:52

No, neither way. We are speaking about two different file systems here. HDFS and local file system (say ext4). By using bin/hadoop fs -put /tmp/somepath /user/hadoop/somepath the command actually does not know whether /tmp/somepath exists in both filesystems, or just in local filesystem. Same thing with the destination path. – Thomas Jungblut Oct 18 '11 at 17:58

5

So the first parameter is not always an local fs path so to say. You can put from one HDFS to another if you'd like. -copyFromLocal will ensure that it just picks from the local disk and uploads to HDFS. – Thomas Jungblut Oct 18 '11 at 17:58

Why does it need to know? Your command example (and the -copyFromLocal variant) always copies /tmp/somepath/* from local to /user/hadoop/somepath/* on HDFS, and creates /user/hadoop/somepath directories if they are not yet created. Right? – snappy Oct 18 '11 at 18:08

No, put would prefer the HDFS scheme instead of the local file system. copyFromLocal would not do this and pick it from local file system. – Thomas Jungblut Oct 19 '11 at 8:06

add a comment

Hadoop中-put和-copyFromLocal的区别

Difference between hadoop fs -put and hadoop fs -copyFromLocal

2 Answers

你可能感兴趣的:(Hadoop)