I just spent some time putting together some basic Java code to read some data from HDFS. Pretty basic stuff. No map reduce involved. Pretty boilerplate code like the stuff from this popular tutorial on the topic.
No matter what, I kept hitting my head on this error:
Exception in thread “main” java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:9000/user/hadoop/DOUG_SVD/out.txt, expected: file:///
If you checkout the tutorial above, what’s supposed to be happening is that an instance of Hadoop’s Configuration
should encounter a fs.default.name
property, in one of the config files its given. The Configuration
should realize that this property has a value of hdfs://localhost:9000
. When you use the Configuration
to create a Hadoop FileSystem
instance, it should happily read this property from Configuration
and process paths from HDFS. That’s a long way of saying these three lines of Java code:
1.
// pickup config files off classpath
2.
Configuration conf =
new
Configuration()
3.
// explicitely add other config files
4.
conf.addResource(
"/home/hadoop/conf/core-site.xml"
);
5.
// create a FileSystem object needed to load file resources
6.
FileSystem fs = FileSystem.get(conf);
7.
// load files and stuff below!
Well… My Hadoop config files (core-site.xml) appear setup correctly. It appears to be in my CLASSPATH. I’m even trying to explicitly add the resource. Basically I’ve followed all the troubleshooting tips you’re supposed to follow when you encounter this exception. But I’m STILL getting this exception. Head meet wall. This has to be something stupid.
Well before I reveal my dumb mistake in the above code, it turns out there’s some helpful functions to help debug these kind of problems:
As Configuration
is just a bunch of key/value pairs from a set of resources, its useful to know what resources it thinks it loaded and what properties it thinks it loaded from those files.
conf.getRaw("fs.default.name")
)Configuration
‘s toString shows the resources loadedYou can similarly checkout FileSystem
‘s helpful toString
method. It nicely lays out where it thinks its pointing (native vs HDFS vs S3 etc).
So if you similarly are looking for a stupid mistake like I was, pepper your code with printouts of these bits of info. They will at least point you in a new direction to search for your dumb mistake.
Turns out I missed the crucial step of passing a Path
object not a String
to addResource
. They appear to do slightly different things. Adding a String
adds a resource relative to the classpath. Adding a Path
is used to add a resource at an absolute location and does not consider the classpath. So to explicitly load the correct config file, the code above gets turned into (drumroll please):
1.
// pickup config files off classpath
2.
Configuration conf =
new
Configuration()
3.
// explicitely add other config files
4.
// PASS A PATH NOT A STRING!
5.
conf.addResource(
new
Path(
"/home/hadoop/conf/core-site.xml"
));
6.
FileSystem fs = FileSystem.get(conf);
7.
// load files and stuff below!
Then Tada! everything magically works! Hopefully these tips can save you the next time you encounter these kinds of problems.