Wednesday, May 19, 2010

HDFS Relative path error when running map/reduce jobs

Recently while running a map reduce we encountered this error


java.io.IOException: Can not get the relative path:
base = hdfs://hadoop-master:8020/5/_temporary/_attempt_201005150057_0006_r_000000_0
child = hdfs://hadoop-master/5/_temporary/_attempt_201005150057_0006_r_000000_0/part-r-00000
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getFinalPath(FileOutputCommitter.java:200)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:146)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:165)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:118)
at org.apache.hadoop.mapred.Task.commit(Task.java:779)
at org.apache.hadoop.mapred.Task.done(Task.java:691)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:414)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


The job tracker and task tracker logs continuously kept failing across many retry attempts and continuously were dumping this error in the logs. Here's the code part that throws the error.

Researched this for quite a while and finally found this excerpt from the Pro Hadoop book "An example of an HDFS URI is hdfs://NamenodeHost[:8020]/. The file system protocol is hdfs, the host to contact for services is NamenodeHost, and the port to connect to is 8020, which is the default port for HDFS. If the default 8020 port is used, the URI may be simplified as hdfs://NamenodeHost/. This value may be altered by individual jobs. You can choose an arbitrary port for the hdfs NameNode."

Currently our "fs.default.name" parameter in core-site.xml file is set to "hdfs://hadoop-master.ic:8020". Based on the above text I went ahead and removed the default port from the parameter and restarted the name node. Ran your job again and voila!!! works now.

No comments:

Post a Comment