Thursday, July 15, 2010

Hadoop HDFS Error: java.io.IOException: Could not complete write to file

We recently have been seeing this error in our hadoop name node logs quite a bit.


2010-06-14 05:15:30,428 WARN org.apache.hadoop.hdfs.StateChange: DIR*
NameSystem.completeFile: failed to complete {filename} because dir.getFileBlocks() is null and pendingFile is null
java.io.IOException: Could not complete write to file {filename} by DFSClient_-44010819
java.io.IOException: Could not complete write to file {filename} by DFSClient_-44010819
at org.apache.hadoop.hdfs.server.namenode.NameNode.complete(NameNode.java:497)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:512)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:966)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:962)
at java.security.AccessController.doPrivileged(Native Method)


Further researching, we found that a new map/reduce job was deployed recently that merges daily files. The errors pretty much started showing up after this job was deployed and is cron to run every day. This job takes a bunch of small files as input and merges them to create a merged compressed file. In the process of creating the merging compressed file, it creates a temp file with merged content, verifies the file and then compresses it to a final file after which it deletes the temp file. Errors were being thrown when the temp file was getting deleted. Certain times delete was happening (meta data on namenode) before the file was replicated to all nodes and that was throwing this error. Currently the HDFS API does not provide a way to synchronouos method to wait till file is replicated.

To resolve this, we started using the HDFS trash functionality. HDFS best practices suggests "Enable HDFS trash, and avoid programmatic deletes - prefer the trash facility."If trash is enabled (and, it should be noted, by default it is not), files that are deleted using the Hadoop filesystem shell are moved into a special hidden trash directory, rather than being deleted immediately. The trash directory is deleted periodically by the system. Any files that are mistakenly deleted can be recovered manually by moving them out of the trash directory. The trash deletion period is configurable. The trash facility is a user level feature, it is not used when programmatically deleting files. As a best practice it is recommended considering moving data to be deleted to the trash directory (to be deleted by the system after the given time), rather than doing an immediate delete.

We changed the code to move to trash instead of deleting the code and that resolved the errors in the logs. Below is a common utility method we use across our code base when moving files.


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.Trash;

public static void moveToTrash(Configuration conf,Path path) throws IOException
{
Trash t=new Trash(conf);
boolean isMoved=t.moveToTrash(path);
if(!isMoved)
{
logger.error("Trash is not enabled or file is already in the trash.");
}
}

No comments:

Post a Comment