http://hadoop.apache.org/common/docs/current/hdfs-default.html
There were many comments on the post. See below the link for the complete post
http://lucene.472066.n3.nabble.com/what-will-happen-if-a-backup-name-node-folder-becomes-unaccessible-td1253293.html#a1253293
In this post I am basically going to summarize my tests to prove that it works in the cloudera distribution. So the behavior is that it ignores any directories that are inaccessible and the namenode only bails out when it can't access any of the specified directories. The below series of tests are pretty much self explanatory
hadoop@training-vm:~$ hadoop version
Hadoop 0.20.1+152
Subversion -r c15291d10caa19c2355f437936c7678d537adf94
Compiled by root on Mon Nov 2 05:15:37 UTC 2009
hadoop@training-vm:~$ jps
8923 Jps
8548 JobTracker
8467 SecondaryNameNode
8250 NameNode
8357 DataNode
8642 TaskTracker
hadoop@training-vm:~$ /usr/lib/hadoop/bin/stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: stopping datanode
localhost: stopping secondarynamenode
hadoop@training-vm:~$ mkdir edit_log_dir1
hadoop@training-vm:~$ mkdir edit_log_dir2
hadoop@training-vm:~$ ls
edit_log_dir1 edit_log_dir2
hadoop@training-vm:~$ ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name
total 8
drwxr-xr-x 2 hadoop hadoop 4096 2009-10-15 16:17 image
drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 15:56 current
hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name edit_log_dir1
hadoop@training-vm:~$ cp -r /var/lib/hadoop-0.20/cache/hadoop/dfs/name edit_log_dir2
------ hdfs-site.xml added new dirs
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<!-- specify this so that running 'hadoop namenode -format' formats the right dir -->
<name>dfs.name.dir</name>
<value>/var/lib/hadoop- 0.20/cache/hadoop/dfs/name,/home/hadoop/edit_log_dir1,
/home/hadoop/edit_log_dir2</value>
</property>
<property>
<name>fs.checkpoint.period</name>
<value>600</value>
</property>
<property>
<name>dfs.namenode.plugins</name>
<value>org.apache.hadoop.thriftfs.NamenodePlugin</value>
</property>
<property>
<name>dfs.datanode.plugins</name>
<value>org.apache.hadoop.thriftfs.DatanodePlugin</value>
</property>
<property>
<name>dfs.thrift.address</name>
<value>0.0.0.0:9090</value>
</property>
</configuration>
---- start all daemons
hadoop@training-vm:~$ /usr/lib/hadoop/bin/start-all.sh
starting namenode, logging to
/usr/lib/hadoop/bin/../logs/hadoop-hadoop-namenode-training-vm.out
localhost: starting datanode, logging to
/usr/lib/hadoop/bin/../logs/hadoop-hadoop-datanode-training-vm.out
localhost: starting secondarynamenode, logging to
/usr/lib/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-training-vm.out
starting jobtracker, logging to
/usr/lib/hadoop/bin/../logs/hadoop-hadoop-jobtracker-training-vm.out
localhost: starting tasktracker, logging to
/usr/lib/hadoop/bin/../logs/hadoop-hadoop-tasktracker-training-vm.out
-------- namenode log confirms all dirs taken
2010-08-24 16:20:48,718 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = training-vm/127.0.0.1
STARTUP_MSG: args = []
STARTUP_MSG: version = 0.20.1+152
STARTUP_MSG: build = -r c15291d10caa19c2355f437936c7678d537adf94;
compiled by 'root' on Mon Nov 2 05:15:37 UTC 2009
************************************************************/
2010-08-24 16:20:48,815 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:
Initializing RPC Metrics with hostName=NameNode, port=8022
2010-08-24 16:20:48,819 INFO org.apache.hadoop.hdfs.server.namenode.NameNode:
Namenode up at: localhost/127.0.0.1:8022
2010-08-24 16:20:48,821 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=NameNode, sessionId=null
2010-08-24 16:20:48,822 INFO
org.apache.hadoop.hdfs.server.namenode.metrics.NameNodeMetrics:
Initializing NameNodeMeterics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext
2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=hadoop,hadoop
2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
supergroup=supergroup
2010-08-24 16:20:48,894 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=false
2010-08-24 16:20:48,903 INFO org.apache.hadoop.hdfs.server.namenode.metrics.FSNamesystemMetrics:
Initializing
FSNamesystemMetrics using context object:org.apache.hadoop.metrics.spi.NoEmitMetricsContext
2010-08-24 16:20:48,905 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/home/hadoop/edit_log_dir1 is not formatted.
2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory
/home/hadoop/edit_log_dir2 is not formatted.
2010-08-24 16:20:48,937 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
2010-08-24 16:20:48,938 INFO org.apache.hadoop.hdfs.server.common.Storage: Number of files = 41
2010-08-24 16:20:48,947 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 0
2010-08-24 16:20:48,947 INFO org.apache.hadoop.hdfs.server.common.Storage:
Image file of size 4357 loaded in 0 seconds.
---- directories confirm in use
hadoop@training-vm:~$ ls -ltr edit_log_dir1
total 12
drwxr-xr-x 4 hadoop hadoop 4096 2010-08-24 16:01 name
-rw-r--r-- 1 hadoop hadoop 0 2010-08-24 16:20 in_use.lock
drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 image
drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 current
hadoop@training-vm:~$ ls -ltr edit_log_dir2
total 12
drwxr-xr-x 4 hadoop hadoop 4096 2010-08-24 16:01 name
-rw-r--r-- 1 hadoop hadoop 0 2010-08-24 16:20 in_use.lock
drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 image
drwxr-xr-x 2 hadoop hadoop 4096 2010-08-24 16:20 current
----- secondary name node checkpoint worked fine
...
2010-08-24 16:27:10,756 INFO org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Posted URL
localhost:50070putimage=1&port=50090&machine=127.0.0.1&token=-
18:1431678956:1255648991179:1282692430000:1282692049090
2010-08-24 16:27:11,008 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
Checkpoint done. New Image Size: 4461
....
--- dirctory put works fine
hadoop@training-vm:~$ hadoop fs -ls /user/training
Found 3 items
drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output
drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input
drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output
hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/hdfs-site.xml /user/training
hadoop@training-vm:~$ hadoop fs -ls /user/training
Found 4 items
drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output
-rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml
drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input
drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output
------ delete one of the directories
hadoop@training-vm:~$ rm -rf edit_log_dir2
hadoop@training-vm:~$ ls -ltr
total 4
drwxr-xr-x 5 hadoop hadoop 4096 2010-08-24 16:20 edit_log_dir1
-- namenode logs
No errors/warns in logs
-------- namenode still running
hadoop@training-vm:~$ jps
12426 NameNode
12647 SecondaryNameNode
12730 JobTracker
14090 Jps
12535 DataNode
12826 TaskTracker
---- puts and ls work fine
hadoop@training-vm:~$ hadoop fs -ls /user/training
Found 4 items
drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output
-rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml
drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input
drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output
hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/core-site.xml /user/training
hadoop@training-vm:~$ hadoop fs -put /etc/hadoop/conf.with-desktop/mapred-site.xml /user/training
hadoop@training-vm:~$ hadoop fs -ls /user/training
Found 6 items
-rw-r--r-- 1 hadoop supergroup 338 2010-08-24 16:28 /user/training/core-site.xml
drwxr-xr-x - training supergroup 0 2010-06-30 13:18 /user/training/grep_output
-rw-r--r-- 1 hadoop supergroup 987 2010-08-24 16:25 /user/training/hdfs-site.xml
drwxr-xr-x - training supergroup 0 2010-06-30 13:14 /user/training/input
-rw-r--r-- 1 hadoop supergroup 454 2010-08-24 16:29 /user/training/mapred-site.xml
drwxr-xr-x - training supergroup 0 2010-06-30 15:30 /user/training/output
------- secondary namenode checkpoint is successdul
2010-08-24 16:37:11,455 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
Checkpoint done. New Image Size: 4671
....
2010-08-24 16:47:11,884 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
Checkpoint done. New Image Size: 4671
...
2010-08-24 16:57:12,264 WARN org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
Checkpoint done. New Image Size: 4671
------- after 30 mins
hadoop@training-vm:~$ jps
12426 NameNode
12647 SecondaryNameNode
12730 JobTracker
16256 Jps
12535 DataNode
12826 TaskTracker
No comments:
Post a Comment