We have a small cluster in place so the below steps works for us. Bigger cluster nodes will need a more automated way.
Here are the step by step instructions. Ensure no map/reduce jobs/HDFS data write jobs are running.
Step 1: Create a screen session on all hadoop nodes
Step 2: Prepare for any disaster that might happen
On the master node where NameNode is current running, take a latest checkpoint in case things go wrong. This will serve as last image to recover from if things go wrong. To take a manual checkpoint, you need to enter safemode. See below for series of commands to do that.
- check current mode
[sudhirv@hadoop-cluster-3 bin]$ hadoop dfsadmin -safemode get
Safe mode is OFF
- enter safe mode
[sudhirv@hadoop-cluster-3 bin]$ hadoop dfsadmin -safemode enter
Safe mode is ON
- check safe mode is ON
[sudhirv@hadoop-cluster-3 bin]$ hadoop dfsadmin -safemode get
Safe mode is ON
- run manual checkpoint
[sudhirv@hadoop-cluster-1 ~]$ hadoop dfsadmin -saveNamespace
-- ensure a checkpoint was taken by looking at image timestamp
[sudhirv@hadoop-cluster-1 ~]$ ls -ltr /var/lib/hadoop-0.20/cache/hadoop/dfs/name/image
total 4
-rw-rw-r-- 1 hadoop hadoop 157 May 26 16:49 fsimage
- leave safe mode
[sudhirv@hadoop-cluster-1 ~]$ hadoop dfsadmin -safemode leave
Safe mode is OFF
- confirm safe mode is off
[sudhirv@hadoop-cluster-1 ~]$ hadoop dfsadmin -safemode get
Safe mode is OFF
Step 3: Log the current hadoop version and configurations versions. The hadoop version will change after update. The configurations version should still stay the same.
sudo su -
[root@hadoop-cluster-3 ~]$ hadoop version
Hadoop 0.20.1+169.56
Subversion -r 8e662cb065be1c4bc61c55e6bff161e09c1d36f3
Compiled by root on Tue Feb 9 13:40:08 EST 2010
[root@hadoop-cluster-3 ~]# alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - status is auto.
link currently points to /etc/hadoop-0.20/conf.cluster
/etc/hadoop-0.20/conf.empty - priority 10
/etc/hadoop-0.20/conf.pseudo - priority 30
/etc/hadoop-0.20/conf.cluster - priority 50
Current `best' version is /etc/hadoop-0.20/conf.cluster.
Step 4: Shutdown hadoop services on the node. Check to ensure services are turned off. It is recommended to start with nodes that host the Name Node and Job Tracker first as mismatch in versions between NN,JT and DN,TT won't work in most cases.
Step 5: Check to ensure the new hadoop update patch shows up in the yum update list. If not then you would need to configure the repo correctly
yum list updates
Step 5: Run the update
sudo su -
[root@hadoop-cluster-3 ~]# yum update hadoop-0.20
Step 6: Check updated version to ensure update was performed and that config alternatives were not changed
[root@hadoop-cluster-3 ~]# hadoop version
Hadoop 0.20.1+169.88
Subversion -r ded54b29979a93e0f3ed773175cafd16b72511ba
Compiled by root on Thu May 20 22:22:39 EDT 2010
[root@hadoop-cluster-3 ~]# alternatives --display hadoop-0.20-conf
hadoop-0.20-conf - status is auto.
link currently points to /etc/hadoop-0.20/conf.cluster
/etc/hadoop-0.20/conf.empty - priority 10
/etc/hadoop-0.20/conf.pseudo - priority 30
/etc/hadoop-0.20/conf.cluster - priority 50
Current `best' version is /etc/hadoop-0.20/conf.cluster.
Step 7: Start the hadoop services on the node and tail logs to ensure no errors are being reported.