Follow the instructions here first to ensure you have the native libraries installed in your OS
http://hadoop.apache.org/common/docs/r0.20.2/native_libraries.html
To configure the hadoop native libraries follow the steps below. Since all our cluster nodes were of the same configuration (architecture and OS wise), I built the native libraries on one and copied onto the other nodes. If you nodes are different then you will have to repeat the below steps on each hadoop node.
Step 1: Install and configure Apache Ant on the hadoop cluster node
Step 2: Download the source of you corresponding hadoop version.
- find your hadoop version
$ hadoop version
Hadoop 0.20.1+169.88
Subversion -r ded54b29979a93e0f3ed773175cafd16b72511ba
Compiled by root on Thu May 20 22:22:39 EDT 2010
- download the corresponding SRPM for the above version.
We use Cloudera distribution so I got mine from below link
http://archive.cloudera.com/redhat/cdh/testing/SRPMS/
- Install the SRPM to get source. This basically dumped the source in my
/usr/src/redhat/SOURCES/hadoop-0.20.1+169.88
folder. If it dumped as a tar.gz file then go ahead and untar it.
$rpm -i hadoop-0.20-0.20.1+169.88-1.src.rpm
Step 3: Link the 'src' folder in source to the hadoop installation folder
cd /usr/lib/hadoop-0.20 -- this is basically the HADOOP_HOME
sudo ln -s /usr/src/redhat/SOURCES/hadoop-0.20.1+169.88/hadoop-0.20.1+169.88/src src
Step 4: Run the ant task
ant -Dcompile.native=true compile-native
If the above task fails with the 'jni.h' not found error then configure the JAVA_HOME PATH
[exec] checking jni.h usability... /usr/lib/hadoop-0.20/src/native/configure: line
19091: test: !=: unary operator expected
[exec] no
[exec] checking jni.h presence... no
[exec] checking for jni.h... no
[exec] configure: error: Native java headers not found. Is $JAVA_HOME set correctly?
BUILD FAILED
/usr/lib/hadoop-0.20/build.xml:466:
The following error occurred while executing this line:
/usr/lib/hadoop-0.20/build.xml:487: exec returned: 1
To resolve the error open the build.xml file and the add env JAVA_HOME in compile-core-native target. After adding the change should look like below
<exec dir="${build.native}" executable="sh" failonerror="true">
<env key="OS_NAME" value="${os.name}"/>
<env key="JAVA_HOME" value="/usr/java"/>
<env key="OS_ARCH" value="${os.arch}"/>
<env key="JVM_DATA_MODEL" value="${sun.arch.data.model}"/>
<env key="HADOOP_NATIVE_SRCDIR" value="${native.src.dir}"/>
<arg line="${native.src.dir}/configure"/>
</exec>
The above will resolve the issue.
A successful installation should show something like this
$ ls -ltr /usr/lib/hadoop-0.20/build/native/Linux-amd64-64/lib
total 212
-rw-r--r-- 1 root root 12953 Jul 21 16:43 Makefile
-rwxr-xr-x 1 root root 73327 Jul 21 16:43 libhadoop.so.1.0.0
-rw-r--r-- 1 root root 850 Jul 21 16:43 libhadoop.la
-rw-r--r-- 1 root root 114008 Jul 21 16:43 libhadoop.a
lrwxrwxrwx 1 root root 18 Jul 21 17:04 libhadoop.so.1 -> libhadoop.so.1.0.0
lrwxrwxrwx 1 root root 18 Jul 21 17:04 libhadoop.so -> libhadoop.so.1.0.0
Run a test job to make sure native libs are being picked properly. Use the test jar based on your version of hadoop. In the below test ensure that the native libs are getting picked. Check for statement "Successfully loaded & initialized native-zlib library".
$ hadoop jar /usr/lib/hadoop-0.20/hadoop-0.20.1+169.88-test.jar testsequencefile -seed
0 -count 1000 -compressType RECORD xxx -codec org.apache.hadoop.io.compress.GzipCodec -check
10/07/22 11:46:27 INFO io.TestSequenceFile: count = 1000
10/07/22 11:46:27 INFO io.TestSequenceFile: megabytes = 1
10/07/22 11:46:27 INFO io.TestSequenceFile: factor = 10
10/07/22 11:46:27 INFO io.TestSequenceFile: create = true
10/07/22 11:46:27 INFO io.TestSequenceFile: seed = 0
10/07/22 11:46:27 INFO io.TestSequenceFile: rwonly = false
10/07/22 11:46:27 INFO io.TestSequenceFile: check = true
10/07/22 11:46:27 INFO io.TestSequenceFile: fast = false
10/07/22 11:46:27 INFO io.TestSequenceFile: merge = false
10/07/22 11:46:27 INFO io.TestSequenceFile: compressType = RECORD
10/07/22 11:46:27 INFO io.TestSequenceFile: compressionCodec = org.apache.hadoop.io.compress.GzipCodec
10/07/22 11:46:27 INFO io.TestSequenceFile: file = xxx
10/07/22 11:46:27 INFO io.TestSequenceFile: creating 1000 records with RECORD compression
10/07/22 11:46:27 INFO util.NativeCodeLoader: Loaded the native-hadoop library
10/07/22 11:46:27 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
10/07/22 11:46:27 INFO compress.CodecPool: Got brand-new compressor
10/07/22 11:46:27 INFO compress.CodecPool: Got brand-new decompressor
10/07/22 11:46:28 INFO compress.CodecPool: Got brand-new decompressor
10/07/22 11:46:28 INFO io.TestSequenceFile: done sorting 1000 debug
10/07/22 11:46:28 INFO io.TestSequenceFile: sorting 1000 records in memory for debug
Looking online some other folks have had issues with above test. They had to manually add the native libs to JAVA_LIBRARY_PATH. This issue was fixed in https://issues.apache.org/jira/browse/HADOOP-4839. If you are having that issue then here is a link on how to manually add it to path
http://www.mail-archive.com/common-user@hadoop.apache.org/msg04761.html
Awesome....thank you very much
ReplyDeleteTo find the correct java installation on your machine - the one with header files search for jni.h
ReplyDeleteapt-file search jni.h