Tuesday, August 03, 2010

Hadoop LZO Installation : Errors and Resolution

A record of our errors and resolution with Hadoop LZO Installation. We previously followed the code base and instructions on

http://code.google.com/p/hadoop-gpl-compression/wiki/FAQ

but later switched to

http://github.com/kevinweil/hadoop-lzo

as there were some improvements and bug fixes.

Error:



java.lang.RuntimeException: native-lzo library not available
at com.hadoop.compression.lzo.LzopCodec.createDecompressor(LzopCodec.java:91)
at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:52)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:582)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)


Resolution: Need to ensure that the lzo lib is in the classpath

$~: ps auxw | grep tasktracker

should show the lzo lib in the classpath list. If not then follow the "Building and configuring" instructions on Kevin's site (link above)

Error:


java.lang.ClassCastException: com.hadoop.compression.lzo.LzopCodec$LzopDecompressor
cannot be cast to com.hadoop.compression.lzo.LzopDecompressor
at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:52)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:418)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:582)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)

java.lang.IllegalAccessError com.hadoop.compression.lzo.LzopDecompressor cannot access
superclass com.hadoop.compression.lzo.LzoDecompressor


Resolution:

The above two errors were mainly because of a mixup in the two code base installation. We resolved this by starting from scratch and deleting all old lzo libs and that resolved the error

Testing to see lzo native libs works:

- Create a sample lzo file


$~: echo "hello world" > test.log
$~: lzop test.log

The above should create a test.log.lzo file

$~: hadoop fs -copyFromLocal test.log.lzo /tmp


Local installation test:


$~: hadoop jar /usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.4.jar com.hadoop.compression.lzo.LzoIndexer /tmp/test.log.lzo
10/08/03 16:40:01 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/08/03 16:40:01 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library
10/08/03 16:40:03 INFO lzo.LzoIndexer: [INDEX] LZO Indexing file /tmp/test.log.lzo, size 0.00 GB...
10/08/03 16:40:03 INFO lzo.LzoIndexer: Completed LZO Indexing in 1.40 seconds (0.00 MB/s). Index size is 0.01 KB.


Distributed Test:


$~: hadoop jar /usr/lib/hadoop-0.20/lib/hadoop-lzo-0.4.4.jar com.hadoop.compression.lzo.DistributedLzoIndexer /tmp/test.log.lzo
10/08/03 16:42:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
10/08/03 16:42:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library
10/08/03 16:42:53 INFO lzo.DistributedLzoIndexer: Adding LZO file /tmp/test.log.lzo to indexing list (no index currently exists)
10/08/03 16:42:53 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
10/08/03 16:44:24 INFO input.FileInputFormat: Total input paths to process : 1
10/08/03 16:44:24 INFO mapred.JobClient: Running job: job_201007251750_0072
10/08/03 16:44:25 INFO mapred.JobClient: map 0% reduce 0%
10/08/03 16:44:38 INFO mapred.JobClient: map 100% reduce 0%
10/08/03 16:44:40 INFO mapred.JobClient: Job complete: job_201007251750_0072
10/08/03 16:44:40 INFO mapred.JobClient: Counters: 6
10/08/03 16:44:40 INFO mapred.JobClient: Job Counters
10/08/03 16:44:40 INFO mapred.JobClient: Launched map tasks=1
10/08/03 16:44:40 INFO mapred.JobClient: Data-local map tasks=1
10/08/03 16:44:40 INFO mapred.JobClient: FileSystemCounters
10/08/03 16:44:40 INFO mapred.JobClient: HDFS_BYTES_READ=60
10/08/03 16:44:40 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=8
10/08/03 16:44:40 INFO mapred.JobClient: Map-Reduce Framework
10/08/03 16:44:40 INFO mapred.JobClient: Map input records=1
10/08/03 16:44:40 INFO mapred.JobClient: Spilled Records=0

4 comments:

  1. Anonymous2:17 PM

    Nice article. I was struggling with om.hadoop.compression.lzo.LzopCodec$LzopDecompressor
    cannot be cast to com.hadoop.compression.lzo.LzopDecompressor all day long. you made my day.

    ReplyDelete
  2. Anonymous1:31 PM

    Thanks! I was having the same ClassCastException, and your post pointed me in the right direction.

    I had previously installed the hadoop-gpl-compression library, and this was getting loaded before the hadoop-lzo jar. So, I just deleted the hadoop-gpl-compression and everything started working.

    ReplyDelete
  3. Anonymous6:55 AM

    it on the classpath and both local and distributed test works. But i still get the error when i run Pig. e.g. Cloudera's Pig example:
    A = LOAD 'input';
    B = FILTER A BY $0 MATCHES '.*dfs[a-z.]+.*';
    DUMP B;

    It runs, shows the exceptions but completes anyway.?

    ReplyDelete
  4. Very helpful, after several different attempts I had no luck until I tried this test and realized I needed to install the "lzop" package. It all works as expected now.

    Thanks!

    ReplyDelete