Skip to content

Compression Level is ignored. #142

@wilcoln

Description

@wilcoln

I want to compress some file already inside hdfs using different compression levels.
To do so, I write the following program:

Compress.java

import ...
import com.hadoop.compression.lzo.LzoCodec;

public class Compress {
  
 public static class VoidReducer extends Reducer<LongWritable, Text, Text, Text> {
  
   @Override
   public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
      for(Text value: values)
        context.write(value, new Text(""));
      }
   }

  public static void main(String[] args) throws Exception{

    Configuration conf = new Configuration();
    int level = Integer.parseInt(args[2]);
    conf.setInt("io.compression.codec.lzo.compression.level", level);

    Job job = Job.getInstance(conf);
    job.setJobName("Compresser Job");
    job.setJarByClass(Compress.class);
    job.setMapperClass(Mapper.class);
    job.setReducerClass(VoidReducer.class);
    job.setNumReduceTasks(1);

    TextInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job,  new Path(args[1]));
    FileOutputFormat.setCompressOutput(job, true);

    FileOutputFormat.setOutputCompressorClass(job, LzoCodec.class);
    // submit and wait for completion
    job.waitForCompletion(true);

Then I execute run the following commands

$ javac -classpath $(hadoop classpath) *.java
$ jar -cvf Compress.jar Compress.class
$ hadoop jar Compress.jar Compress file.txt test1 1
$ hadoop jar Compress.jar Compress file.txt test7 7

The filefile.txt is of size 1Gb. When I then check the size of test1 and test2 with
hdfs dfs -du -s -h, I get 594.6 M for each.
This proves that the compression level is ignored.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions