9. User defined FileOutputFormat classification output

Keywords: Big Data Apache Hadoop Java

1, demand

The output path of MR's map and reduce is the path specified by FileOutPutFormat.setOutPutPath(), but sometimes the code needs to output the results by classification, such as the error information output to one file, and the correct output to another file. In this way, you need to customize to override a FileOutPutFormat class to classify and specify.

2. Custom code

  • 1. Custom outputFormat plug-in code

      import org.apache.hadoop.fs.FSDataOutputStream;
      import org.apache.hadoop.fs.FileSystem;
      import org.apache.hadoop.fs.Path;
      import org.apache.hadoop.io.NullWritable;
      import org.apache.hadoop.io.Text;
      import org.apache.hadoop.mapreduce.RecordWriter;
      import org.apache.hadoop.mapreduce.TaskAttemptContext;
      import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
      import java.io.IOException;
    
      /**
       * [@Author](https://my.oschina.net/arthor) liufu
       */
      public class MyOutPutFormat extends FileOutputFormat<Text, NullWritable> {
          [@Override](https://my.oschina.net/u/1162528)
          public RecordWriter<Text, NullWritable> getRecordWriter(TaskAttemptContext context) throws IOException, InterruptedException {
              Path tocrawlPath = new Path("d:/flow/crawlout/tocrawl.log");
              Path enhancedPath = new Path("d:/flow/enhanced/enhanced.log");
              FileSystem fs = FileSystem.get(context.getConfiguration());
              FSDataOutputStream tocrawlOs = fs.create(tocrawlPath);
              FSDataOutputStream enhancedOs = fs.create(enhancedPath);
    
              return new MyRecordWriter(tocrawlOs,enhancedOs);
          }
    
          static class MyRecordWriter extends RecordWriter<Text, NullWritable>{
    
              FSDataOutputStream tocrawlOs = null;
              FSDataOutputStream enhancedOs = null;
              public MyRecordWriter(FSDataOutputStream tocrawlOs, FSDataOutputStream enhancedOs) {
                  this.tocrawlOs = tocrawlOs;
                  this.enhancedOs = enhancedOs;
              }
    
              /**
               * write The method is to write the final output kv of mr program into the external storage system
               */
              [@Override](https://my.oschina.net/u/1162528)
              public void write(Text key, NullWritable value) throws IOException, InterruptedException {
                  if(key.toString().contains("tocrawl")){
                      tocrawlOs.write(key.toString().getBytes());
                  }else{
                      enhancedOs.write(key.toString().getBytes());
                  }
              }
              [@Override](https://my.oschina.net/u/1162528)
              public void close(TaskAttemptContext context) throws IOException, InterruptedException {
                  if(tocrawlOs!=null) tocrawlOs.close();
                  if(enhancedOs!=null) enhancedOs.close();
              }
          }
      }
    
  • 2. How to set up and use?

Implementation: (refer to TextOutputFormat.class)

Particular attention

The second sentence is still needed? Because the final result of file input format also has a secusses file, you need to specify where by default.

Posted by jonathanellis on Sat, 14 Dec 2019 12:29:26 -0800