Implementing MapReduce in eclipse

Keywords: Big Data xml Hadoop Eclipse Windows

1. Prepare the environment

  • The mapred-site.xml and yarn-site.xml configuration files of Hadoop under Windows are updated as in the virtual machine.
  • Copy mapred-site.xml and yarn-site.xml configuration files to the project.
  • Add dependency packages.

2. Operation mode

  • Run locally (start multiple threads in local eclipse to simulate map task and reduce task execution). Mainly used in test environment.
    _Need to modify the mapred-site.xml configuration file mapreduce.framework.name Change to local.
  • Commit to run in the cluster. Mainly used in production environment.
      You need to first make the project into a jar package and copy it to the virtual machine. Use the hadoop jar command to execute.
  • Submit tasks directly from eclipse on the local machine to run in the cluster.
    85 For example, on disk d. Then set job.setJar("the path of the jar package") in the program. Finally, you need to modify the mapred-site.xml configuration file to
			<property>
			     <name>mapreduce.framework.name</name>
			     <value>yarn</value>
			</property>
			<property>
			     <name>mapreduce.app-submission.cross-platform</name>
			     <value>true</value>
			 </property>

3. A simple example of wordcount to count the number of times a word appears in an article

  • Main function
public class WC {
	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
		//Setting Configuration Objects
		Configuration conf = new Configuration(true);
		
		//Setting up a job object
		Job job = Job.getInstance(conf);
		
		//Set the class where the current main function is located
		job.setJarByClass(WC.class);
		
		//Set the location of the local jar package when you need to use the third mode of operation
		job.setJar("d:/wc.jar");

		//Setting the input path
		FileInputFormat.setInputPaths(job, "/input/wc");

		//Setting Output Path
		Path outputPath = new Path("/output/");
		FileSystem fs = outputPath.getFileSystem(conf);
		//To determine whether the output path exists, delete it if it exists
		if(fs.exists(outputPath)){
			fs.delete(outputPath,true);
		}
		FileOutputFormat.setOutputPath(job, outputPath);

		//Setting up Map class
		job.setMapperClass(WCMapper.class);
		
		//Set the type key of map output key and value to be a word and value to be 1
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(IntWritable.class);

		//Setting reduce class
		job.setReducerClass(WCReduce.class);

		//Set the number of reduce task s
		job.setNumReduceTasks(2);

		//Print information
		job.waitForCompletion(true);
	}
}
  • Map class
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
	Text myKey = new Text();
	IntWritable myValue = new IntWritable(1);
	@Override
	protected void map(LongWritable key, Text value, Context context)
			throws IOException, InterruptedException {
		//Cut out the words according to the blanks
		String[] words = StringUtils.split(value.toString(), ' ');
		for (String word : words) {
			myKey.set(word);
			context.write(myKey,myValue);
		}
	}
}
  • Reduce class
public class WCReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
	@Override
	protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {
		 int sum = 0;
		 for (IntWritable value : values) {
			sum += value.get();
		}
		 context.write(key, new IntWritable(sum));
	}
}

Posted by Yanayaya on Tue, 29 Jan 2019 15:15:15 -0800