1. Prepare the environment
- The mapred-site.xml and yarn-site.xml configuration files of Hadoop under Windows are updated as in the virtual machine.
- Copy mapred-site.xml and yarn-site.xml configuration files to the project.
- Add dependency packages.
2. Operation mode
- Run locally (start multiple threads in local eclipse to simulate map task and reduce task execution). Mainly used in test environment.
_Need to modify the mapred-site.xml configuration file mapreduce.framework.name Change to local. - Commit to run in the cluster. Mainly used in production environment.
You need to first make the project into a jar package and copy it to the virtual machine. Use the hadoop jar command to execute. - Submit tasks directly from eclipse on the local machine to run in the cluster.
85 For example, on disk d. Then set job.setJar("the path of the jar package") in the program. Finally, you need to modify the mapred-site.xml configuration file to
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.app-submission.cross-platform</name> <value>true</value> </property>
3. A simple example of wordcount to count the number of times a word appears in an article
- Main function
public class WC { public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException { //Setting Configuration Objects Configuration conf = new Configuration(true); //Setting up a job object Job job = Job.getInstance(conf); //Set the class where the current main function is located job.setJarByClass(WC.class); //Set the location of the local jar package when you need to use the third mode of operation job.setJar("d:/wc.jar"); //Setting the input path FileInputFormat.setInputPaths(job, "/input/wc"); //Setting Output Path Path outputPath = new Path("/output/"); FileSystem fs = outputPath.getFileSystem(conf); //To determine whether the output path exists, delete it if it exists if(fs.exists(outputPath)){ fs.delete(outputPath,true); } FileOutputFormat.setOutputPath(job, outputPath); //Setting up Map class job.setMapperClass(WCMapper.class); //Set the type key of map output key and value to be a word and value to be 1 job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); //Setting reduce class job.setReducerClass(WCReduce.class); //Set the number of reduce task s job.setNumReduceTasks(2); //Print information job.waitForCompletion(true); } }
- Map class
public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> { Text myKey = new Text(); IntWritable myValue = new IntWritable(1); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { //Cut out the words according to the blanks String[] words = StringUtils.split(value.toString(), ' '); for (String word : words) { myKey.set(word); context.write(myKey,myValue); } } }
- Reduce class
public class WCReduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(key, new IntWritable(sum)); } }