Inherit Mapper's generics
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>
Longwritable - > start offset
Text - > entered text
Text - > output text
Longwritable - > count
Of the four generics, the first two are the type of specified mapper input data, the key is the type of input key, and the value is the type of input value
The data input and output of map and reduce are encapsulated in the form of key value pairs
By default, in the input data passed to our mapper by the framework, key is the starting offset of a line of text to be processed, and the content of this line is taken as value
Serialization problem:
In order to transmit key value data in the network, serialization must be implemented. java has its own serialization function, but the data is redundant, which will be detrimental to MapReduce's massive data analysis process. Therefore, hadoop can realize its own serialization.
Inherit Mapper, override map method
The mapreduce framework calls this method every time it reads a row of data
The specific business logic is written in the method body, and the data to be processed by our business has been passed in by the framework. In the parameter of the method, key value
@Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { }
Implement business logic code:
public class WCMapper extends Mapper<LongWritable, Text, Text, LongWritable>{ //The mapreduce framework calls this method every time it reads a row of data @Override protected void map(LongWritable key, Text value,Context context) throws IOException, InterruptedException { //The specific business logic is written in the method body, and the data to be processed by our business has been passed in by the framework. In the parameter of the method, key value //key is the starting offset of this line of data value is the text content of this line //Convert the contents of this line to string type String line = value.toString(); //The text in this line is segmented by a specific separator String[] words = StringUtils.split(line, " "); //Traverse the word array and output it in k v form K: word v: 1 for(String word : words){ context.write(new Text(word), new LongWritable(1)); } }
inherit Reducer Realization reduce Method
public class WCReducer extends Reducer<Text, LongWritable, Text, LongWritable>{ //After map processing, the framework caches all kv pairs, groups them, passes a group < key, valus {} >, and calls the reduce method once //<hello,{1,1,1,1,1,1.....}> @Override protected void reduce(Text key, Iterable<LongWritable> values,Context context) throws IOException, InterruptedException { long count = 0; //Traverse the list of value s and add them up for(LongWritable value:values){ count += value.get(); } //Output the statistics of this word context.write(key, new LongWritable(count)); } }
After the map and reduce codes are completed respectively, there needs to be a class to describe the whole logic:
Where is the map distribution, where is the reduce distribution; which map is used, which reduce? You also need to make a jar package.
The whole process of a business logic processing is called a job. It tells the cluster which job to use, which project, which map, reduce, the path to process data, and the output results.
/** * Used to describe a specific job * For example, which class does the job use as the map in logical processing and which is the reduce * You can also specify the path of the data to be processed by the job * You can also specify the path where the output of the job will be put * .... * @author duanhaitao@itcast.cn * */ public class WCRunner { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job wcjob = Job.getInstance(conf); //Set which jar package the classes used in the whole job are in wcjob.setJarByClass(WCRunner.class); //mapper and reducer classes used in this job wcjob.setMapperClass(WCMapper.class); wcjob.setReducerClass(WCReducer.class); //Specify the output data kv type of reduce wcjob.setOutputKeyClass(Text.class); wcjob.setOutputValueClass(LongWritable.class); //Specify the output data kv type of mapper wcjob.setMapOutputKeyClass(Text.class); wcjob.setMapOutputValueClass(LongWritable.class); //Specify the storage path of input data to be processed FileInputFormat.setInputPaths(wcjob, new Path("hdfs://weekend110:9000/wc/srcdata/")); //Specify the storage path of output data for processing results FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://weekend110:9000/wc/output3/")); //Submit job to cluster for running wcjob.waitForCompletion(true); } }
Make the project into jar package and upload d to hadoop cluster
Start hadoop yarn
hadoop jar specified class