Level 1: Statistics of user churn
Task description
This task: according to the user behavior data, write MapReduce program to count the loss of users.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Loss of users
It is to count the number of four different user behaviors, that is, the number of click to browse (pv), the number of buy (buy), etc.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the ranking of commodity hits.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format:
buy,total cart,total fav,total pv,total
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class UserLoseDriver { public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> { //Private variable 1, reusable private static IntWritable one = new IntWritable(1); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ //Split each row of data String[] atts = value.toString().split(","); //Get behavior attributes String behavior = atts[3]; //The behavior attribute is used as the key, and 1 is used as the map output of value context.write(new Text(behavior), one); /********** End **********/ } } public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ //Count the total number of values for the same key int sum = 0; for(IntWritable one : values){ sum += one.get(); } //Write to reduce output context.write(key, new IntWritable(sum)); /********** End **********/ } } public static void main(String[] args) throws Exception{ Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "User churn query"); job.setJarByClass(UserLoseDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Level 2: ranking of hits of all commodities
Task description
This task: according to the user behavior data, write MapReduce program to count the ranking of commodity hits.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Ranking of product hits
That is, count the number of users whose behavior is pv (click to browse) in each commodity id, and the output of reduce is sorted according to the number of clicks from large to small.
cleanup() method
The cleanup() method may be used in programming. The cleanup method is the last method executed by the mapper/reduce object after all the map/reduce methods are executed. It can be used to clean up resource release or cleanup; The default inherited parent class method is empty and does nothing.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the ranking of commodity hits.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format (from large to small by hits):
commodity id,Hits commodity id,Hits ··· ···
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import java.util.LinkedList; import java.util.List; import java.util.stream.Collectors; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class ItemClickRankDriver { public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> { private static IntWritable one = new IntWritable(1); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ //1. Split each row of data String[] atts = value.toString().split(","); //2. Get commodity id String item = atts[1]; //3. Get behavior attributes String behavior = atts[3]; //4. If the behavior attribute is' pv ', it is written to the map output if (behavior.equals("pv")) { context.write(new Text(item), one); } /********** End **********/ } } public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> { //Object instance, which is used to save the data processed in the reduce method List<Object[]> list = new LinkedList<>(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ // Count the total number of the same key, and write the key and sum to the list int sum = 0; for (IntWritable one : values) { sum += one.get(); } list.add(new Object[] { key.toString(), Integer.valueOf(sum) }); /********** End **********/ } //cleanup method, that is, the last method executed by the reduce object after all the reduce methods are executed @Override protected void cleanup(Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException { // Sort the list according to the size of sum, and the result is from small to large list = list.stream().sorted((o1, o2) -> { return ((int)o1[1] - (int)o2[1]);}).collect(Collectors.toList()); // Traverse from back to front, that is, from large to small for(int i=list.size()-1; i>=0; i--){ Object[] o = list.get(i); //Write to reduce output context.write(new Text((String) o[0]), new IntWritable((int) o[1])); } } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Ranking of product hits"); job.setJarByClass(ItemClickRankDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Level 3: count the commodities with the highest hits in each commodity category
Task description
This task: according to the user behavior data, write MapReduce program to count the commodities with the highest hits in each commodity category.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the commodities with the highest hits in each commodity category.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format:
Commodity type,Top hits id Commodity type,Top hits id ···
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class ItemClickTopOneEachTypeDriver { public static class ThisMap extends Mapper<Object, Text, Text, Text> { @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ // The function is the same as the previous levels and will not be described again String[] atts = value.toString().split(","); String item = atts[1]; String type = atts[2]; String behavior = atts[3]; if (behavior.equals("pv")) { context.write(new Text(type), new Text(item)); } /********** End **********/ } } public static class ThisReduce extends Reducer<Text, Text, Text, Text> { @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ // Tip: first get the quantity of all product IDS, and then find the maximum value from these quantities // 1. A map is used to save the quantity of each commodity id Map<String, Integer> map = new HashMap<>(); // 2. Count the quantity of each value in values for (Text value : values) { String item = value.toString(); Integer count = !map.containsKey(item) ? 1 : map.get(item) + 1; map.put(item, count); } // 3. Find the key value pair with the largest value in the map Map.Entry<String, Integer> itemMax = Collections.max(map.entrySet(), (entry1, entry2) -> { return entry1.getValue() - entry2.getValue(); }); // 4. Write the result to reduce output context.write(key, new Text(itemMax.getKey())); /********** End **********/ } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "The most clicked products in each product category"); job.setJarByClass(ItemClickTopOneEachTypeDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Level 4: Statistics on the proportion of five commodity categories
Task description
This task: according to the user behavior data, write MapReduce program to count the proportion data of five commodity classifications.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Proportion of commodity categories
Count the quantity of each commodity category. Divide the quantity of a commodity category by the quantity of all commodity categories to get the proportion of the commodity category.
cleanup() method
The cleanup() method may be used in programming. The cleanup method is the last method executed by the mapper/reduce object after all the map/reduce methods are executed. It can be used to clean up resource release or cleanup; The default inherited parent class method is empty and does nothing.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the proportion data of five commodity classifications.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format:
Commodity category, proportion in the total Commodity category, proportion in the total ···
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.DoubleWritable; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class ItemTypeRatioDriver { public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> { private static IntWritable one = new IntWritable(1); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ String[] atts = value.toString().split(","); String type = atts[2]; context.write(new Text(type), one); /********** End **********/ } } public static class ThisReduce extends Reducer<Text, IntWritable, Text, DoubleWritable> { // Save the processing results of the reduce method Map<String,Integer> map = new HashMap<>(); @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ int count = 0; for (IntWritable one : values) { count += one.get(); } map.put(key.toString(), count); /********** End **********/ } // The cleanup method needs to be overridden @Override protected void cleanup(Reducer<Text, IntWritable, Text, DoubleWritable>.Context context) throws IOException, InterruptedException { // Get the sum of the quantities of all product categories int sum = 0; for (int v : map.values()) { sum += v; } // Get the proportion of each commodity category for (String key : map.keySet()) { int value = map.get(key); double ratio = ((double) value) / sum; context.write(new Text(key), new DoubleWritable(ratio)); } } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Proportion of five commodity categories"); job.setJarByClass(ItemTypeRatioDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DoubleWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Level 5: count the purchase times of various commodity types
Task description
This task: according to the user behavior data, write MapReduce program to count the purchase times of various commodity types.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the purchase times of various commodity types.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format:
Commodity type,Number of purchases Commodity type,Number of purchases ···
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class ItemTypeBuyCountDriver { public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> { private static IntWritable one = new IntWritable(1); @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ String[] atts = value.toString().split(","); String type = atts[2]; if (atts[3].equals("buy")) { context.write(new Text(type), one); } /********** End **********/ } } public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ int count = 0; for (IntWritable one : values) { count += one.get(); } context.write(key, new IntWritable(count)); /********** End **********/ } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "Total number of purchases of various commodities"); job.setJarByClass(ItemTypeBuyCountDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
Level 6: count the purchase times of the commodities with the highest click through among the five categories of commodities
Task description
This task: according to the user behavior data, write MapReduce program to count the purchase times of the commodities with the highest click through among the five categories of commodities.
Relevant knowledge
This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.
Data file format description
This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:
1002309,1008608,mobile phone,pv 1002573,1009007,headset,pv 1001541,1008614,mobile phone,pv 1001192,1008612,mobile phone,pv 1001016,1008909,tablet PC,buy 1001210,1008605,mobile phone,pv 1001826,1008704,notebook,pv 1002208,1008906,tablet PC,pv 1002308,1008702,notebook,pv 1002080,1008702,notebook,cart 1001525,1008702,notebook,cart 1002749,1008702,notebook,pv 1002134,1008704,notebook,cart 1002497,1008608,mobile phone,pv ··· ---9948 lines in total---
- Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
- The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
- In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.
Programming requirements
According to the prompt, supplement the code in the editor on the right to calculate the purchase times of the commodities with the highest click through among the five categories of commodities.
- The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
- The input and output key s and value s of map and reduce have been given;
- The main contents of map and reduce process can be written directly in programming.
Expected output format:
Commodity type,The one with the highest hit volume in this type id,Number of purchases Commodity type,The one with the highest hit volume in this type id,Number of purchases ···
Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.
Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.
Start your mission. I wish you success!
code implementation
package educoder; import java.io.IOException; import java.util.ArrayList; import java.util.Collections; import java.util.HashMap; import java.util.List; import java.util.Map; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; /** * UserLoseDriver */ public class ItemMaxClickBuyCountDriver { public static class ThisMap extends Mapper<Object, Text, Text, Text> { @Override protected void map(Object key, Text value, Context context) throws IOException, InterruptedException { /*** Write the map content here****/ /********** Begin **********/ String[] atts = value.toString().split(","); String type = atts[2]; //Take value as the output value of the map, because some attributes will be needed at that time context.write(new Text(type), value); /********** End **********/ } } public static class ThisReduce extends Reducer<Text, Text, Text, Text> { @Override protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException { /*** Write the reduce content here****/ /********** Begin **********/ Map<String, Integer> map = new HashMap<>(); List<String> value_list = new ArrayList<>(); // 1. Because you need to traverse the values in values many times, convert the values iteratable object into a list for (Text v : values) { value_list.add(v.toString()); } // 2. Count the quantity of all commodities for (String v : value_list) { String[] atts = v.toString().split(","); String item = atts[1]; Integer count = !map.containsKey(item) ? 1 : map.get(item) + 1; map.put(item, count); } // 3. Find the products with the largest number of hits String itemClickMax = Collections.max(map.entrySet(), (entry1, entry2) -> { return entry1.getValue() - entry2.getValue(); }).getKey(); // 4. Count the purchase times of the goods with the largest number of hits int buyCount = 0; for (String v : value_list) { String[] atts = v.toString().split(","); if (atts[1].equals(itemClickMax) && atts[3].equals("buy")) { buyCount++; } } // 5. Write the commodity category, the commodity id with the most hits and the number of purchases into the reducer output context.write(key, new Text(itemClickMax + "\t" + buyCount)); /********** End **********/ } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "The number of purchases of the goods with the highest number of hits in the five commodity categories"); job.setJarByClass(ItemMaxClickBuyCountDriver.class); job.setMapperClass(ThisMap.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(Text.class); job.setReducerClass(ThisReduce.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }