Experiment 8 project case - e-commerce data analysis

Keywords: Big Data Data Analysis Data Mining mapreduce

Level 1: Statistics of user churn

Task description

This task: according to the user behavior data, write MapReduce program to count the loss of users.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.
If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Loss of users

It is to count the number of four different user behaviors, that is, the number of click to browse (pv), the number of buy (buy), etc.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the ranking of commodity hits.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format:

buy,total
cart,total
fav,total
pv,total

Test description

The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class UserLoseDriver {

      public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> {
        //Private variable 1, reusable
        private static IntWritable one = new IntWritable(1);
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            //Split each row of data
            String[] atts = value.toString().split(",");
            //Get behavior attributes
            String behavior = atts[3];
            //The behavior attribute is used as the key, and 1 is used as the map output of value
            context.write(new Text(behavior), one);
            /********** End **********/
        }
    }
    public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            //Count the total number of values for the same key
            int sum = 0;
            for(IntWritable one : values){
                sum += one.get();
            }
            //Write to reduce output
            context.write(key, new IntWritable(sum));
            /********** End **********/
        }
    }
    public static void main(String[] args) throws Exception{
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "User churn query");

        job.setJarByClass(UserLoseDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Level 2: ranking of hits of all commodities

Task description

This task: according to the user behavior data, write MapReduce program to count the ranking of commodity hits.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.

If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Ranking of product hits

That is, count the number of users whose behavior is pv (click to browse) in each commodity id, and the output of reduce is sorted according to the number of clicks from large to small.

cleanup() method

The cleanup() method may be used in programming. The cleanup method is the last method executed by the mapper/reduce object after all the map/reduce methods are executed. It can be used to clean up resource release or cleanup; The default inherited parent class method is empty and does nothing.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the ranking of commodity hits.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format (from large to small by hits):

commodity id,Hits
 commodity id,Hits
···
···

Test description

The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
import java.util.stream.Collectors;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class ItemClickRankDriver {

     public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> {
        private static IntWritable one = new IntWritable(1);
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            //1. Split each row of data
            String[] atts = value.toString().split(",");
            //2. Get commodity id
            String item = atts[1];
            //3. Get behavior attributes
            String behavior = atts[3];
            //4. If the behavior attribute is' pv ', it is written to the map output
            if (behavior.equals("pv")) {
                context.write(new Text(item), one);
            }
            /********** End **********/
        }
    }
    public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        //Object instance, which is used to save the data processed in the reduce method
        List<Object[]> list = new LinkedList<>();
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            // Count the total number of the same key, and write the key and sum to the list
            int sum = 0;
            for (IntWritable one : values) {
                sum += one.get();
            }
            list.add(new Object[] { key.toString(), Integer.valueOf(sum) });
            /********** End **********/
        }
        //cleanup method, that is, the last method executed by the reduce object after all the reduce methods are executed
        @Override
        protected void cleanup(Reducer<Text, IntWritable, Text, IntWritable>.Context context)
                throws IOException, InterruptedException {
            // Sort the list according to the size of sum, and the result is from small to large
            list = list.stream().sorted((o1, o2) -> { return ((int)o1[1] - (int)o2[1]);}).collect(Collectors.toList());
            // Traverse from back to front, that is, from large to small
            for(int i=list.size()-1; i>=0; i--){
                Object[] o = list.get(i);
                //Write to reduce output
                context.write(new Text((String) o[0]), new IntWritable((int) o[1]));
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "Ranking of product hits");

        job.setJarByClass(ItemClickRankDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Level 3: count the commodities with the highest hits in each commodity category

Task description

This task: according to the user behavior data, write MapReduce program to count the commodities with the highest hits in each commodity category.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.

If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the commodities with the highest hits in each commodity category.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format:

Commodity type,Top hits id
 Commodity type,Top hits id
···

Test description
The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class ItemClickTopOneEachTypeDriver {

    public static class ThisMap extends Mapper<Object, Text, Text, Text> {
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            // The function is the same as the previous levels and will not be described again
            String[] atts = value.toString().split(",");
            String item = atts[1];
            String type = atts[2];
            String behavior = atts[3];
            if (behavior.equals("pv")) {
                context.write(new Text(type), new Text(item));
            }
            /********** End **********/
        }
}
    public static class ThisReduce extends Reducer<Text, Text, Text, Text> {
        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context)
        throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            // Tip: first get the quantity of all product IDS, and then find the maximum value from these quantities
            // 1. A map is used to save the quantity of each commodity id
            Map<String, Integer> map = new HashMap<>();
            // 2. Count the quantity of each value in values
            for (Text value : values) {
                String item = value.toString();
                Integer count = !map.containsKey(item) ? 1 : map.get(item) + 1;
                map.put(item, count);
            }
            // 3. Find the key value pair with the largest value in the map
            Map.Entry<String, Integer> itemMax = Collections.max(map.entrySet(), (entry1, entry2) -> {
                return entry1.getValue() - entry2.getValue();
            });
            // 4. Write the result to reduce output
            context.write(key, new Text(itemMax.getKey()));
            /********** End **********/
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "The most clicked products in each product category");

        job.setJarByClass(ItemClickTopOneEachTypeDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Level 4: Statistics on the proportion of five commodity categories

Task description

This task: according to the user behavior data, write MapReduce program to count the proportion data of five commodity classifications.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.

If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Proportion of commodity categories

Count the quantity of each commodity category. Divide the quantity of a commodity category by the quantity of all commodity categories to get the proportion of the commodity category.

cleanup() method

The cleanup() method may be used in programming. The cleanup method is the last method executed by the mapper/reduce object after all the map/reduce methods are executed. It can be used to clean up resource release or cleanup; The default inherited parent class method is empty and does nothing.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the proportion data of five commodity classifications.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format:

Commodity category, proportion in the total
 Commodity category, proportion in the total
···

Test description

The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class ItemTypeRatioDriver {

    public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> {
        private static IntWritable one = new IntWritable(1);
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            String[] atts = value.toString().split(",");
            String type = atts[2];
            context.write(new Text(type), one);
            /********** End **********/
        }
    }
    public static class ThisReduce extends Reducer<Text, IntWritable, Text, DoubleWritable> {
        // Save the processing results of the reduce method
        Map<String,Integer> map = new HashMap<>();
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            int count = 0;
            for (IntWritable one : values) {
                count += one.get();
            }
            map.put(key.toString(), count);
            /********** End **********/
        }
        // The cleanup method needs to be overridden
        @Override
        protected void cleanup(Reducer<Text, IntWritable, Text, DoubleWritable>.Context context)
                throws IOException, InterruptedException {
            // Get the sum of the quantities of all product categories
            int sum = 0;
            for (int v : map.values()) {
                sum += v;
            }
            // Get the proportion of each commodity category
            for (String key : map.keySet()) {
                int value = map.get(key);
                double ratio = ((double) value) / sum;
                context.write(new Text(key), new DoubleWritable(ratio));
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "Proportion of five commodity categories");

        job.setJarByClass(ItemTypeRatioDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(DoubleWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Level 5: count the purchase times of various commodity types

Task description

This task: according to the user behavior data, write MapReduce program to count the purchase times of various commodity types.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.

If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the purchase times of various commodity types.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format:

Commodity type,Number of purchases
 Commodity type,Number of purchases
···

Test description

The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class ItemTypeBuyCountDriver {

    public static class ThisMap extends Mapper<Object, Text, Text, IntWritable> {
        private static IntWritable one = new IntWritable(1);
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            String[] atts = value.toString().split(",");
            String type = atts[2];
            if (atts[3].equals("buy")) {
                context.write(new Text(type), one);
            }
            /********** End **********/
        }
    }
    public static class ThisReduce extends Reducer<Text, IntWritable, Text, IntWritable> {
        @Override
        protected void reduce(Text key, Iterable<IntWritable> values, Context context)
                throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            int count = 0;
            for (IntWritable one : values) {
                count += one.get();
            }
            context.write(key, new IntWritable(count));
            /********** End **********/
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "Total number of purchases of various commodities");

        job.setJarByClass(ItemTypeBuyCountDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Level 6: count the purchase times of the commodities with the highest click through among the five categories of commodities

Task description

This task: according to the user behavior data, write MapReduce program to count the purchase times of the commodities with the highest click through among the five categories of commodities.

Relevant knowledge

This training is an intermediate difficulty MapReduce programming exercise, which simulates the statistical analysis of e-commerce data in real scenes. Therefore, it is assumed that you have mastered the basic use of MapReduce.

If you don't know about MapReduce, you can first carry out the basic MapReduce training on this platform, and then continue this training.

Data file format description

This is the e-commerce data used in programming. It is in CSV format and the file name is user_behavior.csv, 9948 lines in size. Examples of the first few lines are as follows:

1002309,1008608,mobile phone,pv
1002573,1009007,headset,pv
1001541,1008614,mobile phone,pv
1001192,1008612,mobile phone,pv
1001016,1008909,tablet PC,buy
1001210,1008605,mobile phone,pv
1001826,1008704,notebook,pv
1002208,1008906,tablet PC,pv
1002308,1008702,notebook,pv
1002080,1008702,notebook,cart
1001525,1008702,notebook,cart
1002749,1008702,notebook,pv
1002134,1008704,notebook,cart
1002497,1008608,mobile phone,pv
···
---9948 lines in total---
  • Each row of data (4 columns) represents: user id, commodity id, commodity category and user behavior;
  • The commodity categories include mobile phone, tablet computer, notebook, smart watch and headset, with a total of 5 categories;
  • In the user behavior, pv represents clicking to browse, cart represents adding to the shopping cart, fav represents adding to like, and buy represents buying.

Programming requirements

According to the prompt, supplement the code in the editor on the right to calculate the purchase times of the commodities with the highest click through among the five categories of commodities.

  • The main method has been given, in which the Job and I / O path have been configured and do not need to be changed;
  • The input and output key s and value s of map and reduce have been given;
  • The main contents of map and reduce process can be written directly in programming.

Expected output format:

Commodity type,The one with the highest hit volume in this type id,Number of purchases
 Commodity type,The one with the highest hit volume in this type id,Number of purchases
···

Test description

The platform will test the code you write. If the MapReduce output written is consistent with the expectation, it will pass.

Note: for display reasons, the tabs in the output results of mapreduce on the web side are uniformly displayed with commas, but in the actual reduce results, the key\value is still separated with the same tabs, which is only a change in display and does not affect the programming and evaluation results.

Start your mission. I wish you success!

code implementation

package educoder;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * UserLoseDriver
 */
public class ItemMaxClickBuyCountDriver {

   public static class ThisMap extends Mapper<Object, Text, Text, Text> {
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            /*** Write the map content here****/
            /********** Begin **********/
            String[] atts = value.toString().split(",");
            String type = atts[2];
            //Take value as the output value of the map, because some attributes will be needed at that time
            context.write(new Text(type), value);
            /********** End **********/
        }
    }
    public static class ThisReduce extends Reducer<Text, Text, Text, Text> {
        @Override
        protected void reduce(Text key, Iterable<Text> values, Context context)
                throws IOException, InterruptedException {
            /*** Write the reduce content here****/
            /********** Begin **********/
            Map<String, Integer> map = new HashMap<>();
            List<String> value_list = new ArrayList<>();
            // 1. Because you need to traverse the values in values many times, convert the values iteratable object into a list
            for (Text v : values) {
                value_list.add(v.toString());
            }
            // 2. Count the quantity of all commodities
            for (String v : value_list) {
                String[] atts = v.toString().split(",");
                String item = atts[1];
                Integer count = !map.containsKey(item) ? 1 : map.get(item) + 1;
                map.put(item, count);
            }
            // 3. Find the products with the largest number of hits
            String itemClickMax = Collections.max(map.entrySet(), (entry1, entry2) -> {
                return entry1.getValue() - entry2.getValue();
            }).getKey();
            // 4. Count the purchase times of the goods with the largest number of hits
            int buyCount = 0;
            for (String v : value_list) {
                String[] atts = v.toString().split(",");
                if (atts[1].equals(itemClickMax) && atts[3].equals("buy")) {
                    buyCount++;
                }
            }
            // 5. Write the commodity category, the commodity id with the most hits and the number of purchases into the reducer output
            context.write(key, new Text(itemClickMax + "\t" + buyCount));
            /********** End **********/
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf, "The number of purchases of the goods with the highest number of hits in the five commodity categories");

        job.setJarByClass(ItemMaxClickBuyCountDriver.class);
        job.setMapperClass(ThisMap.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);

        job.setReducerClass(ThisReduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Posted by kane007 on Sat, 04 Dec 2021 19:59:57 -0800