Implementation of MapReduce Programming in Windows

Keywords: Hadoop Apache Maven Java

Statistics of the number of credit card defaulters in a bank

csv download address

Breach of contract rule: AY_1-PAY_6:PAY_1 is the repayment situation in September 2005; PAY_2 is the repayment situation in August 2005;... PAY_6 is the repayment in April 2005. The same is true of the digital identities in BILL_AMT1-BILL_AMT6 and PAY_AMT1-PAY_AMT6.
The value of PAY_1-PAY_6 means: 0 = timely repayment; 1 = one month delay in repayment; 2 = two months delay in repayment; 3 = three months delay in repayment;... 9 = Delay of repayment by nine months or more.
The monthly payment amount PAY_AMT should not be lower than the minimum monthly repayment specified by the bank, otherwise it will be a breach of contract. If PAY_AMT is greater than BILL_AMT, the remaining amount is deposited in the credit card for the next consumption; if the amount paid is less than the amount of last month's bill but higher than the minimum amount of repayment, it is deemed to be delayed repayment.

Requirement:

Programming on Hadoop Platform to Statistics the Number of Bank Default Users

Realization:

porn.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>Hadoop</groupId>
    <artifactId>BankDefaulter_MapReduce</artifactId>
    <version>1.0-SNAPSHOT</version>

    <dependencies>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.8.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.8.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-mapreduce-client-core</artifactId>
            <version>2.8.0</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.8.0</version>
        </dependency>

        <dependency>
            <groupId>au.com.bytecode</groupId>
            <artifactId>opencsv</artifactId>
            <version>2.4</version>
        </dependency>



    </dependencies>
    <build>
        <plugins>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <archive>
                        <manifest>
                            <mainClass>bankfinddefaulter.FindDefaulter</mainClass>
                        </manifest>
                    </archive>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>
</project>

FindDefaulter.java

package bankfinddefaulter;


import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FindDefaulter {

    public static void main(String[] args) throws Throwable {
        // TODO Auto-generated method stub
        Job job = new Job();
        job.setJarByClass(FindDefaulter.class);

        FileInputFormat.addInputPath(job, new Path("hdfs://172.18.74.236:9000/input/UCI_Credit_Card.csv";//csv file directory
        FileOutputFormat.setOutputPath(job, new Path("hdfs://172.18.74.236:9000/out";//Set the output file directory

        // Associate custom mapper s and reducer s
        job.setMapperClass(BankMapper.class);
        job.setReducerClass(BankReducer.class);

        // Setting map input data type
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // Setting the final output data type
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        System.exit(job.waitForCompletion(true)?0:1);
    }

}

BankReducer.java

package bankfinddefaulter;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;


public class BankReducer extends
        Reducer<Text, IntWritable, Text, IntWritable> {
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values,
                          Context context) throws IOException, InterruptedException {

        int count = 0;

        for (IntWritable value : values) {
            count++;
        }

        context.write(key, new IntWritable(count));

    }

}

BankMapper.java

package bankfinddefaulter;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import au.com.bytecode.opencsv.CSVParser;

public class BankMapper extends
        Mapper<LongWritable, Text, Text, IntWritable> {
    @Override
    protected void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {

        if (key.get() > 0) {

            String[] lines = new CSVParser().parseLine(value.toString());

            //Select the 25th row of csv file for statistics
            context.write(new Text(lines[24]), new IntWritable(1));

        }
    }
}

Method 1

Writing business code in IDEA is to use mvn to type the program into jar package, upload it to hdoop platform and run it.

Input in Terminal console under IDEA

mvn clean package

This command is based on pom.xml

<build>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-plugin</artifactId>
            <configuration>
                <archive>
                    <manifest>
                        <mainClass>bankfinddefaulter.FindDefaulter</mainClass>
                    </manifest>
                </archive>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependencies</descriptorRef>
                </descriptorRefs>
            </configuration>
            <executions>
                <execution>
                    <id>make-assembly</id>
                    <phase>package</phase>
                    <goals>
                        <goal>single</goal>
                    </goals>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

Hadoop platform runs jar package

After typing the project code into a jar package, upload it to the hadoop platform and enter it

hadoop jar Hadoop_API-1.0-SNAPSHOT-jar-with-dependencies.jar

Among them, 1 is the number of defaulting users, with a total of 6636 defaulting users.

Method 2

Running locally in IDEA

After Windows has set up the Hadoop development environment, run FindDefaulter.java

Download and view the output in the 172.18.74.236:50070/output directory

Posted by Kevmaster on Fri, 06 Sep 2019 07:02:24 -0700