Statistics of the number of credit card defaulters in a bank
Breach of contract rule: AY_1-PAY_6:PAY_1 is the repayment situation in September 2005; PAY_2 is the repayment situation in August 2005;... PAY_6 is the repayment in April 2005. The same is true of the digital identities in BILL_AMT1-BILL_AMT6 and PAY_AMT1-PAY_AMT6.
The value of PAY_1-PAY_6 means: 0 = timely repayment; 1 = one month delay in repayment; 2 = two months delay in repayment; 3 = three months delay in repayment;... 9 = Delay of repayment by nine months or more.
The monthly payment amount PAY_AMT should not be lower than the minimum monthly repayment specified by the bank, otherwise it will be a breach of contract. If PAY_AMT is greater than BILL_AMT, the remaining amount is deposited in the credit card for the next consumption; if the amount paid is less than the amount of last month's bill but higher than the minimum amount of repayment, it is deemed to be delayed repayment.
Requirement:
Programming on Hadoop Platform to Statistics the Number of Bank Default Users
Realization:
porn.xml
<?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>Hadoop</groupId> <artifactId>BankDefaulter_MapReduce</artifactId> <version>1.0-SNAPSHOT</version> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.8.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.8.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>2.8.0</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.8.0</version> </dependency> <dependency> <groupId>au.com.bytecode</groupId> <artifactId>opencsv</artifactId> <version>2.4</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>bankfinddefaulter.FindDefaulter</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project>
FindDefaulter.java
package bankfinddefaulter; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class FindDefaulter { public static void main(String[] args) throws Throwable { // TODO Auto-generated method stub Job job = new Job(); job.setJarByClass(FindDefaulter.class); FileInputFormat.addInputPath(job, new Path("hdfs://172.18.74.236:9000/input/UCI_Credit_Card.csv";//csv file directory FileOutputFormat.setOutputPath(job, new Path("hdfs://172.18.74.236:9000/out";//Set the output file directory // Associate custom mapper s and reducer s job.setMapperClass(BankMapper.class); job.setReducerClass(BankReducer.class); // Setting map input data type job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); // Setting the final output data type job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true)?0:1); } }
BankReducer.java
package bankfinddefaulter; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; import java.io.IOException; public class BankReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int count = 0; for (IntWritable value : values) { count++; } context.write(key, new IntWritable(count)); } }
BankMapper.java
package bankfinddefaulter; import java.io.IOException; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; import au.com.bytecode.opencsv.CSVParser; public class BankMapper extends Mapper<LongWritable, Text, Text, IntWritable> { @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { if (key.get() > 0) { String[] lines = new CSVParser().parseLine(value.toString()); //Select the 25th row of csv file for statistics context.write(new Text(lines[24]), new IntWritable(1)); } } }
Method 1
Writing business code in IDEA is to use mvn to type the program into jar package, upload it to hdoop platform and run it.
Input in Terminal console under IDEA
mvn clean package
This command is based on pom.xml
<build> <plugins> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>bankfinddefaulter.FindDefaulter</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
Hadoop platform runs jar package
After typing the project code into a jar package, upload it to the hadoop platform and enter it
hadoop jar Hadoop_API-1.0-SNAPSHOT-jar-with-dependencies.jar
Among them, 1 is the number of defaulting users, with a total of 6636 defaulting users.
Method 2
Running locally in IDEA
After Windows has set up the Hadoop development environment, run FindDefaulter.java
Download and view the output in the 172.18.74.236:50070/output directory