Find the same letter puzzle

Keywords: PHP Hadoop

Dataset import HDFS

Command line access to the dataset just uploaded to HDFS

[hadoop@master hadoop-2.6.0]$ bin/hdfs dfs -ls /anagram/

MapReduce program compilation and operation:

Step 1: in the Map stage, sort each word alphabetically to generate sortedWord, and then output the key/value key value pair (sortedWord,word).

//Writing Map process
	public static class Anagramsmapper extends Mapper<LongWritable, Text, Text, Text> {
		
		public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
			String text = value.toString(); //Converts the alphabet value of the entered Text type to the String type
			
			char[] textCharArray = text.toCharArray(); //Convert alphabet of String type to character array
			
			Arrays.sort(textCharArray); //Sort character arrays
			
			String sortedText = new String(textCharArray); //Convert the sorted character array to String string String
			
			context.write(new Text(sortedText), value);  //Write context, output key (sorted alphabet) and output value (original alphabet)
			
		}
		
	}

Step 2: in the Reduce stage, count all anagrams made up of the same letters in each group.

//Write Reduce process
	public static class Anagramsreducer extends Reducer<Text, Text, Text, Text> {
		public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
			StringBuffer res = new StringBuffer(); //Create an empty StringBuffer instance res
			
			int count = 0;  //Counter initial value is 0
			
			//Start traversing values
			for (Text text : values) {
				
				//If there is a value in the res array, add a "," sign as the separator before adding a new value
				if(res.length() > 0) {
					res.append(",");
				}
				
				//Add values to res
				res.append(text);
				
				//count
				count++;
			}
			
			//Only words with two or more identical letters are displayed
			if(count > 1) {
				context.write(key, new Text(res.toString()));
			}
			
			
		}
		
	}

Step 3: unit test and debug the code.

public class AnagramsMapperTest {
	private Mapper mapper;
	private MapDriver driver;
	
	@Before
	public void init() {
		mapper = new Anagrams.Anagramsmapper();
		driver = new MapDriver(mapper);
	}
	
	@Test
	public void test() throws IOException {
		String line = "gfedcba";  //Customize this letter to verify that the output will be sorted correctly
		driver.withInput(new LongWritable(), new Text(line))
		      .withOutput(new Text("abcdefg"),new Text("gfedcba"))  //Verify that the output Key is alphabetized and the output Value is unchanged
		      .runTest();
		
	}

}

  

public class AnagramsReduceTest {
	private Reducer reducer;
	private ReduceDriver driver;
	@Before
	public void init() {
		reducer = new Anagrams.Anagramsreducer();
		driver = new ReduceDriver(reducer);
	}
	
	@Test
	public void test() throws IOException {
		Text key = new Text("abcdefg"); //Create a new Key, the output is fixed
		List values = new ArrayList();  //Write 4 sets of letter Value values in the new array list to verify whether they are finally output in the predetermined format
		values.add(new Text("gfedcba"));
		values.add(new Text("decgfba"));
		values.add(new Text("fedgcba"));
		values.add(new Text("gcbfeda"));
		
		driver.withInput(key, values)
		      .withOutput(key, new Text("gfedcba,decgfba,fedgcba,gcbfeda"))  //Verify output in this format
		      .runTest();
		
	}
	

}

Step 4: compile and package the project as Anagram.jar, and use the client to upload Anagram.jar to the / home/hadoop/Temp directory of hadoop.

Step 5: use cd /home/hadoop/Temp to switch to the current directory, and execute the task through Hadoop jar anagram.jar com.hadoop.base.anagram/anagram/anagram.txt/anagram/out command line.

Step 6: output the final result of the task to HDFS, and use the hadoop fs -cat /anagram/out/part-r-00000 command to view the result.

Posted by AbydosGater on Mon, 04 Nov 2019 09:53:13 -0800