Requirement: Convert ip address in log file to host name
The format of the log file is as follows:
10.100.122.132 - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50 10.100.122.133 - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50 |
Effect after conversion
PC-20161220MYVT - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50 sog - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50 |
Solution 1: Sequential Processing
public class MainThread { public static void main(String[] args) { try (BufferedReader in = new BufferedReader( new InputStreamReader(new FileInputStream("a.txt"), "UTF-8")); BufferedWriter bw = new BufferedWriter(new FileWriter("a_01.txt",true))) { for (String entry = in.readLine(); entry != null; entry = in.readLine()) { int index = entry.indexOf(' '); String address = entry.substring(0, index); String theRest = entry.substring(index); String hostname = InetAddress.getByName(address).getHostName(); bw.append(hostname + " " + theRest); bw.newLine(); } } catch (UnsupportedEncodingException e) { e.printStackTrace(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }
Problem: The program spends a lot of time waiting for a DNS return request and does nothing during that time.
Solution 2: Using Thread Pool
A main thread reads the log file and uses the thread pool to pass each log item (each line) to other threads for processing.
In this way, due to the time-consuming DNS transformation, other threads can be executed when blocked (if DNS transformation is not time-consuming, then there is no need to use multithreading). Note that the main threads are still executed sequentially, and future s are returned sequentially in the order of reading.
2.1 DNSResolverTask
public class DNSResolverTask implements Callable<String> { private String line; public DNSResolverTask(String line) { this.line = line; } @Override public String call() { try { // separate out the IP address int index = line.indexOf(' '); String address = line.substring(0, index); String theRest = line.substring(index); //Many visitors request multiple pages when they visit the site. //DNS lookup is expensive, and it is not appropriate to do so if every website is searched every time it appears in a log file. //The InetAddress class caches the requested address. If the same address is requested again, it can be retrieved from the cache much faster than from DNS. String hostname = InetAddress.getByName(address).getHostName(); return hostname + " " + theRest; } catch (Exception ex) { return line; } } }
2.2 MainThread
public class MainThread { private final static int NUM_THREADS = 4; public static void main(String[] args) throws IOException { ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS); Queue<LogEntry> results = new LinkedList<LogEntry>(); try (BufferedReader in = new BufferedReader( new InputStreamReader(new FileInputStream("a.txt"), "UTF-8"))) { //The main thread reads file items much faster than each thread parses domain names and ends them. //The file is read and a LookupTask is created for each line. //Ensure sequence by for loop for (String entry = in.readLine(); entry != null; entry = in.readLine()) { DNSResolverTask task = new DNSResolverTask(entry); //If the DNS transformation is not blocked, there is no need to use multithreading. //Because DNS conversion is time-consuming, other threads can be executed when blocked, fast here. Future<String> future = executor.submit(task); LogEntry result = new LogEntry(entry, future); //Idea 1: How about I write directly to the document? //Write directly to the file, and the optional speed should not be too slow. results.add(result); } } BufferedWriter bw = new BufferedWriter(new FileWriter("a_02.txt", true)); for (LogEntry result : results) { try { bw.append(result.future.get()); } catch (InterruptedException e) { bw.append(result.original); } catch (ExecutionException e) { bw.append(result.original); } bw.newLine(); //Don't forget flush bw.flush(); } executor.shutdown(); } private static class LogEntry { //The first log line String original; Future<String> future; LogEntry(String original, Future<String> future) { this.original = original; this.future = future; } } }
Problem: Log files can be huge, so using LinkedList can cause the program to take up a lot of memory
Solution 3: Using producer-consumer queues
To avoid this, you can put the output in a separate thread, sharing the same queue with the input thread. Because the input can be parsed while the previous log file entries can be processed, the queue will not expand too much. But that brings another problem. You need a separate signal indicating that the output has been completed, because the empty queue is not enough to prove that the task has been completed. The easiest way is to count the number of input lines to ensure that it is consistent with the number of output lines.
3.1 DNSResolveTask
public class DNSResolveTask implements Callable<String> { Logger logger = LoggerFactory.getLogger(DNSResolveTask.class); private String line; public DNSResolveTask(String line) { this.line = line; } @Override public String call() { try { // separate out the IP address int index = line.indexOf(' '); String address = line.substring(0, index); String theRest = line.substring(index); //Many visitors request multiple pages when they visit the site. //DNS lookup is expensive, and it is not appropriate to do so if every website is searched every time it appears in a log file. //The InetAddress class caches the requested address. If the same address is requested again, it can be retrieved from the cache much faster than from DNS. String hostname = InetAddress.getByName(address).getHostName(); //logger.info("return a line to queue"); return hostname + " " + theRest; } catch (Exception ex) { return line; } } }
3.2 WriteTask
public class WriterTask implements Runnable { Logger logger = LoggerFactory.getLogger(WriterTask.class); private int lineCount; private LinkedBlockingQueue<MainThread.LogEntry> queue; public WriterTask(LinkedBlockingQueue<MainThread.LogEntry> queue, int lineCount) { this.queue = queue; this.lineCount = lineCount; } @Override public void run() { BufferedWriter bw = null; try { bw = new BufferedWriter(new FileWriter("a_03.txt", true)); while (!Thread.interrupted() && lineCount != 0) { if(!queue.isEmpty()) { MainThread.LogEntry remove = queue.remove(); try { logger.info("write a line"); bw.append(remove.future.get()); } catch (InterruptedException e) { bw.append(remove.original); } catch (ExecutionException e) { bw.append(remove.original); } bw.newLine(); bw.flush(); lineCount--; } } } catch (IOException e) { e.printStackTrace(); } } }
3.3 MainThread
public class MainThread { static Logger logger = LoggerFactory.getLogger(MainThread.class); private final static int NUM_THREADS = 4; public static void main(String[] args) throws IOException { final String fileName = "a.txt"; //Calculate the number of lines in the txt file int lineCount = getLineCount(fileName); ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS); LinkedBlockingQueue<LogEntry> results = new LinkedBlockingQueue<>(); executor.execute(new WriterTask(results, lineCount)); BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8")); for (String entry = in.readLine(); entry != null; entry = in.readLine()) { DNSResolveTask task = new DNSResolveTask(entry); //If the DNS transformation is not blocked, there is no need to use multithreading. //Because DNS conversion is time-consuming, other threads can be executed when blocked, fast here. Future<String> future = executor.submit(task); LogEntry result = new LogEntry(entry, future); //Idea 1: How about I write directly to the document? //Write directly to the file, and the optional speed should not be too slow. //Idea 2: Place it in a list as a producer queue logger.info("add a line to queue"); results.add(result); } executor.shutdown(); } private static int getLineCount(String fileName) throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8")); String line; int lineCount = 0; while((line = in.readLine())!=null){ lineCount++; } return lineCount; } static class LogEntry { //The first log line String original; Future<String> future; LogEntry(String original, Future<String> future) { this.original = original; this.future = future; } } }