Converting IP Address in web Server Log File to Host Name

Keywords: DNS

Requirement: Convert ip address in log file to host name

The format of the log file is as follows:

10.100.122.132 - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50

10.100.122.133 - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50

Effect after conversion

PC-20161220MYVT  - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50

sog  - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50

 

Solution 1: Sequential Processing

public class MainThread {
    public static void main(String[] args) {
        try (BufferedReader in = new BufferedReader(
                new InputStreamReader(new FileInputStream("a.txt"), "UTF-8"));
        BufferedWriter bw = new BufferedWriter(new FileWriter("a_01.txt",true))) {
            for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
                int index = entry.indexOf(' ');
                String address = entry.substring(0, index);
                String theRest = entry.substring(index);
                String hostname = InetAddress.getByName(address).getHostName();
                bw.append(hostname + " " + theRest);
                bw.newLine();
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Problem: The program spends a lot of time waiting for a DNS return request and does nothing during that time.

Solution 2: Using Thread Pool

A main thread reads the log file and uses the thread pool to pass each log item (each line) to other threads for processing.

In this way, due to the time-consuming DNS transformation, other threads can be executed when blocked (if DNS transformation is not time-consuming, then there is no need to use multithreading). Note that the main threads are still executed sequentially, and future s are returned sequentially in the order of reading.

2.1          DNSResolverTask

public class DNSResolverTask implements Callable<String> {

    private String line;
    public DNSResolverTask(String line) {
        this.line = line;
    }
    @Override
    public String call() {
        try {
            // separate out the IP address
            int index = line.indexOf(' ');
            String address = line.substring(0, index);
            String theRest = line.substring(index);
            //Many visitors request multiple pages when they visit the site.
            //DNS lookup is expensive, and it is not appropriate to do so if every website is searched every time it appears in a log file.
            //The InetAddress class caches the requested address. If the same address is requested again, it can be retrieved from the cache much faster than from DNS.
            String hostname = InetAddress.getByName(address).getHostName();
            return hostname + " " + theRest;
        } catch (Exception ex) {
            return line;
        }
    }
}

2.2          MainThread

public class MainThread {

    private final static int NUM_THREADS = 4;

    public static void main(String[] args) throws IOException {
        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        Queue<LogEntry> results = new LinkedList<LogEntry>();

        try (BufferedReader in = new BufferedReader(
                new InputStreamReader(new FileInputStream("a.txt"), "UTF-8"))) {
            //The main thread reads file items much faster than each thread parses domain names and ends them.

            //The file is read and a LookupTask is created for each line.
            //Ensure sequence by for loop
            for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
                DNSResolverTask task = new DNSResolverTask(entry);
                //If the DNS transformation is not blocked, there is no need to use multithreading.
                //Because DNS conversion is time-consuming, other threads can be executed when blocked, fast here.
                Future<String> future = executor.submit(task);
                LogEntry result = new LogEntry(entry, future);
                //Idea 1: How about I write directly to the document?
                //Write directly to the file, and the optional speed should not be too slow.

                results.add(result);
            }
        }

        BufferedWriter bw = new BufferedWriter(new FileWriter("a_02.txt", true));
        for (LogEntry result : results) {
            try {
                bw.append(result.future.get());
            } catch (InterruptedException e) {
                bw.append(result.original);
            } catch (ExecutionException e) {
                bw.append(result.original);
            }
            bw.newLine();
            //Don't forget flush
            bw.flush();
        }
        executor.shutdown();
    }

    private static class LogEntry {
        //The first log line
        String original;
        Future<String> future;

        LogEntry(String original, Future<String> future) {
            this.original = original;
            this.future = future;
        }
    }
}

Problem: Log files can be huge, so using LinkedList can cause the program to take up a lot of memory

Solution 3: Using producer-consumer queues

To avoid this, you can put the output in a separate thread, sharing the same queue with the input thread. Because the input can be parsed while the previous log file entries can be processed, the queue will not expand too much. But that brings another problem. You need a separate signal indicating that the output has been completed, because the empty queue is not enough to prove that the task has been completed. The easiest way is to count the number of input lines to ensure that it is consistent with the number of output lines.

3.1          DNSResolveTask

public class DNSResolveTask implements Callable<String> {

    Logger logger = LoggerFactory.getLogger(DNSResolveTask.class);

    private String line;
    public DNSResolveTask(String line) {
        this.line = line;
    }
    @Override
    public String call() {
        try {
            // separate out the IP address
            int index = line.indexOf(' ');
            String address = line.substring(0, index);
            String theRest = line.substring(index);
            //Many visitors request multiple pages when they visit the site.
            //DNS lookup is expensive, and it is not appropriate to do so if every website is searched every time it appears in a log file.
            //The InetAddress class caches the requested address. If the same address is requested again, it can be retrieved from the cache much faster than from DNS.
            String hostname = InetAddress.getByName(address).getHostName();
            //logger.info("return a line to queue");
            return hostname + " " + theRest;
        } catch (Exception ex) {
            return line;
        }
    }
}

3.2          WriteTask

public class WriterTask implements Runnable {

    Logger logger = LoggerFactory.getLogger(WriterTask.class);

    private int lineCount;
    private LinkedBlockingQueue<MainThread.LogEntry> queue;

    public WriterTask(LinkedBlockingQueue<MainThread.LogEntry> queue, int lineCount) {
        this.queue = queue;
        this.lineCount = lineCount;
    }

    @Override
    public void run() {
        BufferedWriter bw = null;
        try {
            bw = new BufferedWriter(new FileWriter("a_03.txt", true));
            while (!Thread.interrupted() && lineCount != 0) {
                if(!queue.isEmpty()) {
                    MainThread.LogEntry remove = queue.remove();
                    try {
                        logger.info("write a line");
                        bw.append(remove.future.get());
                    } catch (InterruptedException e) {
                        bw.append(remove.original);
                    } catch (ExecutionException e) {
                        bw.append(remove.original);
                    }
                    bw.newLine();
                    bw.flush();
                    lineCount--;
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

3.3          MainThread

public class MainThread {

    static Logger logger = LoggerFactory.getLogger(MainThread.class);

    private final static int NUM_THREADS = 4;

    public static void main(String[] args) throws IOException {

        final String fileName = "a.txt";

        //Calculate the number of lines in the txt file
        int lineCount = getLineCount(fileName);

        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        LinkedBlockingQueue<LogEntry> results = new LinkedBlockingQueue<>();

        executor.execute(new WriterTask(results, lineCount));

        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8"));
        for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
            DNSResolveTask task = new DNSResolveTask(entry);
            //If the DNS transformation is not blocked, there is no need to use multithreading.
            //Because DNS conversion is time-consuming, other threads can be executed when blocked, fast here.
            Future<String> future = executor.submit(task);
            LogEntry result = new LogEntry(entry, future);
            //Idea 1: How about I write directly to the document?
            //Write directly to the file, and the optional speed should not be too slow.

            //Idea 2: Place it in a list as a producer queue
            logger.info("add a line to queue");
            results.add(result);
        }

        executor.shutdown();
    }

    private static int getLineCount(String fileName) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8"));
        String line;
        int lineCount = 0;
        while((line = in.readLine())!=null){
            lineCount++;
        }
        return lineCount;
    }

    static class LogEntry {
        //The first log line
        String original;
        Future<String> future;

        LogEntry(String original, Future<String> future) {
            this.original = original;
            this.future = future;
        }
    }
}

3.4) Implementation results


Posted by AwptiK on Sun, 16 Jun 2019 13:02:06 -0700