Lucene 3.0 Optimized Operation

What is Lucene?

We use Lucene, mainly to do in-site search, that is, to search for resources within a system. For example, BBS, BLOG articles search, online store goods search, etc. So, after learning Lucene, we can add full-text search function for our own project. The exercises related to this learning content are: multi-conditional search for merchandise in the mall. After the search, highlight the eligible information.

Operating index libraries using Lucene's API

An index library is a catalog, which contains some binary files, just like a database, all the data is in the form of files in the file system. Instead of directly manipulating these binary files, we use the API provided by Lucene to perform the corresponding operations, just as we should use SQL statements to manipulate databases.

The operation of index database can be divided into two kinds: management and query. IndexWriter is used to manage index libraries and IndexSearcher is used to query from index libraries. Lucene's data structure is Document and Field. Document represents a data and Field represents an attribute in the data. There are multiple fields in a Document and the value of Field is String, because Lucene only processes text.

We just need to convert the objects in our program into Documents, which can be managed by Lucene. The list of data in the search results is also a collection of Documents.

1. Problems of IndexWriter

IndexWriter: It must be monopolized by singleton mode. Because every Writer needs lock file. IndexWriter itself is an operation class and supports multiple threads, so a global IndexWriter can be used.

The following tests throw exceptions:

@Test
public void testIndexWriter()throws Exception{
        IndexWriter indexWriter = new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
        IndexWriter indexWriter2 = new IndexWriter(Configuraction.getDirectory(),Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
    }

IndexWriter optimization

public class LuceneOperUtil {
    private static IndexWriter indexWriter = null;
    public static IndexWriter getIndexWriter() {
        return indexWriter;
    }
    static {
        try {
            indexWriter = new IndexWriter(Configuraction.getDirectory(),
                    Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
            Runtime.getRuntime().addShutdownHook(new Thread() {
                public void run() {
                    try {
                        indexWriter.optimize();
                    } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }finally{
                        closeIndexWriter(indexWriter);
                    }
                }
            });
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
    public static void closeIndexWriter(IndexWriter indexWriter) {
        try {
            indexWriter.close();
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
}

2. The problems of IndexSearcher:

with IndexWriter hold IndexSearcher Set to global code as follows:
public class LuceneOperUtil {

    private static IndexSearcher indexSearcher=null;
    public static IndexSearcher getIndexSearcher() {
        return indexSearcher;
    }
    public static void closeIndexSearcher(IndexSearcher indexSearcher) {
        try {
            indexSearcher.close();
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
    static {
        try {
            indexSearcher= new IndexSearcher(Configuraction.getDirectory());
            // Registration Exit Procedure Event
            Runtime.getRuntime().addShutdownHook(new Thread() {
                public void run() {
                    closeIndexSearcher(indexSearcher);
                        indexWriter.optimize();
                    }           
});
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
}

IndexSearcher global problems:

IndexSearcher is invisible to all update operations after it is created. Therefore, it is not possible to use a single column mode directly like IndexWriter.

In fact, it's no problem to create an IndexSearcher when using each method.

But that's very low performance. A better way is to share an Index Searcher with the project.

If and only if IndexWriter has done so, close the current IndexSearcher.

Create a new IndexSearcher when in use

IndexSearcher optimization

public class LuceneOperUtil {
    private static IndexWriter indexWriter = null;
    private static IndexSearcher indexSearcher = null;
    // synchronized avoids threads creating multiple IndexSearcher s synchronously
    public static IndexSearcher getIndexSearcher() {
        // If IndexSearcher is NULL, create a new IndexSearcher
        if (indexSearcher == null) {
            synchronized (LuceneOperUtil.class) {
                if (indexSearcher == null) {
                    try {
                        indexSearcher = new IndexSearcher(Configuraction
                                .getDirectory());
                    } catch (Exception e) {
                        new RuntimeException(e);
                    }
                }
            }
        }
        return indexSearcher;
    }
    public static IndexWriter getIndexWriter() {
        // IndexSearcher for Destruction Principles does not need to create a new indexSearcher immediately to save resources
        closeIndexSearcher(indexSearcher);
        indexSearcher=null;
        return indexWriter;
    }

    static {
        try {
            indexWriter = new IndexWriter(Configuraction.getDirectory(),
                    Configuraction.getAnalyzer(), MaxFieldLength.LIMITED);
            // Setting merge factor
            indexWriter.setMergeFactor(5);
            // Registration Exit Procedure Event
            Runtime.getRuntime().addShutdownHook(new Thread() {
                public void run() {
                    try {
                        closeIndexSearcher(indexSearcher);
                        indexWriter.optimize();
                    } catch (Exception e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    } finally {
                        LuceneOperUtil.closeIndexWriter(indexWriter);
                    }
                }
            });
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
    public static void closeIndexWriter(IndexWriter indexWriter) {
        try {
            if (indexWriter!=null)indexWriter.close();
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }

    public static void closeIndexSearcher(IndexSearcher indexSearcher) {
        try {
             if(indexSearcher!=null){
                  indexSearcher.close();
           }
        } catch (Exception e) {
            new RuntimeException(e);
        }
    }
}

So far, the Lucene 3.0 optimization operation has been completed, and the Lucene learning process is recorded only in this blog post.

Posted by Dang on Wed, 26 Jun 2019 15:48:44 -0700

Programmer Group

Lucene 3.0 Optimized Operation

Lucene 3.0 Optimized Operation

1. Problems of IndexWriter

2. The problems of IndexSearcher:

Hot Keywords