Lucene note 01 introduction to Lucene and preliminary index creation

Keywords: Big Data Database Junit Java

I. significance of Lucene's existence

Our usual database search, such as content search, we need to use the like method, which is very inefficient. This is about Lucene. When Lucene searches, it uses an index. This index is similar to the front index table when we look up a dictionary. Compared with looking up a dictionary, we use the index table to query. The efficiency will be very high, which is the significance of full-text search. First, Lucene will create an index for the whole structure, and then search according to the index. Another point is that if the content we need to query is in the file, rather than in the database table, it will be difficult to implement, but Lucene can.

II. Download of Lucene

Can be in Lucene download address Download the corresponding version of Lucene. The 3.5.0 version of Lucene used here needs to be noted: Lucene is not downward compatible, and the gap between different versions may be large.

III. construction of Lucene project

For simplicity, let's create a Java project, download a junit-4.7 jar package, put the two jar packages lucene-core-3.5.0.jar and junit-4.7.jar just downloaded into lib under the project, and add them to build path. In Lucene tool, it can be roughly divided into three parts: index part, word segmentation part and search part.

IV. create index

After the index is created, you can find some files in the E:\Lucene\IndexLibrary directory. These files are called indexes. After that, the Luke tool is used to see the contents of these indexes.

public void index() {
        IndexWriter indexWriter = null;
        try {
            // Create Directory to store index, which can be placed in memory or hard disk
            Directory directory = null;
            // directory = new RAMDirectory(); / / created in memory
            directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));// Create on hard disk
            // Create IndexWriter to write index
            IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, new StandardAnalyzer(Version.LUCENE_35));
            indexWriter = new IndexWriter(directory, indexWriterConfig);
            // Create a Document object, that is, a Document
            Document document = null;
            // Add a Field to a Document. A Document has many attributes, such as size, path, creation time, etc. These are called fields
            File[] files = new File("E:\\Lucene\\SearchSource").listFiles();
            for (File file : files) {
                document = new Document();
                document.add(new Field("content", new FileReader(file)));
                document.add(new Field("fileName", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                document.add(new Field("path", file.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
                // Add document to index with IndexWriter
                indexWriter.addDocument(document);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (indexWriter != null) {
                try {
                    indexWriter.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }

 

Posted by pwicks on Sat, 14 Dec 2019 11:15:00 -0800