Full text search algorithm function realization of search engine (based on Lucene)

Keywords: Database

Before To turntable net By the time, I had published the non full-text search code, and the friends in need wanted to be able to read my blog. This paper mainly discusses how to do full-text search, because I spent a long time to design a new work: viewpoint , the point of view still has a high demand for full-text search, so I spent a lot of time studying full-text search. You can first experience the following: Click me to search (Jane Book bug, can't give full search link, please search by yourself.). No more nonsense, just code

public Map<String,Object>  articleSearchAlgorithms(SearchCondition condition,IndexSearcher searcher) throws ParseException, IOException{

            Map<String,Object> map =new HashMap<String,Object>();
             String[] filedsList=condition.getFiledsList();
             String keyWord=condition.getKeyWord();
             int currentPage=condition.getCurrentPage();
             int pageSize=condition.getPageSize();
             String sortField=condition.getSortField();
             boolean isASC=condition.isDESC();
             String sDate=condition.getsDate();
            String eDate=condition.geteDate();
            String classify=condition.getClassify();

            //Filter end character
            keyWord=escapeExprSpecialWord(keyWord);

            BooleanQuery q1 = new BooleanQuery();
            BooleanQuery q2 = new BooleanQuery();
             BooleanQuery booleanQuery = new BooleanQuery(); //boolean query

             if(classify!=null&&(classify.equals("guanzhi")||classify.equals("opinion")||classify.equals("write"))){
                 String typeId="1";//Default speech
                 if(classify.equals("guanzhi")){
                     typeId="2";
                 }
                 if(classify.equals("opinion")){
                     typeId="3";
                 }
                 Query termQuery = new TermQuery(new Term("typeId",typeId)); 
                 q1.add(termQuery,BooleanClause.Occur.MUST);
             }

             if(sDate!=null&&eDate!=null){//Whether to query the range is determined by these two parameters
                Query rangeQuery = new TermRangeQuery("writingTime", new BytesRef(sDate), new BytesRef(eDate),true, true);
                q1.add(rangeQuery,BooleanClause.Occur.MUST);
             }

            Sort sort = new Sort(); // sort
            sort.setSort(SortField.FIELD_SCORE);
            if(sortField!=null){
                sort.setSort(new SortField(sortField, SortField.Type.STRING, isASC));
            }

            int start = (currentPage - 1) * pageSize;
            int hm = start + pageSize;

            TopFieldCollector res = TopFieldCollector.create(sort,hm,false, false, false, false);

            //Exactly match query
            Term t0=new Term(filedsList[1],keyWord);
            TermQuery termQuery = new TermQuery(t0);//Two kinds of highly matched queries
            q2.add(termQuery,BooleanClause.Occur.SHOULD);

            //prefix match 
            Term t1=new Term(filedsList[1],keyWord);
            PrefixQuery prefixQuery=new PrefixQuery(t1);
            q2.add(prefixQuery,BooleanClause.Occur.SHOULD);

            //Phrase, similarity matching, suitable for segmentation content
            for(int i=0;i<filedsList.length;i++){ //Multi field term query algorithm
                if(i!=1){
                    PhraseQuery phraseQuery=new PhraseQuery();
                    Term ts0=new Term(filedsList[i],keyWord);
                    phraseQuery.add(ts0);

                    FuzzyQuery fQuery=new FuzzyQuery(new Term(filedsList[i],keyWord),2);//Last similarity query

                    q2.add(phraseQuery,BooleanClause.Occur.SHOULD);
                    q2.add(fQuery,BooleanClause.Occur.SHOULD);//The suffixes are similar
                }
            }

            MultiFieldQueryParser  queryParser = new MultiFieldQueryParser(Version.LUCENE_47,filedsList,analyzer);
            queryParser.setDefaultOperator(QueryParser.AND_OPERATOR);
            Query query = queryParser.parse(keyWord);

            q2.add(query,BooleanClause.Occur.SHOULD);

            //Must add logical judgment, otherwise the result is different
            if(q1!=null && q1.toString().length()>0){
                booleanQuery.add(q1,BooleanClause.Occur.MUST);
            }
            if(q2!=null && q2.toString().length()>0){
                 booleanQuery.add(q2,BooleanClause.Occur.MUST);
            }

            searcher.search(booleanQuery, res);
            long amount = res.getTotalHits(); 
            TopDocs tds = res.topDocs(start, pageSize);
            map.put("amount",amount);
            map.put("tds",tds);
            map.put("query",booleanQuery);
            return map;
    }

Note that the search condition of the above code is Viewpoint net You can make changes according to your own search criteria, and it's hard to fit all readers here.

public Map<String, Object> searchArticle(SearchCondition condition) throws Exception{

        Map<String,Object> map =new HashMap<String,Object>();
        List<Write> list=new ArrayList<Write>();

         DirectoryReader reader=condition.getReader();
         String URL=condition.getURL();
         boolean isHighligth=condition.isHighlight();
         String keyWord=condition.getKeyWord();
         IndexSearcher searcher=getSearcher(reader,URL);

        try{
            Map<String,Object> output=articleSearchAlgorithms(condition,searcher);
            if(output==null){
                map.put("amount",0L);
                map.put("source",null);
                return map;
            }

            map.put("amount", output.get("amount"));
            TopDocs tds = (TopDocs) output.get("tds");
            ScoreDoc[] sd = tds.scoreDocs;
            Query query =(Query) output.get("query");

            for (int i = 0; i < sd.length; i++) {

                Document doc = searcher.doc(sd[i].doc);

                String id = doc.get("id");
                /**********************start*************************Put together what needs to be dealt with********************/
                String temp=doc.get("title");
                String title =temp; //Not highlighted by default
                if(isHighligth){
                    //Highlight article title
                    Highlighter highlighterTitle = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterTitle.setTextFragmenter(new SimpleFragmenter(40)); // Word length
                    TokenStream ts = analyzer.tokenStream("title", new StringReader(temp));
                    title= highlighterTitle.getBestFragment(ts,temp); 
                    if(title==null){
                        title=temp.replace(keyWord,"<span style='color:red'>"+keyWord+"</span>");//Highlight plug-in bug s, add this sentence to avoid
                    }
                }

                String temp1=HtmlEnDecode.htmlEncode(doc.get("content"));
                String content=temp1;//Use your own encapsulated method to escape

                if(isHighligth){
                    //Highlight, content
                    Highlighter highlighterContent = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
                    highlighterContent.setTextFragmenter(new SimpleFragmenter(Constant.HIGHLIGHT_CONTENT_LENGTH)); // Word length
                    //temp1=StringEscapeUtils.escapeHtml(temp1); / / escape the Chinese character to cause the highlight to fail
                    TokenStream ts1 = analyzer.tokenStream("content", new StringReader(temp1));
                    content = highlighterContent.getBestFragment(ts1,temp1);

                    if(content==null){
                        content=temp1.replace(keyWord,"<span style='color:red'>"+keyWord+"</span>");//Highlight plug-in bug s, add this sentence to avoid

                        //If you deal with this situation, other highlighters will automatically take screenshots
                        content=subContent(content);//Interception processing
                        content=HtmlEnDecode.htmldecode(content);//html decoding
                        content=SubStringHTML.sub(content,Constant.HIGHLIGHT_CONTENT_LENGTH);
                    }
                }
                /*---------------------------------------Keep changing data together----------------------------*/

                Write write=writeDao.getArticle(Long.parseLong(id));
                if(write!=null){
                    write.setTitle(title);
                    write.setContent(content);

                    Date writingTime=write.getWritingTime();
                    String timeGap=DateUtil.dateGap(writingTime);//timeGap
                    write.setTimeGap(timeGap);

                    list.add(write);
                }
            }

        }catch(Exception e){
            e.printStackTrace();
        }
        map.put("source",list);
        return map;
    }

Note that the above is the specific search code. Different application scenarios have different requirements. Please encapsulate the object and query the database according to your own requirements. The code is unreserved and absolutely available.

If you have any questions, you can add qq group: 284205104. If the group is full, please go there To the turntable Just find the latest group and add it. Thank you for reading.

Posted by judgy on Thu, 07 May 2020 08:46:09 -0700