Using haystack to realize the function of django full-text search engine

Keywords: Python Django pip Database

Preface

django is a powerful web framework of python language. With some plug-ins, it is easy to add search functions to web sites.

The search engine uses whoosh, which is a full-text search engine implemented by pure python. It is compact and simple.

Chinese search requires Chinese word segmentation, using jieba.

Direct use of whoosh in django projects needs to pay attention to some basic details. haystack, a search framework, can easily add search functions directly in django without paying attention to the details of index building, search parsing and so on.

haystack supports a variety of search engines, not only whoosh, but also solr, elastic search and other search engines, and can switch the engine directly, even without modifying the search code.

Configuration search

1. Installation of related packages

pip install django-haystack
pip install whoosh
pip install jieba

2. Configure settings for django

Modify the settings.py file and add the haystack application:

INSTALLED_APPS = (
    ...
    'haystack', #Put haystack at the end
)

Adding haystack configuration to settings:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'haystack.backends.whoosh_cn_backend.WhooshEngine',
        'PATH': os.path.join(BASE_DIR, 'whoosh_index'),
    }
}

# Add this item, when the database changes, it will automatically update the index, which is very convenient.
HAYSTACK_SIGNAL_PROCESSOR = 'haystack.signals.RealtimeSignalProcessor'

3. Add url

In the urls.py of the entire project, the url path that configures the search function is:

urlpatterns = [
    ...
    url(r'^search/', include('haystack.urls')),
]

4. Add an index to the application directory

In the subapplication directory, create a file named search_indexes.py.

from haystack import indexes
# Modify this for your own model
from models import GoodsInfo

# Modify the name of the class here to be the model class + Index, for example, the model class is GoodsInfo, and the class here is GoodsInfoIndex.
class GoodsInfoIndex(indexes.SearchIndex, indexes.Indexable):
    text = indexes.CharField(document=True, use_template=True)

    def get_model(self):
        # Modify this for your own model
        return GoodsInfo

    def index_queryset(self, using=None):
        return self.get_model().objects.all()

Explain:
1) Modify the three notes above.
2) This file specifies how to index existing data. At get_model, you can directly complete the index by placing the model in django, without paying attention to the details of database reading and index establishment.
3) text=indexes.CharField, which specifies which fields in the model class will be indexed, and use_template=True indicates that we need to specify a template file to tell you which fields to index later.

5. Specify index template file

Create the Model Class Name_text.txt file under "templates/search/indexes/application name/" of the project.

For example, if the model class above is named GoodsInfo, create goodsinfo_text.txt (all lowercase). This file specifies which fields in the model are indexed and written as follows: (modify Chinese only, do not change object)

{object. field 1}}
{object. field 2}}}
{object. field 3}}}

6. Specify search results page

Under templates/search / create a search.html page.

<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
{% if query %}
    <h3>The search results are as follows:</h3>
    {% for result in page.object_list %}
        <a href="/{{ result.object.id }}/">{{ result.object.gName }}</a><br/>
    {% empty %}
        <p>Nothing was found.</p>
    {% endfor %}

    {% if page.has_previous or page.has_next %}
        <div>
            {% if page.has_previous %}<a href="?q={{ query }}&amp;page={{ page.previous_page_number }}">{% endif %}&laquo; Previous page{% if page.has_previous %}</a>{% endif %}
        |
            {% if page.has_next %}<a href="?q={{ query }}&amp;page={{ page.next_page_number }}">{% endif %}next page &raquo;{% if page.has_next %}</a>{% endif %}
        </div>
    {% endif %}
{% endif %}
</body>
</html>

7. Using jieba Chinese word segmenter

Under the installation folder of haystack, a file named ChineseAnalyr.py is created by path such as'/ home/python/.virtualenvs/django_py2/lib/python 2.7/site-packages/haystack/backends'.

import jieba
from whoosh.analysis import Tokenizer, Token


class ChineseTokenizer(Tokenizer):
    def __call__(self, value, positions=False, chars=False,
                 keeporiginal=False, removestops=True,
                 start_pos=0, start_char=0, mode='', **kwargs):
        t = Token(positions, chars, removestops=removestops, mode=mode,
                  **kwargs)
        seglist = jieba.cut(value, cut_all=True)
        for w in seglist:
            t.original = t.text = w
            t.boost = 1.0
            if positions:
                t.pos = start_pos + value.find(w)
            if chars:
                t.startchar = start_char + value.find(w)
                t.endchar = start_char + value.find(w) + len(w)
            yield t


def ChineseAnalyzer():
    return ChineseTokenizer()

8. Switch back-end of whoosh to Chinese word segmentation

Copy the whoosh_backend.py file in the above backends directory, named whoosh_cn_backend.py, and open the file for replacement:

# Introduce the Chinese word segmentation just added at the top.
from .ChineseAnalyzer import ChineseAnalyzer 

# In the entire py file, find
analyzer=StemmingAnalyzer()
All changed to
analyzer=ChineseAnalyzer()
There are probably two or three places altogether.

9. Generating Index

Manually generate an index:

python manage.py rebuild_index

10. Implementing Search Entry

Add a search box to the page:

<form method='get' action="/search/" target="_blank">
    <input type="text" name="q">
    <input type="submit" value="query">
</form>

Rich Customization

The above is just a quick completion of a basic search engine, haystack and more customizable to achieve personalized needs.

Refer to official documents: http://django-haystack.readthedocs.io/en/master/

Custom Search view

In the configuration above, search related requests are imported into haystack.urls. If you want to customize the view of the search to achieve more functions, you can modify it.

The content of haystack.urls is very simple.

from django.conf.urls import url  
from haystack.views import SearchView  
  
urlpatterns = [  
    url(r'^$', SearchView(), name='haystack_search'),  
]  

So let's write a view, inherit from SearchView, and import the search url into the custom view.

class MySearchView(SearchView):
# Rewrite related variables or methods
template = 'search_result.html'

Look at the source code or documentation of SearchView to see what each method does, and you can modify it pertinently.
For example, it overrides the template variable and modifies the location of the search results page template.

Highlight

In the search results page template, you can use the highlight tag (you need to load it first)

{% highlight <text_block> with <query> [css_class "class_name"] [html_tag "span"] [max_length 200] %}

text_block is the whole text, query is the highlighted keyword, and the following optional parameters can define the html tag of the highlighted keyword, the css class name, and the longest length of the highlighted part.

The source code of the highlight part is located in the haystack/template tags/lighlight.py and haystack/utils/lighlighting.py files, which can be copied and modified to realize the custom highlight function.

ref.

  1. http://django-haystack.readthedocs.io/en/master/
  2. http://blog.csdn.net/ac_hell/article/details/52875927

Posted by kataras on Thu, 30 May 2019 12:02:25 -0700