ElasticSearch7.x Series IV: actual combat

Keywords: JDBC ElasticSearch Database Windows

catalog

preface
ES series installation
Data synchronization
How to search ES in. net code
end

preface

The first three series understand the installation and some basic use. In this chapter, I'll give you a real fight

First of all, no matter in the company or in the personal use, the initial problems are:

How to install the ElasticSearch series including es, ES head, kibana and word breaker?
How can I synchronize the data tables in my database to ES and update them incrementally?
How to call my client? How to highlight? How to full-text search, conditional search?

When I study ElasticSearch with these three questions, I found that there are no good articles on the Internet, including now, on May 50, 2020, I search the articles on ElasticSearch on the Internet. Most of them are translations of official documents, and synchronous data is written with the IndexDocument method of ES operation. I am surprised that millions of data in your data table are inserted with the ES insertion method?

For client call, I use. net. After ES7.x version, type has been abolished. net client uses Nest. Similarly, there are no good articles for searching on the Internet. Most of them are simple translations of official documents, not even a highlight or multi criteria search

So, I wrote this article, which is a series of four practical articles. Follow me, you can get

Incremental synchronization of data table to ES, including addition, update, but excluding deletion
net, including full-text search, single field search and multi criteria search

ps:lucca, I know you're looking 🐷

ES series installation

Look at the first three series. You need to install es, ES head, word breaker, Kibana,Logstash

Es head and kibana are only visual tools. They can be installed without installation. However, it is recommended to install at least one es head

Go to see my first three installments. I use the Windows version here. Just search on the Internet for Docker installation under linux

Data synchronization

I have several data tables, about millions of data. It's obviously not practical to use the ES insertion method, so we use Logstash to synchronize the data here

For example, I now have three tables: News table, video table and article table. My website search is also aimed at these three tables. Because the ES7.x version discards type, both methods can be used at present

Three tables and three index indexes
Create an index index for three tables, but add an estype field to distinguish

Both of these methods are available, but I use three tables, and I build three index indexes

After defining this concept, we start to write the Logstash configuration file. First, you need to determine whether your database is SQLserver or Mysql, download the corresponding JDBC driver, and install the JDK for Logstash

I use SQLserver here, so I can search directly: Microsoft SQL Server JDBC, and then download the driver

I used to put the driver in the bin directory of Logstash, and create a new folder in the bin directory called: JDBC config

Then put the driver in, start to write the configuration file, and start a name, such as mine jdbc.config

input {
    jdbc {
      jdbc_driver_library => "D:\Vae\ElasticSearch\logstash-7.6.2\logstash-7.6.2\bin\jdbcconfig\mssql-jdbc-8.2.2.jre8.jar"
      jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      jdbc_connection_string => "jdbc:sqlserver://192.168.100.100:1433;DatabaseName=VaeDB;"
      jdbc_user => "sa"
      jdbc_password => "666666"
      schedule => "* * * * *"
      statement => "select NewsID as Id,Title as title,CreateDate as createDate,Content as content,CONVERT (VARCHAR (30),UpdateDate,25) AS UpdateDate from News where UpdateDate > :sql_last_value"
      use_column_value => true
      tracking_column => "UpdateDate"
      tracking_column_type => "timestamp"
      type => "News"
    }
    jdbc {
      jdbc_driver_library => "D:\Vae\ElasticSearch\logstash-7.6.2\logstash-7.6.2\bin\jdbcconfig\mssql-jdbc-8.2.2.jre8.jar"
      jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      jdbc_connection_string => "jdbc:sqlserver://192.168.100.100:1433;DatabaseName=VaeDB;"
      jdbc_user => "sa"
      jdbc_password => "666666"
      schedule => "* * * * *"
      statement => "select ArticleId as Id,Title as title,CreateDate as createDate,Content as content,CONVERT (VARCHAR (30),UpdateDate,25) AS UpdateDate from Article where UpdateDate > :sql_last_value"
      use_column_value => true
      tracking_column => "UpdateDate"
      tracking_column_type => "timestamp"
      type => "Article"
    }
    jdbc {
      jdbc_driver_library => "D:\Vae\ElasticSearch\logstash-7.6.2\logstash-7.6.2\bin\jdbcconfig\mssql-jdbc-8.2.2.jre8.jar"
      jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
      jdbc_connection_string => "jdbc:sqlserver://192.168.100.100:1433;DatabaseName=VaeDB;"
      jdbc_user => "sa"
      jdbc_password => "666666"
      schedule => "* * * * *"
      statement => "select VideoId as Id,Title as title,CreateDate as createDate,Content as content,CONVERT (VARCHAR (30),UpdateDate,25) AS UpdateDate from Video where UpdateDate > :sql_last_value"
      use_column_value => true
      tracking_column => "UpdateDate"
      tracking_column_type => "timestamp"
      type => "Video"
    }
}

filter {
    mutate {
            add_field => {
                    "[@metadata][NewsID]" => "%{Id}"
            }
            add_field => {
                    "[@metadata][ArticleId]" => "%{Id}"
            }
            add_field => {
                    "[@metadata][VideoId]" => "%{Id}"
            }
    }  
}

output {
  if [type] == "News"{
    elasticsearch {
      hosts  => "192.168.100.100:9200"
      index => "news"
      action => "index"
      document_id => "%{[@metadata][NewsID]}"
    }
    }
  if [type] == "Article"{
    elasticsearch {
      hosts  => "192.168.100.100:9200"
      index => "article"
      action => "index"
      document_id => "%{[@metadata][ArticleId]}"
    }
  }
  if [type] == "Video"{
    elasticsearch {
      hosts  => "192.168.100.100:9200"
      index => "video"
      action => "index"
      document_id => "%{[@metadata][VideoId]}"
    }
  }
}

About this configuration file, my three Logstash articles are very detailed. If you don't understand them, you can go and have a look

Pay attention

jdbc_ driver_ Replace library with your own drive path
jdbc_connection_string to your own database connection string, the following account password is the same
Change the ElasticSearch hosts in the output to your own ES address

ok. After the configuration file is written, let's execute it. Open the windows power shell input in the bin directory of Logstash

logstash -f jdbcconfig/jdbc.conf

After a while, you can open es head or Kibana to see that the data of the three tables are synchronized to the corresponding three indexes in ES

Because I use the update date as the update basis field, new addition and update operations will be updated incrementally. The update frequency is set by myself. I don't understand series 3

Adding and updating Logstash will help us. What about deleting it?

Two methods are given by the government

The database uses pseudo deletion, IsDelete field, 0 changes to 1, and then periodically cleans up the database and the data with IsDelete of 1 in ES
When deleting the database in the code, directly call the delete method of ES to delete the data in ES

Look at both ways. If there are few watches, it's good to use 2

How to search ES in. net code

Follow me to do this step. The data already exists. How to write the code? Search Nest in NuGet in. net and install it

And then by the way, I have a problem with multiple Index searches

How to accept multiple Index data?

Because the fields of my news table and video table are different. When I accept them, Nest can only write one Model to accept them

I think of two ways. One is to write generics, as follows

client.Search<T>(s => s
        .Index(indexName)
        .Query(q => q
            .Match(m => m
                .Field(f => f.Title)
                .Query(keyword))
        )

But Nest is disgusting, you know? I can write generics, of course, but my next Title search is wrong

.Field(f => f.Title)

The document seems to have nothing written, and the articles searched online seem to have this writing method

.Field("title")

I see it's OK, so my generics can be used, but the error is reported. I didn't try it out, you can try it

So I took another approach, using a Model to receive, that is, my defined ViewModel AllInformationViewModel

I have used all the datab as e table fields once. The results of the three tables are only ID, title, content and createDate

Anyway, the results of my ES query also show these contents. It's convenient for me to keep all the names consistent

Single Index and multiple Index

I just want to search the contents of the video table. That's the single Index. I want to search the contents of the three tables of video, news and articles. That's the multiple Index

string[] indexName = new string[] { "article", "news", "video" };

client.Search<AllInformationViewModel>(s => s
        .Index(indexName)

Perfect. Is indexName a single index or a multiple index

Single field search and full text retrieval

I want to search for the title, that is, the title highlighting. I want to search for the full text, including the title and the text description, and then both of them are highlighted

#Search title only
client.Search<AllInformationViewModel>(s => s
        .Index(indexName)
        .Query(q => q
            .Match(m => m
                .Field(f => f.Title)
                .Query(keyword))
        )
    
#Full text search
client.Search<AllInformationViewModel>(s => s
        .Index(indexName)
        .Query(q => q
        .QueryString(qs => qs
            .Query(keyword).DefaultOperator(Operator.And))

Multiple conditions, time range + paging + highlighting

I'm too lazy to write. Post the code. Here is the time range of full-text search + paging + full-text highlighting

return client.Search<AllInformationViewModel>(s => s
        .Index(indexName)
        .From(pageInfo.PageIndex)
        .Size(pageInfo.PageSize)
        .Query(q => q
        .QueryString(qs => qs
            .Query(keyword).DefaultOperator(Operator.And))
        && q
        .DateRange(d => d
            .Field(f => f.CreateDate)
            .GreaterThanOrEquals(startTime)
            .LessThan(endTime)
            )
        )
        .Highlight(h => h
            .PreTags("<em>")
            .PostTags("</em>")
            .Fields(
                fs => fs
                    .Field(p => p.Title),
                fs => fs
                    .Field(p => p.Content)
)));

Another interesting thing is that the Highlight found by ES is in Hit's Highlight. You have to manually assign it

if (search.Hits?.Count() > 0)
{
    foreach (var hit in search.Hits)
    {
        var allInformationViewModel = new AllInformationViewModel
        {
            Id = int.Parse(hit.Id),
            KeyName = hit.Source.KeyName,
            Title = hit.Source.Title,
            Content = hit.Source.Content,
            Score = hit.Score,
            Etype = hit.Source.Etype,
            Picture = hit.Source.Picture,
            CreateDate = hit.Source.CreateDate
        };
        foreach (var highlightField in hit.Highlight)
        {
            if (highlightField.Key == "title")
            {
                foreach (var highlight in highlightField.Value)
                {
                    allInformationViewModel.Title = highlight;
                }
            }
            else if (highlightField.Key == "content")
            {
                allInformationViewModel.Content = string.Empty;
                short num = 0;
                foreach (var highlight in highlightField.Value)
                {
                    allInformationViewModel.Content += DataValidator.CleanHTMLExceptem(highlight) + "...";
                    num += 1;
                    if (num > 3)
                    {
                        break;
                    }
                }
            }
        }
        allInformationViewModels.Add(allInformationViewModel);
    }
}

The above code is very simple. Take out the Highlight in Highlight directly. If it is judged as the Highlight of title, assign it to title. If it is judged as the Highlight of content, assign it to content field. But the content body may have several values. I'll take three to show it. Separate them with

Because most of the body Content contains HTML tags, we need to remove the HTML tags, but not the em tags, because em is our highlighted tag

end

In this practical battle, you can get the most basic function. The most basic data + search is OK. The front-end page is very simple. I didn't write it. I finished it directly with ul li. Then I defined the method parameters. ES encapsulates a Helper class, a static instance and the basic method encapsulates the completion

But there are also ES security and other Nest syntax that need to be learned, but the rest of the documents are almost the same

Posted by MarcAndreTalbot on Sat, 30 May 2020 23:57:21 -0700

Programmer Group