Query And Fetch & Query Then Fetch & DFS Query And Fetch & DFS Query Then Fetch

Keywords: Java ElasticSearch

Routine retrieval of ES includes four retrieval methods (SCAN and COUNT are no longer recommended)

Query And Fetch(Q_A_F)

Query Then Fetch(Q_T_F)

DFS Query And Fetch(DFS_Q_A_F)

DFS Query Then Fetch(DFS_Q_T_F)

 

DFS

ES uses TF-IDF statistical methods to assess the importance of retrieving documents. The idea is:

If a word or phrase appears frequently in one article and rarely in other articles, it is considered that the word or phrase has a good ability to distinguish categories and is suitable for classification.

Details of the algorithm can be referred to: Baidu Encyclopedia

 

In ES, there is a problem with this algorithm, that is, the distribution of retrieval in distributed environment. Assuming that a cluster consists of five fragments, only one of which has a high frequency of data retrieval, then scoring separately from five documents will result in high scoring of fragmented data with high retrieval frequency and low scoring of other fragments. This result is not what we expected.

A simple solution to this problem is to use global TF (word frequency) to score documents, which will make the results more biased towards actual usage. Usually the default Query Then Fetch has met the need, unless the retrieval results differ greatly from expectations.

Query Then Fetch

  • Send the query to each shard
  • Find all matching documents and calculate scores using local Term/Document Frequencies
  • Build a priority queue of results (sort, pagination with from/to, etc)
  • Return metadata about the results to requesting node. Note, the actual document is not sent yet, just the scores
  • Scores from all the shards are merged and sorted on the requesting node, docs are selected according to query criteria
  • Finally, the actual docs are retrieved from individual shards where they reside.
  • Results are returned to the client

DFS Query Then Fetch

  • Prequery each shard asking about Term and Document frequencies
  • Send the query to each shard
  • Find all matching documents and calculate scores using global Term/Document Frequencies calculated from the prequery.
  • Build a priority queue of results (sort, pagination with from/to, etc)
  • Return metadata about the results to requesting node. Note, the actual document is not sent yet, just the scores
  • Scores from all the shards are merged and sorted on the requesting node, docs are selected according to query criteria
  • Finally, the actual docs are retrieved from individual shards where they reside.
  • Results are returned to the client

Official BLOG: https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch

Chinese translation: http://www.jianshu.com/p/c7529b98993e

 

Q_A_F and Q_T_F

This part of the data is relatively small, but we can read the source code to see the difference between the two from the use.

As can be seen in the org.elasticsearch.action.search.TransportSearchAction, when the number of shard s is 1, ES recommends converting the search type to Query And Fetch.

    @Override
    protected void doExecute(SearchRequest searchRequest, ActionListener<SearchResponse> listener) {
        // optimize search type for cases where there is only one shard group to search on
        if (optimizeSingleShard && searchRequest.searchType() != SCAN && searchRequest.searchType() != COUNT) {
            try {
                ClusterState clusterState = clusterService.state();
                String[] concreteIndices = indexNameExpressionResolver.concreteIndices(clusterState, searchRequest);
                Map<String, Set<String>> routingMap = indexNameExpressionResolver.resolveSearchRouting(clusterState, searchRequest.routing(), searchRequest.indices());
                int shardCount = clusterService.operationRouting().searchShardsCount(clusterState, concreteIndices, routingMap);
                if (shardCount == 1) {
                    // if we only have one group, then we always want Q_A_F, no need for DFS, and no need to do THEN since we hit one shard
                    searchRequest.searchType(QUERY_AND_FETCH);
                }
            } catch (IndexNotFoundException | IndexClosedException e) {
                // ignore these failures, we will notify the search response if its really the case from the actual action
            } catch (Exception e) {
                logger.debug("failed to optimize search type, continue as normal", e);
            }
        }

        AbstractSearchAsyncAction searchAsyncAction;
        switch(searchRequest.searchType()) {
            case DFS_QUERY_THEN_FETCH:
                searchAsyncAction = new SearchDfsQueryThenFetchAsyncAction(logger, searchService, clusterService,
                        indexNameExpressionResolver, searchPhaseController, threadPool, searchRequest, listener);
                break;
            case QUERY_THEN_FETCH:
                searchAsyncAction = new SearchQueryThenFetchAsyncAction(logger, searchService, clusterService,
                        indexNameExpressionResolver, searchPhaseController, threadPool, searchRequest, listener);
                break;
            case DFS_QUERY_AND_FETCH:
                searchAsyncAction = new SearchDfsQueryAndFetchAsyncAction(logger, searchService, clusterService,
                        indexNameExpressionResolver, searchPhaseController, threadPool, searchRequest, listener);
                break;
            case QUERY_AND_FETCH:
                searchAsyncAction = new SearchQueryAndFetchAsyncAction(logger, searchService, clusterService,
                        indexNameExpressionResolver, searchPhaseController, threadPool, searchRequest, listener);
                break;
            case SCAN:
                searchAsyncAction = new SearchScanAsyncAction(logger, searchService, clusterService, indexNameExpressionResolver,
                        searchPhaseController, threadPool, searchRequest, listener);
                break;
            case COUNT:
                searchAsyncAction = new SearchCountAsyncAction(logger, searchService, clusterService, indexNameExpressionResolver,
                        searchPhaseController, threadPool, searchRequest, listener);
                break;
            default:
                throw new IllegalStateException("Unknown search type: [" + searchRequest.searchType() + "]");
        }
        searchAsyncAction.start();
    }

 

In the corresponding Action

In the first stage, Q_A_F uses Fetch for data acquisition and Q_T_F uses Query for data acquisition.

In the second stage, Q_A_F carries out data merge directly, and Q_T_F traverses each shard to execute Fetch to obtain data and merge.

The result is that because Q_A_F is a node, there is no need to obtain data in two segments. Q_T_F needs to obtain the information of the document on shard first, and then acquire the data after processing.

Q_A_F

    @Override
    protected void sendExecuteFirstPhase(DiscoveryNode node, ShardSearchTransportRequest request,
                                         ActionListener<QueryFetchSearchResult> listener) {
        searchService.sendExecuteFetch(node, request, listener);
    }

    @Override
    protected void moveToSecondPhase() throws Exception {
        threadPool.executor(ThreadPool.Names.SEARCH).execute(new ActionRunnable<SearchResponse>(listener) {
            @Override
            public void doRun() throws IOException {
                boolean useScroll = request.scroll() != null;
                sortedShardList = searchPhaseController.sortDocs(useScroll, firstResults);
                final InternalSearchResponse internalResponse = searchPhaseController.merge(sortedShardList, firstResults,
                    firstResults, request);
                String scrollId = null;
                if (request.scroll() != null) {
                    scrollId = TransportSearchHelper.buildScrollId(request.searchType(), firstResults, null);
                }
                listener.onResponse(new SearchResponse(internalResponse, scrollId, expectedSuccessfulOps, successfulOps.get(),
                    buildTookInMillis(), buildShardFailures()));
            }

            @Override
            public void onFailure(Throwable t) {
                ......
            }
        });
    }

 

Q_T_F

    @Override
    protected void sendExecuteFirstPhase(DiscoveryNode node, ShardSearchTransportRequest request,
                                         ActionListener<QuerySearchResultProvider> listener) {
        searchService.sendExecuteQuery(node, request, listener);
    }

    @Override
    protected void moveToSecondPhase() throws Exception {
        boolean useScroll = request.scroll() != null;
        sortedShardList = searchPhaseController.sortDocs(useScroll, firstResults);
        searchPhaseController.fillDocIdsToLoad(docIdsToLoad, sortedShardList);

        if (docIdsToLoad.asList().isEmpty()) {
            finishHim();
            return;
        }

        final ScoreDoc[] lastEmittedDocPerShard = searchPhaseController.getLastEmittedDocPerShard(
            request, sortedShardList, firstResults.length()
        );
        final AtomicInteger counter = new AtomicInteger(docIdsToLoad.asList().size());
        for (AtomicArray.Entry<IntArrayList> entry : docIdsToLoad.asList()) {
            QuerySearchResultProvider queryResult = firstResults.get(entry.index);
            DiscoveryNode node = nodes.get(queryResult.shardTarget().nodeId());
            ShardFetchSearchRequest fetchSearchRequest = createFetchRequest(queryResult.queryResult(), entry, lastEmittedDocPerShard);
            executeFetch(entry.index, queryResult.shardTarget(), counter, fetchSearchRequest, node);
        }
    }

    void executeFetch(final int shardIndex, final SearchShardTarget shardTarget, final AtomicInteger counter,
                      final ShardFetchSearchRequest fetchSearchRequest, DiscoveryNode node) {
        searchService.sendExecuteFetch(node, fetchSearchRequest, new ActionListener<FetchSearchResult>() {
            @Override
            public void onResponse(FetchSearchResult result) {
                result.shardTarget(shardTarget);
                fetchResults.set(shardIndex, result);
                if (counter.decrementAndGet() == 0) {
                    finishHim();
                }
            }

            @Override
            public void onFailure(Throwable t) {
                // the search context might not be cleared on the node where the fetch was executed for example
                // because the action was rejected by the thread pool. in this case we need to send a dedicated
                // request to clear the search context. by setting docIdsToLoad to null, the context will be cleared
                // in TransportSearchTypeAction.releaseIrrelevantSearchContexts() after the search request is done.
                docIdsToLoad.set(shardIndex, null);
                onFetchFailure(t, fetchSearchRequest, shardIndex, shardTarget, counter);
            }
        });
    }

    private void finishHim() {
        threadPool.executor(ThreadPool.Names.SEARCH).execute(new ActionRunnable<SearchResponse>(listener) {
            @Override
            public void doRun() throws IOException {
                final InternalSearchResponse internalResponse = searchPhaseController.merge(sortedShardList, firstResults,
                    fetchResults, request);
                String scrollId = null;
                if (request.scroll() != null) {
                    scrollId = TransportSearchHelper.buildScrollId(request.searchType(), firstResults, null);
                }
                listener.onResponse(new SearchResponse(internalResponse, scrollId, expectedSuccessfulOps,
                    successfulOps.get(), buildTookInMillis(), buildShardFailures()));
                releaseIrrelevantSearchContexts(firstResults, docIdsToLoad);
            }

            @Override
            public void onFailure(Throwable t) {
                try {
                    ReduceSearchPhaseException failure = new ReduceSearchPhaseException("fetch", "", t, buildShardFailures());
                    if (logger.isDebugEnabled()) {
                        logger.debug("failed to reduce search", failure);
                    }
                    super.onFailure(failure);
                } finally {
                    releaseIrrelevantSearchContexts(firstResults, docIdsToLoad);
                }
            }
        });
    }

Posted by DynV on Tue, 11 Dec 2018 22:03:14 -0800