Flink sink Elasticsearch prevents task interruption

Keywords: Big Data ElasticSearch flink

preface

The Flink real-time computing platform has been built since half a year. Elasticsearch has been used in some storage layers and contacted Flink from scratch. In the past half a year, many pits have been encountered, which has changed from traditional development to big data development. Elasticsearch contains a variety of fuses to prevent OOM. Due to the high cost caused by the current business query method (see allow_expensive_queries), a query may cause the service to fuse. At this time, it may cause the real-time task sink elastic search request to fuse.
Of course, Flink Connector provides several failure handling mechanisms

  1. Ignorefailurehandler: all exceptions of sink elasticsearch Connector will be ignored;
  2. NoOpFailureHandler: does not handle any exceptions, only outputs exception stack information (default);
  3. RetryRejectedExecutionFailureHandler: when a specific exception is encountered, it will be retried, including the EsRejectedExecutionException class and its subclasses.

When we encounter frequent updates, using ignorefailurehandler will not affect the Flink task when we fail to write ES. Of course, when we encounter sensitive statistics, we need to retry the failed result set,
It needs to be handled in conjunction with the RetryRejectedExecutionFailureHandler. Only the EsRejectedExecutionException class and its subclasses will be handled in the source code. Of course, exceptions of this type belong to ElasticsearchStatusException exceptions, which have no relationship. To prevent Flink from dying due to the fusing of elasticsearch cluster, we need to do specific processing and rewrite ActionRequestFailureHandler.

Rewrite processing class

Policy class

For better extension, we first define a policy class ElasticsearchExceptionHandlerStrategy. The code is as follows:

/**
 * @author liweigao
 * @date 2021/12/2 11:17 PM
 */
@Getter
public enum ElasticsearchExceptionHandlerStrategy {

    /**
     * By default, null is not processed or the parent class is used by default, and the implementation is determined by the handler
     */
    DEFAULT(Lists.newArrayList()),

    /***
     * All exception Throwable levels
     * Need attention
     */
    ALL_EXCEPTION(Lists.newArrayList(Throwable.class)),

    /**
     * @see org.elasticsearch.ElasticsearchException
     * @see org.elasticsearch.ElasticsearchException.ElasticsearchExceptionHandle
     * <p>
     * elasticsearch Encapsulated exception
     */
    ELASTICSEARCH_EXCEPTION(Lists.newArrayList(ElasticsearchException.class)),

    /**
     * @see org.elasticsearch.ElasticsearchStatusException
     * @see org.elasticsearch.rest.RestStatus
     * @see EsRejectedExecutionException
     * <p>
     * elasticsearch Abnormal state
     * todo It can be refined according to the corresponding exceptions~
     * Link status error for communication (e.g. 429 error caused by es fusing)
     */
    ELASTICSEARCH_STATUS_AND_REJECTED_EXCEPTION(Lists.newArrayList(org.elasticsearch.ElasticsearchStatusException.class,
                                                EsRejectedExecutionException .class)),;

    final List<Class<? extends Throwable>> exceptionClass;

    ElasticsearchExceptionHandlerStrategy(List<Class<? extends Throwable>> exceptionClass) {
        this.exceptionClass = exceptionClass;
    }

}

Four strategies are defined

  1. ALL_EXCEPTION all exceptions
  2. ELASTICSEARCH_EXCEPTION ELASTICSEARCH_EXCEPTION elasticsearch all exceptions
  3. ELASTICSEARCH_STATUS_AND__EXCEPTION EsRejectedExecutionException and ElasticsearchStatusException exceptions
  4. DEFAULT is null by DEFAULT, or the parent class is used by DEFAULT. The implementation is determined by the handler

The ElasticsearchExceptionHandlerStrategy enumeration class can be extended according to the actual business.

Override exception handling class
  • RetryExecutionFailureHandler: a specific exception failed to retry. If the policy is DEFAULT, it will be handled by the parent class (retryejectedexecutionfailurehandler). The code is as follows:
/**
 * The exception handling can be retried and handled according to {@ link ElasticsearchExceptionHandlerStrategy}
 *
 * @author liweigao
 * @date 2021/12/2 11:27 PM
 */
@Slf4j
public class RetryExecutionFailureHandler extends RetryRejectedExecutionFailureHandler {

    private static final long serialVersionUID = -1;

    private ElasticsearchExceptionHandlerStrategy strategy;

    @Nullable
    public RetryExecutionFailureHandler(ElasticsearchExceptionHandlerStrategy strategy) {
        this.strategy = strategy;
    }

    @Override
    public void onFailure(ActionRequest action, Throwable failure, int restStatusCode, RequestIndexer indexer) throws Throwable {

        if (Objects.isNull(strategy) || CollectionUtils.isEmpty(strategy.getExceptionClass())) {
            super.onFailure(action, failure, restStatusCode, indexer);
            return;
        }

        log.error("Failed Elasticsearch item request: {}", failure.getMessage(), failure);
        for (Class<? extends Throwable> exceptionClass : strategy.getExceptionClass()) {
            if (ExceptionUtils.findThrowable(failure, exceptionClass).isPresent()) {
                indexer.add(action);
                return;
            }
        }
        // rethrow all other failures
        throw failure;

    }
}

  • Ignoreexceptionfailurehandler: specific exceptions are ignored. If the policy is DEFAULT, it is similar to ignorefailurehandler. The processing code is as follows:
/**
 * Ignore specific exceptions. If not specified, all exceptions will be ignored by default
 *
 * @author liweigao
 * @date 2021/12/2 11:35 PM
 */
@Slf4j
public class IgnoringExceptionFailureHandler implements ActionRequestFailureHandler {

    private static final long serialVersionUID = -1;

    private ElasticsearchExceptionHandlerStrategy strategy;

    @Override
    public void onFailure(ActionRequest action, Throwable failure, int restStatusCode, RequestIndexer indexer) throws Throwable {

        if (Objects.isNull(strategy) || CollectionUtils.isEmpty(strategy.getExceptionClass())) {
            return;
        }
        log.error("Failed Elasticsearch item request: {}", failure.getMessage(), failure);
        for (Class<? extends Throwable> exceptionClass : strategy.getExceptionClass()) {
            if (ExceptionUtils.findThrowable(failure, exceptionClass).isPresent()) {
                return;
            }
        }
        // rethrow all other failures
        throw failure;

    }
}

Sink ES code detailed configuration

The pseudo code is as follows:

ElasticsearchSink.Builder<Object> builder = new ElasticsearchSink.Builder<Object>(httpHosts,
                new ElasticsearchSinkFunction(){...});
//Configure batch submission
builder.setBulkFlushBackoff(true);
//Set retry times
builder.setBulkFlushBackoffRetries(2);
//Set retry interval
builder.setBulkFlushBackoffDelay(2000L);
//Set the retry policy CONSTANT: CONSTANT eg: the retry interval is 2s, and three retries will be performed in 2s - > 4S - > 6S; External: index eg: the retry interval is 2s, and 3 retries will be performed in 2s - > 4S - > 8s
builder.setBulkFlushBackoffType(ElasticsearchSinkBase.FlushBackoffType.CONSTANT);
//Set the maximum amount of batch submitted data
builder.setBulkFlushMaxSizeMb(10);
//Set batch submission interval
builder.setBulkFlushInterval(2000L);
//Set the maximum number of batch submissions
builder.setBulkFlushMaxActions(1000);
//Set retry mechanism
builder.Builder<Object>.setFailureHandler(new RetryExecutionFailureHandler(ElasticsearchExceptionHandlerStrategy.DEFAULT));

Elasticsearch failure retry mechanism depends on checkpoint. See the source code: ElasticsearchSinkBase class

summary

The above humble opinion, after all, just into the pit, welcome to exchange ~ recommend a wave Flink's publishing platform . Remember: there is no optimal public configuration, and the corresponding effect can only be achieved according to specific scenarios.

Posted by banzaimonkey on Mon, 06 Dec 2021 15:50:31 -0800