Flume implements writing es

Keywords: Java ElasticSearch Apache Netty

Flume customized elasticsearch sink source code

Recently, we tried to write messages to elasticsearch through flume, but flume did not provide support for each es version, only kept support for version 0.9. It may be because the ES version changes frequently and there are big differences between different versions, so it is impossible to develop each es version in each flume version.

Version compatibility issues

The following is how I write to es6.8 in flume version 1.7. During the implementation, I went through numerous pits. One of the next episodes is that I downloaded the latest flume source code (1.9) from the official website. Because some of the codes in es sink have changed little, I secretly thought that only ES was compiled using the latest source code development There is no problem with the sink package. Only after the development is completed can we find that the opened sink package cannot be run on 1.7. Download the flume source code of version 1.7 and make adjustments... ε = ('ο ')) alas.

 

Flume source code download

Flume is the top-level open source project of apache. It can be downloaded directly to the apache official website. After the source code is downloaded, it can be opened using IDE. I use Idea. Flume has two release codelines, 0.9.x and 1.x. note here that the downloaded flume source version should be consistent with the flume version you use. Flume project relies on a lot of packages, and the open source project uses the official package in Maven central warehouse, so it's a long process to import flume project for the first time, to keep the network unblocked. It took me about three hours to import all the packages.

 

Code modification

In the flume source code, the es sink related codes are all under the flume / flume ng SINS / flume ng elastic search sink sub module, and the code implementation is very simple.

apache-flume-1.7.0-src

|—flume-ng-elasticsearch-sink

|—client

         |—ElasticSearchClient.java

  |—ElasticSearchClientFactory.java

  |—ElasticSearchRestClient.java

  |—ElasticSearchTransportClient.java

  |—NoSuchClientTypeException.java

  |—RoundRobinList.java

|—AbstractElasticSearchIndexRequestBuilderFactory.java

|—ContentBuilderUtil.java

|—ElasticSearchDynamicSerializer.java

|—ElasticSearchIndexRequestBuilderFactory.java

|—ElasticSearchLogStashEventSerializer.java

|—ElasticSearchSink.java

|—ElasticSearchSinkConstants.java

|—EventSerializerIndexRequestBuilderFactory.java

|—IndexNameBuilder.java

|—SimpleIndexNameBuilder.java

|—TimeBasedIndexNameBuilder.java

|—TimestampedEvent.java

|—pom.xml

|—pom.xml

1. Modify the dependency version of es related packages in pom.xml to 6.8.5

2. Adjust the es sink code to use the 6.8.5 interface

3. Modify pom.xml of flume-ng-elasticsearch-sink sub project to increase transport dependency to provide 6.8.5 client dependency

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>transport</artifactId>
</dependency>

 

4. Modify pom.xml of flume-ng-elasticsearch-sink sub project to increase httpclient dependency to provide 6.8.5 client dependency

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
</dependency>

 

Packaging deployment

After the modification, you need to package and deploy the flume-ng-elasticsearch-sink-1.7.0.jar package to ${FLUME_HOME}/lib /

Copy all elastic related packages from es environment to ${FLUME_HOME}/lib /

From the local copy of the package relied on by elasticsearch sink to ${FLUME_HOME}/lib /, many dependent packages are found by error one by one:

elasticsearch-6.8.5.jar
elasticsearch-cli-6.8.5.jar
elasticsearch-core-6.8.5.jar
elasticsearch-rest-client-6.8.5.ja
elasticsearch-secure-sm-6.8.5.jar
elasticsearch-ssl-config-6.8.5.jar
elasticsearch-x-content-6.8.5.jar
httpasyncclient-4.1.2.jar
jackson-core-asl-1.9.3.jar.bak
lang-mustache-client-6.8.5.jar
netty-3.9.4.Final.jar
netty-buffer-4.1.32.Final.jar
netty-codec-4.1.32.Final.jar
netty-codec-http-4.1.32.Final.jar
netty-common-4.1.32.Final.jar
netty-handler-4.1.32.Final.jar
netty-resolver-4.1.32.Final.jar
netty-transport-4.1.32.Final.jar
parent-join-client-6.8.5.jar
percolator-client-6.8.5.jar
rank-eval-client-6.8.5.jar
reindex-client-6.8.5.jar
transport-6.8.5.jar
transport-netty4-client-6.8.5.jar

 

 

Custom flume interceptor

Pit

The following are several packet shortage errors encountered:

FAIL_ON_SYMBOL_HASH_OVERFLOW

11 March 2020 12:16:31,586 ERROR [lifecycleSupervisor-1-2] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:251) - Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@29ce66c0 counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW
at org.elasticsearch.common.xcontent.json.JsonXContent.<clinit>(JsonXContent.java:57)
at org.elasticsearch.common.xcontent.XContentType$1.xContent(XContentType.java:56)
at org.elasticsearch.common.settings.Setting.arrayToParsableString(Setting.java:1318)
at org.elasticsearch.common.settings.Setting.access$800(Setting.java:87)
at org.elasticsearch.common.settings.Setting$ListSetting.lambda$new$0(Setting.java:1343)
at org.elasticsearch.common.settings.Setting$ListSetting.innerGetRaw(Setting.java:1353)
at org.elasticsearch.common.settings.Setting.getRaw(Setting.java:461)
at org.elasticsearch.common.settings.Setting.lambda$listSetting$35(Setting.java:1269)
at org.elasticsearch.common.settings.Setting.listSetting(Setting.java:1286)
at org.elasticsearch.common.settings.Setting.listSetting(Setting.java:1269)
at org.elasticsearch.transport.TransportSettings.<clinit>(TransportSettings.java:47)
at org.elasticsearch.client.transport.TransportClient.newPluginService(TransportClient.java:105)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:135)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:288)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:128)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:114)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:104)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.openClient(ElasticSearchTransportClient.java:206)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.<init>(ElasticSearchTransportClient.java:79)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory.getClient(ElasticSearchClientFactory.java:48)
at org.apache.flume.sink.elasticsearch.ElasticSearchSink.start(ElasticSearchSink.java:354)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
11 March 2020 12:16:31,590 INFO [lifecycleSupervisor-1-2] (org.apache.flume.sink.elasticsearch.ElasticSearchSink.stop:381) - ElasticSearch sink {} stopping

 

Problem: inconsistent version of jackson package relied on

Solution: all jackson packages used in local packaging need to be replaced with flume environment

 

ClassNotFound:io.netty.util.NettyRuntime

Problem: missing nettyCommon package

Solution: directly copy all dependent packages under the netty directory of the local warehouse to the flume environment

 

 

ClassNotFound:SslConfigurationLoader

Problem: missing elasticsearch SSL config package

Solution: elasticsearch all packages need to be added to flume

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-ssl-config</artifactId>
    <version>6.7.1</version>
</dependency>

 

ClassNotFound:SchemeIOSessionStrategy

 

unner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6d310488 counterGroup:{ name:null counters:{} } } - Exception follows.
java.lang.NoClassDefFoundError: org/apache/http/nio/conn/SchemeIOSessionStrategy
at org.elasticsearch.index.reindex.ReindexPlugin.getSettings(ReindexPlugin.java:94)
at org.elasticsearch.plugins.PluginsService.lambda$getPluginSettings$0(PluginsService.java:89)
at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267)
at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374)
at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
at org.elasticsearch.plugins.PluginsService.getPluginSettings(PluginsService.java:89)
at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:147)
at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:288)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:128)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:114)
at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:104)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.openClient(ElasticSearchTransportClient.java:206)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.<init>(ElasticSearchTransportClient.java:79)
at org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory.getClient(ElasticSearchClientFactory.java:48)
at org.apache.flume.sink.elasticsearch.ElasticSearchSink.start(ElasticSearchSink.java:354)
at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45)
at org.apache.flume.SinkRunner.start(SinkRunner.java:79)
at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.http.nio.conn.SchemeIOSessionStrategy
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 29 more

 

Solution: httpaasyncclient package needs to be copied to flume

 

Two types of clients

flume elasticsearch sink uses two kinds of clients to access es:

PreBuiltTransportClient

transportClient using interface 9300
HttpClient

restClient interface 9200



Posted by saikiran on Sun, 15 Mar 2020 22:02:52 -0700