Flume customized elasticsearch sink source code
Recently, we tried to write messages to elasticsearch through flume, but flume did not provide support for each es version, only kept support for version 0.9. It may be because the ES version changes frequently and there are big differences between different versions, so it is impossible to develop each es version in each flume version.
Version compatibility issues
The following is how I write to es6.8 in flume version 1.7. During the implementation, I went through numerous pits. One of the next episodes is that I downloaded the latest flume source code (1.9) from the official website. Because some of the codes in es sink have changed little, I secretly thought that only ES was compiled using the latest source code development There is no problem with the sink package. Only after the development is completed can we find that the opened sink package cannot be run on 1.7. Download the flume source code of version 1.7 and make adjustments... ε = ('ο ')) alas.
Flume source code download
Flume is the top-level open source project of apache. It can be downloaded directly to the apache official website. After the source code is downloaded, it can be opened using IDE. I use Idea. Flume has two release codelines, 0.9.x and 1.x. note here that the downloaded flume source version should be consistent with the flume version you use. Flume project relies on a lot of packages, and the open source project uses the official package in Maven central warehouse, so it's a long process to import flume project for the first time, to keep the network unblocked. It took me about three hours to import all the packages.
Code modification
In the flume source code, the es sink related codes are all under the flume / flume ng SINS / flume ng elastic search sink sub module, and the code implementation is very simple.
apache-flume-1.7.0-src
|—flume-ng-elasticsearch-sink
|—client
|—ElasticSearchClient.java
|—ElasticSearchClientFactory.java
|—ElasticSearchRestClient.java
|—ElasticSearchTransportClient.java
|—NoSuchClientTypeException.java
|—RoundRobinList.java
|—AbstractElasticSearchIndexRequestBuilderFactory.java
|—ContentBuilderUtil.java
|—ElasticSearchDynamicSerializer.java
|—ElasticSearchIndexRequestBuilderFactory.java
|—ElasticSearchLogStashEventSerializer.java
|—ElasticSearchSink.java
|—ElasticSearchSinkConstants.java
|—EventSerializerIndexRequestBuilderFactory.java
|—IndexNameBuilder.java
|—SimpleIndexNameBuilder.java
|—TimeBasedIndexNameBuilder.java
|—TimestampedEvent.java
|—pom.xml
|—pom.xml
1. Modify the dependency version of es related packages in pom.xml to 6.8.5
2. Adjust the es sink code to use the 6.8.5 interface
3. Modify pom.xml of flume-ng-elasticsearch-sink sub project to increase transport dependency to provide 6.8.5 client dependency
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
</dependency>
4. Modify pom.xml of flume-ng-elasticsearch-sink sub project to increase httpclient dependency to provide 6.8.5 client dependency
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
Packaging deployment
After the modification, you need to package and deploy the flume-ng-elasticsearch-sink-1.7.0.jar package to ${FLUME_HOME}/lib /
Copy all elastic related packages from es environment to ${FLUME_HOME}/lib /
From the local copy of the package relied on by elasticsearch sink to ${FLUME_HOME}/lib /, many dependent packages are found by error one by one:
elasticsearch-6.8.5.jar
elasticsearch-cli-6.8.5.jar
elasticsearch-core-6.8.5.jar
elasticsearch-rest-client-6.8.5.ja
elasticsearch-secure-sm-6.8.5.jar
elasticsearch-ssl-config-6.8.5.jar
elasticsearch-x-content-6.8.5.jar
httpasyncclient-4.1.2.jar
jackson-core-asl-1.9.3.jar.bak
lang-mustache-client-6.8.5.jar
netty-3.9.4.Final.jar
netty-buffer-4.1.32.Final.jar
netty-codec-4.1.32.Final.jar
netty-codec-http-4.1.32.Final.jar
netty-common-4.1.32.Final.jar
netty-handler-4.1.32.Final.jar
netty-resolver-4.1.32.Final.jar
netty-transport-4.1.32.Final.jar
parent-join-client-6.8.5.jar
percolator-client-6.8.5.jar
rank-eval-client-6.8.5.jar
reindex-client-6.8.5.jar
transport-6.8.5.jar
transport-netty4-client-6.8.5.jar
Custom flume interceptor
Pit
The following are several packet shortage errors encountered:
FAIL_ON_SYMBOL_HASH_OVERFLOW
11 March 2020 12:16:31,586 ERROR [lifecycleSupervisor-1-2] (org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run:251) - Unable to start SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@29ce66c0 counterGroup:{ name:null counters:{} } } - Exception follows. java.lang.NoSuchFieldError: FAIL_ON_SYMBOL_HASH_OVERFLOW at org.elasticsearch.common.xcontent.json.JsonXContent.<clinit>(JsonXContent.java:57) at org.elasticsearch.common.xcontent.XContentType$1.xContent(XContentType.java:56) at org.elasticsearch.common.settings.Setting.arrayToParsableString(Setting.java:1318) at org.elasticsearch.common.settings.Setting.access$800(Setting.java:87) at org.elasticsearch.common.settings.Setting$ListSetting.lambda$new$0(Setting.java:1343) at org.elasticsearch.common.settings.Setting$ListSetting.innerGetRaw(Setting.java:1353) at org.elasticsearch.common.settings.Setting.getRaw(Setting.java:461) at org.elasticsearch.common.settings.Setting.lambda$listSetting$35(Setting.java:1269) at org.elasticsearch.common.settings.Setting.listSetting(Setting.java:1286) at org.elasticsearch.common.settings.Setting.listSetting(Setting.java:1269) at org.elasticsearch.transport.TransportSettings.<clinit>(TransportSettings.java:47) at org.elasticsearch.client.transport.TransportClient.newPluginService(TransportClient.java:105) at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:135) at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:288) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:128) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:114) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:104) at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.openClient(ElasticSearchTransportClient.java:206) at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.<init>(ElasticSearchTransportClient.java:79) at org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory.getClient(ElasticSearchClientFactory.java:48) at org.apache.flume.sink.elasticsearch.ElasticSearchSink.start(ElasticSearchSink.java:354) at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45) at org.apache.flume.SinkRunner.start(SinkRunner.java:79) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 11 March 2020 12:16:31,590 INFO [lifecycleSupervisor-1-2] (org.apache.flume.sink.elasticsearch.ElasticSearchSink.stop:381) - ElasticSearch sink {} stopping
Problem: inconsistent version of jackson package relied on
Solution: all jackson packages used in local packaging need to be replaced with flume environment
ClassNotFound:io.netty.util.NettyRuntime
Problem: missing nettyCommon package
Solution: directly copy all dependent packages under the netty directory of the local warehouse to the flume environment
ClassNotFound:SslConfigurationLoader
Problem: missing elasticsearch SSL config package
Solution: elasticsearch all packages need to be added to flume
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-ssl-config</artifactId>
<version>6.7.1</version>
</dependency>
ClassNotFound:SchemeIOSessionStrategy
unner: { policy:org.apache.flume.sink.DefaultSinkProcessor@6d310488 counterGroup:{ name:null counters:{} } } - Exception follows. java.lang.NoClassDefFoundError: org/apache/http/nio/conn/SchemeIOSessionStrategy at org.elasticsearch.index.reindex.ReindexPlugin.getSettings(ReindexPlugin.java:94) at org.elasticsearch.plugins.PluginsService.lambda$getPluginSettings$0(PluginsService.java:89) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1374) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.elasticsearch.plugins.PluginsService.getPluginSettings(PluginsService.java:89) at org.elasticsearch.client.transport.TransportClient.buildTemplate(TransportClient.java:147) at org.elasticsearch.client.transport.TransportClient.<init>(TransportClient.java:288) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:128) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:114) at org.elasticsearch.transport.client.PreBuiltTransportClient.<init>(PreBuiltTransportClient.java:104) at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.openClient(ElasticSearchTransportClient.java:206) at org.apache.flume.sink.elasticsearch.client.ElasticSearchTransportClient.<init>(ElasticSearchTransportClient.java:79) at org.apache.flume.sink.elasticsearch.client.ElasticSearchClientFactory.getClient(ElasticSearchClientFactory.java:48) at org.apache.flume.sink.elasticsearch.ElasticSearchSink.start(ElasticSearchSink.java:354) at org.apache.flume.sink.DefaultSinkProcessor.start(DefaultSinkProcessor.java:45) at org.apache.flume.SinkRunner.start(SinkRunner.java:79) at org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:249) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.ClassNotFoundException: org.apache.http.nio.conn.SchemeIOSessionStrategy at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 29 more
Solution: httpaasyncclient package needs to be copied to flume
Two types of clients
flume elasticsearch sink uses two kinds of clients to access es:
PreBuiltTransportClient
transportClient using interface 9300
HttpClient
restClient interface 9200