04 - system built-in interceptor usage & custom interceptor

Keywords: Hadoop

Use of system built-in interceptor

During Flume operation, Flume has the ability to modify / delete events during the operation phase, which is realized through interceptors. The interceptors have the following characteristics:

  • The interceptor needs to implement the org.apache.flume.interceptor.Interceptor interface.
  • The interceptor can modify or delete events based on any condition selected by the developer in the selector.
  • The interceptor adopts the responsibility chain mode, and multiple interceptors can intercept in a specified order.
  • The list of events returned by an interceptor is passed to the next interceptor in the chain.
  • If an interceptor needs to delete events, it only needs to include no events to delete in the returned event set.

Common interceptors:

  1. Timestamp Interceptor: a timestamp interceptor that adds the current timestamp (in milliseconds) to the events header. The name of the key is timestamp and the value is the current timestamp. It is not used very much
  2. Host Interceptor: host name interceptor. Add the host name or IP address of the Flume agent into the events header. The key name is: host (can also be customized)
  3. Static Interceptor: static interceptor, which is used to add a set of static key s and value s to the events header.

Case demonstration

Through the time interceptor, the data source is SyslogTcp, the transmission channel mode is FileChannel, and the final output destination is HDFS

Configuration scheme

[root@tianqinglong01 flumeconf]# vi ts.conf
a1.sources = r1
a1.channels = c1
a1.sinks = s1

a1.sources.r1.type=syslogtcp
a1.sources.r1.host=tianqingglong01
a1.sources.r1.port=6666
a1.sources.r1.interceptors=i1 i2 i3
a1.sources.r1.interceptors.i1.type=timestamp
a1.sources.r1.interceptors.i1.preserveExisting=false
a1.sources.r1.interceptors.i2.type=host
a1.sources.r1.interceptors.i2.preserveExisting=false
a1.sources.r1.interceptors.i2.useIP=true
a1.sources.r1.interceptors.i2.hostHeader=hostname
a1.sources.r1.interceptors.i3.type=static
a1.sources.r1.interceptors.i3.preserveExisting=false
a1.sources.r1.interceptors.i3.key=hn
a1.sources.r1.interceptors.i3.value=tianqinglong01

a1.channels.c1.type=memory

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://tianqinglong01:8020/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=%{hostname}
a1.sinks.s1.hdfs.fileSuffix=.log
ai.sinks.s1.hdfs.inuseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundvalue=1
ai.sinks.s1.hdfs.roundUnit=second
ai.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

Start agent service

[root@tianqinglong flumeconf]# flume-ng agent -c ../conf -f ./ts.conf -n a1 -Dflume.root.logger=INFO,console

test

[root@tianqinglong ~]# echo "hello world hello interceptor" | nc tianqinglong01 6666

# nc requires installation
yum install -y nmap-ncat

Use of selectors

explain

The Channel selector in Flume acts on the source phase and is a component that determines which Channel the specific events accepted by the source are written to. They tell the Channel processor and then it writes the events to the Channel.

Interaction of various components in Agent

Because Flume is not a two-phase commit, events are written to one Channel, and then events are committed before writing to the next Channel. If an exception occurs when writing to one Channel, the same events previously written to other channels cannot be rolled back. When such an exception occurs, the Channel processor throws a channelexception and the transaction fails. If the Source attempts to write to another Channel again Write the same event (in most cases, it will be written again, and only Syslog, Exec and other sources cannot retry because there is no way to generate the same data). Duplicate events will be written to the Channel, and the previous submission is successful, so duplication occurs in Flume.

The Channel selector is configured through the Channel processor. The Channel selector can specify that one group of channels is required and another group is optional.

Flume classifies two kinds of selectors. If no selector is specified in the Source configuration, the copy Channel selector will be used automatically

  • replicating: this selector copies each event to all Channels specified by the Channels parameter of the Source.
  • multiplexing: it is a Channel selector specially used for dynamically routing events. It routes based on the value of a specific event header by selecting which Channel the events should be written to

Case demonstration: replicating selector

Configuration scheme

[root@tianqinglong01 flumeconf]# vi rep.conf
a1.sources = r1
a1.channels = c1 c2
a1.sinks = s1 s2

a1.sources.r1.type=syslogtcp
a1.sources.r1.host = tianqinglong01
a1.sources.r1.port = 6666
a1.sources.r1.selector.type=replicating

a1.channels.c1.type=memory

a1.channels.c2.type=memory

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://tianqinglong01:8020/flume/%Y/%m/%d/rep
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inuseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundvalue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://tianqinglong01:8020/flume/%Y/%m/%d/rep
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
ai.sinks.s2.hdfs.fileType=Datastream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundunit=second
a1.sinks. s2.hdfs.useLocalTimeStamp=true

a1.sources.r1.channels=c1 c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2

Start agent service

[root@tianqinglong flumeconf]# flume-ng agent -c ../conf -f ./rep.conf -n a1 -Dflume.root.logger=INFO,console

test

[root@tianqinglong ~]# echo "hello world hello interceptor" | nc tianqinglong01 6666

# nc requires installation
yum install -y nmap-ncat

Case demonstration: Multiplexing selector

Configuration scheme

[root@tianqinglong01 flumeconf]# vi mul.conf
a1.sources = r1
a1.channels = c1 c2
a1.sinks = s1 s2

a1.sources.r1.type=syslogtcp
a1.sources.r1.host = tianqinglong01
a1.sources.r1.port = 6666
a1.sources.r1.selector.type=replicating
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.USER = c1
a1.sources.r1.selector.mapping.ORDER = c2
a1.sources.r1.selector.default = c1


a1.channels.c1.type=memory

a1.channels.c2.type=memory

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://tianqinglong01:8020/flume/%Y/%m/%d/mul
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inuseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundvalue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://tianqinglong01:8020/flume/%Y/%m/%d/mul
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
ai.sinks.s2.hdfs.fileType=Datastream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundunit=second
a1.sinks. s2.hdfs.useLocalTimeStamp=true

a1.sources.r1.channels=c1 c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2

Start agent service

[root@tianqinglong flumeconf]# flume-ng agent -c ../conf -f ./mul.conf -n a1 -Dflume.root.logger=INFO,console

test

[root@tianqinglong ~]# curl -X POST -d '[{"headers":{"state":"ORDER"},"body":"this is my multiplex to c2"]' http://tianqinglong01:6666
[root@tianqinglong ~]# curl -X POST -d '[{"headers":{"state":"ORDER"},"body":"this is my content"]' http://tianqinglong01:6666

custom interceptor

demand

In order to improve Flume Scalability of,The user can define a ballast by himself

take event of body The data in, which starts with a number, is stored as hdfs://tianqinglong01:8020/flume/number.log s1
 take event of body The data in, which starts with a letter, is stored as hdfs://tianqinglong01:8020/flume/character.log s2
 take event of body The data in, starting with others, is almost stored as hdfs: //tianqinglong01:8020/flume/other.log s3

pom.xml

<dependency>
	<groupId>org.apache.flume</groupId>
    <artifactId>flume-ng-core</artifactId>
	<version>1.8.0</version>
</dependency>

code

package com.qf;

import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;

import java.util.List;

public class MyIntercepter implements Interceptor {
    private static final String LOCATION_KEY="location";
    private static final String LOCATION_NUMBER="number";
    private static final String LOCATION_CHARACTER="character";
    private static final String LOCATION_OTHER="other";


    public Event intercept(Event event) {
        byte[] body = event.getBody();
        if(body[0]>='0'&&body[0]<='9'){
            event.getHeaders().put(LOCATION_KEY,LOCATION_NUMBER);
        }
        else if(body[0]>='a'&&body[0]<='z' ||body[0]>='A'&&body[0]<='Z'){
            event.getHeaders().put(LOCATION_KEY,LOCATION_CHARACTER);

        }else {
            event.getHeaders().put(LOCATION_KEY,LOCATION_OTHER);

        }
        return event;
    }

    public List<Event> intercept(List<Event> events) {
        for (Event event: events
             ) {
            intercept(event);
        }
        return events;
    }

    public void initialize() {

    }
    public void close() {

    }
    public static class MyBuilder implements Builder{

        public Interceptor build() {
            return new MyIntercepter();
        }

        public void configure(Context context) {

        }
    }
}

Package upload

use maven Package the interceptor, and then combine the package with the dependent fastjson Upload together to flume lib Directory

Preparation of programmes

a1.sources=r1
al.channels=c1 c2 c3
a1.sinks=s1 s2 s3
a1.sources.r1.channels=c1 c2 c3
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2
a1.sinks.s3.channel=c3
#Set the properties of source
a1.sources.ri.type=syslogtcp
a1.sources.r1.host=tianqinglong01
a1.sources.r1.port=12345
#Set interceptor
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=com.qf.flume. MyInterceptor$MyBuilder
#Set the properties of the selector
a1.sources.r1.selector.type=multiplexing
a1.sources.r1.selector.header=location
a1.sources.r1.selector.mapping.number=c1
a1.sources.r1.selector.mapping.character=c2
a1.sources.ri.selector.mapping.other=c3
#Set channel properties
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c3.type=memory
a1.channels.c3.capacity=1000
#Set the properties of sink
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs: //tianqinglong01:8020/flume/customInterceptor/s1/%Y-%m-%d-%H
a1.sinks.s1.hdfs.useLocalTimeStamp=true
a1.sinks.s1.hdfs.filePrefix=regex
a1.sinks.s1.hdfs.rollInterval=0
a1.sinks.s1.hdfs.rollSize=102400
a1.sinks.s1.hdfs.rollCount=30
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs: //tianqinglong01:8020/flume/customInterceptor/s1/%Y-%m-%d-%H
a1.sinks.s2.hdfs.useLocalTimeStamp=true
a1.sinks.s2.hdfs.filePrefix=regex
a1.sinks.s2.hdfs.rollInterval=0
a1.sinks.s2.hdfs.rollSize=102400
a1.sinks.s2.hdfs.rollCount=30
a1.sinks.s2.hdfs.fileType=DataStream
a1.sinks.s2.hdfs.writeFormat=Text

a1.sinks.s3.type=hdfs
a1.sinks.s3.hdfs.path=hdfs: //tianqinglong01:8020/flume/customInterceptor/s1/%Y-%m-%d-%H
a1.sinks.s3.hdfs.useLocalTimeStamp=true
a1.sinks.s3.hdfs.filePrefix=regex
a1.sinks.s3.hdfs.rollInterval=0
a1.sinks.s3.hdfs.rollSize=102400
a1.sinks.s3.hdfs.rollCount=30
a1.sinks.s3.hdfs.fileType=DataStream
a1.sinks.s3.hdfs.writeFormat=Text

Start agent

[root@tianqinglong01 flumeconf]# flume-ng agent -c ../conf/ -f ./mytest.conf -n -a1 -Dflume.root.logger=INFO,console

test

echo "hello world" | nc  tianqinglong01 12345
echo "123 hello world" | nc  tianqinglong01 12345
echo ".123 hello world" | nc  tianqinglong01 12345

Posted by odtaa on Fri, 12 Nov 2021 20:08:28 -0800