Flume file configuration method and maven

Keywords: Hadoop Maven flume

Configuration introduction

in general:

source is used to receive data
sinks are used to send data
channel is used to cache data

The following are some related types that will be used later.
1.Source component type (used to receive data sent from a certain place)

Netcat Source
Accept the request data from the data client, which is often used in test development

Exec Source
Run a given unix instruction and take the execution result of the instruction as the data source

Spooling Directory Source
Monitors new files in the specified directory and parses events from new files that appear

Kafka Source
Get data from Kafka cluster

Sequence Generator Source
Sequence generator, counter starts from 0 and + 1 to long.max each time_ VALUE

Avro Source
Accept data requested from Avro Client, similar to Netcat Source
Mobile phones commonly used to build Flume clusters and RPC communication data

2.Channel component type (used to cache data)
Memory Channel
Cache Event objects into memory
Advantages: fast
Disadvantages: there is a risk of data loss

JDBC Channel
Save the Event object to DB. Currently, only Derby is supported
Advantages: Safety
Disadvantages: low efficiency

File Channel
Save the Event object to a file
Advantages: Safety
Disadvantages: low efficiency

Kafka Channel
Save Event writes to Kafka cluster
Advantages: high availability, data backup

3.Sink component (mainly output files, or output to other hosts or send information to other places)
Logger Sink
Output the collected data in the form of log

HDFS Sink
The collected data is finally written out to the HDFS distributed file system, which supports two file formats: text and sequence
Note: the file format is DataStream, and the collected data will not be serialized
A data file directory is generated every ten minutes

File Roll Sink
Based on the sink output of file scrolling, the collected data is written and saved to the local file system

Null Sink
All data collected will be discarded

HBaseSinks
Write out and save the collected data to HBase non relational database

Install flume (version 1.9 is used here)

This document is recommended for versions above 1.6, because some methods need to be debugged by themselves, and the old version will be more troublesome.

1. Download the required plug-ins before installation to simply test flume.

yum install -y nc  Download plug-ins

Download the installation package we want to use
flume official download link
Installation package (click to download)

Upload our flume and then unzip it. If you don't understand these steps, you can query the Internet by yourself.
2. Configuration environment

export FLUME_HOME=Yours flume route
export PATH=$PATH:$FLUME_HOME/bin


3. Start hadoop. It is recommended to use fully distributed, which involves multiple agent s.
4. Method of starting flume file

[root@master conf]# flume-ng agent -n a1 -c  ./ -f /example.conf -Dflume.root.logger=INFO,console

Or add in the flume.env.sh file

export JAVA_OPTS="-Dflume.root.logger=INFO,console"

Various profiles

Note that different machines with files 1 and 2 should be separated. Pay attention to the contents of the document.

1. Tip:

At present, I use two machines with Hadoop 3. X and flume 1.9 respectively
192.168.120.129 is my host
192.168.120.134 is my copilot

2. Various configuration methods

conf directory in flume directory

1. Non persistent saved data: file name example.conf

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Configure sink type as sink
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

2. Persistent save data

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1 c2
#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Configure sink type as sink
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = file
a1.channels.c2.checkpointDir = /usr/local/src/flume/checkpoint
a1.channels.c2.dataDirs = /usr/local/src/flume/data

#Bind source and sink to channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c2

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

3. Single log monitoring

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

#Configure soure type to exec
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F app.log

#Configure sink type as sink
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

4. Multiple log monitoring

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.positionFile =/usr/local/src/flume/conf/position.json
a1.sources.r1.filegroups.f1 =  /usr/local/src/flume/conf/app.log
a1.sources.r1.filegroups.f2 =  /usr/local/src/flume/conf/logs/.*log

a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

5. Multi agent monitoring

Document I:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1 k2
a1.channels = c1 c2

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Configure sink type as sink
a1.sinks.k1.type = logger
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 55555

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory

#Bind source and sink to channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

Document II:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1 r2
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.positionFile =/usr/local/src/flume/conf/position.json
a1.sources.r1.filegroups.f1 =  /usr/local/src/flume/conf/app.log
a1.sources.r1.filegroups.f2 =  /usr/local/src/flume/conf/logs/.*log

a1.sources.r2.type = avro
a1.sources.r2.bind = 192.168.120.129
a1.sources.r2.port = 55555

#Configure sink type as sink
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sources.r2.channels = c1

Newly open the session window of file 1 and link flume.
NC localhost 44444 (port number)
Then just type in the text

6. Interceptor:

Document 1:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1 k2
a1.channels = c1 c2

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
#Add interceptor
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host

#Configure sink type as sink
a1.sinks.k1.type = logger

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 55555

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory

#Bind source and sink to channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1

Document 2:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1 r2
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.positionFile =/usr/local/src/flume/conf/position.json
a1.sources.r1.filegroups.f1 =  /usr/local/src/flume/conf/app.log
a1.sources.r1.filegroups.f2 =  /usr/local/src/flume/conf/logs/.*log

a1.sources.r2.type = avro
a1.sources.r2.bind = 192.168.120.129
a1.sources.r2.port = 55555

#Configure sink type as sink
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sources.r2.channels = c1

Newly open the session window of file 1 and link flume.
NC localhost 44444 (port number)
Then just type in the text

7. Use of interceptors:

This looks familiar

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1 k2
a1.channels = c1 c2

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Add interceptor
a1.sources.r1.interceptors = i1 i2 i3 i4 i5
a1.sources.r1.interceptors.i1.type = host
a1.sources.r1.interceptors.i2.type = timestamp

#custom interceptor 
a1.sources.r1.interceptors.i3.type = static
a1.sources.r1.interceptors.i3.key = datacenter
a1.sources.r1.interceptors.i3.value = beijing

#Add UUID
a1.sources.r1.interceptors.i4.type = org.apache.flume.sink.solr.morphline.UUIDInterceptor$Builder

#Hide text
a1.sources.r1.interceptors.i5.type = search_replace
a1.sources.r1.interceptors.i5.searchPattern = \\d{6}
a1.sources.r1.interceptors.i5.replaceString = ******

#Configure sink type as sink
a1.sinks.k1.type = logger
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 55555

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory

#Bind source and sink to channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

8. Custom interceptor

maven is used here
Relevant files are placed in the code test resource directory link
MyHostInterceptor.java is used
Used are:

Package name+file name
Mzj.Demo.MyHostInterceptor$Builer

The packaged jar package is placed in the flume/lib directory

File 1:
#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1 r2
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.positionFile =/usr/local/src/flume/conf/position.json
a1.sources.r1.filegroups.f1 =  /usr/local/src/flume/conf/app.log
a1.sources.r1.filegroups.f2 =  /usr/local/src/flume/conf/logs/.*log

a1.sources.r2.type = avro
a1.sources.r2.bind = 192.168.120.129
a1.sources.r2.port = 55555 
            
#Add custom interceptor 
a1.sources.r2.interceptors = i1
a1.sources.r2.interceptors.i1.type =  Mzj.Demo.MyHostInterceptor$Builder
           
a1.sinks.k1.type = logger

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or Events sent by > to sink in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 
    
#Bind source and sink to channel
a1.sources.r2.channels = c1
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
#Define the agent name as a1

File 2:

#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1 k2
a1.channels = c1 c2

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = host

a1.sinks.k1.type = logger

a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 55555

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory

##Bind source and sink to channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

Open a new file 2 session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

9. Pipe selector

These are all on the same machine
agent1:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources = r1
a1.sinks = k1 k2 k3 k4
a1.channels = c1 c2 c3 c4

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Configure the sink1 type as Logger
a1.sinks.k1.type = logger

#Configure sink2,3,4 as Avro
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 4040

a1.sinks.k3.type = avro
a1.sinks.k3.hostname = 192.168.120.129
a1.sinks.k3.port = 4041

a1.sinks.k4.type = avro
a1.sinks.k4.hostname = 192.168.120.129
a1.sinks.k4.port = 4042

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source > or sent to Events in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

a1.channels.c3.type = memory
a1.channels.c3.capacity = 1000
a1.channels.c3.transactionCapacity = 100

a1.channels.c4.type = memory
a1.channels.c4.capacity = 1000
a1.channels.c4.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1 c2 c3 c4
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3
a1.sinks.k4.channel = c4

#Channel selector
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = state
a1.sources.r1.selector.mapping.CZ = c1 c2
a1.sources.r1.selector.mapping.US = c1 c3
a1.sources.r1.selector.default = c1 c4

#Interceptor
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = static
a1.sources.r1.interceptors.i1.key = state
a1.sources.r1.interceptors.i1.value = US

agent2:

a2.sources = r1
a2.sinks = k1
a2.channels = c1

a2.sources.r1.type = avro
a2.sources.r1.bind = 192.168.120.129
a2.sources.r1.port = 4040

a2.sinks.k1.type = logger

a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapactity = 100

a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

agent3:

a2.sources = r1
a2.sinks = k1
a2.channels = c1

a2.sources.r1.type = avro
a2.sources.r1.bind = 192.168.120.129
a2.sources.r1.port = 4041

a2.sinks.k1.type = logger

a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapactity = 100

a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

agent4:

a2.sources = r1
a2.sinks = k1
a2.channels = c1

a2.sources.r1.type = avro
a2.sources.r1.bind = 192.168.120.129
a2.sources.r1.port = 4042

a2.sinks.k1.type = logger

a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapactity = 100

a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

10. sink failover:

agent1:
#Define the agent name as a1
#Set the names of the 3 components
a1.sources = r1
a1.sinks = k1 k2 k3 k4
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Configure sink group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2 k3 k4
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.priority.k3 = 15
a1.sinkgroups.g1.processor.priority.k4 = 20
a1.sinkgroups.g1.processor.maxpenalty = 10000

#Configure the sink1 type as Logger
a1.sinks.k1.type = logger

#Configure sink2,3,4 as Avro
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129 
a1.sinks.k2.port = 4040
    
a1.sinks.k3.type = avro 
a1.sinks.k3.hostname = 192.168.120.129
a1.sinks.k3.port = 4041  
    
a1.sinks.k4.type = avro 
a1.sinks.k4.hostname = 192.168.120.129
a1.sinks.k4.port = 4042

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to Events in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
a1.sinks.k3.channel = c1
a1.sinks.k4.channel = c1

Other agent(n) files are the same as the agent(n) file of configuration 9 (except 1)

Open a new file 2 session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

11. sink processor load balancing

#Define the agent name as a1
#Set the names of the 3 components
a1.sources = r1
a1.sinks = k1 k2 k3 k4
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

#Define group
a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2 k3 k4
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.backoff = true
a1.sinkgroups.g1.processor.selector = random

#Configure the sink1 type as Logger
a1.sinks.k1.type = logger

#Configure sink2,3,4 as Avro
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = 192.168.120.129
a1.sinks.k2.port = 4040

a1.sinks.k3.type = avro
a1.sinks.k3.hostname = 192.168.120.129
a1.sinks.k3.port = 4041

a1.sinks.k4.type = avro
a1.sinks.k4.hostname = 192.168.120.129
a1.sinks.k4.port = 4042

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to Events in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
a1.sinks.k3.channel = c1
a1.sinks.k4.channel = c1

Other agent(n) files are the same as the agent(n) file of configuration 9 (except 1)
Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

12. Export data to hdfs

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
#The path under node port 9000 / path will automatically create a folder during data sending
#The configuration type is hdfs
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://192.168.120.129:9000/user/flume/logs
a1.sinks.k1.hdfs.fileType = DataStream

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory

##Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Open a new session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

13. Multiple agent s upload hdfs

Document 1:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = avro
a1.sources.r1.bind = 192.168.120.129
a1.sources.r1.port = 4040

#Configure sink1 type as Avro
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://192.168.120.129:9000/user/flume/logs
a1.sinks.k1.hdfs.fileType = DataStream

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to Events in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Document 2:

#Define the agent name as a1
#Set the names of the 3 components
a1.sources = r1
a1.sinks = k1
a1.channels = c1

#The configuration source type is NetCat, the listening address is local, and the port is 44444
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
#Configure sink2,3,4 as Avro
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.120.129
a1.sinks.k1.port = 4040

#Configure the channel type as memory, the maximum capacity of memory queue is 1000, and the maximum number of Events received from source or sent to Events in a transaction is 100
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000 
a1.channels.c1.transactionCapacity = 100

#Bind source and sink to channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Open a new file 2 session window and link flume.
NC localhost 44444 (port number)
Then just type in the text

hadoop fs -cat file path (the written information will appear on the host)

14. Custom source

maven is used here
Relevant files are placed in the code test resource directory link
Used are:

Package name+file name

s1java is used
The packaged jar package is placed in the flume/lib directory

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = Mzj.Demo.s1

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Custom sinks

maven is used here
Relevant files are placed in the code test resource directory link
Used are:

Package name+file name

s2.java is used
The packaged jar package is placed in the flume/lib directory

#Define the agent name as a1
#Set the names of the 3 components
a1.sources =r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = Mzj.Demo.s1
a1.sinks.k1.type = Mzj.Demo.s2

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Code test resources

Extract link
Extraction code: 6666
pack:
Double click the red box

The generated jar package is pulled into flume/lib

Introduction to maven resources:
pom.xml:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>Mzj_baby</groupId>
    <artifactId>Flume_Demo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.flume</groupId>
            <artifactId>flume-ng-core</artifactId>
            <version>1.9.0</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-api</artifactId>
            <version>1.7.32</version>
        </dependency>
    </dependencies>
    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

MyHostInterceptor.java:

package Mzj.Demo;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.interceptor.Interceptor;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * Vast
 * 2021/9/15
 */
public class MyHostInterceptor implements Interceptor{
    private  String name;


    private static final Logger logger = LoggerFactory
            .getLogger(MyHostInterceptor.class);
    @Override
    public void initialize() {
        this.name = "";
    }

    @Override
    public Event intercept(Event event) {
        //Handle the event. The event contains the message body and header
        //If the host source is 192.168.120.129, discard the event
        if(event.getHeaders().get("host").equals("192.168.120.134")){
            logger.info("The source is 134, abandonment event");
            return null;
        }

        Map<String,String> map = new HashMap<String,String>();
        map.put("state","CZ");
        event.setHeaders(map);

        return event;
    }

    @Override
    //Handle all events
    public List<Event> intercept(List<Event> events) {
        List<Event> eventList = new ArrayList<Event>();
        for (Event event: events){
            Event event1 = intercept(event);
            if (event1 != null){
                eventList.add(event1);
            }

        }
        return eventList;
    }

    @Override
    public void close() {

    }



    public static class Builder implements Interceptor.Builder {

        @Override
        public Interceptor build() {
            return  new MyHostInterceptor();
        }

        @Override
        public void configure(Context context) {

        }
    }
}

s1.java

package Mzj.Demo;
import org.apache.flume.Context;
import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.PollableSource;
import org.apache.flume.conf.Configurable;
import org.apache.flume.event.SimpleEvent;
import org.apache.flume.source.AbstractSource;
/**
 * Vast
 * 2021/9/20
 */
public class s1 extends AbstractSource implements Configurable, PollableSource {
    //Processing data
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        try {
            //Self simulated data transmission
            for (int i = 0; i< 10;i++){
                Event event = new SimpleEvent();
                event.setBody(("data:"+i).getBytes());
                getChannelProcessor().processEvent(event);
                //Data preparation consumption
                status = Status.READY;
                Thread.sleep(5000);
            }
        } catch (Exception e) {
            e.printStackTrace();
            status = Status.BACKOFF;
        }
        return status;
    }
    @Override
    public long getBackOffSleepIncrement() {
        return 0;
    }
    @Override
    public long getMaxBackOffSleepInterval() {
        return 0;
    }
    @Override
    public void configure(Context context) {
    }
}

s2.java

package Mzj.Demo;

import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
 * Vast
 * 2021/9/21
 */
public class s2 extends AbstractSink implements Configurable {
    private static final Logger logger = LoggerFactory
            .getLogger(s2.class);
    //Processing data
    @Override
    public Status process() throws EventDeliveryException {
        Status status = null;
        //Get the Channel bound by sink
        Channel ch = getChannel();
        //Get transaction
        Transaction transaction = ch.getTransaction();
        try {
            transaction.begin();
            //Receive data from Channel
            Event event = ch.take();
            //Data can be sent to external storage
            if(event == null){
                status = Status.BACKOFF;
            }else {
                logger.info(new String(event.getBody()));
                status = Status.READY;
            }
        transaction.commit();
        }catch (Exception e){
            logger.error(e.getMessage());
            status = Status.BACKOFF;
        }finally {
            transaction.close();
        }
        return status;
    }
    @Override
    public void configure(Context context) {
    }
}

Posted by burningkamikaze on Tue, 21 Sep 2021 15:54:33 -0700