05 Cases - Real-time Reading Catalog Files

Keywords: vim

Real-time reading directory files to HDFS cases

Case requirements: Use flume to listen for files in the entire directory

Implementation steps:

1. Create the configuration file "flume-dir-hdfsconf"
touch flume-dir-hdfs.conf

vim flume-dir-hdfs.conf

	a3.sources = r3
	a3.sinks = k3
	a3.channels = c3
	# Describe/configure the source
	a3.sources.r3.type = spooldir
	a3.sources.r3.spoolDir = /opt/module/flume/upload
	a3.sources.r3.fileSuffix = .COMPLETED
	a3.sources.r3.fileHeader = true
	#Ignore all files ending with. tmp and do not upload
	a3.sources.r3.ignorePattern = ([^ ]*\.tmp)

	# Describe the sink
	a3.sinks.k3.type = hdfs
	a3.sinks.k3.hdfs.path = hdfs://hadoop102:9000/flume/upload/%Y%m%d/%H
	#Prefix for uploading files
	a3.sinks.k3.hdfs.filePrefix = upload-
	#Whether to scroll folders according to time
	a3.sinks.k3.hdfs.round = true
	#How much time to create a new folder
	a3.sinks.k3.hdfs.roundValue = 1
	#Redefining unit of time
	a3.sinks.k3.hdfs.roundUnit = hour
	#Whether to use local timestamp
	a3.sinks.k3.hdfs.useLocalTimeStamp = true
	#How many Event s are saved to flush to HDFS once
	a3.sinks.k3.hdfs.batchSize = 100
	#Set file type to support compression
	a3.sinks.k3.hdfs.fileType = DataStream
	#How often to generate a new file
	a3.sinks.k3.hdfs.rollInterval = 600
	#Setting the scroll size for each file is about 128M
	a3.sinks.k3.hdfs.rollSize = 134217700
	#File scrolling is independent of the number of Event s
	a3.sinks.k3.hdfs.rollCount = 0
	#Minimum Redundancy Number
	a3.sinks.k3.hdfs.minBlockReplicas = 1

	# Use a channel which buffers events in memory
	a3.channels.c3.type = memory
	a3.channels.c3.capacity = 1000
	a3.channels.c3.transactionCapacity = 100

	# Bind the source and sink to the channel
	a3.sources.r3.channels = c3
	a3.sinks.k3.channel = c3
2. Start the Monitor Folder Command
	bin/flume-ng agent --conf conf/ 
		--name a3 --conf-file job/flume-dir-hdfs.conf
	
	When using Spooling Directory Source, note that:
		(1) Do not create and continuously modify files in the monitoring directory
		(2) Uploaded files end with. COMPLETED
		(3) File changes of monitored folders are scanned every 500 milliseconds
3. add files to the upload folder
	cd /root/app/flume
	mkdir upload
		touch hao.txt
		touch hao.tmp
		touch hao.log
4. View data on HDFS
	"node01:50070"
5. Wait 1 s and query upload folder again
	cd /root/app/flume/upload
	ll				Those three documents appear

Posted by daanoz on Tue, 01 Oct 2019 15:44:36 -0700