Several Methods for Large Log Segmentation

Several Methods for Large Log Segmentation

When the log capacity is G, the efficiency of viewing specific content with vi becomes very low. At this time, we need to divide the large log.
In order to compare the effects of various segmentation methods, the basic information of the test log I selected is as follows:

# ls -lrth test.log
-rw-r--r-- 1 root root 645M 5month  30 20:42 test.log
# wc -l test.log
8856340 test.log

1. split method segmentation

The split command is specifically designed to split a large file into many small files. I'll give a brief description of the options of the split command.

option Meaning
-b The size of the segmented document in byte
-C Segmented document, maximum byte number per line
-d Use numbers as suffixes and - a length to specify the length of suffixes
-l Number of rows of split document

In order to ensure the readability of the log as much as possible, we divide the large log file by line and specify the prefix and suffix of the partitioned file.

#The suffix is a number, in two places, and the prefix is test..log
split -l 1000000 test.log -d -a 2 test.log
#Results after segmentation
ls -lrth
Total dosage 1.3G
-rw-r--r-- 1 root root 645M 5 month  30 20:42 test.log
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log00
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log01
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log02
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log03
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log04
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log05
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log06
-rw-r--r-- 1 root root  73M 5 month  30 20:55 test.log07
-rw-r--r-- 1 root root  64M 5 month  30 20:55 test.log08

2. dd segmentation

dd bs=1M count=300 if=test.log of=newlog.1
dd bs=1M count=300 if=test.log of=newlog.2 skip=300
dd bs=1M count=300 if=test.log of=newlog.3 skip=600

The effect of segmentation

ls -lrth
//Total dosage 1.3G
-rw-r--r-- 1 root root 645M 5month  30 20:42 test.log
-rw-r--r-- 1 root root 300M 5month  30 21:07 newlog.1
-rw-r--r-- 1 root root 300M 5month  30 21:07 newlog.2
-rw-r--r-- 1 root root  45M 5month  30 21:07 newlog.3

In the commands used above, bs represents the size of the data block, count represents the number of blocks copied, if represents the input file, and of represents the output file.
This command does not split the file into the desired state at once, and it is likely that a single line of logs will be split into two files.

3. head+tail segmentation

Using these two commands to obtain part of the file content, and then redirect can achieve file segmentation, but there are many restrictions, only the file can be divided into two parts, if the file is particularly large, in order to achieve the desired effect, we must always segment it.
Head/tail-n $line number test.log > newlog
Because these two commands are familiar with each other, we will not talk more about them.

4. Segmentation by sed

The implementation principle is to intercept the content between specific lines with sed, and then redirect it.

sed -n '1,2000000p' test.log > test.log.1
sed -n '2000001,4000000p' test.log > test.log.2
sed -n '4000001,6000000p' test.log > test.log.3
sed -n '6000001,8000000p' test.log > test.log.4
sed -n '8000001,$p' test.log > test.log.5

The $represents the last line, which also needs a loop if it is too split.

5. awk partitioning

The principle of implementation is similar to sed, because awk is not used much. Here is just a small example:

awk '{if (NR<120000) print $0}' test.log > a.txt
awk '{if (NR>=120000) print $0}' test.log > b.txt

Or split is comfortable to use.

Reference Blog: http://blog.csdn.net/wind0513/article/details/5871293

Posted by khovorka on Wed, 26 Jun 2019 14:47:03 -0700