Several Methods for Large Log Segmentation
When the log capacity is G, the efficiency of viewing specific content with vi becomes very low. At this time, we need to divide the large log.
In order to compare the effects of various segmentation methods, the basic information of the test log I selected is as follows:
# ls -lrth test.log
-rw-r--r-- 1 root root 645M 5month 30 20:42 test.log
# wc -l test.log
8856340 test.log
1. split method segmentation
The split command is specifically designed to split a large file into many small files. I'll give a brief description of the options of the split command.
option | Meaning |
---|---|
-b | The size of the segmented document in byte |
-C | Segmented document, maximum byte number per line |
-d | Use numbers as suffixes and - a length to specify the length of suffixes |
-l | Number of rows of split document |
In order to ensure the readability of the log as much as possible, we divide the large log file by line and specify the prefix and suffix of the partitioned file.
#The suffix is a number, in two places, and the prefix is test..log
split -l 1000000 test.log -d -a 2 test.log
#Results after segmentation
ls -lrth
Total dosage 1.3G
-rw-r--r-- 1 root root 645M 5 month 30 20:42 test.log
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log00
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log01
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log02
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log03
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log04
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log05
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log06
-rw-r--r-- 1 root root 73M 5 month 30 20:55 test.log07
-rw-r--r-- 1 root root 64M 5 month 30 20:55 test.log08
2. dd segmentation
dd bs=1M count=300 if=test.log of=newlog.1
dd bs=1M count=300 if=test.log of=newlog.2 skip=300
dd bs=1M count=300 if=test.log of=newlog.3 skip=600
The effect of segmentation
ls -lrth
//Total dosage 1.3G
-rw-r--r-- 1 root root 645M 5month 30 20:42 test.log
-rw-r--r-- 1 root root 300M 5month 30 21:07 newlog.1
-rw-r--r-- 1 root root 300M 5month 30 21:07 newlog.2
-rw-r--r-- 1 root root 45M 5month 30 21:07 newlog.3
In the commands used above, bs represents the size of the data block, count represents the number of blocks copied, if represents the input file, and of represents the output file.
This command does not split the file into the desired state at once, and it is likely that a single line of logs will be split into two files.
3. head+tail segmentation
Using these two commands to obtain part of the file content, and then redirect can achieve file segmentation, but there are many restrictions, only the file can be divided into two parts, if the file is particularly large, in order to achieve the desired effect, we must always segment it.
Head/tail-n $line number test.log > newlog
Because these two commands are familiar with each other, we will not talk more about them.
4. Segmentation by sed
The implementation principle is to intercept the content between specific lines with sed, and then redirect it.
sed -n '1,2000000p' test.log > test.log.1
sed -n '2000001,4000000p' test.log > test.log.2
sed -n '4000001,6000000p' test.log > test.log.3
sed -n '6000001,8000000p' test.log > test.log.4
sed -n '8000001,$p' test.log > test.log.5
The $represents the last line, which also needs a loop if it is too split.
5. awk partitioning
The principle of implementation is similar to sed, because awk is not used much. Here is just a small example:
awk '{if (NR<120000) print $0}' test.log > a.txt
awk '{if (NR>=120000) print $0}' test.log > b.txt
Or split is comfortable to use.
Reference Blog: http://blog.csdn.net/wind0513/article/details/5871293