2021-09-08-Nginx Note 2: Nginx's access.log log log

Keywords: Operation & Maintenance Nginx crawler

Uses of access.log logs

  1. Statistical access to ip sources and access frequency over a period of time
  2. View most frequently visited pages, HTTP response status codes, interface performance
  3. Interface seconds, minutes, hours and days

Default Configuration Resolution

  1. nginx default log configuration
#log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
#                  '$status $body_bytes_sent "$http_referer" '
#                  '"$http_user_agent" "$http_x_forwarded_for"';
  • log_format: Define the format of the log
  • main: Defined log format name, the following code is the log storage path using that format
#access_log  logs/host.access.log  main;
  • Here is a log of access to a request
192.168.0.1 - - [29/Aug/2021:15:17:08 +0800] "GET /js/abc.js HTTP/1.1" 200 5 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"

1, r e m o t e a d d r Yes answer Of yes really real day Log in Of 192.168.0.1 , That is passenger household end Of I P 2 , remote_addr corresponds to 192.168.0.1 in the real log, which is the IP 2 of the client, Retea ddr corresponds to 192.168.0.1 in the real log, i.e. IP2, remote_of the clientUser corresponds to the second middle bar'-', which indicates remote user information. No user fills it with'-'
3,[ t i m e l o c a l ] Yes answer Of yes [ 29 / A u g / 2021 : 15 : 17 : 08 + 0800 ] , passenger household end interview ask Of time between 4 , " time_local] corresponds to [29/Aug/2021:15:17:08 +0800], client access time 4, ' timel ocal] corresponds to [29/Aug/2021:15:17:08+0800], client access time 4,'request'corresponds to the request content:'GET/js/abc.js HTTP/1.1'
5, s t a t u s Yes answer Of yes ring answer shape state code 6 , status corresponds to Response status Code 6, status corresponds to Response status Code 6, body_bytes_sent corresponds to 5, response body size
7," h t t p r e f e r e r " Yes answer Of yes station spot come source , as fruit straight meet yes u r l hit open Of network page , on Shi Are you? all no Yes , use " − " fill charge , through too this individual ginseng number can with do second kill Gong can , second kill Gong can on through too this individual ginseng number come sentence break use household Of come source yes no close method , also can with use to Prevention Burglary chain , than as chart slice Prevention Burglary chain , sentence break use household Of r e f e r e r , c d n clothes Business implement through too sentence break 8 , r e f e r e r come source , No yes officer square network station , on No Give Way you use . 9 , " http_referer "corresponds to the source of the site, if it is a web page opened directly by url, there is nothing, filled with"-". With this parameter, the second killing function can make a second killing function. The second killing function can use this parameter to judge whether the user's source is legal or not, and can also be used for anti-theft chains, such as picture anti-theft chains, to judge the user's referer, cdn server can judge the source of 8, referer,Not an official website, you won't be allowed to use it.9. " httpr eferer "corresponds to the source of the site, if it is a web page opened directly by url, there is nothing, filled with"". With this parameter, the second killing function can do second killing function. The second killing function can use this parameter to judge whether the source of users is legitimate, and can also be used for anti-theft chain, such as picture anti-theft chain, referer of users, cdn server by judging 8, cdn server by judgingReferer source, not an official website, is not available to you.9. "http_user_agent"corresponds to client information accessed by client users, such as google browser, UC browser, etc. The anti-crawl mechanism is judged by this field. Users are judged by the crawl code, which is generally empty here. If it is empty, it will not crawl for them
10,'$http_x_forwarded_for "is the transfer mechanism of Nginx, which can transfer the user's true IP to downstream services. If this parameter is not provided, downstream services get only Nginx's ip, but not the client's IP.

Nginx Statistical Analysis Visits

View the top 100 IP accesses most frequently

  1. Linux text processing command awk, which can be customized by default with space split judgment, awk is line-by-line
awk '{print $1}' access_temp.log | sort -n |uniq -c | sort -rn | head -n 100
  • Command Resolution

awk: command
'{print $1}': truncate the first parameter of each line by space
| Pipe symbol, used to add filter conditions
Sort-n sorting condition, sorted by value, note that the first column is sorted
Uniq-c weights, -c shows the number of times the row repeats next to each column.
Sort-rn-r is in reverse order, sorted by value, note that this sort refers to the sort of visits after statistics
Head-n 100 means the first 100

Top 20 Most Visited URL s

  1. command
cat access_temp.log |awk '{print $7}'| sort|uniq -c| sort -rn| head -20 | more

Custom log format, counting interface response time

  1. Log format increase $request_time, and one more parameter: $upstream_response_time

The time from the first byte of accepting a user request to the end of sending response data, that is, the time to receive request data, the time to program response, and the time to output response data
$upstream_response_time: The time from the start of a Nginx connection to the back end until the data is accepted and the connection is closed
$request_time will generally be greater than upstream_response_time is large because it takes a lot more time to transfer data when the user network is poor or when the data is large

  1. Add Method
log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" $request_time';
access_log /var/log/nginx/access.log main;
  • Generate log: Add a log less than 1s
192.168.0.1 - - [29/Aug/2021:15:17:08 +0800] "GET /js/abc.js HTTP/1.1" 200 5 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36"
192.168.0.1 - - [29/Aug/2021:16:12:02 +0800] "GET /js/abc.js HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" "-" 0.000
192.168.0.1 - - [29/Aug/2021:16:12:25 +0800] "GET /js/abc.js HTTP/1.1" 304 0 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36" "-" 0.000

  1. Statistics Time-consuming Interface
cat time_temp.log|awk '($NF > 2){print $7}'|sort -n|uniq -c|sort -nr|head -5

Note: $NF means the last column, awk'{print $NF}'
Normal Business Response: No more than 500 ms
Simple Business or Hot Page Data: 10ms, over 100ms are rotten

Posted by lbaxterl on Tue, 07 Sep 2021 10:39:49 -0700