Web crawler, also known as Web spider or Web robot, is mainly used to crawl the content of the target website.
Distinguish web crawlers in terms of function:
data acquisitiondata processingData storage
For the above three parts, the basic work framework flow is shown as follows:
Function: Download Web p ...
1. Python Basics
1.Python environment installation
1.1 download Python
Official website: https://www.python.org/
1.2 installing Python
One way fool installation
1.3 test whether the installation is successful
Win + R, enter cmd, enter If an error occurs: 'python', it is not an internal or external command, nor is it a ...
Posted by lady_bug on Sat, 18 Sep 2021 09:40:35 -0700
Tuples are called read-only lists, that is, data can be queried but cannot be modified. Therefore, the slicing operation of strings is also applicable to tuples. Example: (1, 2, 3) ("a", "b", "c")
List is one of the basic data types in python. Other languages also have data types similar to ...
Posted by calande on Sat, 18 Sep 2021 02:29:18 -0700
Task 1: data collection
Web page“ http://pm25.in/beijing ”It contains the air quality monitoring data of 12 monitoring points in Beijing. Please write a program to capture the monitoring points, AQI and air quality index categories on the web page (the web page samples are saved in the src1 directory under the source material folde ...
Posted by SpectralDesign.Net on Thu, 16 Sep 2021 10:35:35 -0700
Misc24 BMP height change
Tip: the flag is above the picture
bmp format file
The real flag is on the picture. You can see it by changing the height
Misc25 PNG height change
Tip: the flag is under the picture
Just change the height
Tip: the flag is still under the picture, but how much is it?
Open t ...
Posted by kaspari22 on Sat, 11 Sep 2021 23:02:31 -0700
This article uses 58 Tongcheng's recruitment website for reference to learn and crawl the encrypted part of the font
58 Tongcheng Recruitment
As you can see from the diagram, the font information in the source code is replaced by question marks, so if you crawl normally, only those question marks will be crawled.
Right-click the page and ...
Posted by Renegade85 on Sat, 11 Sep 2021 10:21:56 -0700
Tip: after the article is written, the directory can be generated automatically. Please refer to the help document on the right for how to generate it
Recently, I followed the dark horse programmer to learn the request crawler and successfully completed the batch processing of NCBI documents. The problem is that the crawling eff ...
Posted by dharprog on Thu, 09 Sep 2021 20:18:19 -0700
1. Overview of Functions
If you need a block of code several times when developing a program, in order to improve writing efficiency and code reuse, you organize code blocks with independent functions into a small module, which is a function.
We've been exposed to a number of functions, such as input(), print(), range(), len(), and so on, ...
Posted by kenchucky on Tue, 07 Sep 2021 17:13:49 -0700
Uses of access.log logs
Statistical access to ip sources and access frequency over a period of timeView most frequently visited pages, HTTP response status codes, interface performanceInterface seconds, minutes, hours and days
Default Configuration Resolution
nginx default log configuration
#log_format main '$remote_addr - $remote_use ...
Posted by lbaxterl on Tue, 07 Sep 2021 10:39:49 -0700
Easy campus to achieve automatic clock in
Recently, Xiao Du came to a certain university in Yunnan and clocked in and reported his personal information on Yi campus every day. This wave of operation really made my scalp numb, so I made a living and wanted to punch in automatically.
The principle is similar to the previous automatic punch in p ...
Posted by sleepydad on Mon, 06 Sep 2021 21:05:59 -0700