Experience with battery theft teaches you to crawl an ip website to create an ip proxy pool

Keywords: Python crawler


This time we crawled a free ip proxy
Cough, what is Sao's operation? It is to crawl the ip address they provide, and then use that ip address to crawl their website resources
I won't put the website links here, or leave a little face.
Next come the old rules, Five Reptiles, Step on, Watch, Enter, Pick, Guess

Reptilian Five

Pedal

As the name implies, stepping on a spot is the place where we are going to Mississippi. For example, climbing Baidu, the stepping point is their address www.baidu.com. The same is true here. As mentioned above, this operation is a bit humble, so the victim's place will not be published here.

observation

The so-called observation is to observe the structure of the web page, to analyze what we want, how can we get it in our hands, and then to get the victim's scratch map

Through observation we found that he had hidden what we wanted in a pocket named tr. We took a small notebook and wrote it down to prepare for action later

Get into

Step on the spot, we also know where our goal is hiding what we want, and we're going to get closer
Cough, remember to prepare wrenches, screwdrivers, hammers before approaching target
Useragents make up to prevent victims from recognizing us
requests are like opening the key to the victim's home
Random is random and disguises itself more like the person the victim often comes into contact with
lxml is our most important tool in this crime. Everything before it was for this step, to pull what we want from him

from  fake_useragent import UserAgent  # This module gets UAS from the network
import requests  # Requests are used to initiate requests
import random   # For Random
from lxml import etree  # xpath parsing module

With the crime tools on, let's get the keys into his house
In order not to be found, we have to wear a hidden clothing for ourselves, the ip agent

proxies_list=[
'106.55.15.244:8889',
'119.183.250.1:9000'
'220.173.37.128:7890',
'223.96.90.216:8085',
'47.100.34.198:8001',
'103.103.3.6:8080',
'27.192.200.7:9000',
'113.237.3.178:9999',
'45.228.188.241:999',
'211.24.95.49:47615',
'191.101.39.193:80',
'103.205.15.97:8080',
'185.179.30.130:8080',
'190.108.88.97:999',
'182.87.136.228:9999',
'167.172.180.46:33555',
'58.255.7.90:9999',
'190.85.244.70:999',
'175.146.211.158:9999',
'36.56.102.35:9999',
'131.153.151.250:8003',
'195.9.61.22:45225',
'43.239.152.254:8080'
]#ip list
random_proxies = random.choice(proxies_list)#Randomly select ip from list
# Create headers
proxies = {'http' : random_proxies}  #Stitching ip

That proxies_above List is the wardrobe where we put our invisible clothes. You can see that I put a lot of invisible clothes in the wardrobe. Because this invisible clothes has time limitations, if we discover it, we will change to another one so that we can go to his house to get stuff all the time.
Since this invisible clothing is not perfect and defective, we need to process it once, proxies is to process the invisible clothing, making the invisible clothing more perfect

ua=UserAgent()  # Call function
user_agent=ua.random  #Random selection of ua
headers={'User-Agent':user_agent}  #Stitching ua

Here's the top-level cosmetic surgery from a distant world of Wanzi
ua=useragent() is the face selection of our country, from the vast sea of people to choose which star to show home
user_agent=ua.random This seems a bit arbitrary. Others are not easy to do so much money for grooming. You even choose one randomly and don't cough professionally at all.
headers={'User-Agent': user_agent} Now it's time to start an operation. Now that the nose is selected, we'll pick up the scalpel and start a face-shaving operation. headers will accept our best face. Finally, when we go to another place (the victim) to pick something up, we'll put the face on its own head so he can't tell if we're bad people.
Cough

Everything is ready, then open the door with a screwdriver

res=requests.get(url,headers=headers,proxies=proxies).text   # Initiate Request

Above are the tools we use to open doors. The url is the address, the headers are the covers we use to groom in Banzi, and the proxies are our invisible clothes
requests is the main tool, behind get you can understand what the door is, whether it's an anti-theft (https) or a wooden door (http)
Let's expand it here. The theft-proof door just adds something called a condom-binding layer, the s that comes out behind it. When we open it, we need to change the get to post and pass in some value.

Grab

After entering the door, we'll go back to the second step, where we already know where he's hiding it, and then we'll use our xapth to pull it out step by step
Decryptor is the xpath helper tool in Google. See another article for installation tutorials
Article Entry

Hidden Point Found Next Open Tool Input corresponding code to fetch data
Enter a code in the upper left corner and output what we get in the upper right corner
Next enter the program Enter the corresponding password into the program

res_1=etree.HTML(res)
res_lis=res_1.xpath('//Tbody//tr/td[1]/text()') #xpath statement

Inventory of stolen goods

Above we've found what we want, and next we'll count the stolen goods
Remember to prepare a linen pocket to hold what we want before counting

ip_list=[]  # Define a linen pocket to store usable ip addresses

We've got a lot of things but some of them are bad, so let's go one by one and see if it's good or bad, and then we'll leave it as bad as he is.
Code directly up below

for li in res_lis:  # Loop through the extracted ip address
    print(li)  #Used to test if the extraction is working
    ip_lis=({
        'http':'http:/{}'.format(li)
    })#Stitching ip.address
    url='http://baidu.com'#Just enter any address at will, because Baidu visits faster, so use Baidu
    # ip_list=[]
    try:  #try: handle exceptions with
        res=requests.get(url, headers=headers, proxies=proxies,timeout=6).text  #Initiate a request to the test site
        # print(res)
        ip_list.append(li) # Add available IPS to the list
        print('{}available'.format(li)) #Print for tips
        # print(ip_list)

    except Exception as e:
        print('{}Not available'.format(li))# Printing is not available if not available
        continue

Going around

After our last sauce, it's time to pack the good stuff in your pocket

print(ip_list)#Print last available ip list
with open('ip.txt','a',)as f:#Write to File Local Storage
    f.write(str(ip_list))

We pack and take the good stuff in a big pocket called ip.txt, A is how we put it in

Complete process

# Name: Someone who stole the battery for the week
# Time of crime: 2021/11/21 7:44
from  fake_useragent import UserAgent  # This module gets UAS from the network
import requests  # Requests are used to initiate requests
import random   # For Random
from lxml import etree  # xpath parsing module
url='http://www.xiladaili.com/gaoni/2/'#destination url address

ua=UserAgent()  # Call function
user_agent=ua.random  #Random selection of ua
headers={'User-Agent':user_agent}  #Stitching ua
proxies_list=[


'106.55.15.244:8889',

'119.183.250.1:9000'
'220.173.37.128:7890',
'223.96.90.216:8085',





'47.100.34.198:8001',
'103.103.3.6:8080',
'27.192.200.7:9000',
'113.237.3.178:9999',

'45.228.188.241:999',
'211.24.95.49:47615',
'191.101.39.193:80',

'103.205.15.97:8080',
'185.179.30.130:8080',
'190.108.88.97:999',
'182.87.136.228:9999',
'167.172.180.46:33555',
'58.255.7.90:9999',
'190.85.244.70:999',

'175.146.211.158:9999',
'36.56.102.35:9999',
'131.153.151.250:8003',

'195.9.61.22:45225',
'43.239.152.254:8080'


]#ip list
random_proxies = random.choice(proxies_list)#Randomly select ip from list

# Create headers
proxies = {'http' : random_proxies}  #Stitching ip
res=requests.get(url,headers=headers,proxies=proxies).text   # Initiate Request
print(res)  # Just comment it out and use it earlier to test if you can make a normal request and print a response
res_1=etree.HTML(res)
res_lis=res_1.xpath('//Tbody//tr/td[1]/text()') #xpath statement
ip_list=[]  # Define a list to store available ip addresses
for li in res_lis:  # Loop through the extracted ip address
    print(li)  #Used to test if the extraction is working
    ip_lis=({
        'http':'http:/{}'.format(li)
    })#Stitching ip.address
    url='http://baidu.com'#Just enter any address at will, because Baidu visits faster, so use Baidu
    # ip_list=[]
    try:  #try: handle exceptions with
        res=requests.get(url, headers=headers, proxies=proxies,timeout=6).text  #Initiate a request to the test site
        # print(res)
        ip_list.append(li) # Add available IPS to the list
        print('{}available'.format(li)) #Print for tips
        # print(ip_list)

    except Exception as e:
        print('{}Not available'.format(li))# Printing is not available if not available
        continue
print(ip_list)#Print last available ip list
with open('ip.txt','a',)as f:#Write to File Local Storage
    f.write(str(ip_list))

Result Display


Posted by nsarisk on Sun, 21 Nov 2021 10:07:19 -0800