Download APP1.0 for image of python crawler

Keywords: Python network xml Windows

Today, I'd like to give you a little bit of fun. Use python to crawl the pictures to the local area. The website is https://www.pexels.com/
This website is a foreign language network, so the search image should be in English. What we need to do today is search and download the image in python, and make a web APP.

Direct code

from bs4 import BeautifulSoup
import requests

headers ={
    'accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Cookie':'__cfduid=dcb472bad94316522ad55151de6879acc1479632720; locale=en; _ga=GA1.2.1575445427.1479632759; _gat=1; _hjIncludedInSample=1',
    'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.143 Safari/537.36'
}

url_path = 'https://www.pexels.com/search/'
content= input('Please enter the picture you want to download:')
url = url_path + content + '/'
wb_data = requests.get(url,headers=headers)
soup = BeautifulSoup(wb_data.text,'lxml')
imgs = soup.select('a > img')
list = []
for img in imgs:
    photo = img.get('src')
    list.append(photo)

path = 'C://Users/Administrator/Desktop/photo/'

i = 1
for item in list:
    if item==None:
        pass
    elif '?' in item:
        data = requests.get(item,headers=headers)
        fp = open(path+content+str(i)+'.jpeg','wb')
        fp.write(data.content)
        fp.close
        i = i+1
    else:
        data = requests.get(item, headers=headers)
        fp = open(path+item[-10:],'wb')
        fp.write(data.content)
        fp.close()

Analysis code

1. First, I search snow and girl on the website, which are: https://www.pexels.com/search/snow/
https://www.pexels.com/search/girl/
So I use the input function for input and build the url myself.
2 parse and find the url of the image and put it in the list, which will not be covered.
Before using urlretrieve to download, there was always an error reported, which may be the cause of the foreign language network. So I asked again for the url of the picture I got and added headers.
Why use judgment? Because this website appears None when I crawl, I pass it away, and others have jpeg format and png format, so I need to download them separately.

Posted by Fingers68 on Sun, 15 Dec 2019 06:47:59 -0800