Reptilian Practice

Today we practice crawling a website and summarize the crawling template of similar websites. Let's take a website like http://www.simm.cas.cn/xwzx/kydt/ as an example. The goal is to crawl the title, release time, article links, picture links, and source of the news. We mainly use requests, re, Beautiful Soup, JSON modules. Go ...

Posted by kir10s on Thu, 24 Jan 2019 10:15:13 -0800

[Jsoup in action] Simulated Browser: Use of Jsoup Tool Classes and retry Strategy for Failed Retries (3)

Get other sister chapters of a Document object from a URL:Simulated Browser: Getting Web Page Data Simply (1)Simulated Browser: post Simulated Log-in to Get Web Page Data (2)Simulated Browser: Use of Jsoup Tool Class and retry Strategy for Failed Retries (3) Tool class: As the name implies, it is a tool for others to use as a tool. It only pro ...

Posted by FVxSF on Mon, 31 Dec 2018 21:00:08 -0800