Seek help! Beautifulsop can't parse Baidu Homepage
In the python 3 compiler, first import the corresponding third-party libraries. Here, I only use urllib.request and beautiful Soup for testing.
After the import, enter the following code in the shell to display the corresponding results.
>>>url_1 = r'https://www.baidu.com' >>>url_2 = r'https://baidu.com' >>>page_1 = urllib.request.urlopen(url_1) >>>page_2 = urllib.request.urlopen(url_2) >>>soup_1 = BeautifulSoup(page_1.read(), 'html.parser') >>>soup_2 = BeautifulSoup(page_2.read(), 'html.parser') >>>soup_1('a') [] >>>soup_2('a') [<a href="/" id="result_logo" onmousedown="return c({'fm':'tab','tab':'logo'})"><img alt="To Baidu Homepage" class="index-logo-src" src="//Py "> Pinyin < / a >, < a href =" javascript:; "name =" ime_cl "> Close < / a >, < a class =" U login "onclick =" return false; "> login < / a >, < a class =" mnav "href =" http://news.b < a class = "MNA V" href = "http://tieba.baidu.com" name = "tj_trtieba" > post < / a >, < a class = "MNA IDU. COM / Gaoji / preferences. HTML "name =" TJ [settingicon "> Settings < / a >, < a class =" bri "HRE . Baidu. COM / F? KW = & fr = www T "onmousedown =" return C ({'FM':'tab ','tab':'t = PS & ie = UTF-8 & key = "onmousedown =" return C ({'FM':'tab ','tab':'music '}) "wdfield =" key "> 1989888 & RN = 20 & PN = 0 & DB = 0 & S = 25 & ie = UTF-8 & word = "onmousedown =" return C ({ ; ie = UTF-8 "onmousedown =" return C ({'FM':'tab ','tab':'wenku '}) "wdfield =" word "> Library < / a >, < a href =" //A href = "http://home.baidu.com" onmousedown = "return ns? C ({'FM':'behs','tab ':'tj? About'})" "> Www.baidu.com/duty/ "onmousedown =" return ns? C ({'FM':'behs','tab ':'tj? Duty'}) "> use hundred In this paper, the author analyzes the characteristics of
It can be seen that url_1can not be parsed in beautiful soup after getting the html of the page, but url_2can be parsed, and the difference between url_1and url_2lies in the previous www. I am deeply puzzled about this. I don't know why this happens. Please help me to solve the doubts.