Of course, learning Python on the road will be difficult, there is no good learning materials, how to learn?There is no clear recommendation to join the communication group number in Python Learning: there are like-minded little partners in the 973783996 group, they help each other, there are good video learning tutorials and PDF in the group!
Frequency Statistics of HuanZhuGeGe
Word Cloud Label Generation from Frequency Statistics of HuanZhuGeGe
It's like turning the "Report on the Work of the Chinese Government 2016" into a cipher cloud
And then Time of the Child
Take the picture of the little swallow as the background of the word cloud
Statistics on the frequency of Ci in The Heroes of Ejection Carving and the Ci Cloud Background of Guo Jing's Opera Photos
Is there a full instant vision?
A Web-side movie database interaction
You can see the whole history of Hong Kong movies, from early co-filming of Shanghai films to Hu Jinshuan's martial arts films to Bruce Lee's era, then Jackie Chan, then Stephen Chow
Word frequency analysis of responsibility requirements to extract necessary skills
Use the crawler to climb down the picture of Goddess of Knowledge
Finally, let's show you the Python code:
Word Frequency Statistics and Word Cloud Codes
from wordcloud import WordCloud import jieba import PIL import matplotlib.pyplot as plt import numpy as np def wordcloudplot(txt): path = 'd:/jieba/msyh.ttf' path = unicode(path, 'utf8').encode('gb18030') alice_mask = np.array(PIL.Image.open('d:/jieba/she.jpg')) wordcloud = WordCloud(font_path=path, background_color="white", margin=5, width=1800, height=800, mask=alice_mask, max_words=2000, max_font_size=60, random_state=42) wordcloud = wordcloud.generate(txt) wordcloud.to_file('d:/jieba/she2.jpg') plt.imshow(wordcloud) plt.axis("off") plt.show() def main(): a = [] f = open(r'd:\jieba\book\she.txt', 'r').read() words = list(jieba.cut(f)) for word in words: if len(word) > 1: a.append(word) txt = r' '.join(a) wordcloudplot(txt) if __name__ == '__main__': main()
Code to crawl the goddess
import requests import urllib import re import random from time import sleep def main(): url = 'xxx' headers = {xxx} i = 925 for x in xrange(1020, 2000, 20): data = {'start': '1000', 'offset': str(x), '_xsrf': 'a128464ef225a69348cef94c38f4e428'} content = requests.post(url, headers=headers, data=data, timeout=10).text imgs = re.findall('<img src=\\\\\"(.*?)_m.jpg', content) for img in imgs: try: img = img.replace('\\', '') pic = img + '.jpg' path = 'd:\\bs4\\zhihu\\jpg4\\' + str(i) + '.jpg' urllib.urlretrieve(pic, path) print ('Downloaded' + str(i) + u'Picture') i += 1 sleep(random.uniform(0.5, 1)) except: print ('Catch 1 Leak') pass sleep(random.uniform(0.5, 1)) if __name__ == '__main__': main()