1. Download jieba participle and wordcloud
Pip3 install jieba (3 may need to be removed)
2. Open + name the text to generate word cloud
Use with open as
3. Participle
Import custom dictionary (load_userdict; sep_list)
4. Statistics of word frequency
Define an empty dictionary; Use cycle
5. Add stop words
Put the words separated from the text in a list;
Import + named stop word text
Cycle (setting condition: word frequency < 20 / single word len / medium stop word)
Delete the inconsistent word names.pop()
******Get the classified words and their word frequency above******
6. Confirm and modify the path
import os; os.getcwd() current; os.chdir(path) modification
7. Generate word cloud
Names = wordcloud (requirements in the figure). Generate_ from_ Frequencies (Dictionary of words)
8. Use fonts
font=r'path (position of font in the computer)
9. Finally generate word cloud
Plt.imshow; plt.axis(off) delete the coordinate axis; plt.show display picture
******Above square word cloud chart******
10. Set of graphics
from PIL import Image
import numpy as np
Also add some requirements to the requirements in the figure above
The code is as follows:
import jieba with open('China145.txt','r',encoding='utf-8')as f:#r: Open the file read-only. The pointer to the file will be placed at the beginning of the file. renmin=f.read()#renmin is his own name #participle jieba.load_userdict('China145cut.txt')#load_userdict means importing a custom dictionary. What's the use of this step? seg_list=jieba.cut(renmin,cut_all=False)#false indicates the exact mode is used #Statistical word frequency tf={}#tf means to define the dictionary name as for seg in seg_list: if seg in tf: tf[seg]+=1#[SEG] brackets are the values corresponding to seg else: tf[seg]=1 #Add stop words ci=list(tf.keys())#Make keys into a list called ci,. Keys is the list that returns all keys with open('chinesestopwords.txt','r',encoding='utf-8')as ft: stopword=ft.read()#Make a stop word tuple (from your own txt) called stopword for seg in ci: if tf[seg]<20 or len(seg)<2 or seg in stopword or "-" in seg: tf.pop(seg)#. pop means to delete the specified key print(tf) import os print(os.getcwd()) from wordcloud import WordCloud import matplotlib.pyplot as plt #Add shape from PIL import Image import numpy as np#That is, for the convenience of writing programs, numpy is nicknamed np; Numpy is an extension library of the Python language mask=np.array(Image.open('heart.jpg')) #Add font font=r'c:\Windows\Fonts\simfang.ttf' wc=WordCloud(background_color='white',mask=mask,font_path=font,width=800,height=600).generate_from_frequencies(tf)#What's the difference between the generate here and the generate above? plt.imshow(wc) plt.axis('off') plt.show()#Display image wc.to_file('wc.jpg')#Generate jpg
Relevant knowledge:
[text cannot be parsed] note that when creating a new text, save it as, and select the encoding method as uft-8
[encoding='utf-8 '] if the display cannot be read, add this
[os module] provides methods to process files and directories
os.getcwd() returns the current working directory
os.chdir(path) changes the current working directory
[jieba participle]
Related links: python stuttering word segmentation learning - Liu Shuai - blog Garden
[with open as] read and write files
r means open in read-only mode, and the pointer is at the beginning of the text
Related links: python uses with open() as to read and write files_ xrinosvip blog - CSDN blog
[imshow] heat map is a common method of data analysis. It shows the difference of data through color difference and brightness.
Related links: plt.imshow()_ Small program scarlet blog - CSDN blog_ plt.imshow
[PIL] picture processing module
Related links: https://www.jb51.net/article/184195.htm
After class practice
Fang Siqi's first love paradise is a story about Fang Siqi, a girl who loves literature, who was sexually assaulted by her teacher Li Guohua and finally led to mental collapse. According to the word cloud picture (excluding the name of the protagonist), the main keywords of this paper are: teacher, like, no, sister, don't, etc. For the girl Siqi, Li Guohua was initially a respectable Chinese teacher with deep attainments in literature. However, under the guidance of the teacher, Siqi could not resist and felt very painful. She had to force herself to "like" the teacher for a moment of relief. Sister is Siqi's neighbor. She is a young woman who also loves literature but has been in a domestic violence environment for a long time, which is also another clue of this article. Sister is Siqi's comfort, but her experience also imperceptibly affects Siqi.