Senior Python programmers teach you how to count the frequency of names in the three countries. It's very simple

Keywords: Python encoding Lambda Database

Senior Python programmers teach you simple and interesting programs:
Using the third-party database, we can count the number of characters' names in the romance of the Three Kingdoms.

Senior Python programmers teach you how to count the frequency of names in the three countries. It's very simple
One of them is a third-party library which classifies Chinese text according to the association probability of Chinese characters. It is easy to use and very easy to use

import jieba

def getWords():
 txt = open('novels/threekingdoms.txt', 'r', encoding = 'utf-8').read()
 words = jieba.lcut(txt)
 counts = {}
 for word in words:
 if len(word) == 1:
 continue
 else:
 counts[word] = counts.get(word, 0) + 1
 word_list = list(counts.items())
 word_list.sort(key = lambda x : x[1], reverse = True)
 return word_list

Optimize human flesh, remove words that are not human names, and merge some words that refer to the same person into one person.

import jieba

def countWords(excludes, merges):
 txt = open('novels/threekingdoms.txt', 'r', encoding = 'utf-8').read()
 words = jieba.lcut(txt)
 counts = {}
 # Take out words and symbols of one length and words in excludes
 for word in words:
 if len(word) == 1 or word in excludes:
 continue
 else:
 counts[word] = counts.get(word, 0) + 1
 # Merge people with the same name
 for merge in merges:
 for name in merge[1]:
 counts[merge[0]] += counts.get(name, 0)
 del counts[name]
 word_list = list(counts.items())
 word_list.sort(key = lambda x : x[1], reverse = True)
 return word_list
excludes = {'But say','Two person','Must not','Principal','Your Majesty','Hanzhoung','See only','Public will','last emperor of a dynasty','Shu troops','Mount a horse','Yell','Prefect','This person','Madam',
 'First master','later generations','behind','inside of a city','Emperor','One side','why not','Army','Neglect Report','Sir','Common people','Why','Can not','such','How',
 'Then?','Vanguard','Inferior to','hurry','Original','Cause','Jiangdong','Dismount','Cry','Exactly','Xuzhou','suddenly','Jingzhou','About','Army horse',
 'therefore','Chengdu','Disappear','Unknown','Great defeat','Events','after','First army','Draw troops','rise in arms','in the armed forces','Reception','Lead troops','Next day','Overjoyed',
 'Recruit troops','Great alarm','Sure','Think','Rage','Must not','At heart','Below','A sound','Catch up with','Grain and grass','The world','Soochow','Therefore','Governor',
 'Cao Bing','Together','decompose','Return','Pay separately','have to','Mount a horse','Three thousand','General','Xu Du','subsequently','Report','today','Afraid to','Wei Bing',
 'Front','Soldiers','And say','Public officials','Luoyang','Lead troops','Deliberation','Sergeant','A starry night','Elite soldier','On the city','Plan','Refuse to','meet','His words',
 'One day','While doing','Civil and military','Xiangyang','Get ready','How','Go to war','personally','There must be','one person','Men and horses','Ignorance','Who','this matter','In',
 'Ambush','Qishan','Multiplication','see suddenly','Laugh','Fancheng','Brother','Chopped-off head','Stand on','Xichuan','Herald','In the first place','Five hundred','A Biao','Stick to',
 'here','Between','surrender','Five thousand','Ambush','Changan','Three way','dispatch an envoy','General','Guan Xing','Military adviser','Imperial court','The three armed forces','king','See you later',
 'General general','Inevitable','Officers and men','It's night.','Path' }

merges = [ ('Liu Bei',('Xuan de','Xuan de Yue','Xuan de asked','xuande','Yuan de da','Xuan de Zi','Wen de Wen','Uncle Huang','Liu Huang Shu')),
 ('Guan Yu',('Guan Yu','Cloud length','Guan Yun Chang')),
 ('Kong Ming',('Zhu Geliang','Kongming said','Kong Ming Xiao','Kong Ming Zhi','Kong Ming Zi')),
 ('Cao Cao',('The prime minister','Meng de','Cao Gong','Cao meng de')),
 ('Zhang Fei',('Yide','Zhang Yi De'))
 ]

word_list = countWords(excludes, merges)
for i in range(30):
 word, count = word_list[i]
 print('{0:^10}{1:{3}^10}{2:^15}'.format(i+1, word, count, chr(12288))) # chr(12288) is a Chinese space```
//The results are as follows. Of course, words like general, hero, hero, eldest brother and gentleman can't judge who they refer to. Only statistics can judge them, so we can only make a relative reference here.

1 Liu Bei 1578
2 Cao 1485
3 Kongming 1485
4 Guan Yu 820
5 Zhang Fei 393
6 Lv Bu 300
7 Zhao yun278
8 Sun Quan 264
9 Sima Yi 221
10 Zhou Yu 217
11 yuan shao191
12 Ma Chao 185
13 Weiyan 180
14 Huang Zhong 168
15 Jiangwei 151
16 Ma Dai 127
17 pound 122
18 Meng Huo 122
19 Liu Biao 120
20 summer Houdun 116
21 Dong Zhuo 114
22 sun CE 108
23 Lu Su 107
24 Xu Huang 97
25 simazhao 89
26 XiaHouYuan 88
27 Wang Ping 88
28 Liu Zhang 85
29 Yuan Shu 84
30 Lumeng 83


The above is a small case in Python language, for your reference only.

Posted by erth on Tue, 03 Dec 2019 03:31:59 -0800