This article mainly introduces the generation of tag cloud code analysis based on Python 3. The example code is introduced in detail in this article, which has a certain reference learning value for everyone's study or work. You can refer to the following for friends who need it
Tag cloud is the most popular way to present big data. In python3, it can also achieve the effect of tag cloud. The map is as follows:
First, install the following libraries:
#!/usr/bin/python3.4 # -*- coding: utf-8 -*- # http://www.lfd.uci.edu/~gohlke/pythonlibs/#cx_freeze # Download pygame from universal warehouse # pip3 download simplejson
There are also the most important libraries:
pip3 install pytagcloud
Or go to the official website to download: https://pypi.python.org/pypi/pytagcloud/
After installation, use the example on the official website to do the following:
from pytagcloud import create_tag_image, make_tags from pytagcloud.lang.counter import get_tag_counts YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\ used to depict keyword metadata on websites, or to visualize free form text." tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120) create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')
Decisive error reporting:
Traceback (most recent call last): File "D:/code/pythonwork/Text.py", line 96, in <module> tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120) File "C:\Python34\lib\site-packages\pytagcloud\lang\counter.py", line 25, in get_tag_counts return sorted(counted.iteritems(), key=itemgetter(1), reverse=True) AttributeError: 'dict' object has no attribute 'iteritems'
See the problems found in the Library:
# counter.py return sorted(counted.iteritems(), key=itemgetter(1), reverse=True)
It turns out that python3.4 does not support writing:
In Python 2. X, items() is used to return a copy of the list of all items (key / value pairs) in D), which takes up extra memory.
iteritems() is used to return the iterations [Returns an iterator on all items(key/value pairs) in D] after the operation of its own dictionary list. It does not occupy extra memory.
In Python 3.x, the two methods of iteritems() and viewitems() have been abolished, and the result of items() is the same as that of viewitems() in 2.x. Replacing iteritems () with items () in 3.x can be used for loop traversal.
But when I change to:
# counter.py return sorted(counted.items(), key=itemgetter(1), reverse=True)
I found that there was no error in the operation, but no tag cloud was generated. After printing it over and over again, I finally found the problem:
from pytagcloud import create_tag_image
This is to generate a tuple:
# counts =[('cloud', 3), # ('words', 2), # ('code', 1), # ('word', 1), # ('appear', 1)]
But the items() in Python 3 can't achieve this effect, so I'll write it myself.
Read the txt file, and divide each line into array elements according to the space:
arr = [] file = open('../tagcloud/tag_file.txt', 'r') data = file.read().split('\r\n') for content in data: contents = validatecontent(content).split() for word in contents: arr.append(word)
['BAISC', 'Python', 'BASICA', 'GVBASIC', 'GWBASIC', 'Python', 'ETBASIC', 'QBASIC', 'Quick', 'Basic', 'Turbo', 'Basic', 'True', 'Python', 'java', 'Basic', 'Visual', 'Basic', 'Visual', 'Basic', 'Net', 'Power', 'Basic', 'Python', 'java', 'SQL', 'VB', 'Small', 'Basic', 'Free', 'Basic', 'DarkBASIC', 'VBScript', 'Visual', 'Basic', 'For', 'ApplicationsVBA', 'REALbasic', 'C', 'C', 'Turbo', 'C', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Borland', 'C', 'C', 'Builder', 'CCLI', 'Python', 'java', 'ObjectiveC', 'C#', 'Microsoft', 'Visual', 'C', 'Pascal', 'Delphi', 'Turbo', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Pascal', 'Object', 'Pascal', 'Free', 'Pascal', 'Lazarus', 'FORTRAN', 'MATLAB', 'Scilab', 'GNU', 'Octave', 'R', 'SPlus', 'Mathematica', 'Maple', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Julia', 'xBaseClipper', 'Visual', 'FoxPro', 'SQLPLSQL', 'TSQL', 'SQLPSM', 'LINQ', 'Xquer', 'Lua', 'Python', 'java', 'SQL', 'VB', 'Perl', 'PHP', 'Python', 'Ruby', 'ASP', 'JSP', 'TclTk', 'VBScript', 'AppleScript', 'AAuto', 'ActionScript', 'DMDScript', 'ECMAScript', 'JavaScript', 'JScript', 'TypeScript', 'sh', 'bash', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'sed', 'awk', 'PowerShell', 'csh', 'tcsh', 'ksh', 'zsh', 'XMLSVG', 'XML', 'Schema', 'Python', 'java', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'SGML', 'HTML', 'Python', 'java', 'SQL', 'VB', 'Curl', 'SVG', 'XML', 'Schema', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'Java', 'Jython', 'JRuby', 'JScheme', 'Groovy', 'Kawa', 'Scala', 'Clojure', 'ALGOL', 'APLJ', 'Ada', 'Falcon', 'Forth', 'Io', 'MUMPS', 'PLI', 'PostScript', 'REXX', 'SAC', 'Self', 'Simula', 'Swift', 'IronPython', 'IronRuby', 'COBOL', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML']
Where validatecontent is the function of the illegal character at first:
# Remove illegal characters from content (Windows) def validatecontent(content): # '/\:*?"<>|' rstr = r"[\/\\\:\*\?\"\<\>\|\.\*\+\-\(\)\"\'\(\)\!\?\"\"\,\. \;\: \{\}\{\}\=\%\*\~\·]" new_content = re.sub(rstr, "", content) return new_content
Count each element:
from collections import Counter counts = Counter(arr).items() print(counts)
The effect is:
dict_items([('For', 1), ('SQL', 8), ('JRuby', 1), ('Builder', 1), ('HTML', 6), ('LINQ', 1), ('BAISC', 1), ('BASICA', 1), ('PHP', 6), ('Octave', 1), ('csh', 1), ('PostScript', 1), ('awk', 1), ('Ruby', 1), ('AppleScript', 1), ('Object', 1), ('java', 11), ('TclTk', 1), ('Xquer', 1), ('ksh', 1), ('zsh', 1), ('ETBASIC', 1), ('AAuto', 1), ('Borland', 1), ('SVG', 1), ('Jython', 1), ('Simula', 1), ('IronPython', 1), ('Python', 14), ('Microsoft', 1), ('ActionScript', 1), ('XHTML', 2), ('REXX', 1), ('COBOL', 1), ('Scilab', 1), ('Ada', 1), ('Basic', 9), ('GVBASIC', 1), ('ECMAScript', 1), ('TypeScript', 1), ('Falcon', 1), ('Clojure', 1), ('ASP', 1), ('ALGOL', 1), ('XMLSVG', 1), ('GWBASIC', 1), ('VBScript', 2), ('CCLI', 1), ('Lazarus', 1), ('Julia', 1), ('JSP', 1), ('PowerShell', 1), ('IronRuby', 1), ('Power', 1), ('FORTRAN', 1), ('Self', 1), ('Perl', 1), ('Small', 1), ('FoxPro', 1), ('REALbasic', 1), ('GNU', 1), ('Mathematica', 1), ('True', 1), ('Visual', 5), ('JScheme', 1), ('Maple', 1), ('Quick', 1), ('Turbo', 3), ('SAC', 1), ('JScript', 1), ('APLJ', 1), ('sh', 1), ('Kawa', 1), ('Pascal', 4), ('TSQL', 1), ('SPlus', 1), ('C', 6), ('xBaseClipper', 1), ('tcsh', 1), ('SQLPSM', 1), ('ApplicationsVBA', 1), ('SSML', 2), ('R', 1), ('Groovy', 1), ('XSLT', 2), ('MUMPS', 1), ('bash', 1), ('DarkBASIC', 1), ('SGML', 1), ('XAML', 2), ('VB', 8), ('Curl', 1), ('Schema', 2), ('MATLAB', 1), ('MathML', 2), ('Lua', 1), ('Net', 1), ('ObjectiveC', 1), ('JavaScript', 1), ('Java', 1), ('Io', 1), ('Free', 2), ('Delphi', 1), ('sed', 1), ('XML', 2), ('Forth', 1), ('C#', 1), ('SQLPLSQL', 1), ('QBASIC', 1), ('DMDScript', 1), ('Swift', 1), ('Scala', 1), ('PLI', 1)])
Finally, it can be directly substituted:
tags = make_tags(counts, maxsize=120) create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')
Specific corrections need to be pondered slowly, such as text size, image size, background color, etc.
The tag cloud is finished here, but it does not support Chinese, because there is no suitable TTF font file. Prepare a TTF Chinese font, such as Microsoft yahei.ttf, and move it to
# C:\Python34\Lib\site-packages\pytagcloud\fonts
Next, change the fonts.json file and add something similar to css according to the style:
{ "name": "MicrosoftYaHei", "ttf": "MicrosoftYaHei.ttf", "web": "none" }
Notice the comma before and after. Finally, change the code here:
create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='MicrosoftYaHei')
Run, get it! Chinese rendering:
Recommend our Python learning base to see how the seniors learn! From basic Python script, crawler, django, data mining and other programming technologies, as well as sorting out zero basic data to project actual combat data, to every little partner who loves to learn Python! Every day, seniors regularly explain Python technology, share some learning methods and small details that need attention, and click to join our python learners' gathering place