Analysis of generating tag cloud code based on Python 3

Keywords: Python Java SQL PHP

This article mainly introduces the generation of tag cloud code analysis based on Python 3. The example code is introduced in detail in this article, which has a certain reference learning value for everyone's study or work. You can refer to the following for friends who need it

Tag cloud is the most popular way to present big data. In python3, it can also achieve the effect of tag cloud. The map is as follows:

First, install the following libraries:

#!/usr/bin/python3.4
# -*- coding: utf-8 -*-
# http://www.lfd.uci.edu/~gohlke/pythonlibs/#cx_freeze
# Download pygame from universal warehouse
# pip3 download simplejson

There are also the most important libraries:

pip3 install pytagcloud

Or go to the official website to download: https://pypi.python.org/pypi/pytagcloud/
After installation, use the example on the official website to do the following:

from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts
 
YOUR_TEXT = "A tag cloud is a visual representation for text data, typically\
used to depict keyword metadata on websites, or to visualize free form text."
 
tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120)
 
create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

Decisive error reporting:

Traceback (most recent call last):
 File "D:/code/pythonwork/Text.py", line 96, in <module>
  tags = make_tags(get_tag_counts(YOUR_TEXT), maxsize=120)
 File "C:\Python34\lib\site-packages\pytagcloud\lang\counter.py", line 25, in get_tag_counts
  return sorted(counted.iteritems(), key=itemgetter(1), reverse=True)
AttributeError: 'dict' object has no attribute 'iteritems'

See the problems found in the Library:

# counter.py
return sorted(counted.iteritems(), key=itemgetter(1), reverse=True)

It turns out that python3.4 does not support writing:

In Python 2. X, items() is used to return a copy of the list of all items (key / value pairs) in D), which takes up extra memory.

iteritems() is used to return the iterations [Returns an iterator on all items(key/value pairs) in D] after the operation of its own dictionary list. It does not occupy extra memory.

In Python 3.x, the two methods of iteritems() and viewitems() have been abolished, and the result of items() is the same as that of viewitems() in 2.x. Replacing iteritems () with items () in 3.x can be used for loop traversal.

But when I change to:

# counter.py
return sorted(counted.items(), key=itemgetter(1), reverse=True)

I found that there was no error in the operation, but no tag cloud was generated. After printing it over and over again, I finally found the problem:

from pytagcloud import create_tag_image

This is to generate a tuple:

# counts =[('cloud', 3),
# ('words', 2),
# ('code', 1),
# ('word', 1),
# ('appear', 1)]

But the items() in Python 3 can't achieve this effect, so I'll write it myself.

Read the txt file, and divide each line into array elements according to the space:

arr = []
 file = open('../tagcloud/tag_file.txt', 'r')
 data = file.read().split('\r\n')
 for content in data:
  contents = validatecontent(content).split()
  for word in contents:
    arr.append(word)
['BAISC', 'Python', 'BASICA', 'GVBASIC', 'GWBASIC', 'Python', 'ETBASIC', 'QBASIC', 'Quick', 'Basic', 'Turbo', 'Basic', 'True', 'Python', 'java', 'Basic', 'Visual', 'Basic', 'Visual', 'Basic', 'Net', 'Power', 'Basic', 'Python', 'java', 'SQL', 'VB', 'Small', 'Basic', 'Free', 'Basic', 'DarkBASIC', 'VBScript', 'Visual', 'Basic', 'For', 'ApplicationsVBA', 'REALbasic', 'C', 'C', 'Turbo', 'C', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Borland', 'C', 'C', 'Builder', 'CCLI', 'Python', 'java', 'ObjectiveC', 'C#', 'Microsoft', 'Visual', 'C', 'Pascal', 'Delphi', 'Turbo', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Pascal', 'Object', 'Pascal', 'Free', 'Pascal', 'Lazarus', 'FORTRAN', 'MATLAB', 'Scilab', 'GNU', 'Octave', 'R', 'SPlus', 'Mathematica', 'Maple', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'Julia', 'xBaseClipper', 'Visual', 'FoxPro', 'SQLPLSQL', 'TSQL', 'SQLPSM', 'LINQ', 'Xquer', 'Lua', 'Python', 'java', 'SQL', 'VB', 'Perl', 'PHP', 'Python', 'Ruby', 'ASP', 'JSP', 'TclTk', 'VBScript', 'AppleScript', 'AAuto', 'ActionScript', 'DMDScript', 'ECMAScript', 'JavaScript', 'JScript', 'TypeScript', 'sh', 'bash', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML', 'sed', 'awk', 'PowerShell', 'csh', 'tcsh', 'ksh', 'zsh', 'XMLSVG', 'XML', 'Schema', 'Python', 'java', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'SGML', 'HTML', 'Python', 'java', 'SQL', 'VB', 'Curl', 'SVG', 'XML', 'Schema', 'XSLT', 'XHTML', 'MathML', 'XAML', 'SSML', 'Java', 'Jython', 'JRuby', 'JScheme', 'Groovy', 'Kawa', 'Scala', 'Clojure', 'ALGOL', 'APLJ', 'Ada', 'Falcon', 'Forth', 'Io', 'MUMPS', 'PLI', 'PostScript', 'REXX', 'SAC', 'Self', 'Simula', 'Swift', 'IronPython', 'IronRuby', 'COBOL', 'Python', 'java', 'SQL', 'VB', 'PHP', 'HTML']

Where validatecontent is the function of the illegal character at first:

# Remove illegal characters from content (Windows)
def validatecontent(content):
  # '/\:*?"<>|'
  rstr = r"[\/\\\:\*\?\"\<\>\|\.\*\+\-\(\)\"\'\(\)\!\?\"\"\,\. \;\: \{\}\{\}\=\%\*\~\·]"
  new_content = re.sub(rstr, "", content)
  return new_content

Count each element:

from collections import Counter
counts = Counter(arr).items()
print(counts)

The effect is:

dict_items([('For', 1), ('SQL', 8), ('JRuby', 1), ('Builder', 1), ('HTML', 6), ('LINQ', 1), ('BAISC', 1), ('BASICA', 1), ('PHP', 6), ('Octave', 1), ('csh', 1), ('PostScript', 1), ('awk', 1), ('Ruby', 1), ('AppleScript', 1), ('Object', 1), ('java', 11), ('TclTk', 1), ('Xquer', 1), ('ksh', 1), ('zsh', 1), ('ETBASIC', 1), ('AAuto', 1), ('Borland', 1), ('SVG', 1), ('Jython', 1), ('Simula', 1), ('IronPython', 1), ('Python', 14), ('Microsoft', 1), ('ActionScript', 1), ('XHTML', 2), ('REXX', 1), ('COBOL', 1), ('Scilab', 1), ('Ada', 1), ('Basic', 9), ('GVBASIC', 1), ('ECMAScript', 1), ('TypeScript', 1), ('Falcon', 1), ('Clojure', 1), ('ASP', 1), ('ALGOL', 1), ('XMLSVG', 1), ('GWBASIC', 1), ('VBScript', 2), ('CCLI', 1), ('Lazarus', 1), ('Julia', 1), ('JSP', 1), ('PowerShell', 1), ('IronRuby', 1), ('Power', 1), ('FORTRAN', 1), ('Self', 1), ('Perl', 1), ('Small', 1), ('FoxPro', 1), ('REALbasic', 1), ('GNU', 1), ('Mathematica', 1), ('True', 1), ('Visual', 5), ('JScheme', 1), ('Maple', 1), ('Quick', 1), ('Turbo', 3), ('SAC', 1), ('JScript', 1), ('APLJ', 1), ('sh', 1), ('Kawa', 1), ('Pascal', 4), ('TSQL', 1), ('SPlus', 1), ('C', 6), ('xBaseClipper', 1), ('tcsh', 1), ('SQLPSM', 1), ('ApplicationsVBA', 1), ('SSML', 2), ('R', 1), ('Groovy', 1), ('XSLT', 2), ('MUMPS', 1), ('bash', 1), ('DarkBASIC', 1), ('SGML', 1), ('XAML', 2), ('VB', 8), ('Curl', 1), ('Schema', 2), ('MATLAB', 1), ('MathML', 2), ('Lua', 1), ('Net', 1), ('ObjectiveC', 1), ('JavaScript', 1), ('Java', 1), ('Io', 1), ('Free', 2), ('Delphi', 1), ('sed', 1), ('XML', 2), ('Forth', 1), ('C#', 1), ('SQLPLSQL', 1), ('QBASIC', 1), ('DMDScript', 1), ('Swift', 1), ('Scala', 1), ('PLI', 1)])

Finally, it can be directly substituted:

tags = make_tags(counts, maxsize=120)
create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='Lobster')

Specific corrections need to be pondered slowly, such as text size, image size, background color, etc.

The tag cloud is finished here, but it does not support Chinese, because there is no suitable TTF font file. Prepare a TTF Chinese font, such as Microsoft yahei.ttf, and move it to
# C:\Python34\Lib\site-packages\pytagcloud\fonts
Next, change the fonts.json file and add something similar to css according to the style:

{
    "name": "MicrosoftYaHei",
    "ttf": "MicrosoftYaHei.ttf",
    "web": "none"
  }

Notice the comma before and after. Finally, change the code here:

create_tag_image(tags, 'cloud_large.png', size=(900, 600), fontname='MicrosoftYaHei')

Run, get it! Chinese rendering:
Recommend our Python learning base to see how the seniors learn! From basic Python script, crawler, django, data mining and other programming technologies, as well as sorting out zero basic data to project actual combat data, to every little partner who loves to learn Python! Every day, seniors regularly explain Python technology, share some learning methods and small details that need attention, and click to join our python learners' gathering place

Published 18 original articles, won praise 2, visited 10000+
Private letter follow

Posted by jerbecca on Wed, 19 Feb 2020 08:19:56 -0800