Baby: Mom, how do you read these "Chinese characters"? Mom: I'll teach you in Python!

Fun pypinyin Library

Today, I found a fun library called "pypinyin" to help us realize the conversion of Chinese characters to Pinyin.

Here I will first provide you with a Chinese website, you can go on to further study. https://pypi.org/project/pypinyin/

The pypinyin library has the following features:

  • Intelligently match the most correct Pinyin;
  • Support polyphonic and traditional characters;
  • Support a variety of different Pinyin and phonetic styles;

The library is a third-party Python library, so it needs to be installed in advance before use.

pip install pypinyin

Then, import the library.

import pypinyin
from pypinyin import pinyin

Use of pypinyin Library

Let's start with a basic example.

from pypinyin import pinyin
pinyin("having dinner")

The results are as follows:

Some words may be polyphonic. Look at the following example.

from pypinyin import pinyin
pinyin('Feng',heteronym=True)
pinyin('towards',heteronym=True)
pinyin('with',heteronym=True)

The results are as follows:

Carefully observe the above print results. The generated two-dimensional lists are nested. It's really hard to parse them like this!

Can I generate a one-dimensional list?

from pypinyin import lazy_pinyin
lazy_pinyin("The beauty of data analysis and statistics")

The results are as follows:

Here comes the problem!

Although this is a one-dimensional list, there is no tone. Isn't it embarrassing?

This involves a problem of style transformation.

from pypinyin import lazy_pinyin,Style
lazy_pinyin("The beauty of data analysis and statistics",style=Style.TONE)

The results are as follows:

It turns out that there is a Style class for us to choose styles. The following 14 styles are commonly used.

#: normal style, without tone. For example, China - > ` ` China Guo``
NORMAL  =  0

#: standard tone style. Pinyin tone is on the first letter of vowel (default style). For example: China - > ` ` zh ō ng guó``
TONE  =  1

#: tone style 2, that is, the phonetic tone is represented by numbers [1-4] after each vowel. For example: China - > ` ` zho1ng guo2``
TONE2  =  2

#: tone style 3, that is, Pinyin tone is represented by numbers [1-4] after each Pinyin. For example: China - > ` ` Zhong1 guo2``
TONE3  =  8

#: Initials style, only the initials of each Pinyin are returned (Note: some pinyin have no initials, see `#27`_).  For example: China - > ` ` zh G``
INITIALS  =  3

#: initial style. Only the initial part of Pinyin is returned. For example: China - > ` ` Z G``
FIRST_LETTER  =  4

#: vowel style. Only the vowel part of each pinyin is returned without tone. For example: China - > ` ` ong uo``
FINALS  =  5

#: Standard vowel style, with tone, and the tone is on the first letter of the vowel. For example: China - >`` ō ng uó``
FINALS_TONE  =  6

#: vowel style 2, with tone, which is represented by numbers [1-4] after each vowel. For example: China - > ` ` o1ng UO2``
FINALS_TONE2  =  7

#: vowel style 3, with tone, which is represented by numbers [1-4] after each Pinyin. For example: China - > ` ` ong1 UO2``
FINALS_TONE3  =  9

#: phonetic style, with tone, Yin Ping (first tone) is not marked. For example: China - > ` ` ㄓㄨㄥㄍㄨㄛ ˊ``
BOPOMOFO  =  10

#: phonetic style, initials only. For example: China - > ` ` ㄍ``
BOPOMOFO_FIRST  =  11

#: the Chinese phonetic alphabet is compared with the Russian alphabet. The tone is represented by numbers [1-4] after each phonetic alphabet. For example: China - >`` чжун one го 2``
CYRILLIC  =  12

#: Chinese pinyin and Russian alphabet contrast style, only the first letter. For example: China - >`` ч г``
CYRILLIC_FIRST  =  13

If there are other symbols besides Chinese characters and English in your text, what effect will it print?

from pypinyin import lazy_pinyin
lazy_pinyin('Hello,I am😀',style=Style.TONE)

The results are as follows:

Chinese characters can be printed. English and emoticons don't need to be printed. What should we do?

Here is an error parameter. Let's take a look at the example.

from pypinyin import lazy_pinyin

lazy_pinyin('Hello,I am😀',style=Style.TONE,errors='ignore')

The results are as follows:

If you are not satisfied with the returned results of the pypinyin library, you can customize a library! You can use load here_ phrases_ Dict method.

Let's take a look at an example:

from pypinyin import lazy_pinyin

lazy_pinyin("Classmate Huang",style=Style.TONE)

The results are as follows:

The same as "classmate Huang", it is clearly two tones, and here it becomes four tones.

from pypinyin import lazy_pinyin,  load_phrases_dict

personalized_dict = {'Classmate Huang':  [['huáng'], ['tòng'], ['xué']]}

load_phrases_dict(personalized_dict)

lazy_pinyin("Classmate Huang",style=Style.TONE)

The results are as follows:

That's all for today's article.

Posted by ugriffin on Wed, 24 Nov 2021 21:34:17 -0800