Mysql review notes - Fundamentals 7 [overview of database character set]

Keywords: Database MySQL

Character set, basic concepts of character order and their relations

MySQL provides different character sets for data storage. In the data comparison operation, different character order support is provided.

MySQL provides different levels of settings, including server level, database level, table level and column level, which can provide very accurate settings.

What is character set and character order? In short:

  1. character set: defines characters and their encoding.
  2. Character collation: defines the comparison rules of characters.

Example:

There are four characters: A,B,a,b,The codes of these four characters are A = 0, B = 1, a = 2, b = 3. Characters here + The encoding constitutes the character set( character set). 

What if we want to compare the size of two characters? For example A,B,perhaps a,b,The most intuitive way to compare them is to use their coding, for example, because 0 < 1,therefore A < B. 

In addition, for A,a,Although they are encoded differently, we think the upper and lower case characters should be equal, that is to say A == a. 

Two comparison rules are defined above. The collection of these comparison rules is collation. 

	If it is also uppercase and lowercase characters, compare their encoding size;

	If two characters are case sensitive, they are equal.

Character set and character order supported by MySQL

MySQL supports multiple character sets and character sequences.

  • A character set corresponds to at least one character order (generally 1 to many).

  • Two different character sets cannot have the same character order.

  • Each character set has a default character order.

View supported character sets

You can view the character sets supported by MYSQL in the following ways.

show character set ;

```mysql
armscii8	ARMSCII-8 Armenian	armscii8_general_ci	1
ascii	US ASCII	ascii_general_ci	1
big5	Big5 Traditional Chinese	big5_chinese_ci	2
binary	Binary pseudo charset	binary	1
cp1250	Windows Central European	cp1250_general_ci	1
cp1251	Windows Cyrillic	cp1251_general_ci	1
cp1256	Windows Arabic	cp1256_general_ci	1
cp1257	Windows Baltic	cp1257_general_ci	1
cp850	DOS West European	cp850_general_ci	1
cp852	DOS Central European	cp852_general_ci	1
cp866	DOS Russian	cp866_general_ci	1
cp932	SJIS for Windows Japanese	cp932_japanese_ci	2
dec8	DEC West European	dec8_swedish_ci	1
eucjpms	UJIS for Windows Japanese	eucjpms_japanese_ci	3
euckr	EUC-KR Korean	euckr_korean_ci	2
gb18030	China National Standard GB18030	gb18030_chinese_ci	4
gb2312	GB2312 Simplified Chinese	gb2312_chinese_ci	2
gbk	GBK Simplified Chinese	gbk_chinese_ci	2
geostd8	GEOSTD8 Georgian	geostd8_general_ci	1
greek	ISO 8859-7 Greek	greek_general_ci	1
hebrew	ISO 8859-8 Hebrew	hebrew_general_ci	1
hp8	HP West European	hp8_english_ci	1
keybcs2	DOS Kamenicky Czech-Slovak	keybcs2_general_ci	1
koi8r	KOI8-R Relcom Russian	koi8r_general_ci	1
koi8u	KOI8-U Ukrainian	koi8u_general_ci	1
latin1	cp1252 West European	latin1_swedish_ci	1
latin2	ISO 8859-2 Central European	latin2_general_ci	1
latin5	ISO 8859-9 Turkish	latin5_turkish_ci	1
latin7	ISO 8859-13 Baltic	latin7_general_ci	1
macce	Mac Central European	macce_general_ci	1
macroman	Mac West European	macroman_general_ci	1
sjis	Shift-JIS Japanese	sjis_japanese_ci	2
swe7	7bit Swedish	swe7_swedish_ci	1
tis620	TIS620 Thai	tis620_thai_ci	1
ucs2	UCS-2 Unicode	ucs2_general_ci	2
ujis	EUC-JP Japanese	ujis_japanese_ci	3
utf16	UTF-16 Unicode	utf16_general_ci	4
utf16le	UTF-16LE Unicode	utf16le_general_ci	4
utf32	UTF-32 Unicode	utf32_general_ci	4
utf8	UTF-8 Unicode	utf8_general_ci	3
utf8mb4	UTF-8 Unicode	utf8mb4_0900_ai_ci	4

You can also view the character sets supported by MYSQL in the following ways.

use information_schema;
select * from CHARACTER_SETS;

```mysql
big5	big5_chinese_ci	Big5 Traditional Chinese	2
dec8	dec8_swedish_ci	DEC West European	1
cp850	cp850_general_ci	DOS West European	1
hp8	hp8_english_ci	HP West European	1
koi8r	koi8r_general_ci	KOI8-R Relcom Russian	1
latin1	latin1_swedish_ci	cp1252 West European	1
latin2	latin2_general_ci	ISO 8859-2 Central European	1
swe7	swe7_swedish_ci	7bit Swedish	1
ascii	ascii_general_ci	US ASCII	1
ujis	ujis_japanese_ci	EUC-JP Japanese	3
sjis	sjis_japanese_ci	Shift-JIS Japanese	2
hebrew	hebrew_general_ci	ISO 8859-8 Hebrew	1
tis620	tis620_thai_ci	TIS620 Thai	1
euckr	euckr_korean_ci	EUC-KR Korean	2
koi8u	koi8u_general_ci	KOI8-U Ukrainian	1
gb2312	gb2312_chinese_ci	GB2312 Simplified Chinese	2
greek	greek_general_ci	ISO 8859-7 Greek	1
cp1250	cp1250_general_ci	Windows Central European	1
gbk	gbk_chinese_ci	GBK Simplified Chinese	2
latin5	latin5_turkish_ci	ISO 8859-9 Turkish	1
armscii8	armscii8_general_ci	ARMSCII-8 Armenian	1
utf8	utf8_general_ci	UTF-8 Unicode	3
ucs2	ucs2_general_ci	UCS-2 Unicode	2
cp866	cp866_general_ci	DOS Russian	1
keybcs2	keybcs2_general_ci	DOS Kamenicky Czech-Slovak	1
macce	macce_general_ci	Mac Central European	1
macroman	macroman_general_ci	Mac West European	1
cp852	cp852_general_ci	DOS Central European	1
latin7	latin7_general_ci	ISO 8859-13 Baltic	1
cp1251	cp1251_general_ci	Windows Cyrillic	1
utf16	utf16_general_ci	UTF-16 Unicode	4
utf16le	utf16le_general_ci	UTF-16LE Unicode	4
cp1256	cp1256_general_ci	Windows Arabic	1
cp1257	cp1257_general_ci	Windows Baltic	1
utf32	utf32_general_ci	UTF-32 Unicode	4
binary	binary	Binary pseudo charset	1
geostd8	geostd8_general_ci	GEOSTD8 Georgian	1
cp932	cp932_japanese_ci	SJIS for Windows Japanese	2
eucjpms	eucjpms_japanese_ci	UJIS for Windows Japanese	3
gb18030	gb18030_chinese_ci	China National Standard GB18030	4
utf8mb4	utf8mb4_0900_ai_ci	UTF-8 Unicode	4

be careful

When using SHOW CHARACTER SET to view, you can also add WHERE or LIKE qualifications.

Example 1: use the WHERE qualification.

mysql> SHOW CHARACTER SET WHERE Charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)

Example 2: use the LIKE qualification.

mysql> SHOW CHARACTER SET LIKE "utf8%";
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci    |      3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.00 sec)

View supported character order

Similarly, you can view the character order supported by MYSQL in the following ways.

Method 1: view through show collection.

You can see that the utf8 character set has more than 10 character orders. Judge whether the Default character order is determined by whether the Default value is Yes.

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |  83 |         | Yes      |       1 |
...slightly

Method 2: query information_schema.collections.

mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME           | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci          | utf8               |  33 | Yes        | Yes         |       1 |
| utf8_bin                 | utf8               |  83 |            | Yes         |       1 |
| utf8_unicode_ci          | utf8               | 192 |            | Yes         |       8 |

Naming convention for character order

The name of character order is prefixed with its corresponding character set, as shown below. For example, character order utf8_general_ci indicates that it is the character order of character set utf8.

More rules can be referred to Official documents.

MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; 
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
| utf8               | utf8_bin        |
+--------------------+-----------------+
2 rows in set (0.00 sec)

Posted by lotrfan on Tue, 23 Nov 2021 01:48:27 -0800