Character set, basic concepts of character order and their relations
MySQL provides different character sets for data storage. In the data comparison operation, different character order support is provided.
MySQL provides different levels of settings, including server level, database level, table level and column level, which can provide very accurate settings.
What is character set and character order? In short:
- character set: defines characters and their encoding.
- Character collation: defines the comparison rules of characters.
Example:
There are four characters: A,B,a,b,The codes of these four characters are A = 0, B = 1, a = 2, b = 3. Characters here + The encoding constitutes the character set( character set). What if we want to compare the size of two characters? For example A,B,perhaps a,b,The most intuitive way to compare them is to use their coding, for example, because 0 < 1,therefore A < B. In addition, for A,a,Although they are encoded differently, we think the upper and lower case characters should be equal, that is to say A == a. Two comparison rules are defined above. The collection of these comparison rules is collation. If it is also uppercase and lowercase characters, compare their encoding size; If two characters are case sensitive, they are equal.
Character set and character order supported by MySQL
MySQL supports multiple character sets and character sequences.
-
A character set corresponds to at least one character order (generally 1 to many).
-
Two different character sets cannot have the same character order.
-
Each character set has a default character order.
View supported character sets
You can view the character sets supported by MYSQL in the following ways.
show character set ;
```mysql armscii8 ARMSCII-8 Armenian armscii8_general_ci 1 ascii US ASCII ascii_general_ci 1 big5 Big5 Traditional Chinese big5_chinese_ci 2 binary Binary pseudo charset binary 1 cp1250 Windows Central European cp1250_general_ci 1 cp1251 Windows Cyrillic cp1251_general_ci 1 cp1256 Windows Arabic cp1256_general_ci 1 cp1257 Windows Baltic cp1257_general_ci 1 cp850 DOS West European cp850_general_ci 1 cp852 DOS Central European cp852_general_ci 1 cp866 DOS Russian cp866_general_ci 1 cp932 SJIS for Windows Japanese cp932_japanese_ci 2 dec8 DEC West European dec8_swedish_ci 1 eucjpms UJIS for Windows Japanese eucjpms_japanese_ci 3 euckr EUC-KR Korean euckr_korean_ci 2 gb18030 China National Standard GB18030 gb18030_chinese_ci 4 gb2312 GB2312 Simplified Chinese gb2312_chinese_ci 2 gbk GBK Simplified Chinese gbk_chinese_ci 2 geostd8 GEOSTD8 Georgian geostd8_general_ci 1 greek ISO 8859-7 Greek greek_general_ci 1 hebrew ISO 8859-8 Hebrew hebrew_general_ci 1 hp8 HP West European hp8_english_ci 1 keybcs2 DOS Kamenicky Czech-Slovak keybcs2_general_ci 1 koi8r KOI8-R Relcom Russian koi8r_general_ci 1 koi8u KOI8-U Ukrainian koi8u_general_ci 1 latin1 cp1252 West European latin1_swedish_ci 1 latin2 ISO 8859-2 Central European latin2_general_ci 1 latin5 ISO 8859-9 Turkish latin5_turkish_ci 1 latin7 ISO 8859-13 Baltic latin7_general_ci 1 macce Mac Central European macce_general_ci 1 macroman Mac West European macroman_general_ci 1 sjis Shift-JIS Japanese sjis_japanese_ci 2 swe7 7bit Swedish swe7_swedish_ci 1 tis620 TIS620 Thai tis620_thai_ci 1 ucs2 UCS-2 Unicode ucs2_general_ci 2 ujis EUC-JP Japanese ujis_japanese_ci 3 utf16 UTF-16 Unicode utf16_general_ci 4 utf16le UTF-16LE Unicode utf16le_general_ci 4 utf32 UTF-32 Unicode utf32_general_ci 4 utf8 UTF-8 Unicode utf8_general_ci 3 utf8mb4 UTF-8 Unicode utf8mb4_0900_ai_ci 4
You can also view the character sets supported by MYSQL in the following ways.
use information_schema; select * from CHARACTER_SETS;
```mysql big5 big5_chinese_ci Big5 Traditional Chinese 2 dec8 dec8_swedish_ci DEC West European 1 cp850 cp850_general_ci DOS West European 1 hp8 hp8_english_ci HP West European 1 koi8r koi8r_general_ci KOI8-R Relcom Russian 1 latin1 latin1_swedish_ci cp1252 West European 1 latin2 latin2_general_ci ISO 8859-2 Central European 1 swe7 swe7_swedish_ci 7bit Swedish 1 ascii ascii_general_ci US ASCII 1 ujis ujis_japanese_ci EUC-JP Japanese 3 sjis sjis_japanese_ci Shift-JIS Japanese 2 hebrew hebrew_general_ci ISO 8859-8 Hebrew 1 tis620 tis620_thai_ci TIS620 Thai 1 euckr euckr_korean_ci EUC-KR Korean 2 koi8u koi8u_general_ci KOI8-U Ukrainian 1 gb2312 gb2312_chinese_ci GB2312 Simplified Chinese 2 greek greek_general_ci ISO 8859-7 Greek 1 cp1250 cp1250_general_ci Windows Central European 1 gbk gbk_chinese_ci GBK Simplified Chinese 2 latin5 latin5_turkish_ci ISO 8859-9 Turkish 1 armscii8 armscii8_general_ci ARMSCII-8 Armenian 1 utf8 utf8_general_ci UTF-8 Unicode 3 ucs2 ucs2_general_ci UCS-2 Unicode 2 cp866 cp866_general_ci DOS Russian 1 keybcs2 keybcs2_general_ci DOS Kamenicky Czech-Slovak 1 macce macce_general_ci Mac Central European 1 macroman macroman_general_ci Mac West European 1 cp852 cp852_general_ci DOS Central European 1 latin7 latin7_general_ci ISO 8859-13 Baltic 1 cp1251 cp1251_general_ci Windows Cyrillic 1 utf16 utf16_general_ci UTF-16 Unicode 4 utf16le utf16le_general_ci UTF-16LE Unicode 4 cp1256 cp1256_general_ci Windows Arabic 1 cp1257 cp1257_general_ci Windows Baltic 1 utf32 utf32_general_ci UTF-32 Unicode 4 binary binary Binary pseudo charset 1 geostd8 geostd8_general_ci GEOSTD8 Georgian 1 cp932 cp932_japanese_ci SJIS for Windows Japanese 2 eucjpms eucjpms_japanese_ci UJIS for Windows Japanese 3 gb18030 gb18030_chinese_ci China National Standard GB18030 4 utf8mb4 utf8mb4_0900_ai_ci UTF-8 Unicode 4
be careful
When using SHOW CHARACTER SET to view, you can also add WHERE or LIKE qualifications.
Example 1: use the WHERE qualification.
mysql> SHOW CHARACTER SET WHERE Charset="utf8"; +---------+---------------+-------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+---------------+-------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | +---------+---------------+-------------------+--------+ 1 row in set (0.00 sec)
Example 2: use the LIKE qualification.
mysql> SHOW CHARACTER SET LIKE "utf8%"; +---------+---------------+--------------------+--------+ | Charset | Description | Default collation | Maxlen | +---------+---------------+--------------------+--------+ | utf8 | UTF-8 Unicode | utf8_general_ci | 3 | | utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 | +---------+---------------+--------------------+--------+ 2 rows in set (0.00 sec)
View supported character order
Similarly, you can view the character order supported by MYSQL in the following ways.
Method 1: view through show collection.
You can see that the utf8 character set has more than 10 character orders. Judge whether the Default character order is determined by whether the Default value is Yes.
mysql> SHOW COLLATION WHERE Charset = 'utf8'; +--------------------------+---------+-----+---------+----------+---------+ | Collation | Charset | Id | Default | Compiled | Sortlen | +--------------------------+---------+-----+---------+----------+---------+ | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | ...slightly
Method 2: query information_schema.collections.
mysql> USE information_schema; mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8"; +--------------------------+--------------------+-----+------------+-------------+---------+ | COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN | +--------------------------+--------------------+-----+------------+-------------+---------+ | utf8_general_ci | utf8 | 33 | Yes | Yes | 1 | | utf8_bin | utf8 | 83 | | Yes | 1 | | utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
Naming convention for character order
The name of character order is prefixed with its corresponding character set, as shown below. For example, character order utf8_general_ci indicates that it is the character order of character set utf8.
More rules can be referred to Official documents.
MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; +--------------------+-----------------+ | CHARACTER_SET_NAME | COLLATION_NAME | +--------------------+-----------------+ | utf8 | utf8_general_ci | | utf8 | utf8_bin | +--------------------+-----------------+ 2 rows in set (0.00 sec)