MySQL Character Set Setting Details

Keywords: MySQL Database encoding MariaDB

1. Overview of Contents

During the use of MySQL, it is important to understand the concepts of character set, character order, and the impact of different settings on data storage and comparison.The problem of "random code" that many students encounter in their daily work is probably caused by the inadequate understanding of character set and character order and incorrect setting.

From shallow to deep, this paper introduces the following:

  1. Basic Concepts and Relations of Character Set and Character Order
  2. Character Set, Character Order Setting Level, Relations between Setting Levels Supported by MySQL
  3. Viewing and setting of server, database, table, column character set and character order
  4. When should the character set, character order be set

2. Concepts and Relations of Character Set and Character Order

MySQL provides different character set support for data storage.Different character order support is provided for data comparison.

MySQL provides different levels of settings, including server, database, table, and column levels, which can provide very precise settings.

What is character set, character order?Simply put:

  1. character set: Defines the character and its encoding.
  2. collation: Defines the rules for comparing characters.

For instance:

There are four characters: A, B, a, B. The four characters are coded as A = 0, B = 1, a = 2, b = 3.The character + encoding here constitutes a character set.

What if we want to compare two characters?For example, A, B, or a, b, the most intuitive way to compare is to use their encoding, such as A < B because 0 < 1.

In addition, for A and a, although they are coded differently, we feel that the upper and lower case characters should be equal, that is, A == a.

This defines two comparison rules, the collection of which is collation.

  1. Also uppercase and lowercase characters, compare their encoding size;
  2. If the two characters are case-sensitive, they are equal.

3. Character set and character order supported by MySQL

MySQL supports multiple character sets and character orders.

  1. A character set corresponds to at least one character order (typically 1-to-many).
  2. Two different character sets cannot have the same character order.
  3. Each character set has a default character order.

The above is abstract. Let's look at the next few sections to see what's going on.

1. View supported character sets

You can view the character sets supported by MYSQL in the following ways.

Mode 1:

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
...ellipsis

Mode 2:

mysql> use information_schema;
mysql> select * from CHARACTER_SETS;
+--------------------+----------------------+-----------------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION                 | MAXLEN |
+--------------------+----------------------+-----------------------------+--------+
| big5               | big5_chinese_ci      | Big5 Traditional Chinese    |      2 |
| dec8               | dec8_swedish_ci      | DEC West European           |      1 |
...ellipsis

WHERE or LIKE qualifications can also be added when viewing using SHOW CHARACTER SET.

Example 1: Use WHERE restrictions.

mysql> SHOW CHARACTER SET WHERE Charset="utf8";
+---------+---------------+-------------------+--------+
| Charset | Description   | Default collation | Maxlen |
+---------+---------------+-------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci   |      3 |
+---------+---------------+-------------------+--------+
1 row in set (0.00 sec)

Example 2: Use LIKE qualification.

mysql> SHOW CHARACTER SET LIKE "utf8%";
+---------+---------------+--------------------+--------+
| Charset | Description   | Default collation  | Maxlen |
+---------+---------------+--------------------+--------+
| utf8    | UTF-8 Unicode | utf8_general_ci    |      3 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci |      4 |
+---------+---------------+--------------------+--------+
2 rows in set (0.00 sec)

2. View supported character order

Similarly, you can view the character order supported by MYSQL as follows.

Mode 1: View through SHOW COLLATION.

You can see that the utf8 character set has more than 10 character orders.Determine whether the default character order is determined by whether the Default value is Yes.

mysql> SHOW COLLATION WHERE Charset = 'utf8';
+--------------------------+---------+-----+---------+----------+---------+
| Collation                | Charset | Id  | Default | Compiled | Sortlen |
+--------------------------+---------+-----+---------+----------+---------+
| utf8_general_ci          | utf8    |  33 | Yes     | Yes      |       1 |
| utf8_bin                 | utf8    |  83 |         | Yes      |       1 |
...slightly

Mode 2: Query information_schema.COLLATIONS.

mysql> USE information_schema;
mysql> SELECT * FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8";
+--------------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME           | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci          | utf8               |  33 | Yes        | Yes         |       1 |
| utf8_bin                 | utf8               |  83 |            | Yes         |       1 |
| utf8_unicode_ci          | utf8               | 192 |            | Yes         |       8 |

3. Naming Specifications for Character Order

The name of the character order, prefixed by its corresponding character set, as shown below.For example, the character order utf8_general_ci indicates that it is the character order of the character set utf8.

More rules to refer to Official Documents.

MariaDB [information_schema]> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM COLLATIONS WHERE CHARACTER_SET_NAME="utf8" limit 2; 
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
| utf8               | utf8_bin        |
+--------------------+-----------------+
2 rows in set (0.00 sec)

4. Character Set and Character Order of server

Purpose: When you create a database without specifying a character set, character order, server character set, server character order will be the default character set, sorting rules for the database.

How to specify: When the MySQL service starts, it can be specified with command line parameters.You can also specify it through a variable in the configuration file.

server default character set, character order: When compiling MySQL, specify by compilation parameters.

The character_set_server and collation_server correspond to the server character set and server character order, respectively.

1. View server character set, character order

These two system variables correspond to character_set_server and collation_server, respectively.

mysql> SHOW VARIABLES LIKE "character_set_server";
mysql> SHOW VARIABLES LIKE "collation_server";

2. Specify when starting a service

You can specify the server character set and character order when the MySQL service starts.If not specified, the default character order is latin1, latin1_swedish_ci

mysqld --character-set-server=latin1 \
       --collation-server=latin1_swedish_ci

Individually specify the server character set, in which case the server character order is latin1's default character order latin1_swedish_ci.

mysqld --character-set-server=latin1

3. Profile Specification

In addition to specifying it in command line parameters, it can also be specified in the configuration file, as shown below.

[client]
default-character-set=utf8

[mysql]
default-character-set=utf8

[mysqld]
collation-server = utf8_unicode_ci
init-connect='SET NAMES utf8'
character-set-server = utf8

4. Runtime modifications

Example: Runtime modifications (they will fail after restart and need to be written into the configuration file if you want them to remain the same after restart)

mysql> SET character_set_server = utf8 ;

5. Specify default character set, character order at compile time

Default values for character_set_server, collation_server can be specified at MySQL compilation time through compilation options:

cmake . -DDEFAULT_CHARSET=latin1 \
           -DDEFAULT_COLLATION=latin1_german1_ci

5. Character set and character order of database

Purpose: Specify the character set, character order at the database level.You can specify different character sets/character orders for databases under the same MySQL service.

1. Set the character set/character order of the data

You can specify the character set and collation of the database through CHARACTER SET and COLLATE when you create or modify the database.

Create a database:

CREATE DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

Modify the database:

ALTER DATABASE db_name
    [[DEFAULT] CHARACTER SET charset_name]
    [[DEFAULT] COLLATE collation_name]

Example: Create a database test_schema with the character set set set set to utf8, where the default collation rule is utf8_general_ci.

CREATE DATABASE `test_schema` DEFAULT CHARACTER SET utf8;

2. View the character set/character order of the database

There are three ways to view the character set/character order of a database.

Example 1: View the character set and collation of test_schema.(You need to switch the default database)

mysql> use test_schema;
Database changed
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8                     | utf8_general_ci      |
+--------------------------+----------------------+
1 row in set (0.00 sec)

Example 2: You can also view the character set and database of test_schema by following the command (you do not need to switch the default database)

mysql> SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME  FROM information_schema.SCHEMATA WHERE schema_name="test_schema";
+-------------+----------------------------+------------------------+
| SCHEMA_NAME | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME |
+-------------+----------------------------+------------------------+
| test_schema | utf8                       | utf8_general_ci        |
+-------------+----------------------------+------------------------+
1 row in set (0.00 sec)

Example 3: You can also view the character set by looking at the statement that created the database.

mysql> SHOW CREATE DATABASE test_schema;
+-------------+----------------------------------------------------------------------+
| Database    | Create Database                                                      |
+-------------+----------------------------------------------------------------------+
| test_schema | CREATE DATABASE `test_schema` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+-------------+----------------------------------------------------------------------+
1 row in set (0.00 sec)

3. How to determine the character set and character order of database

  • When a database is created, CHARACTER SET or COLLATE is specified, whichever character set or collation corresponds.
  • When creating a database, if no character set or collation is specified, then character_set_server or collation_server will prevail.

6. Character set and character order of table s

The syntax for creating and modifying tables is as follows. Character set and character order can be set through CHARACTER SET and COLLATE.

CREATE TABLE tbl_name (column_list)
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]]

ALTER TABLE tbl_name
    [[DEFAULT] CHARACTER SET charset_name]
    [COLLATE collation_name]

1. Create a table and specify the character set/character order

As an example, utf8 is specified as the character set, and the character order is the default.

CREATE TABLE `test_schema`.`test_table` (
  `id` INT NOT NULL COMMENT '',
  PRIMARY KEY (`id`)  COMMENT '')
DEFAULT CHARACTER SET = utf8;

2. View the character set/character order of the table

Similarly, there are three ways to view the character set/character order of a table.

Mode 1: Check the table status through SHOW TABLE STATUS, noting that Collation is utf8_general_ci and the corresponding character set is utf8.

MariaDB [blog]> SHOW TABLE STATUS FROM test_schema \G;
*************************** 1. row ***************************
           Name: test_table
         Engine: InnoDB
        Version: 10
     Row_format: Compact
           Rows: 0
 Avg_row_length: 0
    Data_length: 16384
Max_data_length: 0
   Index_length: 0
      Data_free: 11534336
 Auto_increment: NULL
    Create_time: 2018-01-09 16:10:42
    Update_time: NULL
     Check_time: NULL
      Collation: utf8_general_ci
       Checksum: NULL
 Create_options: 
        Comment: 
1 row in set (0.00 sec)

Mode 2: View information_schema.TABLES.

mysql> USE test_schema;
mysql> SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA = "test_schema" AND TABLE_NAME = "test_table";
+-----------------+
| TABLE_COLLATION |
+-----------------+
| utf8_general_ci |
+-----------------+

Mode 3: Confirm by SHOW CREATE TABLE.

mysql> SHOW CREATE TABLE test_table;
+------------+----------------------------------------------------------------------------------------------------------------+
| Table      | Create Table                                                                                                   |
+------------+----------------------------------------------------------------------------------------------------------------+
| test_table | CREATE TABLE `test_table` (
  `id` int(11) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
+------------+----------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

3. How to determine table character set and character order

Assume that the values of CHARACTER SET and COLLATE are charset_name and collation_name, respectively.If you create a table:

  • If charset_name and collation_name are specified, then charset_name and collation_name are used.
  • Only charset_name is specified, but collation_name is not, then charset_name is used in the character set and the character order is the default character order corresponding to charset_name.
  • Only collation_name is specified, but charset_name is not, then the character order is collation_name, and the character set is the character set associated with collation_name.
  • If charset_name and collation_name are not clear, the character set and character order settings of the database are used.

7. Character set and sorting of column

Columns of type CHAR, VARCHAR, TEXT can specify character set/character order with the following syntax:

col_name {CHAR | VARCHAR | TEXT} (col_length)
    [CHARACTER SET charset_name]
    [COLLATE collation_name]

1. Add column and specify character set/collation

Examples are as follows: (create table s like this)

mysql> ALTER TABLE test_table ADD COLUMN char_column VARCHAR(25) CHARACTER SET utf8;

2. View the character set/character order of column

Examples are as follows:

mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE TABLE_SCHEMA="test_schema" AND TABLE_NAME="test_table" AND COLUMN_NAME="char_column";
+--------------------+-----------------+
| CHARACTER_SET_NAME | COLLATION_NAME  |
+--------------------+-----------------+
| utf8               | utf8_general_ci |
+--------------------+-----------------+
1 row in set (0.00 sec)

3. column Character Set/Collation Determination

Assume that the values of CHARACTER SET and COLLATE are charset_name, collation_name, respectively:

  • If both charset_name and collation_name are explicit, the character set and character order are charset_name and collation_name.
  • If only charset_name is specified and collation_name is not, the character set is charset_name and the character order is the default character order of charset_name.
  • Only collation_name is specified, and charset_name is not, then the character order is collation_name and the character set is the character set associated with collation_name.
  • If charset_name and collation_name are not clear, the character set and character order of the table shall prevail.

8. Selection: When to set the character set and character order

In general, you can configure it in three places:

  1. Configure when creating the database.
  2. mysql server is configured when it starts.
  3. When compiling mysql from source code, configure it by compiling parameters

1. Mode 1: Configure the database when it is created

This method is flexible and insurable and does not depend on the default character set/character order.When you create a database, you specify the character set/character order. When you subsequently create a table or column, you inherit the character set/character order of the corresponding database if you do not specify it specifically.

CREATE DATABASE mydb
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

2. Mode 2: Configure mysql server at startup

You can add the following configuration so that when mysql server starts, it configures character-set-server, collation-server.

When you create a database/table/column with mysql client without displaying the declarative character set/character order, character-set-server/collation-server is used as the default character set/character order.

In addition, the character set/character order for client and server connections needs to be set through SET NAMES.

[mysqld]
character-set-server=utf8
collation-server=utf8_general_ci

3. Mode 3: When compiling mysql from source code, set it by compiling parameters

If -DDEFAULT_CHARSET and -DDEFAULT_COLLATION are specified at compile time, then:

  • When a database or table is created, it is used as the default character set/character order.
  • Clients connect to server s as the default character set/character order.(No separate SET NAMES)
shell> cmake . -DDEFAULT_CHARSET=utf8 \
           -DDEFAULT_COLLATION=utf8_general_ci

9. Write after

This paper describes in detail the character set and character order related content in MySQL, which is mainly for data storage and comparison.In fact, there is an important part of the content has not been involved: for the connected character set, character order settings.

Posted by FarhanKhalaf on Fri, 17 May 2019 15:14:30 -0700