Why does technology sharing | check(col_name < > ') turn away spaces

Keywords: MySQL char

1. Problem description

I saw a blank question from my colleagues in the group two days ago. The general phenomenon is as follows:

mysql> select @@version;
+-----------+
| @@version |
+-----------+
| 8.0.25    |
+-----------+
1 row in set (0.00 sec)

mysql> create table t1(
    -> c1 int,
    -> c2 varchar(4) check(c2<>'')  #No spaces between single quotes
    -> )engine=innodb;
Query OK, 0 rows affected (0.21 sec)

mysql> insert into t1 select 1,'  ';  #Insert two spaces in the c2 field
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

Check defines c2 < > '', and inserts a space into the c2 field to prompt that the check constraint is violated.

Why is' '(there is one or more spaces between single quotes) in the insert statement judged as'' (there is no space between single quotes), resulting in insertion failure?

2. Involving knowledge

2.1,Stored and Retrieved

https://dev.mysql.com/doc/refman/8.0/en/char.html

When CHAR values are stored, they are right-padded with spaces to the specified length. When CHAR values are retrieved, trailing spaces are removed unless the PAD_CHAR_TO_FULL_LENGTH SQL mode is enabled.

VARCHAR values are not padded when they are stored. Trailing spaces are retained when values are stored and retrieved, in conformance with standard SQL.

CHAR(N): when the number of characters inserted is less than N, it will add spaces to the right of the string until the total number of characters reaches n; When the query returns data, the space at the end of the string will be removed by default, unless SQL_MODE setting pad_ CHAR_ TO_ FULL_ Length (the manual shows 8.0.13 deprecated and 8.0.25 can still be used).

VARCHAR(N): when the number of inserted characters is less than N, it will not supplement spaces on the right side of the string, and the insert content will be stored intact; If there is a space on the right of the original string, the space will be retained during storage and query return.

2.2,Collation Pad Attribute

https://dev.mysql.com/doc/refman/8.0/en/char.html

Values in CHAR, VARCHAR, and TEXT columns are sorted and compared according to the character set collation assigned to the column.

MySQL collations have a pad attribute of PAD SPACE, other than Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD.

For CHAR, VARCHAR and TEXT fields, the sorting and comparison operations depend on the Collation on the field. The Pad attribute of Collation controls the processing of spaces at the end of the string. You can use information_ In the schema.collections table, view the Pad attribute used by the collage:

mysql> select collation_name,pad_attribute from information_schema.collations;
+----------------------------+---------------+
| collation_name             | pad_attribute |
+----------------------------+---------------+
| armscii8_general_ci        | PAD SPACE     |
...
| utf8mb4_0900_bin           | NO PAD        |
+----------------------------+---------------+
272 rows in set (0.01 sec)

2.3,Trailing Space Handling in Comparisons

https://dev.mysql.com/doc/refman/8.0/en/charset-binary-collations.html#charset-binary-collations-trailing-space-comparisons

For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings:

• For PAD SPACE collations, trailing spaces are insignificant in comparisons; strings are compared without regard to trailing spaces.

• NO PAD collations treat trailing spaces as significant in comparisons, like any other character.

"Comparison" in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant, regardless of collation.

PAD SPACE: space at the end of the string is ignored in sorting and comparison operations. NO PAD: in sorting and comparison operations, spaces at the end of the string are regarded as ordinary characters and cannot be ignored.

3. Problem solving

The following operations are based on MySQL 8.0.25 Community Edition

3.1. Viewing collations used in fields

mysql> show full fields in t1;
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| Field | Type       | Collation          | Null | Key | Default | Extra | Privileges                      | Comment |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
| c1    | int        | NULL               | YES  |     | NULL    |       | select,insert,update,references |         |
| c2    | varchar(4) | utf8mb4_unicode_ci | YES  |     | NULL    |       | select,insert,update,references |         |
+-------+------------+--------------------+------+-----+---------+-------+---------------------------------+---------+
2 rows in set (0.00 sec)

The collision of c2 column is utf8mb4_unicode_ci.

3.2. View the Pad attribute of collision

mysql> select COLLATION_NAME,PAD_ATTRIBUTE from INFORMATION_SCHEMA.COLLATIONS where COLLATION_NAME in('utf8mb4_unicode_ci','utf8mb4_0900_ai_ci');
+--------------------+---------------+
| COLLATION_NAME     | PAD_ATTRIBUTE |
+--------------------+---------------+
| utf8mb4_0900_ai_ci | NO PAD        |
| utf8mb4_unicode_ci | PAD SPACE     |
+--------------------+---------------+
1 row in set (0.00 sec)

utf8mb4_ unicode_ The Pad attribute of Ci is PAD SPACE. It can be seen from 2.3 that c2 column ignores the space at the end of the string in sorting and comparison operations.

Therefore, during check comparison, the spaces in the inserted '' will be ignored. Obviously, ignoring the spaces conflicts with the check constraint, and the insertion fails.

mysql> SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci;
Query OK, 0 rows affected (0.00 sec)

mysql> select '  ' = '';
+--------+
| ' '='' |
+--------+
|      1 |
+--------+
1 row in set (0.00 sec)

3.3. How to make a check constraint take effect according to normal logic

The general rule here is that spaces are spaces and should not be ignored. Just change the c2 field to the collision of NO PAD, and the space can be inserted normally:

mysql> insert into t1 select 1,'  ';  #Insert two spaces in the c2 field
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

mysql> alter table t1 modify c2 varchar(4) collate utf8mb4_0900_ai_ci;  #Modified to collage of NO PAD
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into t1 select 1,'  ';  #Insert two spaces in the c2 field
Query OK, 1 row affected (0.12 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t1 select 1,'';  #There is no space between ''
ERROR 3819 (HY000): Check constraint 't1_chk_1' is violated.

mysql> select c1,c2,hex(c2) from t1;
+------+------+---------+
| c1   | c2   | hex(c2) |
+------+------+---------+
|    1 |      | 2020    |
+------+------+---------+
1 row in set (0.01 sec)

4. Expand

4.1. If c2 column is of CHAR type, is it the same as the previous question

Same. When sorting and comparing CHAR, VARCHAR and TEXT, the spaces at the end of the string are processed according to the Pad attribute of the column collision. At this time, the string used for comparison operation is the content of insert.

4.2 what is the expression form in WHERE condition

Create a new table and insert data

mysql> create table t3(
    -> c1 int,
    -> c2 char(4) collate utf8mb4_unicode_ci,
    -> c3 char(4) collate utf8mb4_0900_ai_ci,
    -> c4 varchar(4) collate utf8mb4_unicode_ci,
    -> c5 varchar(4) collate utf8mb4_0900_ai_ci
    -> )engine=innodb;
Query OK, 0 rows affected (0.29 sec)

mysql> insert into t3 select 1,'a','a','a','a';
Query OK, 1 row affected (0.09 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 2,'a ','a ','a ','a ';  #Each column contains 1 space
Query OK, 1 row affected (0.20 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 3,'a   ','a   ','a  ','a  ';  #There are 3 spaces in the first two columns and 2 spaces in the last two columns
Query OK, 1 row affected (0.17 sec)
Records: 1  Duplicates: 0  Warnings: 0

mysql> insert into t3 select 4,'a  ','a  ','a   ','a   ';  #There are 2 spaces in the first two columns and 3 spaces in the last two columns
Query OK, 1 row affected (0.14 sec)
Records: 1  Duplicates: 0  Warnings: 0

Observe the return result of WHERE condition. The return of char type is controlled by PAD_CHAR_TO_FULL_LENGTH impact (reference 2.1)

mysql> set sql_mode='';
Query OK, 0 rows affected (0.00 sec)

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)

c2 char->The space at the end of the string is removed from the returned data

c2 utf8mb4_unicode_ci->PAD SPACE->Sort and compare operations, ignoring spaces at the end of the string

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.01 sec)

c3 char->The space at the end of the string is removed from the returned data

c3 utf8mb4_0900_ai_ci->NO PAD->Sorting and comparison operations, the spaces at the end of the string are regarded as ordinary characters

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+---------+---------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4)  | hex(c5)  |
+------+------+------+------+------+---------+---------+----------+----------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61       | 61       |
|    2 | a    | a    | a    | a    | 61      | 61      | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61      | 61      | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61      | 61      | 61202020 | 61202020 |
+------+------+------+------+------+---------+---------+----------+----------+
4 rows in set (0.00 sec)

c4 varchar->Returns the space at the time of data retention insertion

c4 utf8mb4_unicode_ci->PAD SPACE->Sort and compare operations, ignoring spaces at the end of the string

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+---------+---------+---------+---------+
| c1   | c2   | c3   | c4   | c5   | hex(c2) | hex(c3) | hex(c4) | hex(c5) |
+------+------+------+------+------+---------+---------+---------+---------+
|    1 | a    | a    | a    | a    | 61      | 61      | 61      | 61      |
+------+------+------+------+------+---------+---------+---------+---------+
1 row in set (0.00 sec)

c5 varchar->Returns the space at the time of data retention insertion

c5 utf8mb4_0900_ai_ci->NO PAD->Sorting and comparison operations, the spaces at the end of the string are regarded as ordinary characters


mysql> set sql_mode='PAD_CHAR_TO_FULL_LENGTH';
Query OK, 0 rows affected, 1 warning (0.01 sec)

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c2='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |
+------+------+------+------+------+----------+----------+----------+----------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       |
|    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)

c2 char->PAD_CHAR_TO_FULL_LENGTH->Return the supplementary space to the right of the data string

c2 utf8mb4_unicode_ci->PAD SPACE->Sort and compare operations, ignoring spaces at the end of the string

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c3='a';
Empty set (0.00 sec)

c3 char->PAD_CHAR_TO_FULL_LENGTH->Return the supplementary space to the right of the data string

c3 utf8mb4_0900_ai_ci->NO PAD->Sorting and comparison operations, the spaces at the end of the string are regarded as ordinary characters

1~4 that 's ok c3 Column return values contain spaces, and c3 Columnar Collation yes NO PAD,Spaces at the end of the string cannot be ignored, where Filter cannot find record

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c4='a';
+------+------+------+------+------+----------+----------+----------+----------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4)  | hex(c5)  |
+------+------+------+------+------+----------+----------+----------+----------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61       | 61       |
|    2 | a    | a    | a    | a    | 61202020 | 61202020 | 6120     | 6120     |
|    3 | a    | a    | a    | a    | 61202020 | 61202020 | 612020   | 612020   |
|    4 | a    | a    | a    | a    | 61202020 | 61202020 | 61202020 | 61202020 |
+------+------+------+------+------+----------+----------+----------+----------+
4 rows in set (0.00 sec)
c4 varchar->Returns the space at the time of data retention insertion
c4 utf8mb4_unicode_ci->PAD SPACE->Sort and compare operations, ignoring spaces at the end of the string

mysql> select *,hex(c2),hex(c3),hex(c4),hex(c5) from t3 where c5='a';
+------+------+------+------+------+----------+----------+---------+---------+
| c1   | c2   | c3   | c4   | c5   | hex(c2)  | hex(c3)  | hex(c4) | hex(c5) |
+------+------+------+------+------+----------+----------+---------+---------+
|    1 | a    | a    | a    | a    | 61202020 | 61202020 | 61      | 61      |
+------+------+------+------+------+----------+----------+---------+---------+
1 row in set (0.00 sec)

c5 varchar->Returns the space at the time of data retention insertion

c5 utf8mb4_0900_ai_ci->NO PAD->Sorting and comparison operations, the spaces at the end of the string are regarded as ordinary characters

At this time, the string used for comparison operation is the content of Retrieved. When char and VARCHAR return data, the space at the end of the string is handled differently, and PAD_CHAR_TO_FULL_LENGTH only affects char types.

4.3. Impact on unique index

https://dev.mysql.com/doc/refman/8.0/en/char.html

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters results in a duplicate-key error. For example, if a table contains 'a', an attempt to store 'a ' causes a duplicate-key error.

If there is a unique index (single column, character type) and the number of spaces in the tail of the inserted data is different, a duplicate key error may be reported:

mysql> select c1,c4,c5,hex(c4),hex(c5) from t3;
+------+------+------+----------+----------+
| c1   | c4   | c5   | hex(c4)  | hex(c5)  |
+------+------+------+----------+----------+
|    1 | a    | a    | 61       | 61       |
|    2 | a    | a    | 6120     | 6120     |
|    3 | a    | a    | 612020   | 612020   |
|    4 | a    | a    | 61202020 | 61202020 |
+------+------+------+----------+----------+
4 rows in set (0.00 sec)

mysql> alter table t3 add unique(c4);
ERROR 1062 (23000): Duplicate entry 'a' for key 't3.c4'

mysql> alter table t3 add unique(c5);
Query OK, 0 rows affected (0.44 sec)
Records: 0  Duplicates: 0  Warnings: 0

You can see that c4 column failed to create a unique index, and c5 column succeeded in creating a unique index.

c4 utf8mb4_ unicode_ CI - > pad space - > sorting and comparison operations, ignoring the spaces at the end of the string, and 4 lines of data are repeated.

c5 utf8mb4_ 0900_ ai_ CI - > no pad - > sorting and comparison operations, the spaces at the end of the string are regarded as ordinary characters, and the four lines of data are different.

5. Summary

Stored

-CHAR(N)VARCHAR(N)
StoredIf the character is less than N, fill in the space on the rightKeep the space at the time of insertion, and no additional space will be added on the right

Retrieved

SQL_MODECHAR(N)VARCHAR(N)
Default ValueRemove the spaces at the end of the stringKeep spaces when inserting
PAD_CHAR_TO_FULL_LENGTHReturn the complete string, and fill in the space on the right when it is less than nKeep spaces when inserting

Comparison (excluding like)

|Pad attribute | char (n) / varchar (n) | -- | -- | -- pad space | ignore the space at the end of the string | NO PAD | the space at the end of the string is regarded as a normal character and cannot be ignored|

Enjoy GreatSQL :)

Article recommendation:

Technology sharing | MGR best practice https://mp.weixin.qq.com/s/66u5K7a9u8GcE2KPn4kCaA

Technology sharing | the way to repair MGR Bug in Wanli database https://mp.weixin.qq.com/s/IavpeP93haOKVBt7eO8luQ

Macos system compiler percona and the difference of some functions on Macos system https://mp.weixin.qq.com/s/jAbwicbRc1nQ0f2cIa_2nQ

Technology sharing | using systemd to manage single machine and multiple instances of MySQL https://mp.weixin.qq.com/s/iJjXwd0z1a6isUJtuAAHtQ

Product | GreatSQL to create a better MGR ecology https://mp.weixin.qq.com/s/ByAjPOwHIwEPFtwC5jA28Q

Product | GreatSQL MGR optimization reference https://mp.weixin.qq.com/s/5mL_ERRIjpdOuONian8_Ow

About GreatSQL

GreatSQL is a MySQL branch maintained by Wanli database. It focuses on improving the reliability and performance of MGR and supports the parallel query feature of InnoDB. It is a branch version of MySQL suitable for financial applications.

Gitee: https://gitee.com/GreatSQL/GreatSQL

GitHub: https://github.com/GreatSQL/GreatSQL

Wechat & QQ group:

You can scan code to add GreatSQL community assistant wechat friends, send verification information "add group" to join GreatSQL/MGR exchange wechat group, or directly scan code to join GreatSQL/MGR exchange QQ group.

This article is composed of blog one article multi posting platform OpenWrite release!

Posted by aaronrb on Thu, 18 Nov 2021 21:55:04 -0800