字符串类型字段末尾有空格匹配问题

现象:如果字段的值末尾有空格,那么在根据这个字段去查询时,会出现“不精确”的匹配,去掉了末尾的空格。

查了下MySQL的文档,原来是取决于MySQL的collation规则,当然也和MySQL的版本相关。

针对MySQL 5.7以及之前的版本(https://dev.mysql.com/doc/refman/5.7/en/char.html),是这样的:

All MySQL collations are of type PAD SPACE. This means that all CHAR, VARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant. This is true for all MySQL versions, and is not affected by the server SQL mode.

也就是说,在5.7版本以及之前的版本,在做查询比较时,CHAR,VARCHAR,TEXT类型的字段,会把末尾的空格去掉再比较,并且无法改变这个规则。如果一定要精确的匹配,可以使用“LIKE”而非“=”。

而针对MySQL 8以及以后版本(https://dev.mysql.com/doc/refman/8.0/en/char.html),则是这样的:

Most MySQL collations have a pad attribute of PAD SPACE. The exceptions are Unicode collations based on UCA 9.0.0 and higher, which have a pad attribute of NO PAD. To determine the pad attribute for a collation, use the INFORMATION_SCHEMA.COLLATIONS table, which has a PAD_ATTRIBUTE column.

在MySQL 8之后,部分collation可以支持选择NO PAD,也就是说可以选择不忽略末尾的空格再比较。

在MySQL的系统表(INFORMATION_SCHEMA.COLLATIONS)中,可以查询PAD_ATTRIBUTE列的值,来查看不同的collation对应的规则。

比如在MySQL 8的版本(5.7以及以下不存在PAD_ATTRIBUTE列,执行会报找不到列)中执行如下SQL:

SELECT COLLATION_NAME, PAD_ATTRIBUTE FROM INFORMATION_SCHEMA.COLLATIONS WHERE COLLATION_NAME LIKE 'utf8mb4_bin';

会显示两个结果:

+---------------------+---------------------+
| COLLATION_NAME   | PAD_ATTRIBUTE |
+---------------------+---------------------+
| utf8mb4_bin      | PAD SPACE |
+---------------------+---------------------+
| utf8mb4_0900_bin | NO PAD |
+---------------------+---------------------+

所以,可以通过修改字符集的COLLATION来实现对末尾空格处理逻辑。当然,采用LIKE匹配也可以实现精确匹配的效果。

由于处理末尾的规则不一样,所以当使用PAD SPACE规则时,如果该列有唯一性约束,那么通过末尾增加空格来区分两个值也是行不通的。

For those cases where trailing pad characters are stripped or comparisons ignore them, if a column has an index that requires unique values, inserting into the column values that differ only in number of trailing pad characters will result in a duplicate-key error. For example, if a table contains ‘a’, an attempt to store ‘a ‘ causes a duplicate-key error.

大小写匹配问题

这个其实也是跟库表的COLLATION相关的。按照COLLATION的命名规则(https://dev.mysql.com/doc/refman/5.7/en/charset-collation-names.html), 带有ci后缀的那些COLLATION是忽略大小写的(Case Insensitive)。

可以通过执行

SELECT * FROM INFORMATION_SCHEMA.COLLATIONS

查看字符集与其对应的COLLATON列表。