MySQL 优化逗号分隔符字段查询

最近在优化旧业务时，发现一个慢查询的问题，单表数据量190w，当查询某个包含逗号分隔符字段时，该字段有添加二级索引，因为是 VARCAHR 类型逗号分隔的值（ 100,300,32455,1435 ），所有检索起来只能 LIKE %,100,%，或者使用 FIND_IN_SET，但是这两种方式都会全表扫描。

fulltext 全文索引

之前一直知道有这么一个东西，但是基本没有使用过，感谢这个作者的博客^1，对我的帮助。

首先我们需要对该字段创建 fulltext 索引

1	ALTER TABLE `mall_db`.`t_recruit_link` ADD FULLTEXT INDEX fulltext_link_chain(link_chain);

然后我们就可以使用 MATCH 和 AGAINST 进行查询了：

1	SELECT COUNT(1) FROM t_recruit_link WHERE MATCH(link_chain) AGAINST(',1235,3232,')

MATCH AGAINST

全文索引的默认分隔符是标点符号和 stopwords，其中前者正是我们需要的特性。全文索引按照逗号将 MATCH 和 ASGAINST 里的字符串做分割，然后进行匹配。（其实可以看作是分词，只不过它是按照逗号分割的）

stopwords

stopwords 是英文中的一些无意义词，搜索的时候不需要它们，类似汉语中的助词等等。但在我们的使用中显然不是用来做搜索的，因此可以在 my.cnf 文件里，加上ft_stopword_file=’’来禁用它

查询 innodb 默认停词表：

1	SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;

查询出来有 36 条记录

a
about
an
are
as
at
be
by
com
de
en
for
from
how
i
in
is
it
la
of
on
or
that
the
this
to
was
what
when
where
who
will
with
und
the
www

失效情况

必须对字段建立 fulltext 索引。
每个被标点符号分隔的 word 长度必须在 3 个字符以上，如果太短，会被自动忽略掉，这个时候可以考虑让 id 从一个比较大的值开始自增，比如 1000，这样就够长了。
匹配到了 stopwords，比如你的字段是这样的 “hello,what”，what 是 MYSQL 的一个默认的 stopwords，它会被自动忽略。

查看全文索引相关变量：

1	show variables like '%ft%';

Variable	Value
ft_boolean_syntax	+ -><()~*:””&\|
ft_max_word_len	84
ft_min_word_len	4
ft_query_expansion_limit	20
ft_stopword_file	(built-in)
innodb_ft_aux_table
innodb_ft_cache_size	8000000
innodb_ft_enable_diag_print	OFF
innodb_ft_enable_stopword	ON
innodb_ft_max_token_size	84
innodb_ft_min_token_size	3
innodb_ft_num_word_optimize	2000
innodb_ft_result_cache_limit	2000000000
innodb_ft_server_stopword_table
innodb_ft_sort_pll_degree	2
innodb_ft_total_cache_size	640000000
innodb_ft_user_stopword_table

MySQL 优化逗号分隔符字段查询

MySQL 优化逗号分隔符字段查询

fulltext 全文索引

MATCH AGAINST

stopwords

失效情况

参考资料