ElasticSearch 6.3版本(ES)查询关键字不拆词查询:类似mysql 的 like 语句.
mysql的sql语法类似如下,采用大量like和locate语法,进行模糊查询,导致查询一个需要8秒多.通过ES优化后,总的查询在1秒内完成,提升8倍以上.
SELECT
ecc.id,
ecc.customer_code AS customerCode,
...,
ei.industry_name AS industryName
FROM
ec_cust_customer ecc
LEFT JOIN ec_industry ei ON ecc.CUSTOMER_TRADE = ei.id
LEFT JOIN ec_cust_follow_info ecfi ON ecc.id = ecfi.CUST_ID
AND ecfi.id = ( SELECT fi.id FROM ec_cust_follow_info fi WHERE fi.CUST_ID = ecc.id ORDER BY fi.id DESC LIMIT 1 )
LEFT JOIN ec_cust_linkman ecl ON ecl.cust_id = ecc.ID
AND ecl.id = ( SELECT ecll.id FROM ec_cust_linkman ecll WHERE ecll.CUST_ID = ecc.id ORDER BY ecll.id DESC LIMIT 1 )
WHERE
ecc.STATUS = 0
AND deleted = 0
AND (
ecc.customer_code IN (
SELECT
r.customer_code
FROM
ec_cust_change_records r
WHERE
LOCATE( '名称', r.project_name ) > 0
AND ( LOCATE( '铁城', r.before_content ) > 0 OR LOCATE( '铁城', r.after_content ) > 0 )
)
OR LOCATE( '铁城', ecc.customer_name ) > 0
)
ORDER BY
ecc.last_update_date DESC
LIMIT 200
优化步骤:
由于存在多个表关联,而且数据无法整合,因此,按照下面es和mysql两步方法进行优化.
1、将qdy_ecloud_common_db.ec_cust_customer和qdy_ecloud_common_db.ec_cust_change_records 两个表数据同步到ES索引
2、查询客户时先从上列两个es索引中匹配客户名称和变更前后客户名称
3、将第2步得到的客户编码替换掉现有查询sql中的like或者locate语句
下面是ElasticSearch实现方法和原理:
若是afterContent只是包含"张三",查询"张三丰"不会返回该文档.
采用match_phrase短语匹配模式,下面是标准的DSL查询语言,
GET xyz_test/_search
{
"query":
{
"bool":
{
"must":
[
{
"match_phrase":
{ "afterContent": "张三丰"}
}
]
}
}
}
ES在标准的DSL查询语法包含下面几个匹配方法:
1.match,当使用该模式时,是进行模糊查询.例如,输入人名“张三丰”,首先会对输入内容进行分词,分拆成“张”,“三”,“丰”,“张三”,“三丰”,“张三丰”,六个分词,只要es的字段包含上面的六个分词,都会返回该文档.
2.match_phrase短语匹配模式,查询姓名关键字“张三丰”,不会对输入进行拆词,对term的顺序也是必须是一致的,是顺序的.
为了模拟term顺序分词,采用英文进行举例子,
输入查询”see you",若是存在下面三个文档,只会返回前面两个:
{“chat”:“see you”}
{“chat”:“My dear see you tomorrow”}
{“chat”:“you see”}
3.term精确查询,输入的内容必须精确匹配才会返回结果.例如,上面的see you查询中,只有es中不把see和you拆分才能搜索出来.
4.query_string的查询匹配模式,这个和match_phrase短语匹配模式类似,只是多了term逆序也能返回.例如,上面查询“see you”,会出来三个结果,第三个“you see”也会满足条件返回.
下面是创建索引语法:
PUT xyz_test
{
"settings": {
"number_of_shards": 4,
"number_of_replicas": 1
}
}
下面是创建mapping语法:
PUT xyz_test/_doc/_mapping
{
"dynamic_templates": [
{
"ident": {
"match_mapping_type": "string",
"match_pattern": "regex",
"match": ".*[iI]d$",
"mapping": {"type": "keyword"}
}
},
{
"full_text_search": {
"match_mapping_type": "string",
"mapping": {
"type": "text"
}
}
}
]
}
下面是java代码实现不拆词查询:
private List findCustCodeFES(String search) {
logger.info("按客户模糊搜索ES,查询条件值={}",search);
List custCodeList = new ArrayList<>();
try {
SearchRequest searchRequest = new SearchRequest(esIndexConfig.getCustChangeRecords());
BoolQueryBuilder projectNameBuilder = QueryBuilders.boolQuery();
projectNameBuilder.must(QueryBuilders.matchPhraseQuery("projectName", "名称"));
//修改前后查询
BoolQueryBuilder contentBuilder = QueryBuilders.boolQuery();
contentBuilder.should(QueryBuilders.matchPhraseQuery("beforeContent",search));
contentBuilder.should(QueryBuilders.matchPhraseQuery("afterContent",search));
// 2个查询是and
projectNameBuilder.must(contentBuilder);
下面是python获取查询内容:
from elasticsearch import Elasticsearch
es = Elasticsearch(["ip..."],http_auth=('username', 'pwd'),Transport=9200)
print(es.search(index='xyz_test', q='afterContent:积'))
下面是dts同步后,由于dts不支持源数据mysql DDL变更.在使用DTS同步后,因此,每次ddl以后,需要停止DTS同步,然后重新配置DTS,从原来v1版本同步到v2版本,新同步完成后,然后通过ES的alias别名,瞬间从v1指向v2.在线业务影响范围只是暂停了15分钟左右数据不更新,切换影响在毫秒内.
POST /_aliases
{
"actions": [
{
"add": {
"alias": "ec_cust_change_records_alias",
"index": "ec_cust_change_records_v2"
}
},
{
"remove": {
"alias": "ec_cust_change_records_alias",
"index": "ec_cust_change_records_v1"
}
},
{
"add": {
"alias": "ec_cust_customer_alias",
"index": "ec_cust_customer_v2"
}
},
{
"remove": {
"alias": "ec_cust_customer_alias",
"index": "ec_cust_customer_v1"
}
}
]
}
部分内容参考下面文章:
https://www.cnblogs.com/buxizhizhoum/p/9874703.html