ElasticSearch 6.3版本(ES)查询人名关键字不拆词查询

ElasticSearch 6.3版本(ES)查询关键字不拆词查询:类似mysql 的 like 语句.
mysql的sql语法类似如下,采用大量like和locate语法,进行模糊查询,导致查询一个需要8秒多.通过ES优化后,总的查询在1秒内完成,提升8倍以上.

SELECT
	ecc.id,
	ecc.customer_code AS customerCode,
	...,
	ei.industry_name AS industryName 
FROM
	ec_cust_customer ecc
	LEFT JOIN ec_industry ei ON ecc.CUSTOMER_TRADE = ei.id
	LEFT JOIN ec_cust_follow_info ecfi ON ecc.id = ecfi.CUST_ID 
	AND ecfi.id = ( SELECT fi.id FROM ec_cust_follow_info fi WHERE fi.CUST_ID = ecc.id ORDER BY fi.id DESC LIMIT 1 )
	LEFT JOIN ec_cust_linkman ecl ON ecl.cust_id = ecc.ID 
	AND ecl.id = ( SELECT ecll.id FROM ec_cust_linkman ecll WHERE ecll.CUST_ID = ecc.id ORDER BY ecll.id DESC LIMIT 1 ) 
WHERE
	ecc.STATUS = 0 
	AND deleted = 0 
	AND (
		ecc.customer_code IN (
		SELECT
			r.customer_code 
		FROM
			ec_cust_change_records r 
		WHERE
			LOCATE( '名称', r.project_name ) > 0 
			AND ( LOCATE( '铁城', r.before_content ) > 0 OR LOCATE( '铁城', r.after_content ) > 0 ) 
		) 
		OR LOCATE( '铁城', ecc.customer_name ) > 0 
	) 
ORDER BY
	ecc.last_update_date DESC 
	LIMIT 200

优化步骤:
由于存在多个表关联,而且数据无法整合,因此,按照下面es和mysql两步方法进行优化.
1、将qdy_ecloud_common_db.ec_cust_customer和qdy_ecloud_common_db.ec_cust_change_records 两个表数据同步到ES索引
2、查询客户时先从上列两个es索引中匹配客户名称和变更前后客户名称
3、将第2步得到的客户编码替换掉现有查询sql中的like或者locate语句

下面是ElasticSearch实现方法和原理:
若是afterContent只是包含"张三",查询"张三丰"不会返回该文档.
采用match_phrase短语匹配模式,下面是标准的DSL查询语言,

GET xyz_test/_search
{
    "query": 
    {
        "bool": 
        {
        "must": 
        [
            {
                "match_phrase": 
                { "afterContent": "张三丰"}
             }
        ]
        }
     }
}

ES在标准的DSL查询语法包含下面几个匹配方法:
1.match,当使用该模式时,是进行模糊查询.例如,输入人名“张三丰”,首先会对输入内容进行分词,分拆成“张”,“三”,“丰”,“张三”,“三丰”,“张三丰”,六个分词,只要es的字段包含上面的六个分词,都会返回该文档.
2.match_phrase短语匹配模式,查询姓名关键字“张三丰”,不会对输入进行拆词,对term的顺序也是必须是一致的,是顺序的.
为了模拟term顺序分词,采用英文进行举例子,
输入查询”see you",若是存在下面三个文档,只会返回前面两个:
{“chat”:“see you”}
{“chat”:“My dear see you tomorrow”}
{“chat”:“you see”}
3.term精确查询,输入的内容必须精确匹配才会返回结果.例如,上面的see you查询中,只有es中不把see和you拆分才能搜索出来.
4.query_string的查询匹配模式,这个和match_phrase短语匹配模式类似,只是多了term逆序也能返回.例如,上面查询“see you”,会出来三个结果,第三个“you see”也会满足条件返回.

下面是创建索引语法:

PUT xyz_test
{
    "settings": {
        "number_of_shards":  4,
        "number_of_replicas": 1
    }
}

下面是创建mapping语法:

PUT xyz_test/_doc/_mapping
{
    "dynamic_templates": [
		{
			"ident": {
				"match_mapping_type": "string",
				"match_pattern": "regex",
				"match": ".*[iI]d$",
				"mapping": {"type": "keyword"}
			}
		},
		{
    		"full_text_search": {
                "match_mapping_type": "string",
                "mapping": {
                	"type": "text"
                 }
            }
        }
	]
}

下面是java代码实现不拆词查询:

private List findCustCodeFES(String search) {
		logger.info("按客户模糊搜索ES,查询条件值={}",search);
		List custCodeList =  new ArrayList<>();
		try {
			SearchRequest searchRequest = new SearchRequest(esIndexConfig.getCustChangeRecords());

			BoolQueryBuilder projectNameBuilder = QueryBuilders.boolQuery();
			projectNameBuilder.must(QueryBuilders.matchPhraseQuery("projectName", "名称"));
			//修改前后查询
			BoolQueryBuilder contentBuilder = QueryBuilders.boolQuery();
			contentBuilder.should(QueryBuilders.matchPhraseQuery("beforeContent",search));
			contentBuilder.should(QueryBuilders.matchPhraseQuery("afterContent",search));
			// 2个查询是and
			projectNameBuilder.must(contentBuilder);

下面是python获取查询内容:

from elasticsearch import Elasticsearch
es = Elasticsearch(["ip..."],http_auth=('username', 'pwd'),Transport=9200)
print(es.search(index='xyz_test', q='afterContent:积'))

下面是dts同步后,由于dts不支持源数据mysql DDL变更.在使用DTS同步后,因此,每次ddl以后,需要停止DTS同步,然后重新配置DTS,从原来v1版本同步到v2版本,新同步完成后,然后通过ES的alias别名,瞬间从v1指向v2.在线业务影响范围只是暂停了15分钟左右数据不更新,切换影响在毫秒内.

POST /_aliases
{
  "actions": [
    {
      "add": {
        "alias": "ec_cust_change_records_alias",
        "index": "ec_cust_change_records_v2"
      }
    },
    {
      "remove": {
        "alias": "ec_cust_change_records_alias",
        "index": "ec_cust_change_records_v1"
      }
    },
    {
      "add": {
        "alias": "ec_cust_customer_alias",
        "index": "ec_cust_customer_v2"
      }
    },
    {
      "remove": {
        "alias": "ec_cust_customer_alias",
        "index": "ec_cust_customer_v1"
      }
    }
  ]
}

部分内容参考下面文章:
https://www.cnblogs.com/buxizhizhoum/p/9874703.html

你可能感兴趣的:(ELK)