1.1 Elasticsearch是什么
Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。Elasticsearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。
1.2 Elasticsearch常用名词
文档:文档是索引和搜索数据的最小单位,一个文档可以理解为一个数据库表的一行数据,它和行的区别是,文档是有层次的(文档中可以包含新的文档,一个字段可以包含其他字段和取值,想象JSON格式的数据),无模式的(并非所有的文档都需要拥有相同的字段)。
类型:类型是文档的逻辑容器,相当于数据库中的一张表,每个类型中字段的定义称为映射。例如name字段可以映射为String。
索引:索引是映射类型的容器,相当于一个关系型数据库,是独立的大量文档集合。
节点:一个节点是一个Elasticsearch实例,在服务器上启动Elasticsearch后,就拥有了一个节点。如果在另一台服务器上启动Elasticsearch,这就是另一个节点。也可以通过启动多个Elasticsearch进程,在同一台服务器拥有多个节点。对于使用Elasticsearch的应用程序,集群中有1个还是多个节点都是透明的,默认情况下,可以连接集群中任一节点并访问完整的数据集,就好像集群只有单独的一个节点。
分片:分片是Elasticsearch处理的最小单元,一份分片是一个Lucene索引:一个包含倒排索引的文件目录,倒排索引的结构使得Elasticsearch在不扫描所有文档的情况下,就能告诉你哪些文档包含特定的词条(单词),倒排索引记录了词条出现在哪个文档的什么位置,以及出现的次数等映射信息。
分片分为主分片和副本分片,副本分片有助于提高搜索性能和容错率,Elasticsearch在索引的主分片和副本分片中进行搜索请求的负载均衡,原有主分片丢失后会成为新的主分片,副本分片可以在运行时进行添加和移除,而主分片不可以,在创建索引之前必须决定主分片的数量,过少的分片会限制可拓展性,过多的分片会影响性能。
水平拓展与垂直拓展:水平拓展:在节点中加入更多节点,一般是在另外服务器开启Elasticsearch实例。垂直拓展:为原来的节点增加更多硬件资源,如为虚拟机分配更多处理器,为物理机增加更多内存,更多的CPU,更快的磁盘。
1.3 分布式索引和搜索(多节点多分片)过程
索引过程:接收建立索引请求的Elasticsearch节点首先选择文档索引到哪个分片。默认地,文档在分片中均匀分布:对于每篇文档,分片是通过其ID字符串的散列决定的,每份分片拥有相同的散列范围,接收新文档的机会均等,一旦目标分片确定,接收请求的节点将文档转发到该分片所在的节点。随后,建立索引操作在所有目标分片的所有副本分片中进行,在所有可用副本分片完成文档索引后,建立索引命令就会成功返回。
搜索过程:搜索时,接收请求的节点将请求转发到一组包含所有数据的分片,Elasticsearch使用round-robin的轮询机制选择可用的分片(主分片或副本分片),并将搜索请求转发过去,Elasticsearch然后从这些分片收集结果,将其聚集到单一的回复,然后将回复返回给客户端应用程序。
二、windows下Elasticsearch安装
Elasticsearch官网下载地址:https://www.elastic.co/cn/downloads/elasticsearch
也可以下载其他旧版本:
安装步骤参考: https://blog.csdn.net/kevlin_v/article/details/94616871
安装IK中文分词器和拼音分词器:https://blog.csdn.net/qq_28988969/article/details/79540620
三、Elasticsearch搭建集群
项目中用到了两台windows服务器10.254.24.54,10.254.24.55,于是就搭建了只有两个节点的简单集群,也可以搭建单机伪集群,操作一样。
3.1 修改elasticsearch.yml配置文件
10.254.24.55服务器
# 设置集群名称,集群内所有节点的名称必须一致。
cluster.name: my-esCluster
# 设置节点名称,集群内节点名称必须唯一。
node.name: node1
# 表示该节点会不会作为主节点,true表示会;false表示不会
node.master: true
# 当前节点是否可以作为数据节点,是:true、否:false
node.data: true
# 索引数据存放的位置
path.data: D:\TPI\elastic\data
# 日志文件存放的位置
#path.logs: /opt/elasticsearch/logs
# 需求锁住物理内存,是:true、否:false
#bootstrap.memory_lock: true
# 监听地址,用于访问该es
network.host: 0.0.0.0
# es对外提供的http端口,默认 9200
http.port: 9200
# TCP的默认监听端口,默认 9300
transport.tcp.port: 9300
# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
discovery.zen.minimum_master_nodes: 1
node.max_local_storage_nodes: 2
# es7.x 之后新增的配置,写入候选主节点的设备地址,在开启服务后可以被选为主节点
discovery.seed_hosts: ["10.254.24.55:9300", "10.254.24.54:9301"]
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5
# es7.x 之后新增的配置,初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["node1", "node2"]
# 是否支持跨域,是:true,在使用head插件时需要此配置
http.cors.enabled: true
# “*” 表示支持所有域名
http.cors.allow-origin: "*"
action.destructive_requires_name: true
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history*
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.graph.enabled: false
xpack.watcher.enabled: false
xpack.ml.enabled: false
10.254.24.54 服务器
# 设置集群名称,集群内所有节点的名称必须一致。
cluster.name: my-esCluster
# 设置节点名称,集群内节点名称必须唯一。
node.name: node2
# 表示该节点会不会作为主节点,true表示会;false表示不会
node.master: true
# 当前节点是否可以作为数据节点,是:true、否:false
node.data: true
# 索引数据存放的位置
path.data: D:\elasticsearch\data
# 日志文件存放的位置
#path.logs: /opt/elasticsearch/logs
# 需求锁住物理内存,是:true、否:false
#bootstrap.memory_lock: true
# 监听地址,用于访问该es
network.host: 0.0.0.0
# es对外提供的http端口,默认 9200
http.port: 9201
# TCP的默认监听端口,默认 9300
transport.tcp.port: 9301
# 设置这个参数来保证集群中的节点可以知道其它N个有master资格的节点。默认为1,对于大的集群来说,可以设置大一点的值(2-4)
discovery.zen.minimum_master_nodes: 1
node.max_local_storage_nodes: 2
# es7.x 之后新增的配置,写入候选主节点的设备地址,在开启服务后可以被选为主节点,这里要加端口号,否则可能导致启动时选主节点失败
discovery.seed_hosts: ["10.254.24.55:9300", "10.254.24.54:9301"]
discovery.zen.fd.ping_timeout: 1m
discovery.zen.fd.ping_retries: 5
# es7.x 之后新增的配置,初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["node1", "node2"]
# 是否支持跨域,是:true,在使用head插件时需要此配置
http.cors.enabled: true
# “*” 表示支持所有域名
http.cors.allow-origin: "*"
action.destructive_requires_name: true
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history*
xpack.security.enabled: false
xpack.monitoring.enabled: true
xpack.graph.enabled: false
xpack.watcher.enabled: false
xpack.ml.enabled: false
单机伪集群配置参考:https://blog.csdn.net/csdn565973850/article/details/104772551/
3.2 修改jvm.options配置文件
修改JVM堆的初始值和最大值,项目两台服务器都是16g,我将jvm堆初始值与最大值改成了8g
-Xms8g
-Xmx8g
集群规划与管理:https://www.cnblogs.com/leeSmall/p/9220535.html
3.3 启动并访问
10.254.24.54:9201
10.254.24.55:9200
四、SpringBoot整合ElasticSearch Java High Level Rest Client
4.1 pom.xml
<!--elasticsearch-->
<!--elasticsearch-rest-high-level-client-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
<version>7.6.2</version>
<exclusions>
<exclusion>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
</exclusion>
<exclusion>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- https://mvnrepository.com/artifact/org.elasticsearch/elasticsearch -->
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch</artifactId>
<version>7.6.2</version>
</dependency>
<!--<!– https://mvnrepository.com/artifact/org.elasticsearch.client/elasticsearch-rest-client –>-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-client</artifactId>
<version>7.6.2</version>
</dependency>
Elasticsearch Java High Level Rest Client对SpringBoot版本有比较高的要求,一开始我SpringBoot版本时2.0.0版本的,整合后发现项目启动不了,也没报任何错误,后来就改成了2.2.2版本发现可以启动成功。同时如果项目之前已经导入了httpclient依赖的话,也可能因为httpclient版本冲突导致启动报错。
把项目原来引入的httpclient依赖注释掉:
<!-- httpclient begin -->
<!--<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.3.5</version>
</dependency>-->
<!-- httpclient end -->
4.2 application.yml
elasticsearch:
nodes: 10.254.24.55:9200, 10.254.24.54:9201
schema: http
max-connect-total: 50
max-connect-per-route: 10
connection-request-timeout-millis: 500
socket-timeout-millis: 30000
connect-timeout-millis: 1000
#这个是我项目中做completion suggest的索引名称
indexName: suggest
4.3 集成RestHighLevelClient
编写构造器:
package cnki.tpi.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.stereotype.Component;
import java.util.List;
/**
* ClassName: EsClientBuilder
* Description:
*
* @author 小黄
* date: 2020/4/13 20:07
*/
@Component
public class EsClientBuilder {
private int connectTimeoutMillis = 1000;
private int socketTimeoutMillis = 30000;
private int connectionRequestTimeoutMillis = 500;
private int maxConnectPerRoute = 10;
private int maxConnectTotal = 30;
private final List<HttpHost> httpHosts;
private EsClientBuilder(List<HttpHost> httpHosts) {
this.httpHosts = httpHosts;
}
public EsClientBuilder setConnectTimeoutMillis(int connectTimeoutMillis) {
this.connectTimeoutMillis = connectTimeoutMillis;
return this;
}
public EsClientBuilder setSocketTimeoutMillis(int socketTimeoutMillis) {
this.socketTimeoutMillis = socketTimeoutMillis;
return this;
}
public EsClientBuilder setConnectionRequestTimeoutMillis(int connectionRequestTimeoutMillis) {
this.connectionRequestTimeoutMillis = connectionRequestTimeoutMillis;
return this;
}
public EsClientBuilder setMaxConnectPerRoute(int maxConnectPerRoute) {
this.maxConnectPerRoute = maxConnectPerRoute;
return this;
}
public EsClientBuilder setMaxConnectTotal(int maxConnectTotal) {
this.maxConnectTotal = maxConnectTotal;
return this;
}
public static EsClientBuilder build(List<HttpHost> httpHosts) {
return new EsClientBuilder(httpHosts);
}
public RestHighLevelClient create() {
HttpHost[] httpHostArr = httpHosts.toArray(new HttpHost[0]);
RestClientBuilder builder = RestClient.builder(httpHostArr);
builder.setRequestConfigCallback(requestConfigBuilder -> {
requestConfigBuilder.setConnectTimeout(connectTimeoutMillis);
requestConfigBuilder.setSocketTimeout(socketTimeoutMillis);
requestConfigBuilder.setConnectionRequestTimeout(connectionRequestTimeoutMillis);
return requestConfigBuilder;
});
builder.setHttpClientConfigCallback(httpClientBuilder -> {
httpClientBuilder.setMaxConnTotal(maxConnectTotal);
httpClientBuilder.setMaxConnPerRoute(maxConnectPerRoute);
return httpClientBuilder;
});
return new RestHighLevelClient(builder);
}
}
配置类Configuration:
package cnki.tpi.config;
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.util.Assert;
import org.springframework.util.StringUtils;
import java.util.ArrayList;
import java.util.List;
/**
* ClassName: ElasticConfig
* Description:
*
* @author 小黄
* date: 2020/4/13 17:44
*/
@Configuration
public class ElasticConfig {
@Value("${elasticsearch.nodes}")
private List<String> nodes;
@Value("${elasticsearch.schema}")
private String schema;
@Value("${elasticsearch.max-connect-total}")
private Integer maxConnectTotal;
@Value("${elasticsearch.max-connect-per-route}")
private Integer maxConnectPerRoute;
@Value("${elasticsearch.connection-request-timeout-millis}")
private Integer connectionRequestTimeoutMillis;
@Value("${elasticsearch.socket-timeout-millis}")
private Integer socketTimeoutMillis;
@Value("${elasticsearch.connect-timeout-millis}")
private Integer connectTimeoutMillis;
@Bean
public RestHighLevelClient getRestHighLevelClient() {
List<HttpHost> httpHosts = new ArrayList<>();
for (String node : nodes) {
try {
String[] parts = StringUtils.split(node, ":");
Assert.notNull(parts,"Must defined");
Assert.state(parts.length == 2, "Must be defined as 'host:port'");
httpHosts.add(new HttpHost(parts[0], Integer.parseInt(parts[1]), schema));
} catch (RuntimeException ex) {
throw new IllegalStateException(
"Invalid ES nodes " + "property '" + node + "'", ex);
}
}
return EsClientBuilder.build(httpHosts)
.setConnectionRequestTimeoutMillis(connectionRequestTimeoutMillis)
.setConnectTimeoutMillis(connectTimeoutMillis)
.setSocketTimeoutMillis(socketTimeoutMillis)
.setMaxConnectTotal(maxConnectTotal)
.setMaxConnectPerRoute(maxConnectPerRoute)
.create();
}
}
五、完成completion suggest
5.1 核心操作接口IBaseElasticService
package cnki.tpi.service;
import cnki.tpi.entity.dto.ElasticSuggestDto;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import java.util.Collection;
import java.util.List;
public interface IBaseElasticService {
void createIndex(String idxName,String idxSQL);
void insertOrUpdateOne(String idxName, ElasticSuggestDto entity);
void insertBatch(String idxName, List<ElasticSuggestDto> list);
<T> void deleteBatch(String idxName, Collection<T> idList);
List<String> searchCompletionSuggest(String idxName, SearchSourceBuilder builder);
void deleteIndex(String idxName);
void deleteByQuery(String idxName, QueryBuilder builder);
}
5.2 核心操作实现类BaseElasticServiceImpl
package cnki.tpi.service.impl;
import cnki.tpi.entity.dto.ElasticSuggestDto;
import cnki.tpi.service.IBaseElasticService;
import lombok.extern.slf4j.Slf4j;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.action.support.IndicesOptions;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.xcontent.XContentType;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.reindex.DeleteByQueryRequest;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.suggest.Suggest;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.Collection;
import java.util.List;
import java.util.Spliterator;
import java.util.Spliterators;
import java.util.stream.Collectors;
import java.util.stream.StreamSupport;
/**
* ClassName: BaseElasticServiceImpl
* Description:
*
* @author 小黄
* date: 2020/4/13 15:41
*/
@Service
@Slf4j
public class BaseElasticServiceImpl implements IBaseElasticService {
@Autowired
RestHighLevelClient restHighLevelClient;
/**
* 创建索引
* @param idxName 索引名称
* @param idxSQL 索引描述
*/
@Override
public void createIndex(String idxName,String idxSQL){
try {
if (!this.indexExist(idxName)) {
log.info(" idxName={} 已经存在,idxSql={}",idxName,idxSQL);
return;
}
CreateIndexRequest request = new CreateIndexRequest(idxName);
buildSetting(request);
request.mapping(idxSQL, XContentType.JSON);
// request.settings() 手工指定Setting
CreateIndexResponse res = restHighLevelClient.indices().create(request, RequestOptions.DEFAULT);
if (!res.isAcknowledged()) {
throw new RuntimeException("初始化失败");
}
} catch (Exception e) {
e.printStackTrace();
System.exit(0);
}
}
/**
* 断某个index是否存在
* @param idxName index名
* @return
* @throws Exception
*/
private boolean indexExist(String idxName) throws Exception {
GetIndexRequest request = new GetIndexRequest(idxName);
request.local(false);
request.humanReadable(true);
request.includeDefaults(false);
request.indicesOptions(IndicesOptions.lenientExpandOpen());
return restHighLevelClient.indices().exists(request, RequestOptions.DEFAULT);
}
/**
* 设置分片
* @param request
*/
private void buildSetting(CreateIndexRequest request){
request.settings(Settings.builder().put("index.number_of_shards",6)
.put("index.number_of_replicas",1));
}
/**
*
* @param idxName 索引名
* @param entity 文档对象
*/
@Override
public void insertOrUpdateOne(String idxName, ElasticSuggestDto entity) {
IndexRequest request = new IndexRequest(idxName);
//log.info("Data : id={},entity={}",entity.getId(),JSON.toJSONString(entity.getData()));
request.id(entity.getId());
request.source(entity.getData(), XContentType.JSON);
try {
restHighLevelClient.index(request, RequestOptions.DEFAULT);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* 批量插入文档数据
* @param idxName
* @param list
*/
@Override
public void insertBatch(String idxName, List<ElasticSuggestDto> list) {
BulkRequest request = new BulkRequest();
list.forEach(item -> request.add(new IndexRequest(idxName).id(item.getId())
.source(item.getData(), XContentType.JSON)));
try {
restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* 批量删除
* @param idxName 索引名称
* @param idList 待删除列表
* @param
*/
@Override
public <T> void deleteBatch(String idxName, Collection<T> idList) {
BulkRequest request = new BulkRequest();
idList.forEach(item -> request.add(new DeleteRequest(idxName, item.toString())));
try {
restHighLevelClient.bulk(request, RequestOptions.DEFAULT);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* 搜索
* @param idxName
* @param builder
* @return
*/
@Override
public List<String> searchCompletionSuggest(String idxName, SearchSourceBuilder builder) {
SearchRequest request = new SearchRequest(idxName);
request.source(builder);
try {
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
List<String> list = StreamSupport.stream(Spliterators.spliteratorUnknownSize(response.getSuggest().iterator(), Spliterator.ORDERED), false)
.flatMap(suggestion -> suggestion.getEntries().get(0).getOptions().stream())
.map((Suggest.Suggestion.Entry.Option option) -> option.getText().toString())
.collect(Collectors.toList());
return list;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* 删除index
* @param idxName
*/
@Override
public void deleteIndex(String idxName) {
try {
if (!this.indexExist(idxName)) {
log.info(" idxName={} 已经存在",idxName);
return;
}
restHighLevelClient.indices().delete(new DeleteIndexRequest(idxName), RequestOptions.DEFAULT);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
/**
* deleteByQuery
* @param idxName
* @param builder
*/
@Override
public void deleteByQuery(String idxName, QueryBuilder builder) {
DeleteByQueryRequest request = new DeleteByQueryRequest(idxName);
request.setQuery(builder);
//设置批量操作数量,最大为10000
request.setBatchSize(10000);
request.setConflicts("proceed");
try {
restHighLevelClient.deleteByQuery(request, RequestOptions.DEFAULT);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}
5.3 创建索引
我这里图方便直接用postman创建的索引,也可以用上面代码createIndex方法进行创建
索引json
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 6,
"analysis": {
"analyzer": {
"default": {
"tokenizer": "ik_max_word"
},
"first_py_letter_analyzer": {
"tokenizer": "first_py_letter"
},
"full_pinyin_letter_analyzer": {
"tokenizer": "full_pinyin_letter"
}
},
"tokenizer": {
"first_py_letter": {
"type": "pinyin",
"keep_first_letter": true,
"keep_full_pinyin": false,
"keep_original": false,
"limit_first_letter_length": 16,
"lowercase": true,
"trim_whitespace": true,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_none_chinese": true,
"keep_none_chinese_in_joined_full_pinyin": true
},
"full_pinyin_letter": {
"type": "pinyin",
"keep_separate_first_letter": false,
"keep_full_pinyin": false,
"keep_original": false,
"limit_first_letter_length": 16,
"lowercase": true,
"keep_first_letter": false,
"keep_none_chinese_in_first_letter": false,
"none_chinese_pinyin_tokenize": false,
"keep_none_chinese": true,
"keep_joined_full_pinyin": true,
"keep_none_chinese_in_joined_full_pinyin": true
}
}
}
},
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"analyzer": "default",
"fields": {
"keyword_pinyin": {
"type": "completion",
"analyzer": "full_pinyin_letter_analyzer"
},
"keyword_first_py": {
"type": "completion",
"analyzer": "first_py_letter_analyzer"
}
}
}
}
}
}
5.4 插入索引数据与completion suggest搜索
IGmdiSuggestService
package cnki.tpi.service.gmdi;
import java.util.List;
public interface IGmdiSuggestService {
/**
* 从txt文档插入数据到es suggest索引中
* @throws Exception
*/
void insertSuggest(String filePath) throws Exception;
/**
* 根据用户输入内容获取自动补全提示语
* @param searchValue
* @return
*/
List<String> searchCompletionSuggest(String searchValue);
}
GmdiSuggestServiceImpl
package cnki.tpi.service.gmdi.impl;
import cn.hutool.core.io.FileUtil;
import cnki.tpi.entity.dto.ElasticSuggestDto;
import cnki.tpi.service.IBaseElasticService;
import cnki.tpi.service.gmdi.IGmdiSuggestService;
import cnki.tpi.util.MD5;
import lombok.extern.slf4j.Slf4j;
import org.elasticsearch.search.builder.SearchSourceBuilder;
import org.elasticsearch.search.suggest.SuggestBuilder;
import org.elasticsearch.search.suggest.SuggestBuilders;
import org.elasticsearch.search.suggest.SuggestionBuilder;
import org.elasticsearch.search.suggest.completion.FuzzyOptions;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Service;
import java.io.BufferedReader;
import java.io.File;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.regex.Pattern;
/**
* ClassName: GmdiSuggestServiceImpl
* Description:
*
* @author 小黄
* date: 2020/4/13 16:05
*/
@Slf4j
@Service
public class GmdiSuggestServiceImpl implements IGmdiSuggestService {
@Autowired
private IBaseElasticService elasticService;
@Value("${elasticsearch.indexName}")
private String indexName;
@Override
public void insertSuggest(String filePath) throws Exception {
File file = FileUtil.touch(filePath);
BufferedReader reader = FileUtil.getReader(file, "UTF-8");
String suggest = null;
while ((suggest = reader.readLine()) != null) {
String id = MD5.md5(suggest);
HashMap<String, String> map = new HashMap<>();
map.put("suggest", suggest);
ElasticSuggestDto suggestDto = new ElasticSuggestDto(id, map);
elasticService.insertOrUpdateOne(indexName, suggestDto);
}
reader.close();
}
@Override
public List<String> searchCompletionSuggest(String searchValue) {
String field = "suggest";
if (checkLetter(searchValue)) {
field = "suggest.keyword_pinyin";
}
//用SearchSourceBuilder来构造查询请求体
List<String> result = new ArrayList<>();
result = elasticService.searchCompletionSuggest(indexName, getSearchSourceBuilder(field, searchValue, false));
if (result.size() == 0 && checkLetter(searchValue)) {
//拼音fuzzy查询
result = elasticService.searchCompletionSuggest(indexName, getSearchSourceBuilder(field, searchValue, true));
if (result.size() == 0) {
//首字母查询
result = elasticService.searchCompletionSuggest(indexName,
getSearchSourceBuilder("suggest.keyword_first_py", searchValue, false));
}
}
return result;
}
/**
* getCompletionSuggestionBuilder
*
* @param field
* @param value
* @return
*/
private SuggestionBuilder getCompletionSuggestionBuilder(String field, String value, Boolean isFuzzy) {
if (isFuzzy) {
return SuggestBuilders.completionSuggestion(field).prefix(value,
FuzzyOptions.builder().setFuzziness(2).build()).skipDuplicates(true).size(10);
} else {
return SuggestBuilders.completionSuggestion(field).prefix(value).skipDuplicates(true).size(10);
}
}
/**
* getSearchSourceBuilder
*
* @param field
* @param value
* @param isFuzzy 是否fuzzy模糊查询
* @return
*/
private SearchSourceBuilder getSearchSourceBuilder(String field, String value, Boolean isFuzzy) {
SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
SuggestionBuilder completionSuggestionBuilder = getCompletionSuggestionBuilder(field, value, isFuzzy);
SuggestBuilder suggestBuilder = new SuggestBuilder();
suggestBuilder.addSuggestion("search-suggest", completionSuggestionBuilder);
sourceBuilder.suggest(suggestBuilder);
return sourceBuilder;
}
/**
* 只包含字母
*
* @return 验证成功返回true,验证失败返回false
*/
private boolean checkLetter(String cardNum) {
String regex = "^[A-Za-z]+$";
return Pattern.matches(regex, cardNum);
}
/**
* 验是否中文
*
* @param chinese 中文字符
* @return 验证成功返回true,验证失败返回false
*/
public static boolean checkChinese(String chinese) {
String regex = "^[\u4E00-\u9FA5]+$";
return Pattern.matches(regex, chinese);
}
}
https://www.jianshu.com/p/12d791cd29c1
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html#completion-suggester
http://www.appblog.cn/2019/06/04/ElasticSearch%207.x%20%E9%9B%86%E6%88%90RestHighLevelClient/
https://blog.csdn.net/qq_28988969/article/details/79540620
https://blog.csdn.net/wwd0501/article/details/80885987
https://blog.csdn.net/qushaming/article/details/90479091
https://www.jianshu.com/p/9e2c6a8e1b54
https://blog.csdn.net/wwd0501/article/details/80595201
https://blog.csdn.net/lixiaohai_918/article/details/89569611
https://www.cnblogs.com/feiquan/p/11888812.html
https://blog.csdn.net/woshiaotian/article/details/41479099
https://www.jianshu.com/p/acc8e86cc772
https://www.cnblogs.com/yfb918/p/10784951.html
https://www.cnblogs.com/leeSmall/p/9220535.html
https://blog.csdn.net/qq_35981283/article/details/86627170
https://blog.csdn.net/wzygis/article/details/51698309
https://blog.csdn.net/csdn565973850/article/details/104772551/
https://www.cnblogs.com/lianliang/p/7953891.html
https://stackoverflow.com/questions/53823154/elasticsearch-java-high-level-rest-client-suggest-search
http://www.likecs.com/default/index/show?id=39948
https://blog.csdn.net/chengyuqiang/article/details/89841544