建站过程中,为了方便笔记和文章内容的全文检索,考虑集成es,使用es的分词功能,实现站内的全文检索。
官网下载es压缩包,解压之后配置config中的yml文件:
cluster.name: legolas
node.name: node-1
http.port: 9200
# transport.tcp.port: 9300 集群节点使用端口
network.host: 127.0.0.1
# elasticsearch-head插件需要用到的跨域配置
http.cors.enabled: true
http.cors.allow-origin: "*"
执行elasticsearch.bat,浏览器访问http://localhost:9200/,不报错即成功。
为了查看方便,我们可以安装elasticsearch-head插件,首先安装nodeJs环境,
安装 grunt: npm install npm install -g grunt -cli 查看grunt版本 grunt -version
下载elasticsearch-head (禁止放在elasticsearch的plugins和modules目录下,放在根目录下即可),解压之后修改head/Gruntfile.js:
connect: {
server: {
options: {
port: 9100,
hostname: '*',
base: '.',
keepalive: true
}
}
}
安装elasticsearch-head:在elasticsearch-head目录下执行npm install
运行插件只要在head源代码目录下启动nodejs:grunt server
然后浏览器输入http://localhost:9100/即可。
若启动报错Gruntfile.js引起的,缺少以下包,执行以下命令:
npm install grunt-contrib-clean --registry=https://registry.npm.taobao.org
npm install grunt-contrib-concat --registry=https://registry.npm.taobao.org
npm install grunt-contrib-watch --registry=https://registry.npm.taobao.org
npm install grunt-contrib-connect --registry=https://registry.npm.taobao.org
npm install grunt-contrib-copy --registry=https://registry.npm.taobao.org
npm install grunt-contrib-jasmine --registry=https://registry.npm.taobao.org
安装ik分词器:
git上找到相应版本(版本必须一致),下载到本地打包mvn package,在target/release中得到zip,在plugins目录下新建ik文件夹,解压进去即可
pinying分词器操作一样 ik->pinyin
1.下载解压logstash之后,在根目录下新建mysql,将mysql-connector-java-5.1.27.jar拷贝进去,然后配置jdbc.conf,和同步的表的查询方式,如果是全量同步只需要select * from [table]即可
jdbc.conf配置:
# logstash同步mysql数据库到elasticsearch
input {
stdin {
}
jdbc {
type =>"t_article"
# mysql 数据库链接
jdbc_connection_string => "jdbc:mysql://192.168.1.131:3306/ds"
# 用户名和密码
jdbc_user => "root"
jdbc_password => "**********"
jdbc_driver_library => "E:\Program\logstash-6.7.0\mysql\mysql-connector-java-5.1.27.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
# 执行的sql 就是上一步创建的sql文件的绝对路径+文件名字
statement_filepath => "E:\Program\logstash-6.7.0\mysqletc\article.sql"
# 设置监听间隔 各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新
schedule => "* * * * *"
}
jdbc {
type =>"t_note"
# mysql 数据库链接
jdbc_connection_string => "jdbc:mysql://192.168.1.131:3306/ds"
# 用户名和密码
jdbc_user => "root"
jdbc_password => "**********"
jdbc_driver_library => "E:\Program\logstash-6.7.0\mysql\mysql-connector-java-5.1.27.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_paging_enabled => "true"
jdbc_page_size => "50000"
# 执行的sql 就是上一步创建的sql文件的绝对路径+文件名字
statement_filepath => "E:\Program\logstash-6.7.0\mysqletc\note.sql"
# 设置监听间隔 各字段含义(由左至右)分、时、天、月、年,全部为*默认含义为每分钟都更新
schedule => "* * * * *"
}
}
filter {
json {
source => "message"
remove_field => ["message"]
}
}
output {
if [type]=="t_article"{
elasticsearch {
# ES的IP地址及端口
hosts => ["192.168.1.131:9200"]
index => "article"
#user => "elastic"
#password => "123456"
# 索引名称
# 自增ID id必须是待查询的数据表的序列字段
document_id => "%{id}"
}
}
if [type]=="t_note"{
elasticsearch {
hosts => ["192.168.1.131:9200"]
index => "note"
document_id => "%{id}"
}
}
stdout {
# JSON格式输出
codec => json_lines
}
}
note.sql :
select *fromt_notewherelocked=0
2.运行logstash(进入到logstash/bin目录下):logstash -f …/mysql/jdbc.conf
logstash就会自动同步mysql的数据到elasticsearch中了。
首先spring集成elasticsearch需要添加 elasticsearch的transport 依赖
org.elasticsearch.client
transport
6.6.0
jackson-core
com.fasterxml.jackson.core
然后我们定义TransportClient(连接es客户端的java api,这里我们把它交给spring容器去管理);TransportClient作为一个外部访问者,请求ES的集群,对于集群而言,它是一个外部因素,不会影响集群的运行。
@Configuration
public class MyConfig {
@Bean
public TransportClient client() throws UnknownHostException {
TransportAddress node = new TransportAddress(
InetAddress.getByName("192.168.1.118"), 9300
);
//.put("client.transport.sniff", true)自动嗅探整个集群的状态,把集群中其他ES节点的ip添加到本地的客户端列表中
Settings settings = Settings.builder().put("cluster.name", "legolas").build();
TransportClient client = new PreBuiltTransportClient(settings);
client.addTransportAddress(node);
return client;
}
}
然后在业务层调用TransportClient实例对es进行增删改查操作。
public String queryArticles(@PathVariable("content") String content, @PathVariable("num") Integer num, ModelMap model) {
try {
List contentSearchTerm = ESUtils.handlingSearchContent(client, "article", content);
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.termQuery("user_id", Consts.DEFAULT_LEGOLASID));
boolQuery.must(QueryBuilders.termQuery("locked", 0));
SearchRequestBuilder builder = client.prepareSearch("article").setTypes("doc");
builder.setQuery(boolQuery);
BoolQueryBuilder termQuery = QueryBuilders.boolQuery();
if (contentSearchTerm != null && contentSearchTerm.size() > 0) {
for (String searchTerm : contentSearchTerm) {
termQuery.should(QueryBuilders.matchPhrasePrefixQuery("title", searchTerm))
.should(QueryBuilders.matchPhrasePrefixQuery("content_text", searchTerm));
}
}
boolQuery.must(termQuery);
builder.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(boolQuery)
.setFrom(10 * (num - 1))
.setSize(10)
.addSort("update_time", SortOrder.DESC);
SearchResponse responseTemp = builder.execute().actionGet();
long total = responseTemp.getHits().getTotalHits() / 10 == 0 ? responseTemp.getHits().getTotalHits() / 10 : responseTemp.getHits().getTotalHits() / 10 + 1;
if (total == 0) {
total = 1;
}
List result = new ArrayList<>();
SearchResponse response = builder.get();
for (SearchHit hit : response.getHits()) {
try {
ArticleSum recogcardInfo = MyHightlightBuilder.setArticleSumHighlighter(hit, contentSearchTerm);
result.add(recogcardInfo);
} catch (Exception e) {
e.printStackTrace();
}
}
model.addAttribute("articles", result);
model.addAttribute("pageNum", num);
model.addAttribute("totalPage", total);
} catch (Exception e) {
//若elastic异常,则改用数据库查询标题匹配
PageInfo result = articleService.listUnLockedArticlesByUserIdAndTitleWithPage(Consts.DEFAULT_LEGOLASID, content, 1);
List atcs = result.getList();
MyHightlightBuilder.setArticleTitleHighlighter(atcs, content);
model.addAttribute("articles", atcs);
model.addAttribute("pageNum", result.getPageNum());
model.addAttribute("totalPage", result.getPages());
e.printStackTrace();
return "views/index";
}
return "views/index";
}
得到请求参数content字段以后我们先用ES的分词器对其进行分词操作:在ESUtils类中我们定义handlingSearchContent方法,在getIkAnalyzeSearchTerms中调用elasticsearch的ik分词器进行分词。
public static List handlingSearchContent(TransportClient client, String index, String searchContent) {
List searchTermResultList = new ArrayList<>();
try {
// 按逗号分割,获取搜索词列表
List searchTermList = Arrays.asList(searchContent.split(","));
// 如果搜索词大于 1 个字,则经过 IK 分词器获取分词结果列表
searchTermList.forEach(searchTerm -> {
// 搜索词 TAG 本身加入搜索词列表,并解决 will 这种问题
searchTermResultList.add(searchTerm);
// 获取搜索词 IK 分词列表
searchTermResultList.addAll(getIkAnalyzeSearchTerms(client, index, searchTerm));
});
} catch (Exception e) {
e.printStackTrace();
}
return searchTermResultList;
}
public static List getIkAnalyzeSearchTerms(TransportClient client, String index, String searchContent) {
//TODO
AnalyzeRequestBuilder ikRequest = new AnalyzeRequestBuilder(client,
AnalyzeAction.INSTANCE, index, searchContent);
ikRequest.setTokenizer("ik_smart");//使用ik智能分词
//ikRequest.setTokenizer("ik_max_word");
List ikTokenList = ikRequest.execute().actionGet().getTokens();
// 循环赋值
List searchTermList = new ArrayList<>();
ikTokenList.forEach(ikToken -> searchTermList.add(ikToken.getTerm()));
return handlingIkResultTerms(searchTermList);
}
private static List handlingIkResultTerms(List searchTermList) {
//这里我们只保留词,可以自己定义
List phraseList = new ArrayList<>();
searchTermList.forEach(term -> {
if (term.length() > 1) {
phraseList.add(term);
}
});
return phraseList;
}
QueryBuilders.boolQuery用于组合多个叶子或复合查询子句的默认查询。其中must方法类似与操作,should类似或操作;matchPhrasePrefixQuery用于短语前缀匹配查询,分词后的短语通过循环组合到boolQuery中,最终通过SearchRequestBuilder调用这些组合语句进行查询。
我们自定义工具处理类对查询结果进行查询关键字高亮处理,新建MyHightlightBuilder类,定义setNoteHighlighter方法(针对不同的实体内容处理可以定义不同的方法):
public static Note setNoteHighlighter(SearchHit hit, List searchTerms) {
String sourceAsString = hit.getSourceAsString();
//将json串值转换成对应的实体对象
Note recogcardInfo = JSON.parseObject(sourceAsString, Note.class);
//获取对应的高亮域
StringBuilder sb = new StringBuilder(recogcardInfo.getContent());
for (String searchTerm : searchTerms) {
int n = StringUtils.appearNumber(sb.toString(), searchTerm);
for (int i = 1; i <= n; i++) {
sb.insert(StringUtils.positionAppearN(sb.toString(), searchTerm, i), PRE_TAG[4]);
sb.insert(StringUtils.positionAppearN(sb.toString(), searchTerm, i) + searchTerm.length(), END_TAG);
}
}
recogcardInfo.setContent(sb.toString());
return recogcardInfo;
}
在业务层通过循环调用高亮方法处理查询的结果:
SearchResponse response = builder.get();
for (SearchHit hit : response.getHits()) {
//直接获取源数据用:hit.getSourceAsMap();
//将文档中的每一个对象转换json串值
Note recogcardInfo = MyHightlightBuilder.setNoteHighlighter(hit, contentSearchTerm);
result.add(recogcardInfo);
}
对elasticsearch的更新操作:
UpdateRequest update = new UpdateRequest("article", "doc", article.getId().toString());
XContentBuilder builder = XContentFactory.jsonBuilder().startObject();
builder.field("title", article.getTitle())
.field("summary", article.getSummary())
.field("content", article.getContent())
.field("content_text", article.getContentText())
.field("count", article.getCount())
.field("locked", article.getLocked());
builder.endObject();
update.doc(builder);
client.update(update);
对elasticsearch的删除操作:
client.prepareDelete("article", "doc", id.toString());