在Java的全文检索里面有solr
和elasticsearch
两大高级玩意
首先我们来看看他们的区别:
1)Solr建立索引时候,搜索效率下降,实时搜索效率不高,es实时搜索效率高
2)Solr利用Zookeeper进行分布式管理,而Elasticsearch自身带有分布式协调管理功能。
3)Solr支持更多格式的数据,比如JSON、XML、CSV,而Elasticsearch仅支持json文件格式。
4)Solr官方提供的功能更多,而Elasticsearch本身更注重于核心功能,高级功能多有第三方插件提供
5)Solr在传统的搜索应用中表现好于Elasticsearch,但在处理实时搜索应用时效率明显低于Elasticsearch。
6)Solr是传统搜索应用的有力解决方案,但Elasticsearch更适用于新兴的实时搜索应用
。
因此我们选择Elasticsearch
因为elasticsearch出于系统安全考虑,不能使用root启动.所以要建立其他的普通用户.
创建用户dev,密码:123456
[root@dev-2 ~]# useradd dev
[root@dev-2 ~]# passwd dev
Changing password for user dev.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[root@dev-2 ~]# id dev
下载:
[root@dev-2 elasticsearch]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-
6.3.0.tar.gz
--2018-09-05 11:32:45-- https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.0.tar.gz
Resolving artifacts.elastic.co (artifacts.elastic.co)... 107.21.253.15, 184.72.242.47, 107.21.237.95,
...
Connecting to artifacts.elastic.co (artifacts.elastic.co)|107.21.253.15|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 91423553 (87M) [application/x-gzip]
Saving to: ‘elasticsearch-6.3.0.tar.gz’
100%
[=======================================================================================================
==============================================>] 91,423,553 947KB/s in 1m 55s
2018-09-05 11:34:31 (776 KB/s) - ‘elasticsearch-6.3.0.tar.gz’ saved [91423553/91423553]
[root@dev-2 elasticsearch]#
解压
[root@dev-2 elasticsearch]# tar -xvf elasticsearch-6.3.0.tar.gz
配置环境变量
[root@dev-2 elasticsearch-6.3.0]# vim /etc/profile
# import elasticsearch
export SEARCH_HOME='/opt/soft/elasticsearch/elasticsearch-6.3.0'
export PATH=$PATH:$SEARCH_HOME/bin
[root@dev-2 elasticsearch-6.3.0]# source /etc/profile
设置所有者为dev
[root@dev-2 elasticsearch]# ll
total 89284
drwxr-xr-x. 8 root root 143 Sep 05 11:43 elasticsearch-6.3.0
[root@dev-2 elasticsearch]# chown -R dev:dev elasticsearch-6.3.0
[root@dev-2 elasticsearch]# ll
total 89284
drwxr-xr-x. 8 dev dev 143 Sep 05 11:43 elasticsearch-6.3.0
[root@dev-2 elasticsearch]
修改JVM参数
-Xms和-Xmx的值要一致.
[root@dev-2 elasticsearch-6.3.0]# cd config/
[root@dev-2 config]# ls
elasticsearch.yml jvm.options log4j2.properties role_mapping.yml roles.yml users users_roles
[root@dev-2 config]# vim jvm.options
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms1g
-Xmx1g
[root@dev-2 config]#
修改启动参数
[root@dev-2 config]# vim elasticsearch.yml
# Set the bind address to a specific IP (IPv4 or IPv6):
#使用 云服务器时用以下地址;若为本地服务器时则使用本地服务器地址;
network.host: 0.0.0.0
#
# Set a custom port for HTTP:
#
http.port: 9200
#
# 开启跨域访问支持,默认为false
#
http.cors.enabled: true
#
# 跨域访问允许的域名地址,(允许所有域名)以上使用正则
#
http.cors.allow-origin: /.*/
[root@dev-2 config]#
切换到dev启动
[root@dev-2 elasticsearch-6.3.0]# su dev
[dev@dev-2 elasticsearch-6.3.0]$ elasticsearch
[2018-09-05T11:50:22,771][INFO ][o.e.n.Node ] [] initializing ...
但启动报错,如下
[2018-09-05T11:50:40,429][INFO ][o.e.b.BootstrapChecks ] [eme3ye6] bound or publishing to a nonloopback address, enforcing bootstrap checks
ERROR: [4] bootstrap checks failed
[1]: initial heap size [268435456] not equal to maximum heap size [536870912]; this can cause resize
pauses and prevents mlockall from locking the entire heap
[2]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
[3]: max number of threads [3852] for user [dev] is too low, increase to at least [4096]
[4]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
[2018-09-05T11:50:40,506][INFO ][o.e.n.Node ] [eme3ye6] stopping ...
[2018-09-05T11:50:40,606][INFO ][o.e.n.Node ] [eme3ye6] stopped
[2018-09-05T11:50:40,606][INFO ][o.e.n.Node ] [eme3ye6] closing ...
[2018-09-05T11:50:40,619][INFO ][o.e.n.Node ] [eme3ye6] closed
请切换到root,修改 /etc/security/limits.conf
文件
[root@dev-2 elasticsearch-6.3.0]# vim /etc/security/limits.conf
#@student - maxlogins 4
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
# End of file
[root@dev-2 elasticsearch-6.3.0]#
修改/etc/sysctl.conf
文件
[root@dev-2 etc]# vim /etc/sysctl.conf
vm.max_map_count=655360
[root@dev-2 etc]# sysctl -p
vm.max_map_count = 655360
再切换到dev
用户启动elasticsearch
[dev@dev-2 elasticsearch-6.3.0]$ nohup elasticsearch &
[dev@dev-2 elasticsearch-6.3.0]$ cat nohup.out
....
[2018-09-05T12:15:27,438][INFO ][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [eme3ye6]
publish_address {47.98.143.72:9200}, bound_addresses {47.98.143.72:9200}
[2018-09-05T12:15:27,438][INFO ][o.e.n.Node ] [eme3ye6] started
[2018-09-05T12:15:27,949][WARN ][o.e.x.s.a.s.m.NativeRoleMappingStore] [eme3ye6] Failed to clear cache
for realms [[]]
[2018-07-04T10:17:37,982][INFO ][o.e.l.LicenseService ] [eme3ye6] license [36a89374-568c-4f0d-83ce-
37475ca3015c] mode [basic] - valid
[2018-07-04T10:17:37,996][INFO ][o.e.g.GatewayService ] [eme3ye6] recovered [0] indices into
cluster_state
....
[dev@dev-2 elasticsearch-6.3.0]$ jps
21874 Jps
4969 Elasticsearch
[dev@dev-2 elasticsearch-6.3.0]# netstat -lnp|grep 9200
tcp 0 0 0.0.0.0:9200 0.0.0.0:* LISTEN 4969/java
[dev@dev-2 elasticsearch-6.3.0]# curl 47.98.143.72:9200
{
"name" : "jDY6CwJ",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "LhuAiJqpRVqKOxIUZ0Xa3Q",
"version" : {
"number" : "6.3.0",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "424e937",
"build_date" : "2018-06-11T23:38:03.357887Z",
"build_snapshot" : false,
"lucene_version" : "7.3.1",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}
**Github下载和Elasticsearch相同版本的IK分词器: **
IK分词器下载地址
在 plugins/
文件夹下建立ik
文件夹,并把压缩包内容解压到 ik 文件夹.
[root@dev-2 plugins]# pwd
/opt/soft/elasticsearch/elasticsearch-6.3.0/plugins
[dev@dev-2 plugins]$ mkdir ik
[dev@dev-2 plugins]$ cd ik
[dev@dev-2 ik]$ unzip elasticsearch-analysis-ik-6.3.0.zip -d .
[dev@dev-2 ik]$ ls
commons-codec-1.9.jar config httpclient-4.5.2.jar plugindescriptor.properties
commons-logging-1.2.jar elasticsearch-analysis-ik-6.3.0.jar httpcore-4.4.4.jar
重新启动ElasticSear
[dev@dev-2 elasticsearch-6.3.0]$ nohup bin/elasticsearch &
[1] 17149
[dev@dev-2 elasticsearch-6.3.0]$ jps
17236 Jps
17149 Elasticsearch
[dev@dev-2 elasticsearch-6.3.0]$ cat nohup.out |grep ik
[2018-09-05T13:42:57,317][INFO ][o.e.p.PluginsService ] [eme3ye6] loaded plugin [analysis-ik]
[2018-09-05T13:43:21,229][INFO ][o.w.a.d.Monitor ] try load config from
/opt/soft/elasticsearch/elasticsearch-6.3.0/config/analysis-ik/IKAnalyzer.cfg.xml
[2018-09-05T13:43:49,372][INFO ][o.w.a.d.Monitor ] try load config from
/opt/soft/elasticsearch/elasticsearch-
6.3.0/plugins/ik/config/https://blog.csdn.net/z90818/article/details/78644293.cfg.xml
[dev@dev-2 elasticsearch-6.3.0]$
Q: 如何使用IK分词器对自己需要存放的数据进行分词和查询?
A: 在创建index和type的时候,就指定type里面的字段是否需要分词.就像我们在mysql数据库先搭建表结构一样
例如: 指定Index为 testindex,type为 testtype里面的字段 name使用 ik_max_word 分词存储和搜索.
curl -XPOST http://localhost:9200/testindex/testtype/_mapping -H 'Content-Type:application/json' -d'
{
"properties": {
"name": {
"type": "text",
"analyzer": "ik_max_word",
"search_analyzer": "ik_max_word"
}
}
}'
详细例子可见: ik-github例子
注:该系统所用的数据由于数据量较大,则由本人事先从数据库读取数据存到ElasticSearch,下面会做详细描述以便后期检索需要
该检索系统使用Spring Boot结合Jest +mysql快速实现对阿里云ElasticSearch的全文检索功能。
主要使用组件:
Jest:一种rest访问es的客户端
elasticsearch:实现全文检索
thymeleaf:web前端模版框架
a) maven配置
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0modelVersion>
<groupId>cheng.demo.springcloudgroupId>
<artifactId>jest-elasticsearchartifactId>
<version>0.0.1-SNAPSHOTversion>
<packaging>jarpackaging>
<name>jest-elasticsearchname>
<url>http://maven.apache.orgurl>
<properties>
<project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
properties>
<parent>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-parentartifactId>
<version>1.5.6.RELEASEversion>
parent>
<dependencies>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-webartifactId>
dependency>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-data-elasticsearchartifactId>
dependency>
<dependency>
<groupId>io.searchboxgroupId>
<artifactId>jestartifactId>
dependency>
<dependency>
<groupId>net.java.dev.jnagroupId>
<artifactId>jnaartifactId>
dependency>
<dependency>
<groupId>javax.persistencegroupId>
<artifactId>persistence-apiartifactId>
<version>1.0version>
dependency>
<dependency>
<groupId>org.mybatis.spring.bootgroupId>
<artifactId>mybatis-spring-boot-starterartifactId>
<version>1.2.0version>
dependency>
<dependency>
<groupId>tk.mybatisgroupId>
<artifactId>mapperartifactId>
<version>3.3.9version>
dependency>
<dependency>
<groupId>org.mybatis.generatorgroupId>
<artifactId>mybatis-generator-maven-pluginartifactId>
<version>1.3.5version>
dependency>
<dependency>
<groupId>mysqlgroupId>
<artifactId>mysql-connector-javaartifactId>
<version>5.1.40version>
dependency>
<dependency>
<groupId>org.springframeworkgroupId>
<artifactId>spring-jdbcartifactId>
<version>4.3.6.RELEASEversion>
dependency>
<dependency>
<groupId>com.alibabagroupId>
<artifactId>druidartifactId>
<version>1.0.11version>
dependency>
<dependency>
<groupId>junitgroupId>
<artifactId>junitartifactId>
<version>3.8.1version>
<scope>testscope>
dependency>
<dependency>
<groupId>junitgroupId>
<artifactId>junitartifactId>
<version>3.8.2version>
dependency>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-devtoolsartifactId>
<optional>trueoptional>
dependency>
<dependency>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-starter-thymeleafartifactId>
dependency>
<dependency>
<groupId>org.webjarsgroupId>
<artifactId>jqueryartifactId>
<version>3.3.0version>
dependency>
<dependency>
<groupId>org.webjarsgroupId>
<artifactId>bootstrapartifactId>
<version>4.0.0version>
dependency>
dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.bootgroupId>
<artifactId>spring-boot-maven-pluginartifactId>
<configuration>
<fork>truefork>
configuration>
plugin>
plugins>
build>
project>
b) yml配置文件
server:
port: 7081
spring:
elasticsearch:
jest:
uris:
- http://47.98.143.72:9200
read-timeout: 5000
datasource:
driver-class-name: com.mysql.jdbc.Driver
url: jdbc:mysql://10.10.2.233:3306/irs_cms
username: root
password: xxxxxx
thymeleaf:
cache: false
mybatis:
mapper-locations: classpath:mapper/*.xml
type-aliases-package: com.cheng.elasticsearch.entity
@Document(indexName = BusVod.INDEX, type = BusVod.ORDER_TYPE, shards = 6, replicas = 2, refreshInterval = "-1")
public class BusVod implements Serializable{
private static final long serialVersionUID = -763638353551774166L;
//建立索引
public static final String INDEX = "movie-test";
//类型
public static final String ORDER_TYPE = "movie-type";
private String name;
private String director;
private String actor;
private String releaseTime;
private String provider;
private String vodDesc;
private String screenWriter;
// 电影描述,可以通过ik 分词器进行分词
@Field(type = FieldType.String, searchAnalyzer = "ik", analyzer = "ik")
private String summaryLong;
public BusVod(String name, String director, String actor, String releaseTime, String provider, String vodDesc, String screenWriter, String summaryLong) {
//省略
}
public BusVod() {
}
//getter setter 省略...
}
a) java类(BusVodMapper.java)
@Repository
public interface BusVodMapper extends Mapper<BusVod> {
//从数据库查询所有的电影资源数据
List<BusVod> findList();
}
b) mybatis-xml文件(BusVodMapper.xml)
<mapper namespace="com.cheng.elasticsearch.mapper.BusVodMapper">
<resultMap id="resMap" type="com.cheng.elasticsearch.entity.BusVod">
<result column="name" property="name" jdbcType="VARCHAR" />
<result column="director" property="director" jdbcType="VARCHAR" />
<result column="actor" property="actor" jdbcType="VARCHAR" />
<result column="release_time" property="releaseTime" jdbcType="VARCHAR" />
<result column="provider" property="provider" jdbcType="VARCHAR" />
<result column="vod_desc" property="vodDesc" jdbcType="VARCHAR" />
<result column="screen_writer" property="screenWriter" jdbcType="VARCHAR"/>
<result column="summary_long" property="summaryLong" jdbcType="VARCHAR"/>
resultMap>
<sql id="column">
bus_vod.name name,
bus_vod.director director,
bus_vod.actor actor,
bus_vod.release_time release_time,
bus_vod.provider provider,
bus_vod.vod_desc vod_desc,
bus_vod.screen_writer screen_writer,
bus_vod.summary_long summary_long
sql>
<select id="findList" resultMap="resMap">
select
<include refid="column">include>
from bus_vod
select>
mapper>
a) 接口
public interface BusVodService {
//从数据库导入数据到elasticsearch
List<BusVod> findList();
void saveBusvod(List<BusVod> busVodList);
//根据输入的值搜索复合的内容
List<BusVod> searchEntity(String keyword);
}
b) 实现类
@Service
public class BusVodServiceImpl implements BusVodService {
private static final Logger LOGGER = LoggerFactory.getLogger(BusVodServiceImpl.class);
@Autowired
private JestClient jestClient;
@Autowired
private BusVodMapper mapper;
@Override
public List<BusVod> findList() {
List<BusVod> busVods = mapper.findList();
return busVods;
}
/**
* 批量保存内容到ES
*/
@Override
public void saveBusvod(List<BusVod> busVodList) {
Bulk.Builder bulk = new Bulk.Builder();
for(BusVod busVod : busVodList) {
Index index = new Index.Builder(busVod).index(BusVod.INDEX).type(BusVod.ORDER_TYPE).build();
bulk.addAction(index);
}
try {
jestClient.execute(bulk.build());
LOGGER.info("ES 插入完成");
} catch (IOException e) {
e.printStackTrace();
LOGGER.error(e.getMessage());
}
}
/**
* 通过一个关键字搜索对应的电影资源
* @param keyword
* @return
*/
@Override
public List<BusVod> searchEntity(String keyword) {
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
if (keyword != null) {
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("actor",keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("director", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("name", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("provider", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("releaseTime", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("screenWriter", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("summaryLong", keyword));
boolQueryBuilder.should(QueryBuilders.commonTermsQuery("vodDesc", keyword));
}
HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("summaryLong");//高亮summaryLong
highlightBuilder.field("name");
highlightBuilder.preTags("").postTags("");//高亮标签
highlightBuilder.fragmentSize(500);//高亮内容长度
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.highlight(highlightBuilder);
searchSourceBuilder.query(boolQueryBuilder).size(10); //需要检索的数据条数,这里可以设计为分页
Search search = new Search.Builder(searchSourceBuilder.toString())
.addIndex(BusVod.INDEX).addType(BusVod.ORDER_TYPE).build();
try {
SearchResult result = jestClient.execute(search);
System.out.println("本次查询共查到:"+result.getTotal()+"部电影!");
List<SearchResult.Hit<BusVod,Void>> hits = result.getHits(BusVod.class);
System.out.println(hits.size());
List<BusVod> busVodlists = new ArrayList<BusVod>();
for (SearchResult.Hit<BusVod, Void> hit : hits) {
BusVod source = hit.source;
//获取高亮后的内容
Map<String, List<String>> highlight = hit.highlight;
if (highlight!=null){
List<String> summaryLong = highlight.get("summaryLong");//高亮后的summaryLong
if(summaryLong!=null){
source.setSummaryLong(summaryLong.get(0));
}
List<String> name = highlight.get("name");//高亮后的name
if(name!=null){
source.setName(name.get(0));
}
}
BusVod busVod = new BusVod();
busVod.setName(source.getName());
busVod.setSummaryLong(source.getSummaryLong());
busVod.setActor(source.getActor());
busVod.setDirector(source.getDirector());
busVod.setScreenWriter(source.getScreenWriter());
busVod.setVodDesc(source.getVodDesc());
busVod.setReleaseTime(source.getReleaseTime());
busVod.setProvider(source.getProvider());
busVodlists.add(busVod);
}
return busVodlists;
} catch (IOException e) {
LOGGER.error(e.getMessage());
e.printStackTrace();
}
return null;
}
}
调用Service接口,数据返回到页面(VodController)
@Controller
@RequestMapping("/bus/vod")
public class VodController {
@Autowired
private BusVodService busVodService;
//利用fiddler工具或者postman工具调用该方法可实现数据库数据导入到elasticsearch中
@RequestMapping("/list")
public String getList(HttpServletRequest request) {
List<BusVod> busVods = busVodService.findList();
List<BusVod> addList = new ArrayList<BusVod>();
for(BusVod busVod : busVods){
BusVod newEntity = new BusVod(busVod.getName(),busVod.getDirector(),busVod.getActor(),
busVod.getReleaseTime(), busVod.getProvider(),busVod.getVodDesc(), busVod.getScreenWriter(), busVod.getSummaryLong());
addList.add(newEntity);
}
busVodService.saveBusvod(addList);
return "";
}
//实现全文检索功能
@RequestMapping(value = "/query", method = RequestMethod.GET)
public String query(@RequestParam(name = "keyword", required = false) String keyword,
ModelMap map ) {
List<BusVod> busVodList= busVodService.searchEntity(keyword);
List<Map<String, Object>> result = new ArrayList<Map<String, Object>>();
map.addAttribute("movieLists",busVodList);
return "moviesList";
}
}
前端页面使用的是thymeleaf模板(moviesList.html)
<html lang="en" xmlns:th="http://www.thymeleaf.org">
<head>
<meta charset="UTF-8"/>
<title>电影资源检索title>
<link rel='stylesheet' href='/webjars/bootstrap/css/bootstrap.min.css'/>
<script src="/webjars/jquery/jquery.min.js">script>
<script src="/webjars/bootstrap/js/bootstrap.min.js">script>
head>
<body style="width: 1000px; margin-left: 200px;">
<form action="/bus/vod/query" class="px-5 py-3" >
<div class="input-group">
<input name="keyword" type="text" class="form-control" placeholder="请输入搜索内容" aria-label="请输入搜索内容" aria-describedby="basic-addon2"/>
<div class="input-group-append">
<button class="btn btn-outline-secondary" type="submit">搜索button>
div>
div>
form>
<ul class="list-group">
<li th:each="movie : ${movieLists}" class="list-group-item">
<div class="row">
<a th:href="${movie.getName()}">
<h4 scope="row" th:utext="${movie.name}" >h4>
a>
<h6 scope="row" th:text="${movie.getActor()}" class="align-bottom" >h6>
div>
<div class="row">
<h6 scope="row" th:utext="${movie.summaryLong}"/>
div>
li>
<hr/>
ul>
body>
html>