一.全文搜索概述
1.数据结构:
(1).结构化数据:指具有固定格式或有限长度的数据,如数据库,元数据等.
(2).非结构化数据:指不定长或无固定格式的数据,如邮件,Word文档.
2.数据搜索的方式:
(1).结构化数据:
数据库中用SQL,元数据利用操作系统本身的机制.
(2)非结构化数据
a.顺序扫描法:从头到尾顺序查找,对于小数据量文件而言,这种方法是最直接,最方便的.大文件的话就很慢了.
b.全文搜索:将非结构化数据中的一部分信息提取出来,重新组织,使其变得有一定结构,然后对次有一定结构的数据进行搜索,从而达到搜索相对较快的目的.这部分,从非结构化数据中提取出来的然后重新组织的信息,我们称为索引.这种先建立索引,再对索引进行搜索的过程就称为全文索引.
区别:顺序扫描每次都要扫描,而创建索引的过程仅仅需要一次,以后便一劳永逸.
3.全文搜索的步骤:
a.建立文本库; b.建立索引; c.执行搜索; d.过滤结果;
4.全文搜索的相关技术:
a.lucene; b.Elasticsearch; c.Solr
Lucene是搜索引擎,而Elasticsearch与Solr都是基于Lucene之上而实现的.区别在于1.Solr利用Zookeeper进行分布式管理,而Elasticsearch自身带有分布式协调管理功能; 2.Solr支持更多的格式的数据,如JSON,XML,CSV,而Elasticsearch仅支持JSON; 3.Solr官方提供的功能更多,而Elasticsearch本身更注重于核心功能,高级功能有第三方插件提供; 4.Solr在传统的搜索应用中表现更好,在处理实时搜索应用时效率明显低于Elasticsearch.
二.Elasticsearch核心概念
1.近实时:考虑到性能优化,Elasticsearch每个n秒自动刷新.实际上Elasticsearch索引新文档后,不会直接写入磁盘,而是首先写入文件系统缓存,之后根据刷新设置,定期同步到磁盘.
2.集群:集群是一个或多个节点的集合,用来保存应用的全部数据并提供基于全部节点的集成式索引和搜索功能.集群有利于系统性能的水平扩展和保障系统的可用性.每个Elasticsearch集群都需要有一个唯一的名称,默认是"elasticsearch".
3.节点:是一个集群中的单台服务器,用来保存数据并参与整个集群的索引和搜索操作,其有唯一一个名称标识,默认是一个随机的UUID,在单个集群中,开发人员可以拥有任意数量的节点.
4.索引:相似文档的集合
5.类型;对一个索引中包含的文档的进一步细分.
6.文档:进行索引的基本单位,与索引中的一个类型相对应.文档使用JSON格式来表示.
7.分片和副本Elasticsearch允许把索引分成多个分片(Shard)来存储索引的部分数据.#Elasticsearch会负责处理分片的分配和聚合,对于每一个分片中的数据,应该有至少一个副本.
三.Elasticsearch与Spring Boot集成
1.build.gradle
添加Spring Data Elasticsearch的依赖.在build.gradle文件中添加starter库即可.
compile('org.springframework.boot:spring-boot-starter-data-elasticsearch')
2.下载安装Elasticsearch(ubuntu)
a.去官网下载https://www.elastic.co/downloads/elasticsearch这里我下载的是tar
b.解压后在文件的bin目录下运行(win下打开elasticsearch.bat)
./elasticsearch
c.打开网址http://localhost:9200
会得到一下JSON数据,则启动成功
name "_lPqbC0"
cluster_name "elasticsearch"
cluster_uuid "fjSbVpIsQiOr423l7NGkjg"
version
number "6.3.2"
build_flavor "default"
build_type "tar"
build_hash "053779d"
build_date "2018-07-20T05:20:23.451332Z"
build_snapshot false
lucene_version "7.3.1"
minimum_wire_compatibility_version "5.6.0"
minimum_index_compatibility_version "5.0.0"
tagline "You Know, for Search"
配置Elasticsearch请参阅:https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html
3.Eclasticsearch实战
这是在hello-world项目上进行的,如果不会构建项目请参阅这篇博客https://blog.csdn.net/weixin_40132006/article/details/81218167.
a.修改application.properties,添加以下内容:
# Elasticsearch 服务地址
spring.data.elasticsearch.cluster-nodes=localhost:9300
# 设置连接超时时间
spring.data.elasticsearch.properties.transport.tcp.connect_timeout=120s
b,创建文档类
在com.libo.spring.boot.blog.domain.es包下创建文档类EsBlog,专门用于存储文档.
package com.libo.spring.boot.blog.domain.es;
import java.io.Serializable;
import org.springframework.data.annotation.Id;
import org.springframework.data.elasticsearch.annotations.Document;
@Document(indexName = "blog",type = "blog")
public class EsBlog implements Serializable{
private static final long serialVersionUID = 1L;
@Id
private String id;
private String summary;
private String content;
private String title;
protected EsBlog() {
//JPA的规范要求无参构造函数;设为protected防止直接使用
}
public EsBlog(String title,String summary,String content) {
this.title = title;
this.summary = summary;
this.content = content;
}
public String getId() {
return id;
}
public void setId(String id) {
this.id = id;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getSummary() {
return summary;
}
public void setSummary(String summary) {
this.summary = summary;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
public String toString() {
return String.format("User[id=%s, title='%s', summary='%s', content='%s']", id,title,summary,content);
}
}
c.创建资源库
在com.libo.spring.boot.blog.respository.es包下定义资源库的接口EsBlogRepository,
package com.libo.spring.boot.blog.repository.es;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.Pageable;
import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;
import com.libo.spring.boot.blog.domain.es.EsBlog;
public interface EsBlogRepository extends ElasticsearchRepository {
Page findByTitleContainingOrSummaryContainingOrContentContaining(String title,String summary,String content,Pageable pageable );
}
d.在test的目录下建立com.libo.spring.boot.blog.respository.es包,创建测试用例EsBlogRepositoryTest.
package com.libo.spring.boot.blog.repository.es;
import javax.swing.Spring;
import org.junit.Before;
import org.junit.Test;
import org.junit.runner.RunWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.test.context.junit4.SpringRunner;
import com.libo.spring.boot.blog.domain.es.EsBlog;
@RunWith(SpringRunner.class)
@SpringBootTest
public class EsBlogRepositoryTest {
@Autowired
private EsBlogRepository esBlogRepository;
@Before
public void initRepositoryData() {
//清除所有的数据
esBlogRepository.deleteAll();
//初始化数据
esBlogRepository.save(new EsBlog("Had I not seen the Sun",
"I could have borne the shade",
"But Light a newer Wilderness. My Wilderness has made."));
esBlogRepository.save(new EsBlog("There is room in the halls of pleasure",
"For a long and lordly train",
"But one by one we must all file on, Through the narrow aisles of pain."));
esBlogRepository.save(new EsBlog("When you are old",
"When you are old and grey and full of sleep",
"And nodding by the fire,take down this book."));
}
@Test
public void testFindDistinctEsBlogByTitleContainingOrSummaryContainingOrContentContaining() {
Pageable pageable = PageRequest.of(0, 20);
String title = "Sun";
String summary = "is";
String content = "down";
Page page = esBlogRepository.findByTitleContainingOrSummaryContainingOrContentContaining(title, summary, content, pageable);
System.out.println("------------start 1");
for(EsBlog blog : page) {
System.out.println(blog.toString());
}
System.out.println("------------end 1");
title = "the";
summary = "the";
content = "the";
page = esBlogRepository.findByTitleContainingOrSummaryContainingOrContentContaining(title, summary, content, pageable);
System.out.println("------------start 2");
for(EsBlog blog : page) {
System.out.println(blog.toString());
}
System.out.println("------------end 2");
}
}
Pageable pageable = PageRequest.of(0, 20);;是初始化一个分页请求.在执行测试用例之前,先在Elasticsearch的存储库中初始化三首诗,作为测试用的数据.而后执行完的结果从控制台进行输出.
------------start 1
User[id=WY18-mQBHJLIRX6qlM-w, title='Had I not seen the Sun', summary='I could have borne the shade', content='But Light a newer Wilderness. My Wilderness has made.']
User[id=W418-mQBHJLIRX6qlc9I, title='When you are old', summary='When you are old and grey and full of sleep', content='And nodding by the fire,take down this book.']
------------end 1
------------start 2
User[id=WY18-mQBHJLIRX6qlM-w, title='Had I not seen the Sun', summary='I could have borne the shade', content='But Light a newer Wilderness. My Wilderness has made.']
User[id=Wo18-mQBHJLIRX6qlc8i, title='There is room in the halls of pleasure', summary='For a long and lordly train', content='But one by one we must all file on, Through the narrow aisles of pain.']
User[id=W418-mQBHJLIRX6qlc9I, title='When you are old', summary='When you are old and grey and full of sleep', content='And nodding by the fire,take down this book.']
------------end 2
注意测试前要先打开Elasticsearch.
e.创建控制器
在com.libo.spring.boot.blog.controller包下创建构造器BlogController用于请求
package com.libo.spring.boot.blog.controller;
import java.util.List;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.domain.Page;
import org.springframework.data.domain.PageRequest;
import org.springframework.data.domain.Pageable;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RequestParam;
import org.springframework.web.bind.annotation.RestController;
import com.libo.spring.boot.blog.domain.es.EsBlog;
import com.libo.spring.boot.blog.repository.es.EsBlogRepository;
@RestController
@RequestMapping("/blogs")
public class BlogController {
@Autowired
private EsBlogRepository esBlogRepository;
@GetMapping
public List list(@RequestParam(value="title",required=false,defaultValue="") String title,
@RequestParam(value="summary",required=false,defaultValue="") String summary,
@RequestParam(value="content",required=false,defaultValue="") String content,
@RequestParam(value="pageIndex",required=false,defaultValue="0") int pageIndex,
@RequestParam(value="pageSize",required=false,defaultValue="10") int pageSize){
Pageable pageable = PageRequest.of(pageIndex, pageSize);
Page page = esBlogRepository.findByTitleContainingOrSummaryContainingOrContentContaining(title, summary, content, pageable);
return (List) page.getContent();
}
}
启动项目前要确保Elasticsearch服务器已经启动,启动之后要运行测试例来帮助初始化数据.可以用以下链接测试
http://localhost:8080/blogs?title=i&summary=love&content=you