全文搜索(检索),工作原理:计算机索引程序,扫描文章中的每一个词,对每一个词建立一个索引,指明出现次数和位置。查询时通过索引进行查找,类似于查字典。
因为是通过索引在查,速度较于通过sql查,会快很多。
具体过程如下:
1、建文本库
2、建立索引
3、执行搜索
4、过滤结果
Lucene:https://lucene.apache.org/core/
Solr:https://solr.apache.org/
Elasticsearch:https://www.elastic.co/cn/elasticsearch
Lucene是搜索引擎,Elasticsearch和Solr都是基于Lucene之上实现的全文检索系统
Elasticsearch和Solr对比,版本比较老,做参考即可
一个高度可扩展的开源全文搜索和分析引擎,它允许用户快速地、近实时地对大数据进行存储、搜索和分析,它通常用来支撑有复杂的数据搜索需求的企业级应用 。
近实时,而不是实时
索引文档到可搜索的时间有一个轻微的延迟(通常为1秒)。之所以会有这个延时,主要考虑查询的性能优化。
想要实时,就得刷新,要么是牺牲索引的效率(每次索引之后刷新),要么就是牺牲查询的效率(每次查询之前都进行刷新 ),Elasticsearch取了折中,每隔n秒自动刷新
Elasticsearch 索引新文档后,不会直接写入磁盘,而是首先存入文件系统缓存,之后根据刷新设置,定期同步到磁盘。索引我们改完内容不会立即被搜索出来,但是会在1秒内可见
相似文档的集合
对一个索引中包含的文档进一步细分
索引的基本单位,与索引中的一个类型相对应
数据量较大时,把索引分成多个分片来存储索引的部分数据,提高性能/吞吐量
为了安全,一个分片中的数据至少有一个副本
https://www.elastic.co/cn/downloads/elasticsearch
注意版本,spring-boot2.x,不要用最新版本,用7.x.x
命令行进入bin目录,执行elasticsearch启动服务,Ctrl/command + C停止服务
启用localhost:9200,测试Elasticsearch节点是否正在运行,可能会遇到安全认证问题,见问题部分
{
"name": "zhangxingxingdeMacBook-Pro.local",
"cluster_name": "elasticsearch",
"cluster_uuid": "DwgXhzhwQ9WS0drElcEZmg",
"version": {
"number": "7.11.1", // 当前elasticsearch版本
"build_flavor": "default",
"build_type": "tar",
"build_hash": "ff17057114c2199c9c1bbecc727003a907c0db7a",
"build_date": "2021-02-15T13:44:09.394032Z",
"build_snapshot": false,
"lucene_version": "8.7.0", //lucene版本
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
@Document(indexName = "blog")
@Table(name = "article")
public class EsBlog implements Serializable {
private static final long serialVersionUID = 1L;
@Id // 主键
private String id;
private String title;
private String author;
private String content;
protected EsBlog(){}
public EsBlog(String title, String author, String content){
this.title = title;
this.author = author;
this.content = content;
}
......
@Override
public String toString(){
return String.format(
"Article[id=%s, title='%s', author='%s', content='%s']",
id, title, author, content
);
}
}
@Repository
public interface EsBlogRepository extends ElasticsearchRepository<EsBlog, String> {
Page<EsBlog> findByTitleContainingOrAuthorContainingOrContentContaining(String title, String author, String content, Pageable pageable);
}
注意在创建启动类中进行包扫描,否则注入的时候找不到bean
@EnableJpaRepositories(basePackages = "com.xxx.xxx")
@RunWith(SpringRunner.class)
@SpringBootTest(classes= SpringApplicationSock.class) // 启动sping-boot,引入IOC
public class EsBlogRepositoryTest {
@Autowired
private EsBlogRepository esBlogRepository;
@Before
public void initRepositoryData(){
// 清除所有数据
esBlogRepository.deleteAll();
// 初始化数据,存入es存储库
esBlogRepository.save(new EsBlog("静夜思", "李白", "床前明月光,疑是地上霜。举头望明月,低头思故乡。"));
esBlogRepository.save(new EsBlog("咏柳", "贺知章", "碧玉妆成一树高,万条垂下绿丝绦。不知细叶谁裁出,二月春风似剪刀。"));
esBlogRepository.save(new EsBlog("悯农", "李绅", "锄禾日当午,汗滴禾下土。谁知盘中餐,粒粒皆辛苦。"));
}
@Test
public void testFindDistincEsBlogTitleContainingOrSummaryContainingOrContentContaining(){
// 初始化一个分页请求
Pageable pageable = PageRequest.of(0, 20);
String title = "咏";
String author = "王";
String content = "月";
Page<EsBlog> page = esBlogRepository.findByTitleContainingOrAuthorContainingOrContentContaining(title, author, content, pageable);
System.out.println("=================start");
for(EsBlog blog : page){
System.out.println(blog.toString());
}
System.out.println("=================end");
}
}
查看存储库
http://localhost:9200/_cat/indices?v=
上述内容通过查询条件,只能查出两条数据
查看blog相关信息
http://localhost:9200/blog
{
"blog": {
"aliases": {},
"mappings": {
"properties": {
"_class": {
"type": "keyword",
"index": false,
"doc_values": false
},
"author": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"content": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"title": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
},
"settings": {
"index": {
"routing": {
"allocation": {
"include": {
"_tier_preference": "data_content"
}
}
},
"refresh_interval": "1s",
"number_of_shards": "1",
"provided_name": "blog",
"creation_date": "1703233943853",
"store": {
"type": "fs"
},
"number_of_replicas": "1",
"uuid": "0ELJkqnmTg-tDwritULELA",
"version": {
"created": "7110199"
}
}
}
}
}
@RestController
@RequestMapping("/blogs")
public class EsBlogController {
@Autowired
private EsBlogRepository esBlogRepository;
@GetMapping
public List<EsBlog> list(
@RequestParam(value = "title", required = false, defaultValue = "") String title,
@RequestParam(value = "author", required = false, defaultValue = "") String author,
@RequestParam(value = "content", required = false, defaultValue = "") String content,
@RequestParam(value = "pageIndex", required = false, defaultValue = "0") int pageIndex,
@RequestParam(value = "pageSize", required = false, defaultValue = "10") int pageSize
){
Pageable pageable = PageRequest.of(pageIndex, pageSize);
Page<EsBlog> page = esBlogRepository.findByTitleContainingOrAuthorContainingOrContentContaining(title, author, content, pageable);
return page.getContent();
}
}
1、ElasticSearch服务正常启动,但是在浏览器上无法访问http://localhost:9200,最新版本可能会有这个问题
received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/[0:0:0:0:0:0:0:1]:9200, remoteAddress=/[0:0:0:0:0:0:0:1]:63470}
解决方法:
ElasticSearch默认开启了安全认证,需要将安全认证关掉
config/elasticsearch.yml,将下面两处的true改为false
2、启动test,提示Unsatisfied dependency expressed through field ‘esBlogRepository’;
未启动spring boot,没有IOC
https://blog.csdn.net/weixin_43801567/article/details/96643032
3、Unable to parse response body for Response{requestLine=POST /blog/_doc?timeout=1m HTTP/1.1, host=http://localhost:9200, response=HTTP/1.1 201 Created}
es服务器的响应程序解析不了,有可能是spring-boot版本低了
spring-boot 2.7.3,es:8.11.3 会有问题,将es改为7.11.1正常
https://blog.csdn.net/weixin_38201936/article/details/121746906
https://blog.csdn.net/qq_50652600/article/details/125521823