Elasticsearch 和 Solr 比较

当单纯的对已有数据进行搜索时，Solr更快。
当实时建立索引时，Solr会产生io阻塞，查询性能较差，Elasticsearch具有明显的优势。
随着数据量的增加，Solr的搜索效率会变得更低，而Elasticsearch却没有明显的变化。
- 此时如果转变我们的搜索基础设施后，从Solr到Elasticsearch，我们看见一个即时，50倍提高搜索性能。

ElasticSearch vs Solr 总结

es基本是开箱即用（解压就可以用！），非常简单，Solr安装略微复杂一丢丢！
Solr 利用 Zookeeper 进行分布式管理，而Elasticsearch 自身带有分布式协调管理功能。
- Solr 支持更多格式的数据，比如JSON、XML、CSV，而Elasticsearch 仅支持json文件格式。
Solr 官方提供的功能更多，而Elasticsearch本身更注重核心功能，高级功能多由第三方插件提供，例如图形化界面需要kibana友好支撑
Solr 查询快，但更新索引时慢（即插入删除慢），用于电商等查询多的应用；
- ES 建立索引快（即查询慢），==即实时性查询快==，用于facebook新浪等搜索。
- Solr 是传统搜索应用的有力解决方案，但 Elasticsearch 更适用于新兴的实时搜索应用。
Solr比较成熟，有一个更大，更成熟的用户、开发和贡献者社区，而 Elasticsearch 相对开发维护者较少，更新太快，学习使用成本较高。（es 大趋势！）

ElasticSearch 安装

声明：JDK1.8,最低要求！ElasticSearch 客户端，界面工具！

Java开发，ElasticSearch 的版本和我们之后对应的Java 的核心jar包！版本对应！JDK 环境是正常的。

下载

官网：https://www.elastic.co/

image-20201209213216467.png

image-20201209213910696.png

下载地址：https://www.elastic.co/cn/downloads/elasticsearch

官网下载巨慢，翻墙，或从网上找已下载好的资源

==这里在Window下学习！==

ELK三剑客，解压即用！

window 下安装！

1.解压就可以使用了

image-20201210202916024.png

2.熟悉目录

bin 启动文件
config 配置文件
    log4j2 日志配置文件
    jvm.options java虚拟机相关的配置
    elasticsearch.yml elasticsearch的配置文件，默认 9200 端口，跨域问题！
lib 相关jar包
logs 日志
modules 功能模块
plugins 插件 ik分词器

3.启动，访问 9200（注意，第一次在本机上尝试启动报错，需要在yaml文件中加一行配置：xpack.ml.enabled: false）

image-20201210204317760.png

4.访问测试！

image-20201210204527886.png

安装可视化界面 es head的插件

此处必须要有 VUE 基础，并且需要有 node.js 的基本环境已安装

1.下载地址：https://github.com/mobz/elasticsearch-head/

2.启动

npm install
npm run start

3.连接测试发现，存在跨域问题，配置es的yaml配置文件

# 解决跨域
http.cors.enabled: true
http.cors.allow-origin: "*"

4.重启es服务器，然后再次连接

image-20201215210742269.png

我们初学时，就把es当做一个数据库~（可以建立索引（库），文档（库中的数据！））

这个head我们就把它当做数据展示工具，我们后面所有的查询，Kibana

了解ELK

ELK是Elasticsearch、Logstash、Kibana三大开源框架首字母大写简称。市面上也被称为Elastic Stack。其中Elasticsearch 是一个基于Luncene、分布式、通过Restful方式进行交互的近实时搜索平台框架。像类似百度，谷歌这种大数据全文搜索引擎的场景都可以使用Elasticsearch作为底层支持框架，可见Elasticsearch提供的搜索能力确实强大，市面上很多时候我们简称Elasticsearch为es。Logstash是ELK的中央数据流引擎，用于从不同目标（文件/数据存储/MQ）收集不同格式数据，经过过滤后支持输出到不同目的地（文件/MQ/redis/elasticsearch/kafka等）。Kibana可以将elasticsearch的数据通过友好的页面展示出来，提供实时分析功能。

收集清洗数据 -- 》搜索，存储 --》展示Kibana

市面上很多开发只要提到ELK能够一致说出他是一个日志分析架构技术总称，但实际上ELK不仅仅适用于日志分析，它还可以支持其它任何数据分析和收集的场景，日志分析和收集只是更具有代表性。并非唯一性。

image-20201215212613105.png

安装 Kibana

Kibana 是一个针对Elasticsearch的开源分析及可视化平台，用来搜索、查看交互存储在Elasticsearch索引中的数据。使用Kibana，可以通过各种图表进行高级数据分析及展示。Kibana让海量数据更容易理解。它操作简单，基于浏览器的用户界面可以快速创建仪表板（dashboard）实时显示Elasticsearch查询动态。设置Kibana非常简单。无需编码或者额外的基础架构，几分钟内就可以完成Kibana安装并启动Elasticsearch索引监测。

官网：https://www.elastic.co/cn/kibana

Kibana版本要和es版本一致！

下载完毕后，解压需要一些时间。

好处：ELK基本上都是拆箱即用！

启动测试

1.解压后的目录

image-20201215213625726.png

2.启动

image-20201215213842202.png

3.访问测试

image-20201215214028735.png

4.开发工具（Post，curl，head，谷歌浏览器插件测试）

image-20201215214323111.png

我们之后的所有操作都在这里进行编写

5.汉化，修改Kibana的配置文件yaml即可，修改完成后重启项目

image-20201215214850564.png

ES 核心概念

1.索引

2.字段类型（mapping）

3.文档（documents）

概述

在前面的学习中，我们已经掌握了es是什么，同时也把es的服务已经安装启动了，那么es是如何去存储数据，数据结构是什么，又是如何实现搜索的呢？我们先来聊聊Elasticsearch的相关概念吧！

==集群，节点，索引，类型，文档，分片，映射是什么？==

elasticsearch 是面向文档，关系型数据库和 elasticsearch 客观的对比！

Relational DB	Elasticsearch
数据库（database）	索引（indices）
表（tables）	types
行（rows）	documents
字段（columns）	fields

elasticsearch（集群）中可以包含多个索引（数据库），每个索引中可以包含多个类型（表），每个类型下又包含多个文档（行），每个文档中又包含多个字段（列）。

物理设计：

elasticsearch 在后台把每个索引划分成多个分片，每分分片可以在集群中的不同服务器间迁移

一个人就是一个集群！默认的集群名字就是 elasticsearch

IK分词器插件

什么是IK分词器？

分词：即把一段中文或别的划分成一个个的关键字，我们在搜索时候会把自己的信息进行分词，会把数据库中或者索引库中的数据进行分词，然后进行一个匹配操作，默认的中文分词是将每个字看成一个词，比如“大程子”会被分为“大”，“程”，“子”，这显然是不符合要求的，所以我们需要安装中文分词器ik来解决这个问题。

如果要使用中文，建议选择使用ik分词器！

IK提供了两个分词算法：ik_smart 和 ik_max_word , 其中 ik_smart 为最少切分，ik_max_word 为最细粒度划分，后面进行测试。

安装

1.https://github.com/medcl/elasticsearch-analysis-ik

2.下载完毕之后，放入到我们的elasticsearch插件中即可

image-20201219115615988.png

3.重启观察ES,可以看到ik分词器被加载了！

image-20201219115758398.png

4.elasticsearch-plugin 可以通过这个命令来查看加载进来的插件

image-20201219115952052.png

5.使用kibana测试！

查看不同的分词器效果

ik_smart 为最少切分

image-20201219120729001.png

ik_max_word 为最细粒度划分，穷尽词库的可能，字典

image-20201219120917616.png

我们输入“超级喜欢大程子学Java”

image-20201219121225719.png

发现问题：大程子被拆开了！

这种自己需要的词，需要自己加到我们的分词器的字典中!

ik 分词器增加自己的配置

image-20201219121802928.png

重启es，看细节

image-20201219121956666.png

再次测试以下大程子，看下效果

image-20201219122120800.png

以后的话，我们需要自己配置分词就在自已定义的dic文件中进行配置即可！

Rest风格说明

一种软件架构风格，而不是标准，只是提供了一组设计原则和约束条件。它主要用于客户端和服务器交互类的软件。基于这个风格设计的软件可以更简洁，更有层次，更易于实现缓存等机制。

基于Rest命令说明：

method	url地址	描述
PUT	localhost:9200/索引名称/类型名称/文档id	创建文档（指定文档id）
POST	localhost:9200/索引名称/类型名称	创建文档（随机文档id）
POST	localhost:9200/索引名称/类型名称/文档id/_update	修改文档
DELETE	localhost:9200/索引名称/类型名称/文档id	删除文档
GET	localhost:9200/索引名称/类型名称/文档id	查询文档通过文档id
POST	localhost:9200/索引名称/类型名称/_search	查询所有数据

关于索引的基本操作

1.创建一个索引

put /索引名/~类型名~/文档id
{请求体}

image-20201219130054445.png

完成了自动增加索引！数据也成功的添加了，这就是为什么在初期可以把它当做数据库学习的原因！

image-20201219130505146.png

3.那么name这个字段用不用指定类型呢。毕竟我们关系型数据库是需要指定类型的

字符串类型

text、keyword
数值类型

long、integer、short、byte、double、float、half_float、scaled_float
日期类型

date
布尔值类型

boolean
二进制类型

binary
等等……

4.指定字段的类型

image-20201219131359794.png

获得这个规则，可以通过GET请求获取具体的信息！

image-20201219131508825.png

5.查看默认的信息

image-20201219131853418.png

image-20201219132010336.png

如果自己的文档字段没有指定，那么es就会给我们默认配置字段类型！

扩展：通过命令 elasticsearch 索引情况！通过 get _cat/ 可以获得es的当前的很多信息！

image-20201219132904082.png

修改提交还是使用PUT 即可！然后覆盖！最新办法

曾经

image-20201219133232534.png

现在的方法

image-20201219133542459.png

删除索引

通过 DELETE 命令实现删除、根据你的请求来判断是删除索引还是删除文档记录！

使用 RESTFULL 风格是我们ES推荐大家使用的！

关于文档的基本操作（es的重点）

基本操作

1.添加数据

PUT /wangcp/user/3
{
  "name":"李四",
  "age":30,
  "desc":"emm,不知道如何形容",
  "tags":["靓女","旅游","唱歌"]
}

image-20201219140150147.png

2.查询获取数据 GET

image-20201219140841850.png

3.更新数据 PUT

image-20201219141046399.png

Post _update,推荐使用这种更新方式！

image-20201219141510378.png

简单的搜索

GET wangcp/user/1

简单的条件查询，可以根据默认的映射规则，产生基本的查询！

image-20201219142152908.png

image-20201219152938535.png

复杂操作搜索 select(排序，分页，高亮，模糊查询，精准查询！)

image-20201219153320470.png

image-20201219154445137.png

输出结果过滤，不想要那么多，select name,desc

image-20201219154806222.png

我们之后使用Java操作es，所有的方法和对象就是这里面的key！

排序

image-20201219155219339.png

分页查询

image-20201219155446713.png

数据索引下标还是从0开始的，和学的所有数据结构还是一样的。

/search/{current}/{pagesize}

布尔值查询

must（and），所有的条件都要符合 where id=1 and name=xxx

image-20201219160832737.png

should（or），所有的条件都要符合 where id=1 or name=xxx

image-20201219170009162.png

must_not( not )

image-20201219170140743.png

过滤器 filter

image-20201219170525438.png

gt 大于
gte 大于等于
lt 小于
lte 小于等于

image-20201219170813765.png

匹配多个条件

image-20201219171230616.png

精确查询

term 查询时直接通过倒排索引指定的词条进行精确查找的！

关于分词：

term：直接查询精确的
match：会使用分词器解析！（先分析文档，然后在通过分析的文档进行查询！）

两个类型 text keyword

image-20201219172328426.png

image-20201219172422088.png

image-20201219172950308.png

多个值匹配的精确查询

image-20201219173729773.png

高亮查询

image-20201219174218055.png

image-20201219180346109.png

这些其实 MySQL 也可以做，只是 MySQL 效率较低

匹配
按照条件匹配
精确匹配
区间范围匹配
匹配字段过滤
多条件查询
高亮查询
倒排索引

集成SpringBoot

找官方文档

image-20201220130215774.png

image-20201220130410095.png

image-20201220130619521.png

1.找到原生的依赖


    org.elasticsearch.client
    elasticsearch-rest-high-level-client
    7.10.1

2.找对象

image-20201220131856940.png

3.分析这个类中的方法

配置基本的项目

==问题：一定保证我们导入导入的依赖和我们的es版本一致==

image-20201220133733729.png

image-20201220134326749.png

源码中提供的对象

image-20201221133723591.png

虽然这里导入3个类，静态内部类，核心类就一个。

/*
 * Copyright 2012-2019 the original author or authors.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *      https://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */

package org.springframework.boot.autoconfigure.elasticsearch.rest;

import java.time.Duration;

import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.Credentials;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;

import org.springframework.beans.factory.ObjectProvider;
import org.springframework.boot.autoconfigure.condition.ConditionalOnClass;
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
import org.springframework.boot.context.properties.PropertyMapper;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

/**
 * Elasticsearch rest client infrastructure configurations.
 *
 * @author Brian Clozel
 * @author Stephane Nicoll
 */
class RestClientConfigurations {

    @Configuration(proxyBeanMethods = false)
    static class RestClientBuilderConfiguration {

        // RestClientBuilder
        @Bean
        @ConditionalOnMissingBean
        RestClientBuilder elasticsearchRestClientBuilder(RestClientProperties properties,
                ObjectProvider builderCustomizers) {
            HttpHost[] hosts = properties.getUris().stream().map(HttpHost::create).toArray(HttpHost[]::new);
            RestClientBuilder builder = RestClient.builder(hosts);
            PropertyMapper map = PropertyMapper.get();
            map.from(properties::getUsername).whenHasText().to((username) -> {
                CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
                Credentials credentials = new UsernamePasswordCredentials(properties.getUsername(),
                        properties.getPassword());
                credentialsProvider.setCredentials(AuthScope.ANY, credentials);
                builder.setHttpClientConfigCallback(
                        (httpClientBuilder) -> httpClientBuilder.setDefaultCredentialsProvider(credentialsProvider));
            });
            builder.setRequestConfigCallback((requestConfigBuilder) -> {
                map.from(properties::getConnectionTimeout).whenNonNull().asInt(Duration::toMillis)
                        .to(requestConfigBuilder::setConnectTimeout);
                map.from(properties::getReadTimeout).whenNonNull().asInt(Duration::toMillis)
                        .to(requestConfigBuilder::setSocketTimeout);
                return requestConfigBuilder;
            });
            builderCustomizers.orderedStream().forEach((customizer) -> customizer.customize(builder));
            return builder;
        }

    }

    @Configuration(proxyBeanMethods = false)
    @ConditionalOnClass(RestHighLevelClient.class)
    static class RestHighLevelClientConfiguration {

        // RestHighLevelClient 高级客户端，也是我们这里要讲的，后面项目要用到的客户端
        @Bean
        @ConditionalOnMissingBean
        RestHighLevelClient elasticsearchRestHighLevelClient(RestClientBuilder restClientBuilder) {
            return new RestHighLevelClient(restClientBuilder);
        }

        // RestClient 普通的客户端
        @Bean
        @ConditionalOnMissingBean
        RestClient elasticsearchRestClient(RestClientBuilder builder,
                ObjectProvider restHighLevelClient) {
            RestHighLevelClient client = restHighLevelClient.getIfUnique();
            if (client != null) {
                return client.getLowLevelClient();
            }
            return builder.build();
        }

    }

    @Configuration(proxyBeanMethods = false)
    static class RestClientFallbackConfiguration {

        @Bean
        @ConditionalOnMissingBean
        RestClient elasticsearchRestClient(RestClientBuilder builder) {
            return builder.build();
        }

    }

}

具体的API测试!

1.创建索引

 @Test
void testCreateIndex() throws IOException {
    // 1.创建索引请求 相当于 PUT wang_index
    CreateIndexRequest request = new CreateIndexRequest("wang_index");
    // 2.客户端执行请求
    CreateIndexResponse createIndexResponse =
        client.indices().create(request, RequestOptions.DEFAULT);
    System.out.println(createIndexResponse);

}

2.判断索引是否存在

  @Test
void textExistIndex() throws IOException {
    GetIndexRequest request = new GetIndexRequest("wang_index");
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    System.out.println(exists);
}

3.删除索引

 @Test
void textDeleteIndex() throws IOException {
    DeleteIndexRequest request = new DeleteIndexRequest("wang_index");
    AcknowledgedResponse delete = client.indices().delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.isAcknowledged());
}

4.创建文档

 // 测试添加文档
@Test
void testAddDocument() throws IOException {
    //创建对象
    User user = new User("大程子", 3);
    //创建请求
    IndexRequest request = new IndexRequest("wang_index");
    // 规则 put /wang_index/_doc/1
    request.id("1");
    request.timeout(TimeValue.timeValueSeconds(1));

    // 将我们的数据放入请求
    IndexRequest source = request.source(JSON.toJSONString(user), XContentType.JSON);

    // 客户端发送请求，获取响应的结果
    IndexResponse indexResponse = client.index(source, RequestOptions.DEFAULT);

    System.out.println(indexResponse);
    System.out.println(indexResponse.status()); //对应我们命令返回的状态 CREATED
}

5.添加文档

@Test
void testAddDocument() throws IOException {
    //创建对象
    User user = new User("大程子", 3);
    //创建请求
    IndexRequest request = new IndexRequest("wang_index");
    // 规则 put /wang_index/_doc/1
    request.id("1");
    request.timeout(TimeValue.timeValueSeconds(1));

    // 将我们的数据放入请求
    IndexRequest source = request.source(JSON.toJSONString(user), XContentType.JSON);

    // 客户端发送请求，获取响应的结果
    IndexResponse indexResponse = client.index(source, RequestOptions.DEFAULT);

    System.out.println(indexResponse);
    System.out.println(indexResponse.status()); //对应我们命令返回的状态 CREATED
}

6.获取文档判断是否存在

// 获取文档判断是否存在
@Test
void testIsExists() throws IOException {
    GetRequest getRequest = new GetRequest("wang_index", "1");
    // 不获取返回的 _source 的上下文了
    getRequest.fetchSourceContext(new FetchSourceContext(false));
    getRequest.storedFields("_none_");

    boolean exists = client.exists(getRequest, RequestOptions.DEFAULT);
    System.out.println(exists);
}

7.获得文档信息

// 获得文档的信息
@Test
void testGetDocument() throws IOException {
    GetRequest getRequest = new GetRequest("wang_index", "1");
    GetResponse getResponse = client.get(getRequest, RequestOptions.DEFAULT);
    System.out.println(getResponse.getSourceAsString()); // 打印文档内容
    System.out.println(getResponse);//获得的全部内容和使用命令是一致的
}

8.更新文档信息

// 更新文档的信息
@Test
void testUpdateDocument() throws IOException {
    UpdateRequest updateRequest = new UpdateRequest("wang_index", "1");
    updateRequest.timeout("1s");

    User user = new User("大程子的技术成长路", 18);
    updateRequest.doc(JSON.toJSONString(user),XContentType.JSON);

    UpdateResponse updateResponse = client.update(updateRequest, RequestOptions.DEFAULT);
    System.out.println(updateResponse.status());
}

9.删除文档信息

// 删除文档信息
@Test
void testDeleteDocument() throws IOException {
    DeleteRequest request = new DeleteRequest("wang_index", "1");
    request.timeout("1s");

    DeleteResponse delete = client.delete(request, RequestOptions.DEFAULT);
    System.out.println(delete.status());
}

10.批量插入数据

// 特殊的，真实项目一般都会批量插入数据
@Test
void testBulkRequest() throws IOException {
    BulkRequest bulkRequest = new BulkRequest();
    bulkRequest.timeout("10s");

    ArrayList userList = new ArrayList<>();
    userList.add(new User("wangcp1",3));
    userList.add(new User("wangcp3",6));
    userList.add(new User("wangcp2",9));
    userList.add(new User("wangcp4",12));
    userList.add(new User("wangcp5",15));
    userList.add(new User("wangcp6",18));
    userList.add(new User("dachengzi1",3));
    userList.add(new User("dachengzi2",6));
    userList.add(new User("dachengzi3",9));

    for (int i = 0; i < userList.size(); i++) {
        bulkRequest.add(
            //批量更新和批量删除，就在这里修改对应的请求就可以了
            new IndexRequest("wang_index")
            .id("" + (i+1))
            .source(JSON.toJSONString(userList.get(i)),XContentType.JSON));
    }

    BulkResponse bulkResponse = client.bulk(bulkRequest, RequestOptions.DEFAULT);
    System.out.println(bulkResponse.hasFailures());// 是否失败，返回 false 代表成功！
}

11.搜索查询

// 查询
// SearchRequest 搜索请求
// SearchSourceBuilder 条件构造
// HighlightBuilder 高亮构建
// TermQueryBuilder 构建精确查询
@Test
void testSearch() throws IOException {
    SearchRequest searchRequest = new SearchRequest("wang_index");
    //构建搜索条件
    SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
    sourceBuilder.highlighter();

    // 查询条件，我们可以使用 QueryBuilders 工具来实现
    // QueryBuilders.termQuery 精确查找
    // QueryBuilders.matchAllQuery() 匹配所有
    TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name","wangcp1");
    //        MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery();
    sourceBuilder.query(termQueryBuilder);
    // 设置查询最大时间
    sourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));

    searchRequest.source(sourceBuilder);
    SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
    System.out.println(JSON.toJSONString(searchResponse.getHits()));
    System.out.println("=======================================");
    for (SearchHit documentFields : searchResponse.getHits().getHits()) {
        System.out.println(documentFields.getSourceAsMap());
    }
}

以上为日常学习ElasticSearch对应的记录，存在的不足或问题希望大家留言指出！共学共勉。

ElasticSearch学习与使用（SpringBoot整合ElasticSearch）