elasticsearch的开发应用（2）

在第一篇文章中，我们已经可以通过 docker 安装 elasticsearch 和 kibana 了。那么这次就直接进入实战演练。

我们会先准备数据，针对不同常见应用场景，然后分别通过 Query DSL 和 Spring Data JPA 来实现。

Query DSL：ElasticSearch提供了一个可以执行的JSON风格的DSL(domain-specific language 领域特定语言)，这个被称为Query DSL。

1. 准备

1.1. 索引数据准备

下面就是通过 Query DSL 维护了一个名为 operation_log 的索引，用于记录系统中各个模块的操作日志。

1. 创建索引

PUT /operation_log

2. 维护mapping结构

PUT /operation_log/_mapping
{
  "properties": {
    "ip": {
      "type": "keyword"
    },
    "trace_id": {
      "type": "keyword"
    },
    "operation_time": {
      "type": "date",
      "format": "yyyy-MM-dd HH:mm:ss"
    },
    "module": {
      "type": "keyword"
    },
    "action_code": {
      "type": "keyword"
    },
    "location": {
      "type": "text",
      "analyzer": "ik_max_word",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "object_id": {
      "type": "keyword"
    },
    "object_name": {
      "type": "text",
      "analyzer": "ik_max_word",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "operator_id": {
      "type": "keyword"
    },
    "operator_name": {
      "type": "keyword"
    },
    "operator_dept_id": {
      "type": "keyword"
    },
    "operator_dept_name": {
      "type": "text",
      "analyzer": "ik_max_word",
      "fields": {
        "keyword": {
          "type": "keyword"
        }
      }
    },
    "changes": {
      "type": "nested",
      "properties": {
        "field_name": {
          "type": "keyword"
        },
        "old_value": {
          "type": "keyword"
        },
        "new_value": {
          "type": "keyword"
        }
      }
    }
  }
}

3. 新建文档

下面一个个文档逐个的新增，其实也是可以通过 _bulk 批量新增的，这里还是先按照基础的来。

POST /operation_log/_doc
{
  "ip": "10.1.11.1",
  "trace_id": "670021ff9a2dc6b7",
  "operation_time": "2022-05-02 09:31:18",
  "module": "企业组织",
  "action_code": "UPDATE",
  "location": "企业组织->员工管理->身份管理",
  "object_id": "xxxxx-1",
  "object_name": "成德善",
  "operator_id": "operator_id-1",
  "operator_name": "张三",
  "operator_dept_id": "operator_dept_id-1",
  "operator_dept_name": "研发中心-后端一部",
  "changes": [
    {
      "field_name": "手机号码",
      "old_value": "13055660000",
      "new_value": "13055770001"
    },
    {
      "field_name": "姓名",
      "old_value": "成德善",
      "new_value": "成秀妍"
    }
  ]
}

// 同样的调用方式，再插入下面6个文档

// data-2

{
  "ip": "22.1.11.0",
  "trace_id": "990821e89a2dc653",
  "operation_time": "2022-09-05 11:31:10",
  "module": "资源中心",
  "action_code": "UPDATE",
  "location": "资源中心->文件管理->文件权限",
  "object_id": "fffff-1",
  "object_name": "《2022员工绩效打分细则》",
  "operator_id": "operator_id-2",
  "operator_name": "李四",
  "operator_dept_id": "operator_dept_id-2",
  "operator_dept_name": "人力资源部",
  "changes": [
    {
      "field_name": "查看权限",
      "old_value": "仅李四可查看",
      "new_value": "全员可查看"
    },
    {
      "field_name": "编辑权限",
      "old_value": "仅李四可查看",
      "new_value": "人力资源部可查看"
    }
  ]
}

// data-3

{
  "ip": "22.1.11.0",
  "trace_id": "780821e89b2dc653",
  "operation_time": "2022-10-02 12:31:10",
  "module": "资源中心",
  "action_code": "DELETE",
  "location": "资源中心->文件管理",
  "object_id": "fffff-1",
  "object_name": "《2022员工绩效打分细则》",
  "operator_id": "operator_id-3",
  "operator_name": "王五",
  "operator_dept_id": "operator_dept_id-2",
  "operator_dept_name": "人力资源部",
  "changes": []
}

// data-4

{
  "ip": "10.1.11.1",
  "trace_id": "670021e89a2dc7b6",
  "operation_time": "2022-05-03 09:35:10",
  "module": "企业组织",
  "action_code": "ADD",
  "location": "企业组织->员工管理->身份管理",
  "object_id": "xxxxx-2",
  "object_name": "成宝拉",
  "operator_id": "operator_id-1",
  "operator_name": "张三",
  "operator_dept_id": "operator_dept_id-1",
  "operator_dept_name": "研发中心-后端一部",
  "changes": [
    {
      "field_name": "姓名",
      "new_value": "成宝拉"
    },
    {
      "field_name": "性别",
      "new_value": "女"
    },
    {
      "field_name": "手机号码",
      "new_value": "13055770002"
    },
    {
      "field_name": "邮箱",
      "new_value": "[email protected]"
    }
  ]
}

// data-5

{
  "ip": "10.1.11.5",
  "trace_id": "670021e89a2dc655",
  "operation_time": "2022-05-05 10:35:12",
  "module": "企业组织",
  "action_code": "DELETE",
  "location": "企业组织->员工管理->身份管理",
  "object_id": "xxxxx-1",
  "object_name": "成德善",
  "operator_id": "operator_id-2",
  "operator_name": "李四",
  "operator_dept_id": "operator_dept_id-2",
  "operator_dept_name": "人力资源部",
  "changes": []
}

// data-6

{
  "ip": "10.0.0.0",
  "trace_id": "670021ff9a28ei6",
  "operation_time": "2022-10-02 09:31:00",
  "module": "资源中心",
  "action_code": "DELETE",
  "location": "资源中心->文件管理",
  "object_id": "fffff-a",
  "object_name": "《有空字符串的文档》",
  "operator_id": "operator_id-a",
  "operator_dept_id": "",
  "operator_dept_name": "",
  "operator_name": "路人A",
  "changes": []
}

// data-7

{
  "ip": "10.0.0.0",
  "trace_id": "670021ff9a28768",
  "operation_time": "2022-10-02 09:32:00",
  "module": "资源中心",
  "action_code": "DELETE",
  "location": "资源中心->文件管理",
  "object_id": "fffff-b",
  "object_name": "《有NULL的文档》",
  "operator_id": "operator_id-b",
  "operator_name": "路人B",
  "changes": []
}

1.2. spring 项目准备

1. pom.xml

        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch

引入了 spring-boot-starter-data-elasticsearch，我们 spring-parent 版本是 2.7.4 的，即这里对应的 starter 版本也是 2.7.4。对应 spring-data-elasticsearch 版本是 4.4.3。

spring data 官网里有推荐 spring-data-elasticsearch 版本和 elasticsearch 版本的对应关系，建议按照推荐同步版本，本例中 elasticsearch 版本就是 7.17.6。

然后下文中 spring 的代码，最好的教材还是去看 spring data 官网。

2. application

spring:
  elasticsearch:
    uris: http://localhost:9200
  jackson:
    default-property-inclusion: non_null

3. EO

索引对应的类需要加上 @Document，字段需要加上 @Field。

OperationLog.java

@Data
@Document(indexName = "operation_log")
public class OperationLog {
    @Id
    private String id;

    @Field(type = FieldType.Keyword)
    private String ip;

    @Field(value = "trace_id", type = FieldType.Keyword)
    private String traceId;

    // format={} 不能少
    @Field(value = "operation_time", type = FieldType.Date, format = {}, pattern = "yyyy-MM-dd HH:mm:ss")
    @JsonFormat(pattern = "yyyy.MM.dd HH:mm:ss", timezone = "GMT+8")
    private LocalDateTime operationTime;

    @Field(type = FieldType.Keyword)
    private String module;

    @Field(value = "action_code", type = FieldType.Keyword)
    private String actionCode;

    @Field(type = FieldType.Text, analyzer = "ik_max_word")
    private String location;

    @Field(value = "object_id", type = FieldType.Keyword)
    private String objectId;

    @Field(value = "object_name", type = FieldType.Text, analyzer = "ik_max_word")
    private String objectName;

    @Field(value = "operator_id", type = FieldType.Keyword)
    private String operatorId;

    @Field(value = "operator_name", type = FieldType.Keyword)
    private String operatorName;

    @Field(value = "operator_dept_id", type = FieldType.Keyword)
    private String operatorDeptId;

    @Field(value = "operator_dept_name", type = FieldType.Text, analyzer = "ik_max_word")
    private String operatorDeptName;

    @Field(type = FieldType.Nested)
    private List changes;

}

OperationLogChange.java

@Data
public class OperationLogChange {
    @Field(value = "field_name", type = FieldType.Keyword)
    private String fieldName;

    @Field(value = "old_value", type = FieldType.Keyword)
    private String oldValue;

    @Field(value = "new_value", type = FieldType.Keyword)
    private String newValue;
}

2. 查询

我个人不太喜欢通过继承 ElasticsearchRepository 来实现 Dao层方法，主要是使用局限性太大，不灵活。官方文档也不太推荐这种，而是比较推崇调用 ElasticsearchRestTemplate 方法。

在官方查询的章节中，有介绍过3种方法：

CriteriaQuery：标准的查询方式，简单的查询还行，但针对一些复杂的查询就有些捉襟见肘了。
NativeSearchQuery：原生的查询方式，基本上和 Query DSL 里面的语法逻辑很相似，所以不担心搞不定复杂的查询。
StringQuery：直接支持执行 Query DSL 字符串。

我个人是推荐 NativeSearchQuery，如果哪天真的面对搞不定的查询，可以偶尔尝试一下 StringQuery。所以，下文中所有 spring 的例子，都是基于 NativeSearchQuery的。

2.1. match_all

1. DSL

GET /operation_log/_search
{
  "query": {
    "match_all": {}
  }
}

2. spring

@AllArgsConstructor
@RestController
@RequestMapping("/dql")
public class DqlController {
    private final ElasticsearchRestTemplate esRestTemplate;

    @GetMapping("")
    public List findAll() {
        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchAllQuery())
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());
    }
}

2.2. match(term)

1. DSL

GET /operation_log/_search
{
  "query": {
    "match": {
      "module": "资源中心"
    }
  }
}

2. spring

        Query query =new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchQuery("module", "资源中心"))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

2.3. nested

1. DSL

GET operation_log/_search
{
  "query": {
    "nested": {
      "path": "changes",
      "query": {
        "term": {
          "changes.field_name": "姓名"
        }
      }
    }
  }
}

2. spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.nestedQuery("changes",
                        QueryBuilders.termQuery("changes.field_name", "姓名"),
                        ScoreMode.None))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

2.4. bool(and) - 1

1. DSL

GET operation_log/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "action_code": "UPDATE"
          }
        },
        {
          "nested": {
            "path": "changes",
            "query": {
              "term": {
                "changes.field_name": "姓名"
              }
            }
          }
        }
      ]
    }
  }
}

2. spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.boolQuery()
                        .must(QueryBuilders.termQuery("action_code", "UPDATE"))
                        .must(QueryBuilders.nestedQuery("changes",
                                QueryBuilders.termQuery("changes.field_name", "姓名"), ScoreMode.None)))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

2.5. bool(and) - 2

1. DSL

GET operation_log/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "action_code": "UPDATE"
          }
        }
      ],
      "must": [
        {
          "nested": {
            "path": "changes",
            "query": {
              "term": {
                "changes.field_name": "姓名"
              }
            }
          }
        }
      ]
    }
  }
}

2. spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.boolQuery()
                        .mustNot(QueryBuilders.termQuery("action_code", "UPDATE"))
                        .must(QueryBuilders.nestedQuery("changes",
                                QueryBuilders.termQuery("changes.field_name", "姓名"), ScoreMode.None)))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

2.6. bool(or)、exist

1. DSL

GET /operation_log/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "bool": {
            "must": [
              {
                "term": {
                  "operator_dept_name.keyword": ""
                }
              }
            ]
          }
        },
        {
          "bool": {
            "must_not": [
              {
                "exists": {
                  "field": "operator_dept_name"
                }
              }
            ]
          }
        }
      ]
    }
  }
}

2. spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.boolQuery()
                        .should(QueryBuilders.boolQuery()
                                .must(QueryBuilders.termQuery("operator_dept_name.keyword", "")))
                        .should(QueryBuilders.boolQuery()
                                .mustNot(QueryBuilders.existsQuery("operator_dept_name"))))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

2.7. _source、sort

如果只想查询索引中某几个字段，就可以用到 _source，其中包含两个属性：

includes：查询结果包含某些字段。
excludes：查询结果屏蔽某些字段。

当二者同时出现时，优先级上：excludes > includes。
当只有_source中 includes 时，可以忽略 includes 不写，直接 "_source":[field,...]

sort 可用于排序。

1. DSL

GET /operation_log/_search
{
  "query": {
    "match": {
      "location": "文件"
    }
  },
  "_source": {
    "includes": [
      "module",
      "location",
      "operator_name",
      "operation_time", 
      "changes.field_name"
    ],
    "excludes": [
      "module"
    ]
  },"sort": [
    {
      "operation_time": {
        "order": "asc"
      }
    }
  ]
}

// 也等同于
{
  "query": {
    "match": {
      "location": "文件"
    }
  },
  "_source": [
    "location",
    "operator_name",
    "operation_time",
    "changes.field_name"
  ],
  "sort": [
    {
      "operation_time": {
        "order": "asc"
      }
    }
  ]
}

2. spring

        SourceFilter sourceFilter = new FetchSourceFilterBuilder()
                .withIncludes("module", "location", "operator_name", "operation_time", "changes.field_name")
                .withExcludes("module")
                .build();
        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchQuery("location", "文件"))
                .withSourceFilter(sourceFilter)
                .withSort( Sort.by(Sort.Direction.ASC, "operation_time"))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

8. highlight

这里主要介绍一下highlight里的标签

pre_tags、post_tags:这两个标签定义了分割出的结果以什么tag包围起来，和我们前端的<>效果差不多

fields：定义要高亮搜索的属性，name代表名称要高亮，keyWords代表关键词要高亮

1. DSL

GET /operation_log/_search
{
"query": {
  "match": {
    "location": "文件"
  }
},
"highlight": {
  "fields": {
    "location": {}
  },
  "pre_tags": "",
  "post_tags": ""
}
}

2. spring

        String matchField = "location";
        HighlightBuilder highlightBuilder = new HighlightBuilder()
                .field(matchField)
                .preTags("")
                .postTags("");
        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchQuery(matchField, "文件"))
                .withHighlightBuilder(highlightBuilder)
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(hit -> {
                    OperationLog operationLog = hit.getContent();
                    operationLog.setLocation(hit.getHighlightField(matchField).get(0));
                    return operationLog;
                })
                .collect(Collectors.toList());

9. pageable

1. DSL

GET /operation_log/_search
{
  "query": {
    "match_all": {}
  },
  "from": 0,
  "size": 5,
  "sort": [
    {
      "operation_time": {
        "order": "desc"
      }
    }
  ]
}

2. spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.matchAllQuery())
                .withPageable(PageRequest.of(0, 5, Sort.by(Sort.Direction.DESC,"operation_time")))
                .build();
        return esRestTemplate.search(query, OperationLog.class).stream()
                .map(SearchHit::getContent)
                .collect(Collectors.toList());

3. 修改

3.1. 单文档修改

3.1.1. insert

其实在数据准备阶段已经有新增的例子了。

DSL

POST /operation_log/_doc
{
  "ip": "0.0.0.0",
  "module": "测试数据"
}

spring

        OperationLog operationLog=new OperationLog();
        operationLog.setIp("0.0.0.0");
        operationLog.setModule("测试数据");
        return esRestTemplate.save(operationLog);

3.1.2. update-(save)

新增时，springboot 用到的是 save 方法，更新时也一样可以。不过得拿到文档的id，这里id=13OkA4QBMgWicIn2wBwM。

DSL

PUT /operation_log/_doc/13OkA4QBMgWicIn2wBwM
{
  "ip": "0.0.0.0",
  "module": "测试数据1"
}

spring

esRestTemplate.save(operationLog);

3.1.3. update-(document)

DSL

POST /operation_log/_update/13OkA4QBMgWicIn2wBwM
{
  "doc": {
    "module":"测试数据1"
  }
}

spring

        Document document = Document.create();
        document.put("module", "测试数据1");
        UpdateQuery updateQuery = UpdateQuery
                .builder(id)
                .withDocument(document)
                .build();
        esRestTemplate.update(updateQuery,IndexCoordinates.of("operation_log"));

3.1.4. update-(script)

DSL

POST /operation_log/_update/13OkA4QBMgWicIn2wBwM
{
  "script": {
    "source": "ctx._source.module = params.module",
    "params": {
      "module": "测试数据1"
    }
  }
}

spring

        Map params = new HashMap<>();
        params.put("module", "测试数据1");
        UpdateQuery updateQuery = UpdateQuery
                .builder(id)
                .withScript("ctx._source.module = params.module")
                .withParams(params)
                .build();
        esRestTemplate.update(updateQuery, IndexCoordinates.of("operation_log"));

3.1.5. delete

DSL

DELETE /operation_log/_doc/13OkA4QBMgWicIn2wBwM

spring

        esRestTemplate.delete(id, OperationLog.class);

3.2. 批量修改 bulk

批量新增 DSL

POST /operation_log/_bulk
{"create":{"_index":"operation_log"}}
{"ip":"0.0.0.0","module":"测试数据1"}
{"create":{"_index":"operation_log"}}
{"ip":"0.0.0.0","module":"测试数据2"}
{"create":{"_index":"operation_log"}}
{"ip":"0.0.0.0","module":"测试数据3"}

批量更新 DSL

POST /operation_log/_bulk
{"update":{"_id":"2HP9A4QBMgWicIn26BzR"}}
{"doc":{"module":"测试数据11"}}
{"update":{"_id":"2XP9A4QBMgWicIn26BzR"}}
{"script":{"source":"ctx._source.module = params.module","params":{"module":"测试数据22"}}}

批量删除 DSL

POST /operation_log/_bulk
{"delete":{"_id":"2HP9A4QBMgWicIn26BzR"}}
{"delete":{"_id":"2XP9A4QBMgWicIn26BzR"}}
{"delete":{"_id":"2nP9A4QBMgWicIn26BzR"}}

不知是否注意到，在批量更新的语句中，支持同时 doc、script 两种更新方式。实际上来说，_bulk 其实支持同时将上述的三种语句一起提交执行。
不过项目上一般不会如此应用，都是单独分开来。像批量新增，save 方法就支持批量新增操作，虽然底层代码还是调用 bulkOperation。

spring bulkUpdate

    @PatchMapping("bulk-update")
    public void bulkUpdate() {
        Map params = new HashMap<>();
        params.put("module", "测试数据2");
        String scriptStr = "ctx._source.module = params.module";
        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.termQuery("ip", "0.0.0.0"))
                .build();
        List updateQueryList = esRestTemplate.search(query, OperationLog.class)
                .stream()
                .map(SearchHit::getContent)
                .map(obj -> UpdateQuery.builder(obj.getId())
                        .withScript(scriptStr)
                        .withParams(params)
                        .build())
                .collect(Collectors.toList());
        esRestTemplate.bulkUpdate(updateQueryList, OperationLog.class);
    }

3.3. updateByQuery

DSL

POST /operation_log/_update_by_query
{
  "script": {
    "source": "ctx._source.module = params.module",
    "params": {
      "module": "测试数据1"
    }
  },
  "query": {
    "term": {
      "ip": "0.0.0.0"
    }
  }
}

spring

    @PatchMapping("update-by-query")
    public void updateByQuery() {
        Map params = new HashMap<>();
        params.put("module", "测试数据2");
        String scriptStr = "ctx._source.module = params.module";
        UpdateQuery updateQuery = UpdateQuery
                .builder(new NativeSearchQueryBuilder()
                        .withQuery(QueryBuilders.termQuery("ip", "0.0.0.0"))
                        .build())
                .withScript(scriptStr)
                .withScriptType(ScriptType.INLINE)
                .withLang("painless")
                .withParams(params)
                .build();
        esRestTemplate.updateByQuery(updateQuery, IndexCoordinates.of("operation_log"));
    }

可以对比一下上面的 bulkUpdate 方法，发现有些不同：

updateByQuery 只支持Script，不支持 Document 的方式更新。
updateByQuery 使用 Script 方式更新时，必须传递 scriptType、Lang 这些辅助参数。原本 bulkUpdate 中也是要传的，只不过底层方法封装了，但是没有给 updateByQuery 封装。（实际踩过坑，看封装方法才得知）

3.4. deleteByQuery

DSL

POST /operation_log/_delete_by_query
{
  "query": {
    "term": {
      "ip": "0.0.0.0"
    }
  }
}

spring

        Query query = new NativeSearchQueryBuilder()
                .withQuery(QueryBuilders.termQuery("ip", "0.0.0.0"))
                .build();
        esRestTemplate.delete(query, OperationLog.class);

delete_by_query并不是真正意义上物理文档删除，而是只是版本变化并且对文档增加了删除标记。当我们再次搜索的时候，会搜索全部然后过滤掉有删除标记的文档。因此，该索引所占的空间并不会随着该API的操作磁盘空间会马上释放掉，只有等到下一次段合并的时候才真正被物理删除，这个时候磁盘空间才会释放。相反，在被查询到的文档标记删除过程同样需要占用磁盘空间，这个时候，你会发现触发该API操作的时候磁盘不但没有被释放，反而磁盘使用率上升了。

elasticsearch的开发应用（2）

1. 准备

1.1. 索引数据准备

1.2. spring 项目准备

2. 查询

2.1. match_all

2.2. match(term)

2.3. nested

2.4. bool(and) - 1

2.5. bool(and) - 2

2.6. bool(or)、exist

2.7. _source、sort

8. highlight

9. pageable

3. 修改

3.1. 单文档修改

3.1.1. insert

3.1.2. update-(save)

3.1.3. update-(document)

3.1.4. update-(script)

3.1.5. delete

3.2. 批量修改 bulk

3.3. updateByQuery

3.4. deleteByQuery

你可能感兴趣的:(elasticsearch的开发应用（2）)