SpringBoot 中使用ElasticSearch并高亮检索文档附件内容

SpringBoot使用案例

依赖引入

        
            org.springframework.boot
            spring-boot-starter-data-elasticsearch

配置文件

需要注意的是ES7.0以上的版本是启用transportClient的也就是9300端口，8.0后会完全剔除，所以本次用的是http端口9200

示例代码

Document、 Field：SpringBoot中ES的映射注解支持，就是字面意思

@Data
@EqualsAndHashCode(callSuper = true)
@Accessors(chain = true)
@TableName("user")
@Document(indexName = "example", type = "user", shards = 1, replicas = 0)
public class User extends BaseModel {

    private static final long serialVersionUID = 1L;
    
    @TableField("user_name")
    @Field(type = FieldType.Keyword)
    private String userName;

    @TableField("password")
    @Field(type = FieldType.Keyword)
    private String password;

    @TableField("age")
    @Field(type = FieldType.Keyword)
    private Integer age;

    @TableField("content")
    @Field(type = FieldType.Text)
    private String content;

    @Field(type = FieldType.Text)
    private String fileData;
    
    //附件类型
    private Attachment attachment;
}

@Data
public class Attachment implements Serializable {
    private static final long serialVersionUID = 1L;

    private String contentType;
    private String author;
    private String language;
    private String title;
    private String content;
}

SpringBoot支持的curl接口，不需要实现接口，直接注入使用就可以了，但需要方法符合要求

public interface ElasticRepository extends ElasticsearchRepository {
    //实例案例
    Page findByContent(String content, Pageable pageable);
    //实例案例
    User queryById(String id);
}

使用ingest-attachment 插件高亮检索文档内容

安装插件

进入ES的bin目录下

#linux下安装
./elasticsearch-plugin install ingest-attachment 
#window下安装
./elasticsearch-plugin.bat install ingest-attachment

使用kibana建立管道

-attachment：管道名
-fileData：需要转码的属性
-remove：移除掉转码后的数据

#PUT _ingest/pipeline/attachment
{
  "description": "单文件管道流",
  "processors": [
    {
      "attachment": {
        "field": "fileData",
        "ignore_missing": true
      }
    },{
     "remove":{"field":"fileData"}
    }
  ]
}

SpringBoot高亮检索文件案例
先简单了解一下可以使用的API

RestHighLevelClient：ES官方支持的Api，可以实现很多操作~~
ElasticsearchRestTemplate：封装了RestHighLevelClient模板Api，也能支持基本操作
ElasticsearchRepository：SpringBoot支持Curl的Api，能实现基本的操作

索引数据

restHighLevelClient官方文档

使用的是restHighLevelClient，其他没找到可以设置管道的地方。。。。。

        User user = new User();
        user.setAge(12);
        user.setContent("asd");
        user.setUserName("asd");
        user.setId("12412412");
        try {
            File file = new File("C:\\Users\\Administrator\\Desktop\\答辩分组名单.xlsx");
            user.setFileData(Base64.encode(file));
            IndexRequest indexRequest = new IndexRequest().setPipeline("attachment").index("example")
                    .type("user").id(user.getId()).source(JSON.toJSONString(user), XContentType.JSON);
            restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT);
        } catch (Exception e) {
            e.printStackTrace();
        }

查询数据

高亮需要自己定义Mapper
逻辑看一下代码，都是简单的案例

    @GetMapping("detailByEs/{id}")
    public Page detailByEs(@PathVariable("id") String id) {
        NativeSearchQueryBuilder searchQueryBuilder = new NativeSearchQueryBuilder();
        HighlightBuilder highlightBuilder = new HighlightBuilder();
        highlightBuilder.field(new HighlightBuilder.Field("content"));
        searchQueryBuilder.withPageable(Pageable.unpaged())
                .withQuery(QueryBuilders.boolQuery().must(QueryBuilders.matchQuery("attachment.content", id)))
                .withFilter(QueryBuilders.rangeQuery("age").gt(11))
                .withHighlightBuilder(new HighlightBuilder().field("attachment.content"));

        return elasticsearchTemplate.queryForPage(
                searchQueryBuilder.build(), User.class, esResultMapper);
    }

@Component
public class EsResultMapper implements SearchResultMapper {

    @Override
    public  AggregatedPage mapResults(SearchResponse response, Class aClass, Pageable pageable) {
        // 记录总条数
        long totalHits = response.getHits().getTotalHits();
        // 记录列表(泛型) - 构建Aggregate使用
        List list = new ArrayList<>();
        // 获取搜索结果(真正的的记录)
        SearchHits hits = response.getHits();
        for (SearchHit hit : hits) {
            if (hits.getHits().length <= 0) {
                return null;
            }
            // 将原本的JSON对象转换成Map对象
            Map map = hit.getSourceAsMap();
            // 获取高亮的字段Map
            Map highlightFields = hit.getHighlightFields();
            for (Map.Entry highlightField : highlightFields.entrySet()) {
                // 获取高亮的Key
                String key = highlightField.getKey();
                String[] keys = key.split("\\.");
                key = keys[keys.length - 1];
                // 获取高亮的Value
                HighlightField value = highlightField.getValue();
                // 实际fragments[0]就是高亮的结果，无需遍历拼接
                Text[] fragments = value.getFragments();
                StringBuilder sb = new StringBuilder();
                for (Text text : fragments) {
                    sb.append(text);
                }
                // 因为高亮的字段必然存在于Map中，就是key值
                // 可能有一种情况，就是高亮的字段是嵌套Map，也就是说在Map里面还有Map的这种情况，这里没有考虑
                map.put(key, sb.toString().replaceAll("[\n\t]", ""));
            }

            // 把Map转换成对象
            map.put("attachment", "");
            T item = JSON.parseObject(JSONObject.toJSONString(map), aClass);
            list.add(item);
        }
        // 返回的是带分页的结果
        return new AggregatedPageImpl<>(list, pageable, totalHits);
    }

    @Override
    public  T mapSearchHit(SearchHit searchHit, Class type) {
        Map map = searchHit.getSourceAsMap();
        return JSON.parseObject(JSONObject.toJSONString(map), type);
    }

}

参考文章
SpringBoot 中使用ElasticSearch并高亮检索文档内容_小柒7的博客-CSDN博客
Using the Attachment Processor with arrays | Elasticsearch Plugins and Integrations [7.13] | Elastic