Elasticsearch7.6.x:京东搜索实战

文章目录

    • 项目介绍
    • 1. 爬虫
    • 2. 数据存储
    • 3.页面显示

项目介绍

该项目实战源自B站狂神说Java
视频Elasticsearch教程,包括爬虫、数据存储与搜索和页面展示三部分。
技术:jsoup、springboot、elasticsearch和vue
最终效果:
Elasticsearch7.6.x:京东搜索实战_第1张图片

1. 爬虫

爬虫部分使用jsoup对京东商城进行页面解析,爬取商品信息、价格和图片
爬取图片时,注意图片的懒加载,img标签的src为默认标签,真实加载图片在source-data-lazy-img下。

public class jsoupUtils {
    public static List<goods> getTargetGoods(String keywords) throws IOException {
        String url="https://search.jd.com/Search?keyword="+keywords;
        Document document = Jsoup.parse(new URL(url), 3000);
        Element list = document.getElementById("J_goodsList");
        //System.out.println(list.html());
        Elements li = list.getElementsByTag("li");
        List<goods> goodsArrayList = new ArrayList<>();
        for (Element element : li) {
            String img = element.getElementsByTag("img").eq(0).attr("source-data-lazy-img");
            String name = element.getElementsByClass("p-name").eq(0).text();
            String price = element.getElementsByClass("p-price").eq(0).text();
            goods goods = new goods();
            goods.setImg(img);
            goods.setName(name);
            goods.setPrice(price);
            goodsArrayList.add(goods);
        }
        return goodsArrayList;
    }
}

2. 数据存储

将用爬虫爬取的商品信息放入ES中存储


    public Boolean BulkGoods(String keywords) throws IOException {
        List<goods> goodsList = jsoupUtils.getTargetGoods(keywords);
        BulkRequest bulkRequest = new BulkRequest();
        bulkRequest.timeout("10s");
        for (int i = 0; i < goodsList.size(); i++) {
            bulkRequest.add(new IndexRequest("jd_goods").source(JSON.toJSONString(goodsList.get(i)), XContentType.JSON));
        }
        BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
        return !bulk.hasFailures();
    }

查询并对查找字段高亮


    public List<Map<String, Object>> SearchGoods(String keywords,int pageNum,int pageSize) throws Exception{
        if (pageNum<=0){
            pageNum=1;
        }
        SearchRequest jd_goods = new SearchRequest("jd_goods");
        //构建搜索条件
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        HighlightBuilder highlightBuilder = new HighlightBuilder();

        highlightBuilder.field("name");
        highlightBuilder.requireFieldMatch(false);
        highlightBuilder.preTags("");
        highlightBuilder.postTags("");
        searchSourceBuilder.highlighter(highlightBuilder);

        searchSourceBuilder.from(pageNum);
        searchSourceBuilder.size(pageSize);
        TermQueryBuilder queryBuilder = QueryBuilders.termQuery("name", keywords);
         searchSourceBuilder.query(queryBuilder);
        searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
         jd_goods.source(searchSourceBuilder);
        SearchResponse searchResponse = restHighLevelClient.search(jd_goods, RequestOptions.DEFAULT);
        ArrayList<Map<String, Object>> goodlist = new ArrayList<>();
        for (SearchHit hit : searchResponse.getHits().getHits()) {

            Map<String, HighlightField> highlightFields = hit.getHighlightFields();
            HighlightField name = highlightFields.get("name");
            Map<String, Object> sourceAsMap = hit.getSourceAsMap();
            if (name!=null){
                Text[] fragments = name.fragments();
                String title="";
                for (Text fragment : fragments) {
                    title+=fragment;
                }
                sourceAsMap.put("name",title);
            }
            goodlist.add(sourceAsMap);

        }
        return goodlist;
    }

3.页面显示

Controller层接口接受REST请求

  @GetMapping("/parse/{keyword}")
    public Boolean BulkIntoEs(@PathVariable("keyword") String keyword) throws IOException {
        return bulkService.BulkGoods(keyword);
    }

    @GetMapping("/search/{keyword}/{pageNum}/{pageSize}")
    public List<Map<String, Object>> SearchGoods(
            @PathVariable("keyword") String keyword,
            @PathVariable("pageNum") int pageNum,
            @PathVariable("pageSize") int pageSize
    ) throws Exception {
        return bulkService.SearchGoods(keyword,pageNum,pageSize);
    }

前端略
项目代码:
https://github.com/Icedzzz/ElasticsearchDemoAndProject

最终结果:
Elasticsearch7.6.x:京东搜索实战_第2张图片

你可能感兴趣的:(ELK)