ElasticSearch仿京东搜索

1.新建项目框架,导入依赖前端素材
<dependencies>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-data-elasticsearchartifactId>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-thymeleafartifactId>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-webartifactId>
        dependency>
        
        <dependency>
            <groupId>org.jsoupgroupId>
            <artifactId>jsoupartifactId>
            <version>1.11.2version>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-devtoolsartifactId>
            <scope>runtimescope>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-devtoolsartifactId>
            <scope>runtimescope>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-configuration-processorartifactId>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.projectlombokgroupId>
            <artifactId>lombokartifactId>
            <optional>trueoptional>
        dependency>
        <dependency>
            <groupId>org.springframework.bootgroupId>
            <artifactId>spring-boot-starter-testartifactId>
            <scope>testscope>
        dependency>
        <dependency>
            <groupId>com.alibabagroupId>
            <artifactId>fastjsonartifactId>
            <version>1.2.74version>
        dependency>
    dependencies>

ElasticSearch仿京东搜索_第1张图片
项目结构:
ElasticSearch仿京东搜索_第2张图片

2.配置文件中定义访问接口
server.port=9090
spring.thymeleaf.cache=false

编写控制层

@Controller
public class IndexController {

    @RequestMapping({"/","/index"})
    public String index(){
        return "index";
    }
}

访问页面测试
ElasticSearch仿京东搜索_第3张图片

3.爬取网页数据

新建工具类和实体类,工具类用于爬取网页的数据,实体类用于存储爬取到的数据。

@Data
@NoArgsConstructor
@AllArgsConstructor
public class PaperContent {
    private String img;
    private String price;
    private String title;
}

首先我们需要查看网页的前台结构
ElasticSearch仿京东搜索_第4张图片
ElasticSearch仿京东搜索_第5张图片

编写爬取网页的工具类,对比网页前端的网页结构来编写代码

    public List<PaperContent> parseHtmlElement(String keyWord) throws Exception {
        List<PaperContent> paperContents=new ArrayList<>();
        //keyword用于需要查询的关键字
        String url="http://search.jd.com/Search?keyword="+keyWord;
        Document document = Jsoup.parse(new URL(url), 30000);
        //根据id获取文档
        Element element = document.getElementById("J_goodsList");
//        System.out.println(element);
        //获取标签
        Elements elements = element.getElementsByTag("li");
//        System.out.println(elements);
        //获取元素内容,这里el就是就是
  • 标签 for (Element el:elements) { PaperContent paperContent=new PaperContent(); //获取li标签中的图片信息 //图片是通过懒加载的,并不是直接通过src来获取的 String img = el.getElementsByTag("img").eq(0).attr("data-lazy-img"); //获取li标签中的价格信息 String price = el.getElementsByClass("p-price").eq(0).text(); //获取li标签中的书名信息 String title = el.getElementsByClass("p-name").eq(0).text(); System.out.println("==============================="); System.out.println(img); System.out.println(price); System.out.println(title); paperContent.setImg(img); paperContent.setPrice(price); paperContent.setTitle(title); paperContents.add(paperContent); } return paperContents; }
  • 测试:

    public static void main(String[] args) throws Exception {
            new PageParse().parseHtmlElement("java").forEach(System.out::println);
        }
    

    后台查看是否爬取到数据:
    ElasticSearch仿京东搜索_第6张图片

    4.将爬取到的数据放入ElasticSearch中

    编写配置类,连接本地Elasticsearch

    @Configuration
    public class ElasticSearchConfig {
    
        @Bean
        public RestHighLevelClient restHighLevelClient(){
            RestHighLevelClient client=new RestHighLevelClient(
                    RestClient.builder(new HttpHost("127.0.0.1",9200,"http"))
            );
              return client;
        }
    }
    

    编写Service层,将爬取到的数据放入ElasticSearch中前提是索引存在。

        @Autowired
        private RestHighLevelClient restHighLevelClient;
    
    //解析数据放到elasticSearch
        public Boolean parseContent(String keyword) throws Exception {
            //查询到的数据
            List<PaperContent> paperContents = new PageParse().parseHtmlElement(keyword);
            BulkRequest bulkRequest=new BulkRequest();
            bulkRequest.timeout("2m");
    
            //将查询到的数据放入容器中
            for (int i = 0; i < paperContents.size(); i++) {
                bulkRequest.add(new IndexRequest("jd_commodity")
                        .source(JSON.toJSONString(paperContents.get(i)), XContentType.JSON));
            }
    
            //将请求中大哥数据放入ElasticSearch(批量插入)
            BulkResponse bulk = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
            return !bulk.hasFailures();
        }
    

    编写controller层,与前端结合

     @RequestMapping("/parse/{keyword}")
        public Boolean pushElk(@PathVariable("keyword") String keyword) throws Exception {
            return jdService.parseContent(keyword);
        }
    

    访问地址运行测试:
    ElasticSearch仿京东搜索_第7张图片
    查看elasticsearch是否已经存在数据
    ElasticSearch仿京东搜索_第8张图片

    5.将ElasticSearch中的数据显示到前端页面
     //从elasticSearch中获取数据并且高亮显示
        public List<Map<String,Object>> getContent(String keyword,int pageNo,int pageSize) throws IOException {
            if (pageNo<=1){
                pageNo=1;
            }
            //条件搜索
            SearchRequest searchRequest=new SearchRequest("jd_commodity");
            SearchSourceBuilder searchSourceBuilder=new SearchSourceBuilder();
            //分页
            searchSourceBuilder.from(pageNo);
            searchSourceBuilder.size(pageSize);
            //精准查询
            TermQueryBuilder termQueryBuilder= QueryBuilders.termQuery("title",keyword);
            searchSourceBuilder.query(termQueryBuilder);
            searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));
            searchRequest.source(searchSourceBuilder);
            SearchResponse searchs = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
            //解析结果
            List<Map<String,Object>> list=new ArrayList<>();
            for (SearchHit hit:searchs.getHits().getHits()) {
                list.add(hit.getSourceAsMap());
            }
            return list;
        }
    

    contoller层调用方法

    @RequestMapping("/parse/{keyword}/{pageNo}/{pageSize}")
        public List<Map<String,Object>> getElk(@PathVariable("keyword") String keyword,
                                               @PathVariable("pageNo") int pageNo,
                                               @PathVariable("pageSize") int pageSize) throws IOException {
            return jdService.getContent(keyword, pageNo, pageSize);
        }
    

    访问设置的地址访问
    ElasticSearch仿京东搜索_第9张图片
    与前端结合:
    (html代码)

    DOCTYPE html>
    <html xmlns:th="http://www.thymeleaf.org">
    <head>
        <meta charset="utf-8"/>
        <title>ElasticSearch仿京东实战title>
        <link rel="stylesheet" th:href="@{/css/style.css}"/>
        <script th:src="@{/js/jquery.min.js}">script>
    head>
    <body class="pg">
    <div class="page">
        <div id="app" class=" mallist tmall- page-not-market ">
            
            <div id="header" class=" header-list-app">
                <div class="headerLayout">
                    <div class="headerCon ">
                        
                        <h1 id="mallLogo">
                            <img th:src="@{/images/jdlogo.png}" alt="">
                        h1>
                        <div class="header-extra">
                            
                            <div id="mallSearch" class="mall-search">
                                <form name="searchTop" class="mallSearch-form clearfix">
                                    <fieldset>
                                        <legend>天猫搜索legend>
                                        <div class="mallSearch-input clearfix">
                                            <div class="s-combobox" id="s-combobox-685">
                                                <div class="s-combobox-input-wrap">
                                                    <input v-model="keyword" type="text" autocomplete="off" id="mq"
                                                           class="s-combobox-input" aria-haspopup="true">
                                                div>
                                            div>
                                            <button type="submit" @click.prevent="searchKey" id="searchbtn">搜索button>
                                        div>
                                    fieldset>
                                form>
                                <ul class="relKeyTop">
                                    <li><a>Javaa>li>
                                    <li><a>前端a>li>
                                    <li><a>Linuxa>li>
                                    <li><a>大数据a>li>
                                    <li><a>ElasticSearcha>li>
                                ul>
                            div>
                        div>
                    div>
                div>
            div>
            
            <div id="content">
                <div class="main">
                    
                    <form class="navAttrsForm">
                        <div class="attrs j_NavAttrs" style="display:block">
                            <div class="brandAttr j_nav_brand">
                                <div class="j_Brand attr">
                                    <div class="attrKey">
                                        品牌
                                    div>
                                    <div class="attrValues">
                                        <ul class="av-collapse row-2">
                                            <li><a href="#"> 王说 a>li>
                                            <li><a href="#"> Java a>li>
                                        ul>
                                    div>
                                div>
                            div>
                        div>
                    form>
                    
                    <div class="filter clearfix">
                        <a class="fSort fSort-cur">综合<i class="f-ico-arrow-d">i>a>
                        <a class="fSort">人气<i class="f-ico-arrow-d">i>a>
                        <a class="fSort">新品<i class="f-ico-arrow-d">i>a>
                        <a class="fSort">销量<i class="f-ico-arrow-d">i>a>
                        <a class="fSort">价格<i class="f-ico-triangle-mt">i><i class="f-ico-triangle-mb">i>a>
                    div>
                    
                    <div class="view grid-nosku">
                        <div class="product" v-for="result in results">
                            <div class="product-iWrap">
                                <!商品封面>
                                <div class="productImg-wrap">
                                    <a class="productImg">
                                        <img :src="result.img">
                                    a>
                                div>
                                <!价格>
                                <p class="productPrice">
                                    <em v-text="result.price">em>
                                p>
                                <!标题>
                                <p class="productTitle">
                                    <a v-html="result.title">a>
                                p>
                                <! 店铺名 >
                                <div class="productShop">
                                    <span v-text="result.shopnum">span>
                                div>
                                <! 成交信息 >
                                <p class="productStatus">
                                    <span>月成交<em>999笔em>span>
                                    <span>评价 <a>3a>span>
                                p>
                            div>
                        div>
                    div>
                div>
            div>
        div>
    div>
    <script src="https://unpkg.com/axios/dist/axios.min.js">script>
    <script src="https://cdn.jsdelivr.net/npm/vue@2/dist/vue.js">script>
    <script>
        new Vue({
            el: "#app",
            data: {
                "keyword": '', // 搜索的关键字
                "results": [] // 后端返回的结果
            },
            methods: {
                searchKey() {
                    var keyword = this.keyword;
                    console.log(keyword);
                    // axios.get('searchHighLight/' + keyword + '/0/20').then(response => {
                    axios.get('parse/' + keyword + '/0/20').then(response => {
                        console.log(response);
                        this.results = response.data;
                    })
                }
            }
        });
    script>
    body>
    html>
    

    测试:
    可以检索出相关数据
    ElasticSearch仿京东搜索_第10张图片

    高亮设置

    高亮:设置高亮字段,替换原来的字段
    向读取内容的方法中添加以下代码,
    ElasticSearch仿京东搜索_第11张图片
    ElasticSearch仿京东搜索_第12张图片
    测试:
    ElasticSearch仿京东搜索_第13张图片
    完成。

    你可能感兴趣的:(Java项目练手,java,elasticsearch,批量查询插入,爬虫)