Elasticsearch—jd实战

Elasticsearch—jd实战

此项目为了训练ElasticSearch的熟练度,通过爬虫获取jd的数据,完成数据库查询

本章资料: https://pan.baidu.com/s/1fu_KHu5VCBKorbgJgbvVwA 提取码:3ij8

一、创建Springboot项目

二、编写代码

1、导入依赖

  •     <properties>
            <java.version>1.8</java.version>
            <elasticsearch.version>7.8.0</elasticsearch.version>
        </properties>
        <dependencies>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-data-elasticsearch</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-thymeleaf</artifactId>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-web</artifactId>
            </dependency>
    
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-devtools</artifactId>
                <scope>runtime</scope>
                <optional>true</optional>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-configuration-processor</artifactId>
                <optional>true</optional>
            </dependency>
            <dependency>
                <groupId>org.projectlombok</groupId>
                <artifactId>lombok</artifactId>
                <optional>true</optional>
            </dependency>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-starter-test</artifactId>
                <scope>test</scope>
            </dependency>
            <!-- jsoup解析页面 -->
            <!-- 解析网页 爬视频可 研究tiko -->
            <dependency>
                <groupId>org.jsoup</groupId>
                <artifactId>jsoup</artifactId>
                <version>1.10.2</version>
            </dependency>
            <!-- fastjson -->
            <dependency>
                <groupId>com.alibaba</groupId>
                <artifactId>fastjson</artifactId>
                <version>1.2.70</version>
            </dependency>
        </dependencies>
    

2、导入前端素材

3、编写application.preperties配置文件

  • #端口号
    server.port=8011
    #关闭网页缓存
    spring.thymeleaf.cache=false
    

4、编写IndexController、测试index

  • @Controller
    public class IndexController {
           
    
        @GetMapping({
           "/", "index"})
        public String index(){
           
            return "index";
        }
    }
    
  • Elasticsearch—jd实战_第1张图片

5、编写爬虫

  1. 分析京东搜索页面

    • http://search.jd.com/search?keyword=java
    • Elasticsearch—jd实战_第2张图片
    • 审查页面元素
      • 页面列表id:J_goodsList
      • Elasticsearch—jd实战_第3张图片
      • Elasticsearch—jd实战_第4张图片
  2. 爬取数据(获取请求返回的页面信息,筛选出可用的)

    • 创建HtmlParseUtil,并简单编写

    • public class HtmlParseUtil {
               
          public static void main(String[] args) throws IOException {
               
              /// 使用前需要联网
              // 请求url
              String url = "http://search.jd.com/search?keyword=java";
              // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)
              Document document = Jsoup.parse(new URL(url), 30000);
              // 使用document可以使用在js对document的所有操作
              // 2.获取元素(通过id)
              Element j_goodsList = document.getElementById("J_goodsList");
              // 3.获取J_goodsList ul 每一个 li
              Elements lis = j_goodsList.getElementsByTag("li");
              // 4.获取li下的 img、price、name
              for (Element li : lis) {
               
                  String img = li.getElementsByTag("img").eq(0).attr("src");// 获取li下 第一张图片
                  String name = li.getElementsByClass("p-name").eq(0).text();
                  String price = li.getElementsByClass("p-price").eq(0).text();
      
                  System.out.println("=======================");
                  System.out.println("img : " + img);
                  System.out.println("name : " + name);
                  System.out.println("price : " + price);
              }
          }
      }
      
    • 一般图片特别多的网站,所有的图片都是通过延迟加载的

    • // 打印标签内容
      Elements lis = j_goodsList.getElementsByTag("li");
      System.out.println(lis);
      
    • 打印所有li标签,发现img标签中并没有属性src的设置,只是data-lazy-ing设置图片加载的地址

    • Elasticsearch—jd实战_第5张图片

  3. 实现爬取JD数据

    1. 创建实体类

      • @Data
        @AllArgsConstructor
        @NoArgsConstructor
        public class Content implements Serializable {
                   
            private String name;
            private String img;
            private String price;
        }
        
    2. 封装工具栏

      • public class HtmlParseUtil {
                   
            public static void main(String[] args) throws IOException {
                   
                System.out.println(parseJD("java"));
            }
        
        
            public static List<Content> parseJD(String keyword) throws IOException {
                   
                /// 使用前需要联网
                // 请求url
                String url = "http://search.jd.com/search?keyword=" + keyword;
                // 1.解析网页(jsoup 解析返回的对象是浏览器Document对象)
                Document document = Jsoup.parse(new URL(url), 30000);
                // 使用document可以使用在js对document的所有操作
                // 2.获取元素(通过id)
                Element j_goodsList = document.getElementById("J_goodsList");
                //j_goodsList 如果这里为空   用下面这个方法
                //Document document = Jsoup.connect(url).userAgent("Mozilla/5.0 (Windows NT 5.1; zh-CN) AppleWebKit/535.12 (KHTML, like Gecko) Chrome/22.0.1229.79 Safari/535.12").timeout(30000).get();
                // 3.获取J_goodsList ul 每一个 li
                Elements lis = j_goodsList.getElementsByTag("li");
                // System.out.println(lis);
                // 4.获取li下的 img、price、name
                // list存储所有li下的内容
                List<Content> contents = new ArrayList<Content>();
                for (Element li : lis) {
                   
                    // 由于网站图片使用懒加载,将src属性替换为data-lazy-img
                    String img = li.getElementsByTag("img").eq(0).attr("data-lazy-img");// 获取li下 第一张图片
                    String name = li.getElementsByClass("p-name").eq(0).text();
                    String price = li.getElementsByClass("p-price").eq(0).text();
                    // 封装为对象
                    Content content = new Content(name,img,price);
                    // 添加到list中
                    contents.add(content);
                }
                // System.out.println(contents);
                // 5.返回 list
                return contents;
            }
        }
        
      • 操作响应结果:

        • Elasticsearch—jd实战_第6张图片

6、编写config

  • @Configuration
    public class ElasticSearchConfig {
           
    
        @Bean
        public RestHighLevelClient restHighLevelClient(){
           
            RestHighLevelClient client = new RestHighLevelClient(
                    RestClient.builder(
                        new HttpHost("127.0.0.1", 9200, "http")
                    )
            );
            return client;
        }
    }
    

7、编写Service

  • @Service
    public class ContentService {
           
    
        @Autowired
        private RestHighLevelClient client;
    
    
        // 1、解析数据放入 es 索引中
        public Boolean parseContent(String keyword) throws IOException {
           
            // 获取内容
            List<Content> contents = HtmlParseUtil.parseJD(keyword);
            // 内容放入 es 中
            BulkRequest request = new BulkRequest();
            request.timeout("2m"); // 可更具实际业务是指
            for (int i = 0; i < contents.size(); i++) {
           
                request.add(
                        new IndexRequest("jd_goods")
                                .id(""+(i+1))
                                .source(JSON.toJSONString(contents.get(i)), XContentType.JSON)
                );
            }
            BulkResponse responses = client.bulk(request, RequestOptions.DEFAULT);
            return !responses.hasFailures();
        }
    
        // 2、根据keyword分页查询结果
        public List<Map<String, Object>> search(String keyword, Integer pageIndex, Integer pageSize) throws IOException {
           
            if (pageIndex < 0){
           
                pageIndex = 0;
            }
    
            //高级查询-请求对象
            SearchRequest request = new SearchRequest("jd_goods");
            // 构建查询的请求体
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
            // 构建精确查询请求——>通过keyword查字段name
            TermQueryBuilder termQueryBuilder = QueryBuilders.termQuery("name", keyword);
            //把精确查询请求放入请求体
            searchSourceBuilder.query(termQueryBuilder);
            searchSourceBuilder.timeout(new TimeValue(60, TimeUnit.SECONDS));// 60s
            // 分页
            searchSourceBuilder.from(pageIndex);
            searchSourceBuilder.size(pageSize);
    
            // 高亮
            HighlightBuilder highlightBuilder = new HighlightBuilder();
            highlightBuilder.field("name");
            highlightBuilder.preTags("");
            highlightBuilder.postTags("");
            searchSourceBuilder.highlighter(highlightBuilder);
    
            request.source(searchSourceBuilder);
            SearchResponse searchResponse = client.search(request, RequestOptions.DEFAULT);
            // 解析结果 ==========
            SearchHits hits = searchResponse.getHits();
            List<Map<String, Object>> results = new ArrayList<>();
            for (SearchHit documentFields : hits.getHits()) {
           
                // 使用新的字段值(高亮),覆盖旧的字段值
                Map<String, Object> sourceAsMap = documentFields.getSourceAsMap();
                // 高亮字段
                Map<String, HighlightField> highlightFields = documentFields.getHighlightFields();
                HighlightField name = highlightFields.get("name");
                // 替换
                if (name != null){
           
                    Text[] fragments = name.fragments();
                    StringBuilder new_name = new StringBuilder();
                    for (Text text : fragments) {
           
                        new_name.append(text);
                    }
                    sourceAsMap.put("name",new_name.toString());
                }
                results.add(sourceAsMap);
            }
            return results;
        }
    }
    

8、编写Controller

  • @RestController
    public class ContentController {
           
    
        @Autowired
        private ContentService contentService;
    
        @GetMapping("/parse/{keyword}")
        public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
           
            return contentService.parseContent(keyword);
        }
    
        @GetMapping("/search/{keyword}/{pageIndex}/{pageSize}")
        public List<Map<String, Object>> parse(@PathVariable("keyword") String keyword,
                                               @PathVariable("pageIndex") Integer pageIndex,
                                               @PathVariable("pageSize") Integer pageSize) throws IOException {
           
            return contentService.search(keyword,pageIndex,pageSize);
        }
    }
    

9、测试结果

  • Elasticsearch—jd实战_第7张图片
  • Elasticsearch—jd实战_第8张图片
  • Elasticsearch—jd实战_第9张图片

三、前后端分离(简单使用vue)

1、下载并引入Vue.min.js和axios.js

  1. 如果安装了nodejs,可以按如下步骤,没有自行下载

  2. 自行创建一个文件夹打开命令行执行以下代码

  3. npm install vue
    npm install axios
    
  4. Elasticsearch—jd实战_第10张图片

  5. Elasticsearch—jd实战_第11张图片

2、在页面引入资源

  1. <script th:src="@{/js/vue.min.js}">script>
    <script th:src="@{/js/axios.min.js}">script>	
    
  2. 前端全代码如下:

    • DOCTYPE html>
      <html xmlns:th="http://www.thymeleaf.org">
      
      <head>
          <meta charset="utf-8"/>
          <title>狂神说Java-ES仿京东实战title>
          <link rel="stylesheet" th:href="@{/css/style.css}"/>
      head>
      
      <body class="pg">
      <div class="page" id="app">
          <div id="mallPage" class=" mallist tmall- page-not-market ">
      
              
              <div id="header" class=" header-list-app">
                  <div class="headerLayout">
                      <div class="headerCon ">
                          
                          <h1 id="mallLogo">
                              <img th:src="@{/images/jdlogo.png}" alt="">
                          h1>
      
                          <div class="header-extra">
      
                              
                              <div id="mallSearch" class="mall-search">
                                  <form name="searchTop" class="mallSearch-form clearfix">
                                      <fieldset>
                                          <legend>天猫搜索legend>
                                          <div class="mallSearch-input clearfix">
                                              <div class="s-combobox" id="s-combobox-685">
                                                  <div class="s-combobox-input-wrap">
                                                      <input v-model="keyword" type="text" autocomplete="off" value="dd" id="mq"
                                                             class="s-combobox-input" aria-haspopup="true">
                                                  div>
                                              div>
                                              <button type="submit" @click.prevent="searchKey" id="searchbtn">搜索button>
                                          div>
                                      fieldset>
                                  form>
                                  <ul class="relKeyTop">
                                      <li><a>狂神说Javaa>li>
                                      <li><a>狂神说前端a>li>
                                      <li><a>狂神说Linuxa>li>
                                      <li><a>狂神说大数据a>li>
                                      <li><a>狂神聊理财a>li>
                                  ul>
                              div>
                          div>
                      div>
                  div>
              div>
      
              
              <div id="content">
                  <div class="main">
                      
                      <form class="navAttrsForm">
                          <div class="attrs j_NavAttrs" style="display:block">
                              <div class="brandAttr j_nav_brand">
                                  <div class="j_Brand attr">
                                      <div class="attrKey">
                                          品牌
                                      div>
                                      <div class="attrValues">
                                          <ul class="av-collapse row-2">
                                              <li><a href="#"> 狂神说 a>li>
                                              <li><a href="#"> Java a>li>
                                          ul>
                                      div>
                                  div>
                              div>
                          div>
                      form>
      
                      
                      <div class="filter clearfix">
                          <a class="fSort fSort-cur">综合<i class="f-ico-arrow-d">i>a>
                          <a class="fSort">人气<i class="f-ico-arrow-d">i>a>
                          <a class="fSort">新品<i class="f-ico-arrow-d">i>a>
                          <a class="fSort">销量<i class="f-ico-arrow-d">i>a>
                          <a class="fSort">价格<i class="f-ico-triangle-mt">i><i class="f-ico-triangle-mb">i>a>
                      div>
      
                      
                      <div class="view grid-nosku">
      
                          <div class="product" v-for="result in results">
                              <div class="product-iWrap">
                                  
                                  <div class="productImg-wrap">
                                      <a class="productImg">
                                          <img :src="result.img">
                                      a>
                                  div>
                                  
                                  <p class="productPrice">
                                      <em>{
              {result.price}}em>
                                  p>
                                  
                                  <p class="productTitle">
                                      <a v-html="result.name"> a>
                                  p>
                                  
                                  <div class="productShop">
                                      <span>店铺: 狂神说Java span>
                                  div>
                                  
                                  <p class="productStatus">
                                      <span>月成交<em>999笔em>span>
                                      <span>评价 <a>3a>span>
                                  p>
                              div>
                          div>
                      div>
                  div>
              div>
          div>
      div>
      <script th:src="@{/js/axios.min.js}">script>
      <script th:src="@{/js/vue.min.js}">script>
      <script>
          new Vue({
                 
              el: '#app',
              data:{
                 
                  keyword: '',
                  results:[]
              },
              methods:{
                 
                  searchKey(){
                 
                      let keyword=this.keyword;
                      console.log(keyword);
                      ///search/{keyword}/{pageNum}/{pageSize}
                      axios.get('search/'+keyword+"/1/20").then(response=>{
                 
                          console.log(response);
                          this.results=response.data;
                      });
                  }
              }
      
          })
      script>
      
      body>
      html>
      

四、最终测试结果

  • Elasticsearch—jd实战_第12张图片

你可能感兴趣的:(ElasticSearch,java,elasticsearch,springboot)