Elastic Search 学习笔记

ES

什么是ES?

ES是一个开源的高拓展分布式全文搜索引擎,是整个Elastic Stack的核心。

一、ElasticSearch HTTP操作

Elasticsearch 是面向文档型数据库,一条数据在这里就是一个文档。我们将 Elasticsearch 里存储文档数据和关系型数据库 MySQL 存储数据的概念进行一个类比。

Elastic Search 学习笔记_第1张图片

倒排索引:inverted index 根据名字查id

1、索引操作

1)创建索引

 

对比关系型数据库,创建索引就等同于创建数据库

在 Postman 中,向 ES 服务器发 PUT 请求 :http://127.0.0.1:9200/shopping

2)查看所有索引

在 Postman 中,向 ES 服务器发 GET 请求 :http://127.0.0.1:9200/_cat/indices?v

3)查看单个索引

在 Postman 中,向 ES 服务器发 GET 请求 :http://127.0.0.1:9200/shopping

4) 删除索引

在 Postman 中,向 ES 服务器发 DELETE 请求 :http://127.0.0.1:9200/shopping

PUT是幂等性的,但是POST不是幂等性的[每次返回的id不一样]

2、文档操作

1)创建文档

索引已经创建好了,接下来我们来创建文档,并添加数据。这里的文档可以类比为关系型数 据库中的表数据,添加的数据格式为 JSON 格式。

在 Postman 中,向 ES 服务器发 POST 请求 :http://127.0.0.1:9200/shopping/_doc

POST http://127.0.0.1:9200/shopping/_doc/1001 ---> 固定返回ID

请求体内容为:

{ "title":"小米手机", "category":"小米", "images":"http://www.gulixueyuan.com/xm.jpg", "price":3999.00

}

2) 查询文档

GET http://127.0.0.1:9200/shopping/_doc/1001

查询所有:http://127.0.0.1:9200/shopping/_search 注意查询所有 400-错误 --> body不能有数据

3) 覆盖文档

PUT更新所有数据 http://127.0.0.1:9200/shopping/_doc/1001

{
​
  "title":"华为手机",
​
  "category":"小米",
​
  "images":"http://www.gulixueyuan.com/xm.jpg",
​
  "price":3999.00
​
}

POST更新局部数据 http://127.0.0.1:9200/shopping/_update/1001

{
​
  "doc" : {
​
    "title":"华为"
​
  }
​
}

DELETE删除http://127.0.0.1:9200/shopping/_doc/1002

4) 条件查询

GET条件查询http://127.0.0.1:9200/shopping/_search?q=category:小米

或者 http://127.0.0.1:9200/shopping/_search

{
    "query":{
        "match":{
            "category":"小米"
        }
    }
}

5) 分页查询所有

from计算: (当前页-1)*每页条数

{
    "query":{
        "match_all":{
           
        }
    },
    "from": 0,
    "size":2
}

6)查询全部索引的数据 / 并排序

{
    "query":{
        "match_all":{
           
        }
    },
    "from": 0,
    "size":2,
    "_source":["title"]
}
{
    "query":{
        "match_all":{
           
        }
    },
    "from": 0,
    "size":2,
    "sort":{
        "price":{
            "order":"desc"
        }
    }
}

7)多条件查询

must 必须两个条件同时成立

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "category":"小米"
                    }
                },
                {
                         "match": {
                        "price":"3999"
                    }
                }
            ]
        }
    }
}

should满足其一

{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "category":"小米"
                    }
                },
                {
                         "match": {
                         "title":"华为"
                    }
                }
            ]
        }
    }
}

8)返回查询

{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "category":"小米"
                    }
                },
                {
                         "match": {
                         "title":"华为"
                    }
                }
            ],
            "filter" : {
                "range": {
                    "price":{
                        "gte":"3998"
                    }
                }
            }
        }
    }
}

9)高亮查询

match 全文检索

match_phrase 完全匹配

{
    "query": {
        "match_phrase": {
            "category":"小米"
        }
    },
        "highlight": {
            "fields": {
                "category":{}
            }
        }
}

10)聚合查询

{ 
    "aggs": {   //聚合操作 
        "price_group": {  //随意起名
            "terms" : {  //分组 
                "field" : "price" //字段
            }
        }
    },
    //不要原始数据
    "size" : 0
}

11)映射关系

PUT http://127.0.0.1:9200/user/

PUT http://127.0.0.1:9200/user/_mapping
Elastic Search 学习笔记_第2张图片

 

index:是否索引,默认为 true。 true:字段会被索引,则可以用来进行搜索 false:字段不会被索引,不能用来搜索

store:是否将数据进行独立存储,默认为 false。当然你也可以独立的存储某个字段,只要设置"store": true 即可,获取独立存储的字段要比从_source 中解析快得多,但是也会占用更多的空间,所以要根据实际业务需求来设置。

analyzer:分词器 ,后面会单独写博客说明

{
     "properties": {
         "name" : {
             "type": "text",
             "index":true
         },
         "sex" : {
           "type":"keyword",
            "index" : true
         },
         "tel" : {
             "type" : "keyword",
             "index" : false
         }
     }
}

POSThttp://127.0.0.1:9200/user/_doc/1001

{
     "name":"ansel",
     "sex": "male",
     "tel" : "1234"
}

GET

 {
   "query": {
       "match": {
           "name": "a"
       }
   }
 }

二、JAVA API


    
        org.elasticsearch
        elasticsearch
        7.8.0


    
        org.elasticsearch.client
        elasticsearch-rest-high-level-client
        7.8.0
    

    
        org.apache.logging.log4j
        log4j-api
        2.8.2
    
    
        org.apache.logging.log4j
        log4j-core
        2.8.2
    

    
        com.fasterxml.jackson.core
        jackson-databind
        2.9.9
    

    
        junit
        junit
        4.12
    

1.创建索引

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/15
 @Description
 */
public class client_index_create {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient =
            new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
​
    //创建索引
    CreateIndexRequest request = new CreateIndexRequest("hero");
​
   // CreateIndexRequest request = new CreateIndexRequest("user");
    CreateIndexResponse createIndexResponse =
            esClient.indices().create(request, RequestOptions.DEFAULT);
​
​
    System.out.println("响应操作 ===>" + createIndexResponse.isAcknowledged());
    //2.关闭
    esClient.close();
  }
}

2.查询索引

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.CreateIndexRequest;
import org.elasticsearch.client.indices.CreateIndexResponse;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/15
 @Description
 */
public class client_index_search {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient =
            new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
​
    //创建索引
    GetIndexRequest request = new GetIndexRequest("student");
​
​
    GetIndexResponse getIndexResponse = esClient.indices().get(request, RequestOptions.DEFAULT);
​
      //删除索引
      DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("hero");
    AcknowledgedResponse delete
            = esClient.indices().delete(deleteIndexRequest, RequestOptions.DEFAULT);
      
    System.out.println(getIndexResponse.getAliases());
    System.out.println(getIndexResponse.getMappings());
    System.out.println(getIndexResponse.getSettings());
    //2.关闭
    esClient.close();
  }
}

3.插入数据

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.http.HttpHost;
import org.elasticsearch.action.admin.indices.delete.DeleteIndexRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.support.master.AcknowledgedResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.client.indices.GetIndexRequest;
import org.elasticsearch.client.indices.GetIndexResponse;
import org.elasticsearch.common.xcontent.XContentType;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/15
 @Description
 */
public class
client_doc_insert {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient =
            new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
​
    //插入数据
    IndexRequest request = new IndexRequest();
    request.index("hero").id("1001");
    Hero hero = new Hero();
    hero.setAge(10);
    hero.setName("Jack");
    hero.setSex("male");
    //向ES插入数据必须是JSON格式
    ObjectMapper mapper = new ObjectMapper();
    String str = mapper.writeValueAsString(hero);
    request.source(str, XContentType.JSON);
​
    IndexResponse response = esClient.index(request, RequestOptions.DEFAULT);
    System.out.println(response.getResult());
​
    //2.关闭
    esClient.close();
  }
}

4.修改数据

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import com.fasterxml.jackson.databind.ObjectMapper;
import org.apache.http.HttpHost;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.action.index.IndexResponse;
import org.elasticsearch.action.update.UpdateRequest;
import org.elasticsearch.action.update.UpdateResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/15
 @Description
 */
public class
client_doc_update {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient =
            new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
​
   //更改数据
    UpdateRequest request = new UpdateRequest();
    request.index("hero").id("1001");
    request.doc(XContentType.JSON,"sex", "Female");
​
    UpdateResponse response = esClient.update(request, RequestOptions.DEFAULT);
    System.out.println("result = " + response.getResult());
​
​
    //2.关闭
    esClient.close();
  }
}

5.查询数据

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.document.DocumentField;
​
import java.io.IOException;
import java.util.List;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class Client_doc_Get {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient = new RestHighLevelClient
            (RestClient.builder(new HttpHost("localhost", 9200, "http")));
​
    //查询
    GetRequest request = new GetRequest();
    request.index("hero").id("1001");
    GetResponse response = esClient.get(request, RequestOptions.DEFAULT);
    System.out.println(response.getSourceAsString());
​
    //关流
    esClient.close();
  }
}

6.删除数据

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.delete.DeleteResponse;
import org.elasticsearch.action.get.GetRequest;
import org.elasticsearch.action.get.GetResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class Client_doc_Delete {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient = new RestHighLevelClient
            (RestClient.builder(new HttpHost("localhost", 9200, "http")));
​
    //查询
    DeleteRequest request = new DeleteRequest();
    request.index("hero").id("1002");
    DeleteResponse response = esClient.delete(request, RequestOptions.DEFAULT);
    System.out.println(response.toString());
​
    //关流
    esClient.close();
  }
}

7.批量添加

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
​
import org.apache.http.HttpHost;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class Client_doc_bulk {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient = new RestHighLevelClient
            (RestClient.builder(new HttpHost("localhost", 9200, "http")));
​
    BulkRequest request = new BulkRequest();
​
​
​
    for (int i = 3; i < 7; i++) {
         IndexRequest indexReq = new IndexRequest();
         indexReq.index("hero").id("100" + i).source(XContentType.JSON,"name", "ID 100" + i);
         request.add(indexReq);
    }
​
    BulkResponse response = esClient.bulk(request, RequestOptions.DEFAULT);
    //花费时间
    System.out.println(response.getTook());
    System.out.println(response.getItems());
    esClient.close();
  }
}

8.批量删除

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
​
import org.apache.http.HttpHost;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.bulk.BulkResponse;
import org.elasticsearch.action.delete.DeleteRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.common.xcontent.XContentType;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class Client_doc_bulk_delete {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient = new RestHighLevelClient
            (RestClient.builder(new HttpHost("localhost", 9200, "http")));
​
    BulkRequest request = new BulkRequest();
​
​
​
    for (int i = 3; i < 7; i++) {
        DeleteRequest deleteRequest = new DeleteRequest();
        deleteRequest.index("hero").id("100" + i);
        request.add(deleteRequest);
    }
​
    BulkResponse response = esClient.bulk(request, RequestOptions.DEFAULT);
    //花费时间
    System.out.println(response.getTook());
    System.out.println(response.getItems());
​
    //关流
    esClient.close();
  }
}

9.全部查询

package com.ansel.esdemo.test;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
import org.elasticsearch.search.SearchHits;
import org.elasticsearch.search.builder.SearchSourceBuilder;
​
import java.io.IOException;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class Client_doc_query {
  public static void main(String[] args) throws IOException {
    //1.创建客户端
    RestHighLevelClient esClient =
            new RestHighLevelClient(RestClient.builder(new HttpHost("localhost",9200, "http")));
​
    //查询索引中所有数据
​
    SearchRequest request = new SearchRequest();
    request.indices("hero", "student");
​
    SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
    searchSourceBuilder.query(QueryBuilders.matchAllQuery());
    request.source(searchSourceBuilder);
​
    SearchResponse response = esClient.search(request, RequestOptions.DEFAULT);
    SearchHits hits = response.getHits();
    System.out.println(response.getTook());
    for (SearchHit hit : hits) {
      System.out.println(hit.getSourceAsString());
    }
​
    esClient.close();
  }
}

10.条件查询

SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.termQuery("age", 6));
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
​
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  System.out.println(hit.getSourceAsString());
}

11.分页查询

SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.query(QueryBuilders.matchAllQuery());
searchSourceBuilder.from(0);
searchSourceBuilder.size(2);
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
​
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  System.out.println(hit.getSourceAsString());
}

排序 searchSourceBuilder.sort("age", SortOrder.DESC);

12.过滤字段

//过滤字段
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
builder.query(QueryBuilders.matchAllQuery());
​
//包含 排除字段
String[] excludes = {};
String[] includes = {"name"};
builder.fetchSource(includes,excludes);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  System.out.println(hit.getSourceAsString());
}

13.must & should 多条件查询

SearchRequest request = new SearchRequest();
request.indices("hero");
​
SearchSourceBuilder builder = new SearchSourceBuilder();
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
boolQueryBuilder.must(QueryBuilders.matchQuery("age", 6));
boolQueryBuilder.must(QueryBuilders.matchQuery("sex", "male"));
 boolQueryBuilder.mustNot(QueryBuilders.matchQuery("sex", "female"));
builder.query(boolQueryBuilder);
request.source(builder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  System.out.println(hit.getSourceAsString());
}

14.范围查询

//范围查询
SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder builder = new SearchSourceBuilder();
RangeQueryBuilder rangeQueryBuilder = QueryBuilders.rangeQuery("age");
rangeQueryBuilder.gte(4);
rangeQueryBuilder.lte(7);
builder.query(rangeQueryBuilder);
request.source(builder);
​
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  String result = hit.getSourceAsString();
  System.out.println(result);
}

15.模糊查询

wildcardQuery

SearchRequest request = new SearchRequest();
request.indices("hero");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
QueryBuilders.wildcardQuery("name", "A");
request.source(searchSourceBuilder);
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  System.out.println(hit.getSourceAsString());
}
SearchRequest request = new SearchRequest();
request.indices("hero");
​
SearchSourceBuilder builder = new SearchSourceBuilder();
FuzzyQueryBuilder fuzzyBuilder = QueryBuilders.fuzzyQuery("name", "Ansel").fuzziness(Fuzziness.AUTO);
builder.query(fuzzyBuilder);
​
request.source(builder);
​
SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
    SearchHits hits = resp.getHits();
for (SearchHit hit :hits) {
  String result = hit.getSourceAsString();
  System.out.println("=============================================================");
  System.out.println(result);
  System.out.println("=============================================================");
}

16.高亮查询

  SearchRequest request = new SearchRequest();
  request.indices("hero");
  SearchSourceBuilder builder = new SearchSourceBuilder();
  TermsQueryBuilder termsQueryBuilder = QueryBuilders.termsQuery("sex", "male");
  builder.query(termsQueryBuilder);
  HighlightBuilder highlightBuilder = new HighlightBuilder();
  highlightBuilder.preTags("");
  highlightBuilder.postTags("");
  builder.highlighter(highlightBuilder);
  request.source(builder);
​
​
  SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
      SearchHits hits = resp.getHits();
  for (SearchHit hit :hits) {
    System.out.println(hit.getSourceAsString());
  }
  esClient.close();
}

三、Springboot集成

依赖


    8
    //记得改
    7.3.0

    
    
    
        org.springframework.boot
        spring-boot-starter-data-elasticsearch
    

Config

package com.ansel.esdemo.config;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.apache.http.HttpHost;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestHighLevelClient;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.bind.annotation.ResponseBody;
​
/**
 @title es-demo
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
@Configuration
public class EsClientConfig {
​
  @Bean
  public RestHighLevelClient restHighLevelClient(){
    RestHighLevelClient esClient = new RestHighLevelClient
            (RestClient.builder(new HttpHost("localhost", 9200, "http")));
    return esClient;
  }
}

四、爬虫


 
     org.jsoup
     jsoup
     1.10.2
 
package com.ansel.utils;/**
 * @author Ansel Zhong
 * coding time
 */
​
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
​
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
​
/**
 @title search-practicec
 @author Ansel Zhong
 @Date 2023/3/16
 @Description
 */
public class HtmlParseUtils {
  public static void main(String[] args) throws IOException {
      //获取请求
​
    String url = "https://search.jd.com/Search?keyword=java&enc=utf-8&wq=java&pvid=ef80dbba286c45908d43cd70bed881c3";
      //解析网页
    Document document = Jsoup.parse(new URL(url), 30000);
    Element element = document.getElementById("J_goodsList");
    //System.out.println(element.html());
    Elements elements = element.getElementsByTag("li");
    //获取元素中的内容
    for (Element el : elements) {
      //关于图片多的网页 都是懒加载
      //所以获取data-lazy-img
      String img = el.getElementsByTag("img").eq(0)
              .attr("data-lazy-img");
      String price = el.getElementsByClass("p-price").eq(0)
              .text();
      String name = el.getElementsByClass("p-name")
              .eq(0)
              .text();
      System.out.println("=================================");
      System.out.println(img);
      System.out.println(price);
      System.out.println(name);
    }
​
  }
}
public static List parseJD(String keywords) throws IOException {
  ArrayList contents = new ArrayList<>();
​
  //获取请求
  String url = "https://search.jd.com/Search?keyword=java" + keywords;
  //解析网页
  Document document = Jsoup.parse(new URL(url), 30000);
  Element element = document.getElementById("J_goodsList");
  //System.out.println(element.html());
  Elements elements = element.getElementsByTag("li");
  //获取元素中的内容
  for (Element el : elements) {
    //关于图片多的网页 都是懒加载
    //所以获取data-lazy-img
    String img = el.getElementsByTag("img").eq(0)
            .attr("data-lazy-img");
    String price = el.getElementsByClass("p-price").eq(0)
            .text();
    String name = el.getElementsByClass("p-name")
            .eq(0)
            .text();
    System.out.println("=================================");
    System.out.println(img);
    System.out.println(price);
    System.out.println(name);
    Content content = new Content();
    content.setName(name);
    content.setSrc(img);
    content.setPrice(price);
    contents.add(content);
  }
  return contents;
}

Service

@Autowired
private RestHighLevelClient esClient;
​
@Override
public Boolean parseContent(String keywords) throws IOException {
​
​
    List contents = HtmlParseUtils.parseJD(keywords);
  BulkRequest request = new BulkRequest();
  request.timeout(TimeValue.timeValueSeconds(1));
​
​
  int a = 1015;
  contents.stream()
          .forEach((Content content) -> {
              IndexRequest indexRequest = new IndexRequest();
              indexRequest.index("jd_goods");
              indexRequest.source(JSON.toJSONString(content), XContentType.JSON);
              request.add(indexRequest);
​
            }
          );
  BulkResponse resp = esClient.bulk(request, RequestOptions.DEFAULT);
    return resp.hasFailures();
}
​
  @Override
  public List> Search(String keywords, int pageNo, int pageSize) throws IOException {
      Map map = new HashMap<>();
      ArrayList> list = new ArrayList<>();
      //模糊查询
      SearchRequest request = new SearchRequest();
      SearchSourceBuilder builder = new SearchSourceBuilder();
      builder.query(QueryBuilders.wildcardQuery("name",keywords));
      //分页
      builder.from(pageNo);
      builder.size(pageSize);
      //查询
      request.source(builder);
      SearchResponse resp = esClient.search(request, RequestOptions.DEFAULT);
      SearchHits hits = resp.getHits();
      for (SearchHit hit :hits) {
          map = hit.getSourceAsMap();
          list.add(map);
      }
​
      return list;
  }

ERROR & NOTE

1.Elastic-search-head连接不上 --> 跨域问题

启动head npm run start

elastic-search / config/ yml 文件加

http.cors.enabled: true
http.cors.allow-origin: "*"

2.TOMCAT启动错误

可能是这里的问题,替换:



    com.fasterxml.jackson.core
    jackson-databind
    
        
            com.fasterxml.jackson.core
            jackson-annotations
        
    


    com.fasterxml.jackson.core
    jackson-annotations
    2.9.8

3.JSON转换

也可以用


    com.alibaba
    fastjson
    1.2.47

目录

ES

什么是ES?

一、ElasticSearch HTTP操作

1、索引操作

2、文档操作

二、JAVA API

1.创建索引

2.查询索引

3.插入数据

4.修改数据

5.查询数据

6.删除数据

7.批量添加

8.批量删除

9.全部查询

10.条件查询

11.分页查询

12.过滤字段

13.must & should 多条件查询

14.范围查询

15.模糊查询

16.高亮查询

三、Springboot集成

四、爬虫

ERROR & NOTE

1.Elastic-search-head连接不上 --> 跨域问题

2.TOMCAT启动错误

3.JSON转换


你可能感兴趣的:(学习,elasticsearch,java)