Java-网络爬虫(一)

文章目录

  • 前言
  • 一、网络爬虫
    • 1. 介绍
    • 2. 爬虫协议
    • 3. 法律法规
  • 二、相关知识
    • 1. HttpClient
    • 2. Jsoup
  • 三、综合案例
    • 1. 案例一
    • 2. 案例二
  • 四、总结


前言

在大数据时代,信息采集是一项重要的工作,而互联网中的数据是海量的,如果单纯靠人力进行信息获取,不仅低效繁琐,而且搜集的成本也会提高,如何自动高效地获取互联网中的数据是一个重要的问题,而爬虫技术就是针对这些问题而生的。

一、网络爬虫

1. 介绍

网络爬虫(Web crawler)又称为网络蜘蛛或网络机器人,是一种自动化程序,用于在互联网上浏览和抓取信息,是互联网时代一项普遍运用的网络信息搜集技术。

Java-网络爬虫(一)_第1张图片

该项技术最早应用于搜索引擎领域,是搜索引擎获取数据来源的支撑性技术之一。随着数据资源的爆炸式增长,网络爬虫的应用场景和商业模式变得更加广泛和多样,较为常见的有新闻平台的内容汇聚和生成、电子商务平台的价格对比功能、基于气象数据的天气预报应用等等。

一个出色的网络爬虫工具能够处理大量的数据,大大节省了人类在该类工作上所花费的时间。网络爬虫作为数据抓取的实践工具,构成了互联网开放和信息资源共享理念的基石,如同互联网世界的一群工蜂,不断地推动网络空间的建设和发展。

原理:

传统爬虫从一个或者若干个初始网页的 URL 开始,通过模拟浏览器行为,自动访问并解析网页。它们可以跟踪链接,从一个网页到另一个网页,逐层遍历整个互联网。通过取网页的HTML源代码,并从中提取有用的信息,如文本、图像、链接等。

Java-网络爬虫(一)_第2张图片

功能与价值:

网络爬虫技术是互联网开放共享精神的重要实现工具。允许收集者通过爬虫技术收集数据是数据开放共享的重要措施,网络爬虫能够通过聚合信息、提供链接,为数据所有者的网站带来更多的访问量,这些善意、适量的数据抓取行为,符合数据所有者开放共享数据的预期。

从功能上来讲,爬虫一般分为数据采集、处理、存储三个部分。

爬虫的应用:

  • 实现和优化搜索引擎
  • 获取更多的数据源

2. 爬虫协议

爬虫的功能十分强大,但是我们并不能为所欲为的使用爬虫,爬虫需要遵循 robots 协议,该协议是国际互联网界通行的道德规范,每一个爬虫都应该遵守。

Robots 协议(也称为爬虫协议、机器人协议等)的全称是 “网络爬虫排除标准”(Robots Exclusion Protocol),网站通过 Robots 协议告诉搜索引擎哪些页面可以抓取,哪些页面不能抓取,该协议属于一个规范,并不能保证网站的隐私。

Robots 协议是国际互联网界通行的道德规范,基于以下原则:

  1. 搜索技术应服务于人类,同时尊重信息提供者的意愿,并维护其隐私权。

  2. 网站有义务保证其使用者的个人信息和隐私不被侵犯。

在使用爬虫的时候我们应当注意一下几点:

  1. 拒绝访问和抓取有关不良信息的网站。

  2. 注意版权意识,对于原创内容,未经允许不要将信息用于其他用途,特别是商业方面。

  3. 严格遵循 robots.txt 协议。

  4. 爬虫协议查看方式

大部分网站都会提供自己的 robots.txt 文件,这个文件会告诉我们该网站的爬取准则,查看方式是在域名加 /robots.txt 并回车。

例如百度的爬虫协议:https://www.baidu.com/robots.txt

Java-网络爬虫(一)_第3张图片

  • User-agent:为访问用户
  • Allow:允许爬行的目录
  • Disallow:不允许爬行的目录
  • Sitemap:网站地图,告诉爬虫这个页面是网站地图

从上述协议可以看到百度对于普通使用者为:

User-agent: *
Disallow: /

则表示禁止所有搜索引擎访问网站的任何部分。

而对于 Baiduspider 这类用户

User-agent: Baiduspider
Disallow: /baidu
Disallow: /s?
Disallow: /ulink?
Disallow: /link?
Disallow: /home/news/data/
Disallow: /bh

则不能爬取 /baidu、/s?、/ulink?... 下面目录的数据。


3. 法律法规

网络爬虫规制的必要性:

  • (一)恶意抓取侵害他人权益和经营自由通过网络爬虫访问和收集网站数据行为本身已经产生了相当规模的网络流量,但是,有分析表明其中三分之二的数据抓取行为是恶意的,并且这一比例还在不断上升:恶意机器人可以掠夺资源、削弱竞争对手。恶意机器人往往被滥用于从一个站点抓取内容,然后将该内容发布至另一个站点,而不显示数据源或链接,这一不当手段将帮助非法组织建立虚假网站,产生欺诈风险,以及对知识产权、商业秘密的窃取行为。
  • (二)恶意爬虫危及网络安全从行为本身来讲,恶意爬虫会对目标网站产生 DDOS 攻击的效果,当有成百上千的爬虫机器人与同一网站进行交互,网站将会失去对真实目标的判断,其很难确定哪些流量来自真实用户,哪些流量来自机器人。若平台使用了掺杂虚假访问行为的缺陷数据,做出相关的营销决策,可能会导致大量时间和金钱的损失。尽管 robots 协议作为国际通行的行业规范,能够帮助网站在 robot.txt文件中明确列出限制抓取的信息范围,但并不能从根本上阻止机器人的恶意爬虫行为,其协议本身无法为网站提供任何技术层面的保护。目前恶意的网络爬虫行为已经给互联网平台带来了一定的商业和技术风险,影响了其正常的平台运营和业务开展。
  • (三)现行法律规制方式及其不足之处网络爬虫的不当访问、收集、干扰行为应当受到法律规制。目前,我国已有法律对网络爬虫进行规制主要集中在刑法有关计算机信息系统犯罪的相关条文上。从刑法所追求的法益来看,刑法规范的是对目标网站造成严重影响并具有社会危害性的数据抓取行为。若行为人违反刑法的相关规定,通过网络爬虫访问收集一般网站所存储、处理或传输的数据,可能构成刑法中的非法获取计算机信息系统数据罪;如果在数据抓取过程中实施了非法控制行为,可能构成非法控制计算机信息系统罪。此外,由于使用网络爬虫造成对目标网站的功能干扰,导致其访问流量增大、系统响应变缓,影响正常运营的,也可能构成破坏计算机信息系统罪。

由于刑法的谦抑性,其只能在网络爬虫行为产生严重社会危害而无刑罚以外手段进行规制的情形下起到惩治效果,而对于网络爬虫妨碍其他网站正常运行、过量访问收集数据等一般性危害行为很难起到规制作用,因此我国需要建立在刑法以外的行政规制手段,构建完善的刑事责任、行政责任乃至民事责任体系,以保护互联网平台的合法权益,维护网络空间的正常秩序。

完善网络爬虫规制方式的建议:

从网络爬虫的相关案例来看,其使用者往往有充分的理由做出可能涉嫌违法的数据抓取行为,其辩护理由通常包括:“我可以用公开访问的数据做任何事”“这是合理使用行为”“这与搜索引擎行为类似”“只是使用了自动脚本,而未使用在建立网站上”“我已经遵守了它们的 robots 协议”“该网站没有 robots 协议”“这些数据我只是个人研究使用,并没有商业目的”。由此可见,依托行为是否具有恶意或者通过主观层面来判断爬虫行为违法与否是具有难度的。网络爬虫规制的目标是在数据资源开放共享与互联网平台经营自由、网站安全之间取得平衡,遵循技术中立性原则,对网络爬虫进行规制应当基于客观结果,即是否妨碍网站的正常运行或者对他人合法权益造成严重危害。

数字时代,在数据利用成为网络产业中心的背景下,亟待确立数据访问、获取的规则。在技术手段、市场手段之外,需要采用法律手段规制爬虫技术的应用,对特定的数据访问场景进行规范。通过数据安全立法设置爬虫技术严重影响网站正常运行的判断标准,对具有危害性的网络爬虫行为进行适当规制,是我国安全与发展并重互联网治理根本准则在数据治理领域的体现,其目标是在数据活动各方主体中找到平衡点,兼顾数据开放共享与数据所有者经营自由和安全、社会公共利益,确保数据依法有序自由流动。

谨慎使用的技术:

  1. 爬虫访问频次要控制,别把对方服务器搞崩溃
  2. 涉及个人隐私的信息不能爬
  3. 突破网站的反爬措施,后果很严重
  4. 不要把爬取的数据做不正当竞争
  5. 付费内容,不要抓
  6. 突破网络反爬措施的代码,最好不要上传到网络上

二、相关知识

1. HttpClient

因为爬虫技术是模仿游览器行为,那么必然是需要发送 HTTP 请求,在 Javaapache 有提供支持 HTTP 协议的客户端编程工具包 HttpClient,可以使用 HttpClient 来发送请求,

例如:使用 HttpClient 请求 https://www.rgbku.com/chaxun.html(rgb颜色查询器)

Java-网络爬虫(一)_第4张图片

那么代码可以这样写:

import org.apache.http.Consts;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

import java.io.IOException;
import java.util.Objects;

public class HttpClientDemo {
    public static void main(String[] args) {
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.rgbku.com/chaxun.html");
        CloseableHttpResponse response = null;
        try {
            // 发送请求
            response = httpClient.execute(httpGet);
            // 根据状态码判断是否响应成功(一般是 200)
            if (response.getStatusLine().getStatusCode() == 200) {
                // 解析响应
                HttpEntity entity = response.getEntity();
                String html = EntityUtils.toString(entity, Consts.UTF_8);
                // 打印响应内容
                System.out.println(html);
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            // 关闭资源
            try {
                httpClient.close();
                if (Objects.nonNull(response)) {
                    response.close();
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

可以从打印信息中就能看出已获取到该网站的 HTML 信息了

Java-网络爬虫(一)_第5张图片

HttpClient 不仅可以发送 GET 请求,还能够发起 POSTPUTDELETE 等等各种请求,同时还能携带参数、tokencookie 和设置 User-Agent 等功能,可以做到很好的模拟用户在游览器上面访问网站。

GET 请求:

    /**
     * 发送不带参数的 GET 请求
     */
    public static void sendGet() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

    /**
     * 发送带参数的 GET 请求
     */
    public static void sendGetHasParam() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 URIBuilder
        URIBuilder uriBuilder = new URIBuilder("https://www.xxxx.com");
        // 设置参数
        uriBuilder
                .setParameter("param1", "value1")
                .setParameter("param2", "value2");
        // 创建 httpGet 对象,设置 URI
        HttpGet httpGet = new HttpGet(uriBuilder.build());
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

POST 请求:

    /**
     * 发送不带参数的 POST 请求
     */
    public static void sendPost() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpPost 对象,设置访问 URL
        HttpPost httpPost = new HttpPost("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

    /**
     * 发送带参数的 POST 请求
     */
    public static void sendPostHasParam() throws Exception{
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpPost 对象,设置访问 URL
        HttpPost httpPost = new HttpPost("https://www.xxxx.com");
        // 封装表单中的参数
        List<NameValuePair> params = new ArrayList<>();
        params.add(new BasicNameValuePair("param1", "value1"));
        params.add(new BasicNameValuePair("param2", "value2"));
        /*
         * 创建表单的 entity 对象
         *      parameters:表单数据
         *      charset:编码
         */
        UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, Consts.UTF_8);
        // 设置表单的 entity 对象到 Post 请求中
        httpPost.setEntity(entity);
        CloseableHttpResponse response = httpClient.execute(httpPost);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

连接池:

每次发送请求时都需要创建 HttpClient,会有频繁创建和销毁的问题,对性能会有一定的影响,可以使用连接池来解决这个问题

    /**
     * 连接池
     */
    public static void poolManager() throws Exception {
        // 创建连接池管理器
        PoolingHttpClientConnectionManager connectionManager = new PoolingHttpClientConnectionManager();
        // 设置最大连接数
        connectionManager.setMaxTotal(100);
        // 设置每个主机的最大连接数:因为在爬取数据的时候可通会访问多个主机,如果不设置可能会导致连接不均衡
        connectionManager.setDefaultMaxPerRoute(10);
        // 使用连接池管理器发起请求
        doGet(connectionManager);
        doGet(connectionManager);
    }

    /**
     * 通过连接池发送 http 请求
     * @param connectionManager 连接池
     */
    private static void doGet(PoolingHttpClientConnectionManager connectionManager) throws Exception {
        // 从连接池中获取 HttpClient 对象
        CloseableHttpClient httpClient = HttpClients.custom().setConnectionManager(connectionManager).build();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        // 这里要注意的是 httpClient 不需要再关闭了,因为是连接池管理的
        // httpClient.close();
    }

设置请求信息:

有时候因为网络或者目标服务器的原因,请求需要更长的时间才能完成,或者需要改变 User-Agent 的设置才能正常发起请求时,这个时候就需要自定义设置这些参数

    /**
     * 配置请求信息
     */
    private static void setRequestInfo() throws Exception {
        // 创建 httpClient 对象
        CloseableHttpClient httpClient = HttpClients.createDefault();
        // 创建 httpGet 对象,设置访问 URL
        HttpGet httpGet = new HttpGet("https://www.xxxx.com");
        
        // 配置请求信息
        RequestConfig requestConfig = RequestConfig.custom()
                // 创建连接的最长时间,单位是毫秒
                .setConnectTimeout(1000)
                // 设置获取连接的最长时间,单位是毫秒
                .setConnectionRequestTimeout(500)
                // 设置数据传输的最长时间
                .setSocketTimeout(10 * 1000)
                // 还可以设置其它的设置 ...
                .build();
        
        // 设置请求信息
        httpGet.setConfig(requestConfig);

        CloseableHttpResponse response = httpClient.execute(httpGet);
        // 对响应信息进行处理 ...

        // 关闭资源
        response.close();
        httpClient.close();
    }

虽然说 HttpClient 已经具备了爬数据的功能,但是使用 HttpClient 得到的响应信息比较难解析其中的内容,要对 html 进行进行大量的字符串处理,编写正则表达式去匹配想要获取的信息,所以通常情况下我们并不会使用这种方式进行数据分析。


2. Jsoup

Jsoup 是一款 JavaHTML 解析器,可直接解析某个 URL 地址、HTML 文本内容,它提供了一套非常省力的 API,可通过 DOMCSS 以及类似于 jQuery 的操作方法来取出操作数据。

主要功能如下:

  1. 从一个 URL、文件或字符串中解析 HTML
  2. 使用 DOMCSS 选择器来查找,取出数据
  3. 可操作 HTML 元素、属性、文本

引入依赖:


<dependency>
    <groupId>org.jsoupgroupId>
    <artifactId>jsoupartifactId>
    <version>1.15.3version>
dependency>

示例:还是以 https://www.rgbku.com/chaxun.html(rgb颜色查询器) 这个网址为例,获取该网址的 title 内容

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;

import java.net.URL;

public class JsoupDemo {
    public static void main(String[] args) throws Exception {
        // 解析 URL
        Document document = Jsoup.parse(new URL("https://www.rgbku.com/chaxun.html"), 1000);
        // 比如我想要获取 html 文件中  部分的内容</span>
        <span class="token class-name">String</span> title <span class="token operator">=</span> document
                <span class="token comment">// 获取所有的 title 标签</span>
                <span class="token punctuation">.</span><span class="token function">getElementsByTag</span><span class="token punctuation">(</span><span class="token string">"title"</span><span class="token punctuation">)</span>
                <span class="token comment">// 拿到第一个</span>
                <span class="token punctuation">.</span><span class="token function">first</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
                <span class="token comment">// 获取标签中的文本内容</span>
                <span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 打印</span>
        <span class="token class-name">System</span><span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"title = "</span> <span class="token operator">+</span> title<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>日志信息:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/17b0e4b16709449d9cf95595ebeaa400.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/17b0e4b16709449d9cf95595ebeaa400.jpg" alt="Java-网络爬虫(一)_第6张图片" width="650" height="122" style="border:1px solid black;"></a></p> 
  <p>虽然使用 <code>Jsoup</code> 可以替代 <code>HttpClient</code> 直接发起请求解析数据,但是往往不会这样使用,因为实际的开发过程中,需要使用到多线程、连接池、代理等等方式,而 <code>Jsoup</code> 对这些的支持不是很友好,所以一般把 <code>Jsoup</code> 仅仅作为 <code>Html</code> 解析工具使用</p> 
  <p><strong>(一)加载文档:</strong></p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 通过 URL 加载文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingUrl</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">/*
         * 解析 URL:
         *      spec:访问问的 url
         *      timeoutMillis:超时时间
         */</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.rgbku.com/chaxun.html"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 通过字符串加载文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingString</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token class-name">String</span> html <span class="token operator">=</span> <span class="token string">"html-content"</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析字符串</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span>html<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 通过文件架子啊文档
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">loadingFile</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// html 文件</span>
        <span class="token class-name">File</span> file <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">"D:\\demo.html"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析字符串</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span>file<span class="token punctuation">,</span> <span class="token string">"utf8"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 解析 document</span>

    <span class="token punctuation">}</span>
</code></pre> 
  <p><strong>(二)提取数据:</strong></p> 
  <p>获取元素</p> 
  <ul> 
   <li>方式一:使用 <code>DOM</code> 方法提取文档数据 
    <ul> 
     <li>getElementById(String id):根据 <code>id</code> 获取元素</li> 
     <li>getElementsByTag(String tag):根据标签获取元素</li> 
     <li>getElementsByClass(String className):根据 <code>class</code> 获取元素</li> 
     <li>getElementsByAttribute(String key):根据属性获取元素</li> 
     <li>getElementsByAttributeValue(String key, String value):根据属性和属性值获取元素</li> 
     <li>…</li> 
    </ul> </li> 
  </ul> 
  <p>示例:</p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 使用DOM方法获取元素
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">getElementByDom</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 加载 document</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.xxxx.com"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">// 根据 id 获取元素</span>
        <span class="token class-name">Element</span> idElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">"id"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据标签获取元素</span>
        <span class="token class-name">Elements</span> tagElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByTag</span><span class="token punctuation">(</span><span class="token string">"tag_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据 class 获取元素</span>
        <span class="token class-name">Elements</span> classElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByClass</span><span class="token punctuation">(</span><span class="token string">"class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 根据属性获取元素</span>
        <span class="token class-name">Elements</span> attributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByAttribute</span><span class="token punctuation">(</span><span class="token string">"attribute"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性值获取元素</span>
        <span class="token class-name">Elements</span> attributeValueElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementsByAttributeValue</span><span class="token punctuation">(</span><span class="token string">"attribute"</span><span class="token punctuation">,</span> <span class="token string">"value"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
</code></pre> 
  <ul> 
   <li>方式二:使用选择器获取元素 
    <ul> 
     <li>select(String cssQuery):通过选择器获取元素</li> 
    </ul> </li> 
  </ul> 
  <p>示例:</p> 
  <pre><code class="prism language-java">    <span class="token comment">/**
     * 使用选择器获取元素
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">getElementBySelector</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 加载 document</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.xxxx.com"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">// 通过 id 查找元素</span>
        <span class="token class-name">Element</span> idElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"#id"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">first</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过标签名称查找元素</span>
        <span class="token class-name">Elements</span> tagElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过 class 名称查找元素</span>
        <span class="token class-name">Elements</span> classElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">".class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性获取元素</span>
        <span class="token class-name">Elements</span> attributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"[attribute]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过属性值获取元素</span>
        <span class="token class-name">Elements</span> attributeValueElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"[attribute=value]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>

        <span class="token comment">/*
         * 选择器可以任意的组合使用
         */</span>

        <span class="token comment">// tag#id:标签+ID</span>
        <span class="token class-name">Elements</span> tagIdElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name#id"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag.class:标签+class</span>
        <span class="token class-name">Elements</span> tagClassElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name.class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag[attribute]:标签+属性名</span>
        <span class="token class-name">Elements</span> tagAttributeElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name[attribute]"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// tag[attribute].class:标签+属性名+class</span>
        <span class="token class-name">Elements</span> tagAttributeClassElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"tag_name[attribute].class_name"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// ancestor child:查询某个元素下的子元素</span>
        <span class="token class-name">Elements</span> ancestorChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"ancestor child"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// parent > child:查询直接子元素</span>
        <span class="token class-name">Elements</span> parentChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"parent > child"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// parent > *:查找所有子元素</span>
        <span class="token class-name">Elements</span> allChildElements <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">select</span><span class="token punctuation">(</span><span class="token string">"parent > *"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
</code></pre> 
  <p>处理元素数据</p> 
  <ul> 
   <li>attr(String key):获取属性</li> 
   <li>attr(String key, String value):设置属性</li> 
   <li>attributes():获取所有属性</li> 
   <li>id():获取 id</li> 
   <li>className():获取类名</li> 
   <li>classNames():获取类名集</li> 
   <li>text():获取文本内容</li> 
   <li>text(String value):设置文本内容</li> 
   <li>html():获取内部 HTML 内容</li> 
   <li>html(String value):设置内部 HTML 内容</li> 
   <li>outerHtml():获取外部 HTML 值</li> 
   <li>data():获取数据内容(例如 script 和 style 标签)</li> 
   <li>tag():获取标签</li> 
   <li>tagName():获取标签名称</li> 
   <li>…</li> 
  </ul> 
  <hr> 
  <h2>三、综合案例</h2> 
  <p> </p> 
  <p>爬虫的工作流程通常包括以下几个步骤:</p> 
  <ol> 
   <li> <p>确定起始点:需要指定一个或多个起始 <code>URL</code> 作为抓取的入口点。</p> </li> 
   <li> <p>下载网页:使用 <code>HTTP</code> 或 <code>HTTPS</code> 协议向服务器发送请求,下载网页的 <code>HTML</code> 源代码。</p> </li> 
   <li> <p>解析网页:解析 <code>HTML</code> 源代码,提取出所需的信息。</p> </li> 
   <li> <p>处理数据:对提取的数据进行处理和清洗,以便后续分析和存储。</p> </li> 
   <li> <p>跟踪链接:从当前网页中提取所有链接,并将它们添加到待抓取的 <code>URL</code> 队列中,以便进一步遍历。</p> </li> 
   <li> <p>控制抓取速度:为了避免给服务器带来过大的负载,爬虫通常会设置抓取速度限制,包括请求间隔时间和并发请求数量。</p> </li> 
   <li> <p>存储数据:将提取的数据保存到数据库、文件或其他存储介质中,以便后续使用和分析。</p> </li> 
  </ol> 
  <p>以下案例我会将抓取到的数据存放在 <code>excel</code> 文件中,会使用到 <code>EasyExcel</code>,对于 <code>EasyExcel</code> 的使用可参考博客: Java-easyExcel入门教程</p> 
  <h3>1. 案例一</h3> 
  <p> </p> 
  <p>将 RBG 颜色查询器页面中的数据抓取出来之后保存到 <code>excel</code> 表格中</p> 
  <p><a href="http://img.e-com-net.com/image/info8/7d87da4a106a4e1f90820559e6c85441.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/7d87da4a106a4e1f90820559e6c85441.jpg" alt="Java-网络爬虫(一)_第7张图片" width="650" height="343" style="border:1px solid black;"></a></p> 
  <p>分析:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/45600732ada84e63a16804cf1a88d033.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/45600732ada84e63a16804cf1a88d033.jpg" alt="Java-网络爬虫(一)_第8张图片" width="650" height="540" style="border:1px solid black;"></a></p> 
  <p>从该网址的 <code>HTML</code> 源码分析可得,RGB 的数据来源于 <code><table></code> 表格中,<code><tbody></code> 定义了表格主题,用于存放数据,每个单元格的数据存放在 <code><td></code> 标签中,所以只要拿到 中的文本数据,再剔除掉表头相关的数据即可。</p> 
  <p>代码实现:</p> 
  <p><code>RgbEntity.java</code></p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>annotation<span class="token punctuation">.</span></span><span class="token class-name">ExcelProperty</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>annotation<span class="token punctuation">.</span>write<span class="token punctuation">.</span>style<span class="token punctuation">.</span></span><span class="token operator">*</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">BorderStyleEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">FillPatternTypeEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span>enums<span class="token punctuation">.</span>poi<span class="token punctuation">.</span></span><span class="token class-name">HorizontalAlignmentEnum</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">io<span class="token punctuation">.</span>swagger<span class="token punctuation">.</span>annotations<span class="token punctuation">.</span></span><span class="token class-name">ApiModelProperty</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">AllArgsConstructor</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Builder</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Data</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">NoArgsConstructor</span></span><span class="token punctuation">;</span>

<span class="token comment">/**
 * RGB 实体类
 */</span>
<span class="token annotation punctuation">@Data</span>
<span class="token annotation punctuation">@Builder</span>
<span class="token annotation punctuation">@AllArgsConstructor</span>
<span class="token annotation punctuation">@NoArgsConstructor</span>
<span class="token comment">// 头背景设置</span>
<span class="token annotation punctuation">@HeadStyle</span><span class="token punctuation">(</span>fillPatternType <span class="token operator">=</span> <span class="token class-name">FillPatternTypeEnum</span><span class="token punctuation">.</span><span class="token constant">SOLID_FOREGROUND</span><span class="token punctuation">,</span> horizontalAlignment <span class="token operator">=</span> <span class="token class-name">HorizontalAlignmentEnum</span><span class="token punctuation">.</span><span class="token constant">CENTER</span><span class="token punctuation">,</span> borderLeft <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderTop <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderRight <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderBottom <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">)</span>
<span class="token comment">//标题高度</span>
<span class="token annotation punctuation">@HeadRowHeight</span><span class="token punctuation">(</span><span class="token number">40</span><span class="token punctuation">)</span>
<span class="token comment">//内容高度</span>
<span class="token annotation punctuation">@ContentRowHeight</span><span class="token punctuation">(</span><span class="token number">30</span><span class="token punctuation">)</span>
<span class="token comment">//内容居中,左、上、右、下的边框显示</span>
<span class="token annotation punctuation">@ContentStyle</span><span class="token punctuation">(</span>horizontalAlignment <span class="token operator">=</span> <span class="token class-name">HorizontalAlignmentEnum</span><span class="token punctuation">.</span><span class="token constant">CENTER</span><span class="token punctuation">,</span> borderLeft <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderTop <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderRight <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">,</span> borderBottom <span class="token operator">=</span> <span class="token class-name">BorderStyleEnum</span><span class="token punctuation">.</span><span class="token constant">THIN</span><span class="token punctuation">)</span>
<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">RgbEntity</span> <span class="token punctuation">{</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"英文代码"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"英文代码"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> engName<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"中文名"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"中文名"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> zhName<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"十六进制"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"十六进制"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> code<span class="token punctuation">;</span>

    <span class="token annotation punctuation">@ApiModelProperty</span><span class="token punctuation">(</span>value <span class="token operator">=</span> <span class="token string">"RGB颜色值"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ExcelProperty</span><span class="token punctuation">(</span><span class="token string">"RGB颜色值"</span><span class="token punctuation">)</span>
    <span class="token annotation punctuation">@ColumnWidth</span><span class="token punctuation">(</span><span class="token number">15</span><span class="token punctuation">)</span>
    <span class="token keyword">private</span> <span class="token class-name">String</span> value<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p><code>ReptileDemo.class</code></p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>alibaba<span class="token punctuation">.</span>excel<span class="token punctuation">.</span></span><span class="token class-name">EasyExcel</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">com<span class="token punctuation">.</span>mike<span class="token punctuation">.</span>server<span class="token punctuation">.</span>system<span class="token punctuation">.</span>domain<span class="token punctuation">.</span>excel<span class="token punctuation">.</span></span><span class="token class-name">RgbEntity</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span></span><span class="token class-name">Jsoup</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>nodes<span class="token punctuation">.</span></span><span class="token class-name">Document</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>nodes<span class="token punctuation">.</span></span><span class="token class-name">Element</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">org<span class="token punctuation">.</span>jsoup<span class="token punctuation">.</span>select<span class="token punctuation">.</span></span><span class="token class-name">Elements</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">File</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">FileOutputStream</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>io<span class="token punctuation">.</span></span><span class="token class-name">OutputStream</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>net<span class="token punctuation">.</span></span><span class="token class-name">URL</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">ArrayList</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">List</span></span><span class="token punctuation">;</span>

<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">ReptileDemo</span> <span class="token punctuation">{</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">main</span><span class="token punctuation">(</span><span class="token class-name">String</span><span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token function">demo01</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>

    <span class="token comment">/**
     * 案例一:将 RBG 颜色查询器页面中的数据抓取出来之后保存到 excel 表格中
     */</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">demo01</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token keyword">throws</span> <span class="token class-name">Exception</span> <span class="token punctuation">{</span>
        <span class="token comment">// 获取 document 文档</span>
        <span class="token class-name">Document</span> document <span class="token operator">=</span> <span class="token class-name">Jsoup</span><span class="token punctuation">.</span><span class="token function">parse</span><span class="token punctuation">(</span><span class="token keyword">new</span> <span class="token class-name">URL</span><span class="token punctuation">(</span><span class="token string">"https://www.rgbku.com/chaxun.html"</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 通过 id = color 获取 tbody 元素</span>
        <span class="token class-name">Element</span> tbodyElement <span class="token operator">=</span> document<span class="token punctuation">.</span><span class="token function">getElementById</span><span class="token punctuation">(</span><span class="token string">"color"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 获取 tbody 下所有的 <tr> 标签</span>
        <span class="token keyword">assert</span> tbodyElement <span class="token operator">!=</span> <span class="token keyword">null</span><span class="token punctuation">;</span>
        <span class="token class-name">Elements</span> childrenElements <span class="token operator">=</span> tbodyElement<span class="token punctuation">.</span><span class="token function">children</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 创建 RgbEntity 集合存放数据</span>
        <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">RgbEntity</span><span class="token punctuation">></span></span> list <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">ArrayList</span><span class="token generics"><span class="token punctuation"><</span><span class="token punctuation">></span></span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token comment">// 遍历</span>
        <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token class-name">Element</span> childrenElement <span class="token operator">:</span> childrenElements<span class="token punctuation">)</span> <span class="token punctuation">{</span>
            <span class="token class-name">String</span> tagName <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">tagName</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token class-name">String</span> align <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">attr</span><span class="token punctuation">(</span><span class="token string">"align"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token comment">// 筛选出每一行的数据,剔除表头</span>
            <span class="token keyword">if</span> <span class="token punctuation">(</span><span class="token string">"tr"</span><span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span>tagName<span class="token punctuation">)</span> <span class="token operator">&&</span> <span class="token operator">!</span><span class="token string">"center"</span><span class="token punctuation">.</span><span class="token function">equals</span><span class="token punctuation">(</span>align<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
                <span class="token comment">// 获取每个单元格的数据</span>
                <span class="token class-name">Elements</span> tdElements <span class="token operator">=</span> childrenElement<span class="token punctuation">.</span><span class="token function">children</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token comment">/*
                 * 根据 html 源码分析可得:
                 *      (1)每个 <tr> 标签下都有 5 个 <td> 标签
                 *      (2)这五个 <td> 标签中的内容分别对应:颜色 英文代码 中文名 十六进制 RGB颜色值
                 */</span>
                <span class="token class-name">String</span> engName <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> zhName <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">2</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> code <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">3</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
                <span class="token class-name">String</span> value <span class="token operator">=</span> tdElements<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">text</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
				<span class="token comment">// 添加到集合中</span>
                list<span class="token punctuation">.</span><span class="token function">add</span><span class="token punctuation">(</span><span class="token class-name">RgbEntity</span><span class="token punctuation">.</span><span class="token function">builder</span><span class="token punctuation">(</span><span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">engName</span><span class="token punctuation">(</span>engName<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">zhName</span><span class="token punctuation">(</span>zhName<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">code</span><span class="token punctuation">(</span>code<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">value</span><span class="token punctuation">(</span>value<span class="token punctuation">)</span>
                        <span class="token punctuation">.</span><span class="token function">build</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
            <span class="token punctuation">}</span>
        <span class="token punctuation">}</span>

        <span class="token comment">// 写入到 excel 中</span>
        <span class="token class-name">File</span> file <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">File</span><span class="token punctuation">(</span><span class="token string">"D:\\rgb.xlsx"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">OutputStream</span> os <span class="token operator">=</span> <span class="token keyword">new</span> <span class="token class-name">FileOutputStream</span><span class="token punctuation">(</span>file<span class="token punctuation">)</span><span class="token punctuation">;</span>
        <span class="token class-name">EasyExcel</span><span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>os<span class="token punctuation">,</span> <span class="token class-name">RgbEntity</span><span class="token punctuation">.</span><span class="token keyword">class</span><span class="token punctuation">)</span>
                <span class="token punctuation">.</span><span class="token function">sheet</span><span class="token punctuation">(</span><span class="token string">"Sheet1"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">doWrite</span><span class="token punctuation">(</span>list<span class="token punctuation">)</span><span class="token punctuation">;</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>运行代码生成 <code>excel</code> 文件:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/7feaaac1860e4c15935e8f4d8369a30e.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/7feaaac1860e4c15935e8f4d8369a30e.jpg" alt="Java-网络爬虫(一)_第9张图片" width="650" height="260" style="border:1px solid black;"></a></p> 
  <hr> 
  <h3>2. 案例二</h3> 
  <p> </p> 
  <p>通过食品营养成分查询平台获取所有食品营养成分数据,并持久化到 <code>excel</code> 文件中</p> 
  <p><a href="http://img.e-com-net.com/image/info8/ee43b4616406495cb4c4607d754530a7.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/ee43b4616406495cb4c4607d754530a7.jpg" alt="Java-网络爬虫(一)_第10张图片" width="650" height="390" style="border:1px solid black;"></a></p> 
  <p><a href="http://img.e-com-net.com/image/info8/c4687987c140416f85fc562309a9b29d.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/c4687987c140416f85fc562309a9b29d.jpg" alt="Java-网络爬虫(一)_第11张图片" width="650" height="500" style="border:1px solid black;"></a></p> 
  <p>这个案例的代码就不太方便展示了,我就简单的说下实现的逻辑:</p> 
  <p><a href="http://img.e-com-net.com/image/info8/2eb2e79a345d4c8f808aa944a6c34a85.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/2eb2e79a345d4c8f808aa944a6c34a85.jpg" alt="Java-网络爬虫(一)_第12张图片" width="650" height="375" style="border:1px solid black;"></a></p> 
  <p>通过分析 <code>HTML</code> 源码可知,从大类(一级分类)列表页面中可以获取到大类的名称和图片以及进入到类别(二级分类)列表页的 <code>URL</code>,在类别(二级分类)的页面中又可以获取到类别的名称和进入到食品列表页的 <code>URL</code>,在食品列表页中可以获取到食品的名称和进入到食品成分页面的 <code>URL</code>,在食品成分页中再拿到所有的成分数据。</p> 
  <p>实体类设计:</p> 
  <pre><code class="prism language-java"><span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">AllArgsConstructor</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Builder</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">Data</span></span><span class="token punctuation">;</span>
<span class="token keyword">import</span> <span class="token import"><span class="token namespace">lombok<span class="token punctuation">.</span></span><span class="token class-name">NoArgsConstructor</span></span><span class="token punctuation">;</span>

<span class="token keyword">import</span> <span class="token import"><span class="token namespace">java<span class="token punctuation">.</span>util<span class="token punctuation">.</span></span><span class="token class-name">List</span></span><span class="token punctuation">;</span>

<span class="token annotation punctuation">@Data</span>
<span class="token annotation punctuation">@Builder</span>
<span class="token annotation punctuation">@NoArgsConstructor</span>
<span class="token annotation punctuation">@AllArgsConstructor</span>
<span class="token keyword">public</span> <span class="token keyword">class</span> <span class="token class-name">FoodInfo</span> <span class="token punctuation">{</span>

    <span class="token comment">/**
     * 大类集
     */</span>
    <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Category</span><span class="token punctuation">></span></span> categoryList<span class="token punctuation">;</span>

    <span class="token comment">/**
     * 大类(一级分类)
     */</span>
    <span class="token annotation punctuation">@Data</span>
    <span class="token annotation punctuation">@Builder</span>
    <span class="token annotation punctuation">@NoArgsConstructor</span>
    <span class="token annotation punctuation">@AllArgsConstructor</span>
    <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Category</span> <span class="token punctuation">{</span>
        <span class="token comment">// 大类名称</span>
        <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
        <span class="token comment">// 大类图片</span>
        <span class="token keyword">private</span> <span class="token class-name">String</span> imageUrl<span class="token punctuation">;</span>
        <span class="token comment">// 类别集</span>
        <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Type</span><span class="token punctuation">></span></span> typeList<span class="token punctuation">;</span>

        <span class="token comment">/**
         * 类别(二级分类)
         */</span>
        <span class="token annotation punctuation">@Data</span>
        <span class="token annotation punctuation">@Builder</span>
        <span class="token annotation punctuation">@NoArgsConstructor</span>
        <span class="token annotation punctuation">@AllArgsConstructor</span>
        <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Type</span> <span class="token punctuation">{</span>
            <span class="token comment">// 类别名称</span>
            <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
            <span class="token comment">// 食物集</span>
            <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Food</span><span class="token punctuation">></span></span> foodList<span class="token punctuation">;</span>

            <span class="token comment">/**
             * 食物
             */</span>
            <span class="token annotation punctuation">@Data</span>
            <span class="token annotation punctuation">@Builder</span>
            <span class="token annotation punctuation">@NoArgsConstructor</span>
            <span class="token annotation punctuation">@AllArgsConstructor</span>
            <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Food</span> <span class="token punctuation">{</span>
                <span class="token comment">// 食物名称</span>
                <span class="token keyword">private</span> <span class="token class-name">String</span> name<span class="token punctuation">;</span>
                <span class="token comment">// 组成成分集</span>
                <span class="token keyword">private</span> <span class="token class-name">List</span><span class="token generics"><span class="token punctuation"><</span><span class="token class-name">Component</span><span class="token punctuation">></span></span> componentList<span class="token punctuation">;</span>

                <span class="token comment">/**
                 * 组成成分
                 */</span>
                <span class="token annotation punctuation">@Data</span>
                <span class="token annotation punctuation">@Builder</span>
                <span class="token annotation punctuation">@NoArgsConstructor</span>
                <span class="token annotation punctuation">@AllArgsConstructor</span>
                <span class="token keyword">public</span> <span class="token keyword">static</span> <span class="token keyword">class</span> <span class="token class-name">Component</span> <span class="token punctuation">{</span>
                    <span class="token comment">// 营养素类型</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> nutrientType<span class="token punctuation">;</span>
                    <span class="token comment">// 项目</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> itemName<span class="token punctuation">;</span>
                    <span class="token comment">// 含量</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> value<span class="token punctuation">;</span>
                    <span class="token comment">// 同类排名</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> sort<span class="token punctuation">;</span>
                    <span class="token comment">// 同类均值</span>
                    <span class="token keyword">private</span> <span class="token class-name">String</span> avgValue<span class="token punctuation">;</span>
                <span class="token punctuation">}</span>
            <span class="token punctuation">}</span>
        <span class="token punctuation">}</span>
    <span class="token punctuation">}</span>
<span class="token punctuation">}</span>
</code></pre> 
  <p>这里要注意的首先是 <code>URL</code> 的拼接,因为链接是相对路径的形式,其次是要创建连接池避免资源的浪费,设置访问间隔的时间别把对方服务器搞崩溃。</p> 
  <hr> 
  <h2>四、总结</h2> 
  <p> </p> 
  <p>在实际应用中,爬虫可能需要处理一些挑战和限制,如动态网页、反爬虫机制、登录和验证码等。为了应对这些问题,爬虫可能需要使用代理、用户代理伪装、验证码识别等技术。</p> 
  <p>值得注意的是,尽管爬虫可以自动化地抓取网页,但在使用爬虫时,需要遵守法律法规和网站的使用规则,避免侵犯他人的权益或引起不良后果。</p> 
  <hr> 
  <p><strong>参考文献:</strong></p> 
  <p>爬虫协议:https://www.dotcpp.com/course/317</p> 
  <p>网络爬虫的法律规制:http://www.cac.gov.cn/2019-06/16/c_1124630015.htm?from=singlemessage</p> 
  <p>Java 爬虫之 JSoup 使用教程:https://my.oschina.net/suveng/blog/4796066</p> 
  <p>JSoup教程:https://www.yiibai.com/jsoup</p> 
 </div> 
</div>
                            </div>
                        </div>
                    </div>
                    <!--PC和WAP自适应版-->
                    <div id="SOHUCS" sid="1742720754677465088"></div>
                    <script type="text/javascript" src="/views/front/js/chanyan.js"></script>
                    <!-- 文章页-底部 动态广告位 -->
                    <div class="youdao-fixed-ad" id="detail_ad_bottom"></div>
                </div>
                <div class="col-md-3">
                    <div class="row" id="ad">
                        <!-- 文章页-右侧1 动态广告位 -->
                        <div id="right-1" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_1"> </div>
                        </div>
                        <!-- 文章页-右侧2 动态广告位 -->
                        <div id="right-2" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_2"></div>
                        </div>
                        <!-- 文章页-右侧3 动态广告位 -->
                        <div id="right-3" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_3"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="container">
        <h4 class="pt20 mb15 mt0 border-top">你可能感兴趣的:(入门教程,日常积累,java,爬虫,开发语言)</h4>
        <div id="paradigm-article-related">
            <div class="recommend-post mb30">
                <ul class="widget-links">
                    <li><a href="/article/1835509897106649088.htm"
                           title="Long类型前后端数据不一致" target="_blank">Long类型前后端数据不一致</a>
                        <span class="text-muted">igotyback</span>
<a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a>
                        <div>响应给前端的数据浏览器控制台中response中看到的Long类型的数据是正常的到前端数据不一致前后端数据类型不匹配是一个常见问题,尤其是当后端使用Java的Long类型(64位)与前端JavaScript的Number类型(最大安全整数为2^53-1,即16位)进行数据交互时,很容易出现精度丢失的问题。这是因为JavaScript中的Number类型无法安全地表示超过16位的整数。为了解决这个问</div>
                    </li>
                    <li><a href="/article/1835509769822105600.htm"
                           title="LocalDateTime 转 String" target="_blank">LocalDateTime 转 String</a>
                        <span class="text-muted">igotyback</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>importjava.time.LocalDateTime;importjava.time.format.DateTimeFormatter;publicclassMain{publicstaticvoidmain(String[]args){//获取当前时间LocalDateTimenow=LocalDateTime.now();//定义日期格式化器DateTimeFormatterformat</div>
                    </li>
                    <li><a href="/article/1835509391361667072.htm"
                           title="Linux下QT开发的动态库界面弹出操作(SDL2)" target="_blank">Linux下QT开发的动态库界面弹出操作(SDL2)</a>
                        <span class="text-muted">13jjyao</span>
<a class="tag" taget="_blank" href="/search/QT%E7%B1%BB/1.htm">QT类</a><a class="tag" taget="_blank" href="/search/qt/1.htm">qt</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/sdl2/1.htm">sdl2</a><a class="tag" taget="_blank" href="/search/linux/1.htm">linux</a>
                        <div>需求:操作系统为linux,开发框架为qt,做成需带界面的qt动态库,调用方为java等非qt程序难点:调用方为java等非qt程序,也就是说调用方肯定不带QApplication::exec(),缺少了这个,QTimer等事件和QT创建的窗口将不能弹出(包括opencv也是不能弹出);这与qt调用本身qt库是有本质的区别的思路:1.调用方缺QApplication::exec(),那么我们在接口</div>
                    </li>
                    <li><a href="/article/1835504723210366976.htm"
                           title="第四天旅游线路预览——从换乘中心到喀纳斯湖" target="_blank">第四天旅游线路预览——从换乘中心到喀纳斯湖</a>
                        <span class="text-muted">陟彼高冈yu</span>
<a class="tag" taget="_blank" href="/search/%E5%9F%BA%E4%BA%8EGoogle/1.htm">基于Google</a><a class="tag" taget="_blank" href="/search/earth/1.htm">earth</a><a class="tag" taget="_blank" href="/search/studio/1.htm">studio</a><a class="tag" taget="_blank" href="/search/%E7%9A%84%E6%97%85%E6%B8%B8%E8%A7%84%E5%88%92%E5%92%8C%E9%A2%84%E8%A7%88/1.htm">的旅游规划和预览</a><a class="tag" taget="_blank" href="/search/%E6%97%85%E6%B8%B8/1.htm">旅游</a>
                        <div>第四天:从贾登峪到喀纳斯风景区入口,晚上住宿贾登峪;换乘中心有4路车,喀纳斯①号车,去喀纳斯湖,路程时长约5分钟;将上面的的行程安排进行动态展示,具体步骤见”Googleearthstudio进行动态轨迹显示制作过程“、“Googleearthstudio入门教程”和“Googleearthstudio进阶教程“相关内容,得到行程如下所示:Day4-2-480p</div>
                    </li>
                    <li><a href="/article/1835498925755297792.htm"
                           title="DIV+CSS+JavaScript技术制作网页(旅游主题网页设计与制作)云南大理" target="_blank">DIV+CSS+JavaScript技术制作网页(旅游主题网页设计与制作)云南大理</a>
                        <span class="text-muted">STU学生网页设计</span>
<a class="tag" taget="_blank" href="/search/%E7%BD%91%E9%A1%B5%E8%AE%BE%E8%AE%A1/1.htm">网页设计</a><a class="tag" taget="_blank" href="/search/%E6%9C%9F%E6%9C%AB%E7%BD%91%E9%A1%B5%E4%BD%9C%E4%B8%9A/1.htm">期末网页作业</a><a class="tag" taget="_blank" href="/search/html%E9%9D%99%E6%80%81%E7%BD%91%E9%A1%B5/1.htm">html静态网页</a><a class="tag" taget="_blank" href="/search/html5%E6%9C%9F%E6%9C%AB%E5%A4%A7%E4%BD%9C%E4%B8%9A/1.htm">html5期末大作业</a><a class="tag" taget="_blank" href="/search/%E7%BD%91%E9%A1%B5%E8%AE%BE%E8%AE%A1/1.htm">网页设计</a><a class="tag" taget="_blank" href="/search/web%E5%A4%A7%E4%BD%9C%E4%B8%9A/1.htm">web大作业</a>
                        <div>️精彩专栏推荐作者主页:【进入主页—获取更多源码】web前端期末大作业:【HTML5网页期末作业(1000套)】程序员有趣的告白方式:【HTML七夕情人节表白网页制作(110套)】文章目录二、网站介绍三、网站效果▶️1.视频演示2.图片演示四、网站代码HTML结构代码CSS样式代码五、更多源码二、网站介绍网站布局方面:计划采用目前主流的、能兼容各大主流浏览器、显示效果稳定的浮动网页布局结构。网站程</div>
                    </li>
                    <li><a href="/article/1835498547785592832.htm"
                           title="【华为OD机试真题2023B卷 JAVA&JS】We Are A Team" target="_blank">【华为OD机试真题2023B卷 JAVA&JS】We Are A Team</a>
                        <span class="text-muted">若博豆</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a><a class="tag" taget="_blank" href="/search/%E5%8D%8E%E4%B8%BA/1.htm">华为</a><a class="tag" taget="_blank" href="/search/javascript/1.htm">javascript</a>
                        <div>华为OD2023(B卷)机试题库全覆盖,刷题指南点这里WeAreATeam时间限制:1秒|内存限制:32768K|语言限制:不限题目描述:总共有n个人在机房,每个人有一个标号(1<=标号<=n),他们分成了多个团队,需要你根据收到的m条消息判定指定的两个人是否在一个团队中,具体的:1、消息构成为:abc,整数a、b分别代</div>
                    </li>
                    <li><a href="/article/1835496149843275776.htm"
                           title="关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript" target="_blank">关于城市旅游的HTML网页设计——(旅游风景云南 5页)HTML+CSS+JavaScript</a>
                        <span class="text-muted">二挡起步</span>
<a class="tag" taget="_blank" href="/search/web%E5%89%8D%E7%AB%AF%E6%9C%9F%E6%9C%AB%E5%A4%A7%E4%BD%9C%E4%B8%9A/1.htm">web前端期末大作业</a><a class="tag" taget="_blank" href="/search/javascript/1.htm">javascript</a><a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/css/1.htm">css</a><a class="tag" taget="_blank" href="/search/%E6%97%85%E6%B8%B8/1.htm">旅游</a><a class="tag" taget="_blank" href="/search/%E9%A3%8E%E6%99%AF/1.htm">风景</a>
                        <div>⛵源码获取文末联系✈Web前端开发技术描述网页设计题材,DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业|游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作|HTML期末大学生网页设计作业,Web大学生网页HTML:结构CSS:样式在操作方面上运用了html5和css3,采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScrip</div>
                    </li>
                    <li><a href="/article/1835496148601761792.htm"
                           title="HTML网页设计制作大作业(div+css) 云南我的家乡旅游景点 带文字滚动" target="_blank">HTML网页设计制作大作业(div+css) 云南我的家乡旅游景点 带文字滚动</a>
                        <span class="text-muted">二挡起步</span>
<a class="tag" taget="_blank" href="/search/web%E5%89%8D%E7%AB%AF%E6%9C%9F%E6%9C%AB%E5%A4%A7%E4%BD%9C%E4%B8%9A/1.htm">web前端期末大作业</a><a class="tag" taget="_blank" href="/search/web%E8%AE%BE%E8%AE%A1%E7%BD%91%E9%A1%B5%E8%A7%84%E5%88%92%E4%B8%8E%E8%AE%BE%E8%AE%A1/1.htm">web设计网页规划与设计</a><a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/css/1.htm">css</a><a class="tag" taget="_blank" href="/search/javascript/1.htm">javascript</a><a class="tag" taget="_blank" href="/search/dreamweaver/1.htm">dreamweaver</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a>
                        <div>Web前端开发技术描述网页设计题材,DIV+CSS布局制作,HTML+CSS网页设计期末课程大作业游景点介绍|旅游风景区|家乡介绍|等网站的设计与制作HTML期末大学生网页设计作业HTML:结构CSS:样式在操作方面上运用了html5和css3,采用了div+css结构、表单、超链接、浮动、绝对定位、相对定位、字体样式、引用视频等基础知识JavaScript:做与用户的交互行为文章目录前端学习路线</div>
                    </li>
                    <li><a href="/article/1835492740536823808.htm"
                           title="node.js学习" target="_blank">node.js学习</a>
                        <span class="text-muted">小猿L</span>
<a class="tag" taget="_blank" href="/search/node.js/1.htm">node.js</a><a class="tag" taget="_blank" href="/search/node.js/1.htm">node.js</a><a class="tag" taget="_blank" href="/search/%E5%AD%A6%E4%B9%A0/1.htm">学习</a><a class="tag" taget="_blank" href="/search/vim/1.htm">vim</a>
                        <div>node.js学习实操及笔记温故node.js,node.js学习实操过程及笔记~node.js学习视频node.js官网node.js中文网实操笔记githubcsdn笔记为什么学node.js可以让别人访问我们编写的网页为后续的框架学习打下基础,三大框架vuereactangular离不开node.jsnode.js是什么官网:node.js是一个开源的、跨平台的运行JavaScript的运行</div>
                    </li>
                    <li><a href="/article/1835490218845761536.htm"
                           title="Python爬虫解析工具之xpath使用详解" target="_blank">Python爬虫解析工具之xpath使用详解</a>
                        <span class="text-muted">eqa11</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>文章目录Python爬虫解析工具之xpath使用详解一、引言二、环境准备1、插件安装2、依赖库安装三、xpath语法详解1、路径表达式2、通配符3、谓语4、常用函数四、xpath在Python代码中的使用1、文档树的创建2、使用xpath表达式3、获取元素内容和属性五、总结Python爬虫解析工具之xpath使用详解一、引言在Python爬虫开发中,数据提取是一个至关重要的环节。xpath作为一门</div>
                    </li>
                    <li><a href="/article/1835483159630802944.htm"
                           title="nosql数据库技术与应用知识点" target="_blank">nosql数据库技术与应用知识点</a>
                        <span class="text-muted">皆过客,揽星河</span>
<a class="tag" taget="_blank" href="/search/NoSQL/1.htm">NoSQL</a><a class="tag" taget="_blank" href="/search/nosql/1.htm">nosql</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/1.htm">数据分析</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/1.htm">数据结构</a><a class="tag" taget="_blank" href="/search/%E9%9D%9E%E5%85%B3%E7%B3%BB%E5%9E%8B%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">非关系型数据库</a>
                        <div>Nosql知识回顾大数据处理流程数据采集(flume、爬虫、传感器)数据存储(本门课程NoSQL所处的阶段)Hdfs、MongoDB、HBase等数据清洗(入仓)Hive等数据处理、分析(Spark、Flink等)数据可视化数据挖掘、机器学习应用(Python、SparkMLlib等)大数据时代存储的挑战(三高)高并发(同一时间很多人访问)高扩展(要求随时根据需求扩展存储)高效率(要求读写速度快)</div>
                    </li>
                    <li><a href="/article/1835476093189058560.htm"
                           title="Java 重写(Override)与重载(Overload)" target="_blank">Java 重写(Override)与重载(Overload)</a>
                        <span class="text-muted">叨唧唧的</span>

                        <div>Java重写(Override)与重载(Overload)重写(Override)重写是子类对父类的允许访问的方法的实现过程进行重新编写,返回值和形参都不能改变。即外壳不变,核心重写!重写的好处在于子类可以根据需要,定义特定于自己的行为。也就是说子类能够根据需要实现父类的方法。重写方法不能抛出新的检查异常或者比被重写方法申明更加宽泛的异常。例如:父类的一个方法申明了一个检查异常IOExceptio</div>
                    </li>
                    <li><a href="/article/1835473830873755648.htm"
                           title="简单了解 JVM" target="_blank">简单了解 JVM</a>
                        <span class="text-muted">记得开心一点啊</span>
<a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a>
                        <div>目录♫什么是JVM♫JVM的运行流程♫JVM运行时数据区♪虚拟机栈♪本地方法栈♪堆♪程序计数器♪方法区/元数据区♫类加载的过程♫双亲委派模型♫垃圾回收机制♫什么是JVMJVM是JavaVirtualMachine的简称,意为Java虚拟机。虚拟机是指通过软件模拟的具有完整硬件功能的、运行在一个完全隔离的环境中的完整计算机系统(如:JVM、VMwave、VirtualBox)。JVM和其他两个虚拟机</div>
                    </li>
                    <li><a href="/article/1835471058648526848.htm"
                           title="1分钟解决 -bash: mvn: command not found,在Centos 7中安装Maven" target="_blank">1分钟解决 -bash: mvn: command not found,在Centos 7中安装Maven</a>
                        <span class="text-muted">Energet!c</span>
<a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>1分钟解决-bash:mvn:commandnotfound,在Centos7中安装Maven检查Java环境1下载Maven2解压Maven3配置环境变量4验证安装5常见问题与注意事项6总结检查Java环境Maven依赖Java环境,请确保系统已经安装了Java并配置了环境变量。可以通过以下命令检查:java-version如果未安装,请先安装Java。1下载Maven从官网下载:前往Apach</div>
                    </li>
                    <li><a href="/article/1835469672334585856.htm"
                           title="Java企业面试题3" target="_blank">Java企业面试题3</a>
                        <span class="text-muted">马龙强_</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                        <div>1.break和continue的作用(智*图)break:用于完全退出一个循环(如for,while)或一个switch语句。当在循环体内遇到break语句时,程序会立即跳出当前循环体,继续执行循环之后的代码。continue:用于跳过当前循环体中剩余的部分,并开始下一次循环。如果是在for循环中使用continue,则会直接进行条件判断以决定是否执行下一轮循环。2.if分支语句和switch分</div>
                    </li>
                    <li><a href="/article/1835468916290318336.htm"
                           title="JVM、JRE和 JDK:理解Java开发的三大核心组件" target="_blank">JVM、JRE和 JDK:理解Java开发的三大核心组件</a>
                        <span class="text-muted">Y雨何时停T</span>
<a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                        <div>Java是一门跨平台的编程语言,它的成功离不开背后强大的运行环境与开发工具的支持。在Java的生态中,JVM(Java虚拟机)、JRE(Java运行时环境)和JDK(Java开发工具包)是三个至关重要的核心组件。本文将探讨JVM、JDK和JRE的区别,帮助你更好地理解Java的运行机制。1.JVM:Java虚拟机(JavaVirtualMachine)什么是JVM?JVM,即Java虚拟机,是Ja</div>
                    </li>
                    <li><a href="/article/1835464504918503424.htm"
                           title="Java面试题精选:消息队列(二)" target="_blank">Java面试题精选:消息队列(二)</a>
                        <span class="text-muted">芒果不是芒</span>
<a class="tag" taget="_blank" href="/search/Java%E9%9D%A2%E8%AF%95%E9%A2%98%E7%B2%BE%E9%80%89/1.htm">Java面试题精选</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/kafka/1.htm">kafka</a>
                        <div>一、Kafka的特性1.消息持久化:消息存储在磁盘,所以消息不会丢失2.高吞吐量:可以轻松实现单机百万级别的并发3.扩展性:扩展性强,还是动态扩展4.多客户端支持:支持多种语言(Java、C、C++、GO、)5.KafkaStreams(一个天生的流处理):在双十一或者销售大屏就会用到这种流处理。使用KafkaStreams可以快速的把销售额统计出来6.安全机制:Kafka进行生产或者消费的时候会</div>
                    </li>
                    <li><a href="/article/1835462485629562880.htm"
                           title="白骑士的Java教学基础篇 2.5 控制流语句" target="_blank">白骑士的Java教学基础篇 2.5 控制流语句</a>
                        <span class="text-muted">白骑士所长</span>
<a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/%E6%95%99%E5%AD%A6/1.htm">教学</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>欢迎继续学习Java编程的基础篇!在前面的章节中,我们了解了Java的变量、数据类型和运算符。接下来,我们将探讨Java中的控制流语句。控制流语句用于控制程序的执行顺序,使我们能够根据特定条件执行不同的代码块,或重复执行某段代码。这是编写复杂程序的基础。通过学习这一节内容,你将掌握如何使用条件语句和循环语句来编写更加灵活和高效的代码。条件语句条件语句用于根据条件的真假来执行不同的代码块。if语句‘</div>
                    </li>
                    <li><a href="/article/1835462232612368384.htm"
                           title="python语法——三目运算符" target="_blank">python语法——三目运算符</a>
                        <span class="text-muted">HappyRocking</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E4%B8%89%E7%9B%AE%E8%BF%90%E7%AE%97%E7%AC%A6/1.htm">三目运算符</a>
                        <div>在java中,有三目运算符,如:intc=(a>b)?a:b表示c取两者中的较大值。但是在python,不能直接这样使用,估计是因为冒号在python有分行的关键作用。那么在python中,如何实现类似功能呢?可以使用ifelse语句,也是一行可以完成,格式为:aifbelsec表示如果b为True,则表达式等于a,否则等于c。如:c=(aif(a>b)elseb)同样是完成了取最大值的功能。</div>
                    </li>
                    <li><a href="/article/1835457442260021248.htm"
                           title="ArrayList 源码解析" target="_blank">ArrayList 源码解析</a>
                        <span class="text-muted">程序猿进阶</span>
<a class="tag" taget="_blank" href="/search/Java%E5%9F%BA%E7%A1%80/1.htm">Java基础</a><a class="tag" taget="_blank" href="/search/ArrayList/1.htm">ArrayList</a><a class="tag" taget="_blank" href="/search/List/1.htm">List</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E9%9D%A2%E8%AF%95/1.htm">面试</a><a class="tag" taget="_blank" href="/search/%E6%80%A7%E8%83%BD%E4%BC%98%E5%8C%96/1.htm">性能优化</a><a class="tag" taget="_blank" href="/search/%E6%9E%B6%E6%9E%84%E8%AE%BE%E8%AE%A1/1.htm">架构设计</a><a class="tag" taget="_blank" href="/search/idea/1.htm">idea</a>
                        <div>ArrayList是Java集合框架中的一个动态数组实现,提供了可变大小的数组功能。它继承自AbstractList并实现了List接口,是顺序容器,即元素存放的数据与放进去的顺序相同,允许放入null元素,底层通过数组实现。除该类未实现同步外,其余跟Vector大致相同。每个ArrayList都有一个容量capacity,表示底层数组的实际大小,容器内存储元素的个数不能多于当前容量。当向容器中添</div>
                    </li>
                    <li><a href="/article/1835454921990828032.htm"
                           title="Java爬虫框架(一)--架构设计" target="_blank">Java爬虫框架(一)--架构设计</a>
                        <span class="text-muted">狼图腾-狼之传说</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E6%A1%86%E6%9E%B6/1.htm">框架</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E4%BB%BB%E5%8A%A1/1.htm">任务</a><a class="tag" taget="_blank" href="/search/html%E8%A7%A3%E6%9E%90%E5%99%A8/1.htm">html解析器</a><a class="tag" taget="_blank" href="/search/%E5%AD%98%E5%82%A8/1.htm">存储</a><a class="tag" taget="_blank" href="/search/%E7%94%B5%E5%AD%90%E5%95%86%E5%8A%A1/1.htm">电子商务</a>
                        <div>一、架构图那里搜网络爬虫框架主要针对电子商务网站进行数据爬取,分析,存储,索引。爬虫:爬虫负责爬取,解析,处理电子商务网站的网页的内容数据库:存储商品信息索引:商品的全文搜索索引Task队列:需要爬取的网页列表Visited表:已经爬取过的网页列表爬虫监控平台:web平台可以启动,停止爬虫,管理爬虫,task队列,visited表。二、爬虫1.流程1)Scheduler启动爬虫器,TaskMast</div>
                    </li>
                    <li><a href="/article/1835454543471669248.htm"
                           title="Java:爬虫框架" target="_blank">Java:爬虫框架</a>
                        <span class="text-muted">dingcho</span>
<a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a>
                        <div>一、ApacheNutch2【参考地址】Nutch是一个开源Java实现的搜索引擎。它提供了我们运行自己的搜索引擎所需的全部工具。包括全文搜索和Web爬虫。Nutch致力于让每个人能很容易,同时花费很少就可以配置世界一流的Web搜索引擎.为了完成这一宏伟的目标,Nutch必须能够做到:每个月取几十亿网页为这些网页维护一个索引对索引文件进行每秒上千次的搜索提供高质量的搜索结果简单来说Nutch支持分</div>
                    </li>
                    <li><a href="/article/1835450890077696000.htm"
                           title="python怎么将png转为tif_png转tif" target="_blank">python怎么将png转为tif_png转tif</a>
                        <span class="text-muted">weixin_39977276</span>

                        <div>发国外的文章要求图片是tif,cmyk色彩空间的。大小尺寸还有要求。比如网上大神多,找到了一段代码,感谢!https://www.jianshu.com/p/ec2af4311f56https://github.com/KevinZc007/image2Tifimportjava.awt.image.BufferedImage;importjava.io.File;importjava.io.Fi</div>
                    </li>
                    <li><a href="/article/1835448239864770560.htm"
                           title="JavaScript 中,深拷贝(Deep Copy)和浅拷贝(Shallow Copy)" target="_blank">JavaScript 中,深拷贝(Deep Copy)和浅拷贝(Shallow Copy)</a>
                        <span class="text-muted">跳房子的前端</span>
<a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF%E9%9D%A2%E8%AF%95/1.htm">前端面试</a><a class="tag" taget="_blank" href="/search/javascript/1.htm">javascript</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/ecmascript/1.htm">ecmascript</a>
                        <div>在JavaScript中,深拷贝(DeepCopy)和浅拷贝(ShallowCopy)是用于复制对象或数组的两种不同方法。了解它们的区别和应用场景对于避免潜在的bugs和高效地处理数据非常重要。以下是对深拷贝和浅拷贝的详细解释,包括它们的概念、用途、优缺点以及实现方式。1.浅拷贝(ShallowCopy)概念定义:浅拷贝是指创建一个新的对象或数组,其中包含了原对象或数组的基本数据类型的值和对引用数</div>
                    </li>
                    <li><a href="/article/1835444076007223296.htm"
                           title="JAVA·一个简单的登录窗口" target="_blank">JAVA·一个简单的登录窗口</a>
                        <span class="text-muted">MortalTom</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E5%AD%A6%E4%B9%A0/1.htm">学习</a>
                        <div>文章目录概要整体架构流程技术名词解释技术细节资源概要JavaSwing是Java基础类库的一部分,主要用于开发图形用户界面(GUI)程序整体架构流程新建项目,导入sql.jar包(链接放在了文末),编译项目并运行技术名词解释一、特点丰富的组件提供了多种可视化组件,如按钮(JButton)、文本框(JTextField)、标签(JLabel)、下拉列表(JComboBox)等,可以满足不同的界面设计</div>
                    </li>
                    <li><a href="/article/1835441932583661568.htm"
                           title="ESP32-C3入门教程 网络篇⑩——基于esp_https_ota和MQTT实现开机主动升级和被动触发升级的OTA功能" target="_blank">ESP32-C3入门教程 网络篇⑩——基于esp_https_ota和MQTT实现开机主动升级和被动触发升级的OTA功能</a>
                        <span class="text-muted">小康师兄</span>
<a class="tag" taget="_blank" href="/search/ESP32-C3%E5%85%A5%E9%97%A8%E6%95%99%E7%A8%8B/1.htm">ESP32-C3入门教程</a><a class="tag" taget="_blank" href="/search/https/1.htm">https</a><a class="tag" taget="_blank" href="/search/%E6%9C%8D%E5%8A%A1%E5%99%A8/1.htm">服务器</a><a class="tag" taget="_blank" href="/search/esp32/1.htm">esp32</a><a class="tag" taget="_blank" href="/search/OTA/1.htm">OTA</a><a class="tag" taget="_blank" href="/search/MQTT/1.htm">MQTT</a>
                        <div>文章目录一、前言二、软件流程三、部分源码四、运行演示一、前言本文基于VSCodeIDE进行编程、编译、下载、运行等操作基础入门章节请查阅:ESP32-C3入门教程基础篇①——基于VSCode构建HelloWorld教程目录大纲请查阅:ESP32-C3入门教程——导读ESP32-C3入门教程网络篇⑨——基于esp_https_ota实现史上最简单的ESP32OTA远程固件升级功能二、软件流程</div>
                    </li>
                    <li><a href="/article/1835438028009598976.htm"
                           title="WebMagic:强大的Java爬虫框架解析与实战" target="_blank">WebMagic:强大的Java爬虫框架解析与实战</a>
                        <span class="text-muted">Aaron_945</span>
<a class="tag" taget="_blank" href="/search/Java/1.htm">Java</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>文章目录引言官网链接WebMagic原理概述基础使用1.添加依赖2.编写PageProcessor高级使用1.自定义Pipeline2.分布式抓取优点结论引言在大数据时代,网络爬虫作为数据收集的重要工具,扮演着不可或缺的角色。Java作为一门广泛使用的编程语言,在爬虫开发领域也有其独特的优势。WebMagic是一个开源的Java爬虫框架,它提供了简单灵活的API,支持多线程、分布式抓取,以及丰富的</div>
                    </li>
                    <li><a href="/article/1835437775344726016.htm"
                           title="博客网站制作教程" target="_blank">博客网站制作教程</a>
                        <span class="text-muted">2401_85194651</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/maven/1.htm">maven</a>
                        <div>首先就是技术框架:后端:Java+SpringBoot数据库:MySQL前端:Vue.js数据库连接:JPA(JavaPersistenceAPI)1.项目结构blog-app/├──backend/│├──src/main/java/com/example/blogapp/││├──BlogApplication.java││├──config/│││└──DatabaseConfig.java</div>
                    </li>
                    <li><a href="/article/1835435506645692416.htm"
                           title="00. 这里整理了最全的爬虫框架(Java + Python)" target="_blank">00. 这里整理了最全的爬虫框架(Java + Python)</a>
                        <span class="text-muted">有一只柴犬</span>
<a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB%E7%B3%BB%E5%88%97/1.htm">爬虫系列</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>目录1、前言2、什么是网络爬虫3、常见的爬虫框架3.1、java框架3.1.1、WebMagic3.1.2、Jsoup3.1.3、HttpClient3.1.4、Crawler4j3.1.5、HtmlUnit3.1.6、Selenium3.2、Python框架3.2.1、Scrapy3.2.2、BeautifulSoup+Requests3.2.3、Selenium3.2.4、PyQuery3.2</div>
                    </li>
                    <li><a href="/article/1835429581205630976.htm"
                           title="JAVA学习笔记之23种设计模式学习" target="_blank">JAVA学习笔记之23种设计模式学习</a>
                        <span class="text-muted">victorfreedom</span>
<a class="tag" taget="_blank" href="/search/Java%E6%8A%80%E6%9C%AF/1.htm">Java技术</a><a class="tag" taget="_blank" href="/search/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/1.htm">设计模式</a><a class="tag" taget="_blank" href="/search/android/1.htm">android</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%B8%B8%E7%94%A8%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/1.htm">常用设计模式</a>
                        <div>博主最近买了《设计模式》这本书来学习,无奈这本书是以C++语言为基础进行说明,整个学习流程下来效率不是很高,虽然有的设计模式通俗易懂,但感觉还是没有充分的掌握了所有的设计模式。于是博主百度了一番,发现有大神写过了这方面的问题,于是博主迅速拿来学习。一、设计模式的分类总体来说设计模式分为三大类:创建型模式,共五种:工厂方法模式、抽象工厂模式、单例模式、建造者模式、原型模式。结构型模式,共七种:适配器</div>
                    </li>
                                <li><a href="/article/77.htm"
                                       title="算法 单链的创建与删除" target="_blank">算法 单链的创建与删除</a>
                                    <span class="text-muted">换个号韩国红果果</span>
<a class="tag" taget="_blank" href="/search/c/1.htm">c</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a>
                                    <div>
先创建结构体
struct student {
	int data;
	//int tag;//标记这是第几个
	struct student *next;
};
//  addone 用于将一个数插入已从小到大排好序的链中
struct student *addone(struct student *h,int x){
		if(h==NULL)  //??????
			</div>
                                </li>
                                <li><a href="/article/204.htm"
                                       title="《大型网站系统与Java中间件实践》第2章读后感" target="_blank">《大型网站系统与Java中间件实践》第2章读后感</a>
                                    <span class="text-muted">白糖_</span>
<a class="tag" taget="_blank" href="/search/java%E4%B8%AD%E9%97%B4%E4%BB%B6/1.htm">java中间件</a>
                                    <div>       断断续续花了两天时间试读了《大型网站系统与Java中间件实践》的第2章,这章总述了从一个小型单机构建的网站发展到大型网站的演化过程---整个过程会遇到很多困难,但每一个屏障都会有解决方案,最终就是依靠这些个解决方案汇聚到一起组成了一个健壮稳定高效的大型系统。 
  
       看完整章内容,</div>
                                </li>
                                <li><a href="/article/331.htm"
                                       title="zeus持久层spring事务单元测试" target="_blank">zeus持久层spring事务单元测试</a>
                                    <span class="text-muted">deng520159</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/DAO/1.htm">DAO</a><a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/jdbc/1.htm">jdbc</a>
                                    <div>今天把zeus事务单元测试放出来,让大家指出他的毛病, 
1.ZeusTransactionTest.java 单元测试 
  
package com.dengliang.zeus.webdemo.test;

import java.util.ArrayList;
import java.util.List;

import org.junit.Test;
import </div>
                                </li>
                                <li><a href="/article/458.htm"
                                       title="Rss 订阅 开发" target="_blank">Rss 订阅 开发</a>
                                    <span class="text-muted">周凡杨</span>
<a class="tag" taget="_blank" href="/search/html/1.htm">html</a><a class="tag" taget="_blank" href="/search/xml/1.htm">xml</a><a class="tag" taget="_blank" href="/search/%E8%AE%A2%E9%98%85/1.htm">订阅</a><a class="tag" taget="_blank" href="/search/rss/1.htm">rss</a><a class="tag" taget="_blank" href="/search/%E8%A7%84%E8%8C%83/1.htm">规范</a>
                                    <div>  
              RSS是 Really Simple Syndication的缩写(对rss2.0而言,是这三个词的缩写,对rss1.0而言则是RDF Site Summary的缩写,1.0与2.0走的是两个体系)。 
  
RSS</div>
                                </li>
                                <li><a href="/article/585.htm"
                                       title="分页查询实现" target="_blank">分页查询实现</a>
                                    <span class="text-muted">g21121</span>
<a class="tag" taget="_blank" href="/search/%E5%88%86%E9%A1%B5%E6%9F%A5%E8%AF%A2/1.htm">分页查询</a>
                                    <div>在查询列表时我们常常会用到分页,分页的好处就是减少数据交换,每次查询一定数量减少数据库压力等等。 
按实现形式分前台分页和服务器分页: 
前台分页就是一次查询出所有记录,在页面中用js进行虚拟分页,这种形式在数据量较小时优势比较明显,一次加载就不必再访问服务器了,但当数据量较大时会对页面造成压力,传输速度也会大幅下降。 
服务器分页就是每次请求相同数量记录,按一定规则排序,每次取一定序号直接的数据</div>
                                </li>
                                <li><a href="/article/712.htm"
                                       title="spring jms异步消息处理" target="_blank">spring jms异步消息处理</a>
                                    <span class="text-muted">510888780</span>
<a class="tag" taget="_blank" href="/search/jms/1.htm">jms</a>
                                    <div>spring JMS对于异步消息处理基本上只需配置下就能进行高效的处理。其核心就是消息侦听器容器,常用的类就是DefaultMessageListenerContainer。该容器可配置侦听器的并发数量,以及配合MessageListenerAdapter使用消息驱动POJO进行消息处理。且消息驱动POJO是放入TaskExecutor中进行处理,进一步提高性能,减少侦听器的阻塞。具体配置如下: </div>
                                </li>
                                <li><a href="/article/839.htm"
                                       title="highCharts柱状图" target="_blank">highCharts柱状图</a>
                                    <span class="text-muted">布衣凌宇</span>
<a class="tag" taget="_blank" href="/search/hightCharts/1.htm">hightCharts</a><a class="tag" taget="_blank" href="/search/%E6%9F%B1%E5%9B%BE/1.htm">柱图</a>
                                    <div>第一步:导入 exporting.js,grid.js,highcharts.js;第二步:写controller 
  
@Controller@RequestMapping(value="${adminPath}/statistick")public class StatistickController {  private UserServi</div>
                                </li>
                                <li><a href="/article/966.htm"
                                       title="我的spring学习笔记2-IoC(反向控制 依赖注入)" target="_blank">我的spring学习笔记2-IoC(反向控制 依赖注入)</a>
                                    <span class="text-muted">aijuans</span>
<a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a><a class="tag" taget="_blank" href="/search/mvc/1.htm">mvc</a><a class="tag" taget="_blank" href="/search/Spring+%E6%95%99%E7%A8%8B/1.htm">Spring 教程</a><a class="tag" taget="_blank" href="/search/spring3+%E6%95%99%E7%A8%8B/1.htm">spring3 教程</a><a class="tag" taget="_blank" href="/search/Spring+%E5%85%A5%E9%97%A8/1.htm">Spring 入门</a>
                                    <div>IoC(反向控制 依赖注入)这是Spring提出来了,这也是Spring一大特色。这里我不用多说,我们看Spring教程就可以了解。当然我们不用Spring也可以用IoC,下面我将介绍不用Spring的IoC。 
IoC不是框架,她是java的技术,如今大多数轻量级的容器都会用到IoC技术。这里我就用一个例子来说明: 
如:程序中有 Mysql.calss 、Oracle.class 、SqlSe</div>
                                </li>
                                <li><a href="/article/1093.htm"
                                       title="TLS java简单实现" target="_blank">TLS java简单实现</a>
                                    <span class="text-muted">antlove</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/ssl/1.htm">ssl</a><a class="tag" taget="_blank" href="/search/keystore/1.htm">keystore</a><a class="tag" taget="_blank" href="/search/tls/1.htm">tls</a><a class="tag" taget="_blank" href="/search/secure/1.htm">secure</a>
                                    <div>  
1. SSLServer.java 
package ssl;

import java.io.FileInputStream;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.security.KeyStore;
import </div>
                                </li>
                                <li><a href="/article/1220.htm"
                                       title="Zip解压压缩文件" target="_blank">Zip解压压缩文件</a>
                                    <span class="text-muted">百合不是茶</span>
<a class="tag" taget="_blank" href="/search/Zip%E6%A0%BC%E5%BC%8F%E8%A7%A3%E5%8E%8B/1.htm">Zip格式解压</a><a class="tag" taget="_blank" href="/search/Zip%E6%B5%81%E7%9A%84%E4%BD%BF%E7%94%A8/1.htm">Zip流的使用</a><a class="tag" taget="_blank" href="/search/%E6%96%87%E4%BB%B6%E8%A7%A3%E5%8E%8B/1.htm">文件解压</a>
                                    <div>  
 ZIP文件的解压缩实质上就是从输入流中读取数据。Java.util.zip包提供了类ZipInputStream来读取ZIP文件,下面的代码段创建了一个输入流来读取ZIP格式的文件; 
ZipInputStream in = new ZipInputStream(new FileInputStream(zipFileName)); 
  
  
&n</div>
                                </li>
                                <li><a href="/article/1347.htm"
                                       title="underscore.js 学习(一)" target="_blank">underscore.js 学习(一)</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/JavaScript/1.htm">JavaScript</a><a class="tag" taget="_blank" href="/search/underscore/1.htm">underscore</a>
                                    <div>        工作中需要用到underscore.js,发现这是一个包括了很多基本功能函数的js库,里面有很多实用的函数。而且它没有扩展 javascript的原生对象。主要涉及对Collection、Object、Array、Function的操作。       学</div>
                                </li>
                                <li><a href="/article/1474.htm"
                                       title="java jvm常用命令工具——jstatd命令(Java Statistics Monitoring Daemon)" target="_blank">java jvm常用命令工具——jstatd命令(Java Statistics Monitoring Daemon)</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a><a class="tag" taget="_blank" href="/search/jstatd/1.htm">jstatd</a>
                                    <div>1.介绍 
        jstatd是一个基于RMI(Remove Method Invocation)的服务程序,它用于监控基于HotSpot的JVM中资源的创建及销毁,并且提供了一个远程接口允许远程的监控工具连接到本地的JVM执行命令。 
        jstatd是基于RMI的,所以在运行jstatd的服务</div>
                                </li>
                                <li><a href="/article/1601.htm"
                                       title="【Spring框架三】Spring常用注解之Transactional" target="_blank">【Spring框架三】Spring常用注解之Transactional</a>
                                    <span class="text-muted">bit1129</span>
<a class="tag" taget="_blank" href="/search/transactional/1.htm">transactional</a>
                                    <div>Spring可以通过注解@Transactional来为业务逻辑层的方法(调用DAO完成持久化动作)添加事务能力,如下是@Transactional注解的定义: 
  
/*
 * Copyright 2002-2010 the original author or authors.
 *
 * Licensed under the Apache License, Version </div>
                                </li>
                                <li><a href="/article/1728.htm"
                                       title="我(程序员)的前进方向" target="_blank">我(程序员)的前进方向</a>
                                    <span class="text-muted">bitray</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>作为一个普通的程序员,我一直游走在java语言中,java也确实让我有了很多的体会.不过随着学习的深入,java语言的新技术产生的越来越多,从最初期的javase,我逐渐开始转变到ssh,ssi,这种主流的码农,.过了几天为了解决新问题,webservice的大旗也被我祭出来了,又过了些日子jms架构的activemq也开始必须学习了.再后来开始了一系列技术学习,osgi,restful.....</div>
                                </li>
                                <li><a href="/article/1855.htm"
                                       title="nginx lua开发经验总结" target="_blank">nginx lua开发经验总结</a>
                                    <span class="text-muted">ronin47</span>

                                    <div>使用nginx lua已经两三个月了,项目接开发完毕了,这几天准备上线并且跟高德地图对接。回顾下来lua在项目中占得必中还是比较大的,跟PHP的占比差不多持平了,因此在开发中遇到一些问题备忘一下  1:content_by_lua中代码容量有限制,一般不要写太多代码,正常编写代码一般在100行左右(具体容量没有细心测哈哈,在4kb左右),如果超出了则重启nginx的时候会报 too long pa</div>
                                </li>
                                <li><a href="/article/1982.htm"
                                       title="java-66-用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。颠倒之后的栈为{5,4,3,2,1},5处在栈顶" target="_blank">java-66-用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。颠倒之后的栈为{5,4,3,2,1},5处在栈顶</a>
                                    <span class="text-muted">bylijinnan</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>
import java.util.Stack;

public class ReverseStackRecursive {

	/**
	 * Q 66.颠倒栈。
	 * 题目:用递归颠倒一个栈。例如输入栈{1,2,3,4,5},1在栈顶。
	 * 颠倒之后的栈为{5,4,3,2,1},5处在栈顶。
	 *1. Pop the top element
	 *2. Revers</div>
                                </li>
                                <li><a href="/article/2109.htm"
                                       title="正确理解Linux内存占用过高的问题" target="_blank">正确理解Linux内存占用过高的问题</a>
                                    <span class="text-muted">cfyme</span>
<a class="tag" taget="_blank" href="/search/linux/1.htm">linux</a>
                                    <div>Linux开机后,使用top命令查看,4G物理内存发现已使用的多大3.2G,占用率高达80%以上: 
Mem:   3889836k total,  3341868k used,   547968k free,   286044k buffers 
Swap:  6127608k total,&nb</div>
                                </li>
                                <li><a href="/article/2236.htm"
                                       title="[JWFD开源工作流]当前流程引擎设计的一个急需解决的问题" target="_blank">[JWFD开源工作流]当前流程引擎设计的一个急需解决的问题</a>
                                    <span class="text-muted">comsci</span>
<a class="tag" taget="_blank" href="/search/%E5%B7%A5%E4%BD%9C%E6%B5%81/1.htm">工作流</a>
                                    <div> 
 
     当我们的流程引擎进入IRC阶段的时候,当循环反馈模型出现之后,每次循环都会导致一大堆节点内存数据残留在系统内存中,循环的次数越多,这些残留数据将导致系统内存溢出,并使得引擎崩溃。。。。。。 
 
      而解决办法就是利用汇编语言或者其它系统编程语言,在引擎运行时,把这些残留数据清除掉。</div>
                                </li>
                                <li><a href="/article/2363.htm"
                                       title="自定义类的equals函数" target="_blank">自定义类的equals函数</a>
                                    <span class="text-muted">dai_lm</span>
<a class="tag" taget="_blank" href="/search/equals/1.htm">equals</a>
                                    <div>仅作笔记使用 
 

public class VectorQueue {

	private final Vector<VectorItem> queue;

	private class VectorItem {
		private final Object item;
		private final int quantity;

		public VectorI</div>
                                </li>
                                <li><a href="/article/2490.htm"
                                       title="Linux下安装R语言" target="_blank">Linux下安装R语言</a>
                                    <span class="text-muted">datageek</span>
<a class="tag" taget="_blank" href="/search/R%E8%AF%AD%E8%A8%80+linux/1.htm">R语言 linux</a>
                                    <div>命令如下:sudo gedit  /etc/apt/sources.list1、deb http://mirrors.ustc.edu.cn/CRAN/bin/linux/ubuntu/ precise/ 2、deb http://dk.archive.ubuntu.com/ubuntu hardy universesudo apt-key adv --keyserver ke</div>
                                </li>
                                <li><a href="/article/2617.htm"
                                       title="如何修改mysql 并发数(连接数)最大值" target="_blank">如何修改mysql 并发数(连接数)最大值</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/mysql/1.htm">mysql</a>
                                    <div>MySQL的连接数最大值跟MySQL没关系,主要看系统和业务逻辑了 
  
方法一:进入MYSQL安装目录 打开MYSQL配置文件 my.ini 或 my.cnf查找 max_connections=100 修改为 max_connections=1000 服务里重起MYSQL即可 
  方法二:MySQL的最大连接数默认是100客户端登录:mysql -uusername -ppass</div>
                                </li>
                                <li><a href="/article/2744.htm"
                                       title="单一功能原则" target="_blank">单一功能原则</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/%E9%9D%A2%E5%90%91%E5%AF%B9%E8%B1%A1%E7%9A%84%E7%A8%8B%E5%BA%8F%E8%AE%BE%E8%AE%A1/1.htm">面向对象的程序设计</a><a class="tag" taget="_blank" href="/search/%E8%BD%AF%E4%BB%B6%E8%AE%BE%E8%AE%A1/1.htm">软件设计</a><a class="tag" taget="_blank" href="/search/%E7%BC%96%E7%A8%8B%E5%8E%9F%E5%88%99/1.htm">编程原则</a>
                                    <div>单一功能原则[
编辑]            
SOLID    原则    
 
 单一功能原则 
 开闭原则 
 Liskov代换原则 
 接口隔离原则 
 依赖反转原则 
      
 
 查   
 论   
 编 
      
在面向对象编程领域中,单一功能原则(Single responsibility principle)规定每个类都应该有</div>
                                </li>
                                <li><a href="/article/2871.htm"
                                       title="POJO、VO和JavaBean区别和联系" target="_blank">POJO、VO和JavaBean区别和联系</a>
                                    <span class="text-muted">fanmingxing</span>
<a class="tag" taget="_blank" href="/search/VO/1.htm">VO</a><a class="tag" taget="_blank" href="/search/POJO/1.htm">POJO</a><a class="tag" taget="_blank" href="/search/javabean/1.htm">javabean</a>
                                    <div>POJO和JavaBean是我们常见的两个关键字,一般容易混淆,POJO全称是Plain Ordinary Java Object / Plain Old Java Object,中文可以翻译成:普通Java类,具有一部分getter/setter方法的那种类就可以称作POJO,但是JavaBean则比POJO复杂很多,JavaBean是一种组件技术,就好像你做了一个扳子,而这个扳子会在很多地方被</div>
                                </li>
                                <li><a href="/article/2998.htm"
                                       title="SpringSecurity3.X--LDAP:AD配置" target="_blank">SpringSecurity3.X--LDAP:AD配置</a>
                                    <span class="text-muted">hanqunfeng</span>
<a class="tag" taget="_blank" href="/search/SpringSecurity/1.htm">SpringSecurity</a>
                                    <div>前面介绍过基于本地数据库验证的方式,参考http://hanqunfeng.iteye.com/blog/1155226,这里说一下如何修改为使用AD进行身份验证【只对用户名和密码进行验证,权限依旧存储在本地数据库中】。 
  
将配置文件中的如下部分删除: 
  <!-- 认证管理器,使用自定义的UserDetailsService,并对密码采用md5加密-->  
  </div>
                                </li>
                                <li><a href="/article/3125.htm"
                                       title="mac mysql 修改密码" target="_blank">mac mysql 修改密码</a>
                                    <span class="text-muted">IXHONG</span>
<a class="tag" taget="_blank" href="/search/mysql/1.htm">mysql</a>
                                    <div>$ sudo /usr/local/mysql/bin/mysqld_safe –user=root & //启动MySQL(也可以通过偏好设置面板来启动)$ sudo /usr/local/mysql/bin/mysqladmin -uroot password yourpassword //设置MySQL密码(注意,这是第一次MySQL密码为空的时候的设置命令,如果是修改密码,还需在-</div>
                                </li>
                                <li><a href="/article/3252.htm"
                                       title="设计模式--抽象工厂模式" target="_blank">设计模式--抽象工厂模式</a>
                                    <span class="text-muted">kerryg</span>
<a class="tag" taget="_blank" href="/search/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/1.htm">设计模式</a>
                                    <div>抽象工厂模式: 
 
    工厂模式有一个问题就是,类的创建依赖于工厂类,也就是说,如果想要拓展程序,必须对工厂类进行修改,这违背了闭包原则。我们采用抽象工厂模式,创建多个工厂类,这样一旦需要增加新的功能,直接增加新的工厂类就可以了,不需要修改之前的代码。 
 
    总结:这个模式的好处就是,如果想增加一个功能,就需要做一个实现类,</div>
                                </li>
                                <li><a href="/article/3379.htm"
                                       title="评"高中女生军训期跳楼”" target="_blank">评"高中女生军训期跳楼”</a>
                                    <span class="text-muted">nannan408</span>

                                    <div>   首先,先抛出我的观点,各位看官少点砖头。那就是,中国的差异化教育必须做起来。 
   孔圣人有云:有教无类。不同类型的人,都应该有对应的教育方法。目前中国的一体化教育,不知道已经扼杀了多少创造性人才。我们出不了爱迪生,出不了爱因斯坦,很大原因,是我们的培养思路错了,我们是第一要“顺从”。如果不顺从,我们的学校,就会用各种方法,罚站,罚写作业,各种罚。军</div>
                                </li>
                                <li><a href="/article/3506.htm"
                                       title="scala如何读取和写入文件内容?" target="_blank">scala如何读取和写入文件内容?</a>
                                    <span class="text-muted">qindongliang1922</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/jvm/1.htm">jvm</a><a class="tag" taget="_blank" href="/search/scala/1.htm">scala</a>
                                    <div>直接看如下代码: 
 
package file

import java.io.RandomAccessFile
import java.nio.charset.Charset

import scala.io.Source
import scala.reflect.io.{File, Path}

/**
 * Created by qindongliang on 2015/</div>
                                </li>
                                <li><a href="/article/3633.htm"
                                       title="C语言算法之百元买百鸡" target="_blank">C语言算法之百元买百鸡</a>
                                    <span class="text-muted">qiufeihu</span>
<a class="tag" taget="_blank" href="/search/c/1.htm">c</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a>
                                    <div>中国古代数学家张丘建在他的《算经》中提出了一个著名的“百钱买百鸡问题”,鸡翁一,值钱五,鸡母一,值钱三,鸡雏三,值钱一,百钱买百鸡,问翁,母,雏各几何? 
代码如下: 
#include <stdio.h>
int main()
{
	int cock,hen,chick;                               /*定义变量为基本整型*/
	for(coc</div>
                                </li>
                                <li><a href="/article/3760.htm"
                                       title="Hadoop集群安全性:Hadoop中Namenode单点故障的解决方案及详细介绍AvatarNode" target="_blank">Hadoop集群安全性:Hadoop中Namenode单点故障的解决方案及详细介绍AvatarNode</a>
                                    <span class="text-muted">wyz2009107220</span>
<a class="tag" taget="_blank" href="/search/NameNode/1.htm">NameNode</a>
                                    <div>正如大家所知,NameNode在Hadoop系统中存在单点故障问题,这个对于标榜高可用性的Hadoop来说一直是个软肋。本文讨论一下为了解决这个问题而存在的几个solution。 
1. Secondary NameNode 
原理:Secondary NN会定期的从NN中读取editlog,与自己存储的Image进行合并形成新的metadata image 
优点:Hadoop较早的版本都自带,</div>
                                </li>
                </ul>
            </div>
        </div>
    </div>

<div>
    <div class="container">
        <div class="indexes">
            <strong>按字母分类:</strong>
            <a href="/tags/A/1.htm" target="_blank">A</a><a href="/tags/B/1.htm" target="_blank">B</a><a href="/tags/C/1.htm" target="_blank">C</a><a
                href="/tags/D/1.htm" target="_blank">D</a><a href="/tags/E/1.htm" target="_blank">E</a><a href="/tags/F/1.htm" target="_blank">F</a><a
                href="/tags/G/1.htm" target="_blank">G</a><a href="/tags/H/1.htm" target="_blank">H</a><a href="/tags/I/1.htm" target="_blank">I</a><a
                href="/tags/J/1.htm" target="_blank">J</a><a href="/tags/K/1.htm" target="_blank">K</a><a href="/tags/L/1.htm" target="_blank">L</a><a
                href="/tags/M/1.htm" target="_blank">M</a><a href="/tags/N/1.htm" target="_blank">N</a><a href="/tags/O/1.htm" target="_blank">O</a><a
                href="/tags/P/1.htm" target="_blank">P</a><a href="/tags/Q/1.htm" target="_blank">Q</a><a href="/tags/R/1.htm" target="_blank">R</a><a
                href="/tags/S/1.htm" target="_blank">S</a><a href="/tags/T/1.htm" target="_blank">T</a><a href="/tags/U/1.htm" target="_blank">U</a><a
                href="/tags/V/1.htm" target="_blank">V</a><a href="/tags/W/1.htm" target="_blank">W</a><a href="/tags/X/1.htm" target="_blank">X</a><a
                href="/tags/Y/1.htm" target="_blank">Y</a><a href="/tags/Z/1.htm" target="_blank">Z</a><a href="/tags/0/1.htm" target="_blank">其他</a>
        </div>
    </div>
</div>
<footer id="footer" class="mb30 mt30">
    <div class="container">
        <div class="footBglm">
            <a target="_blank" href="/">首页</a> -
            <a target="_blank" href="/custom/about.htm">关于我们</a> -
            <a target="_blank" href="/search/Java/1.htm">站内搜索</a> -
            <a target="_blank" href="/sitemap.txt">Sitemap</a> -
            <a target="_blank" href="/custom/delete.htm">侵权投诉</a>
        </div>
        <div class="copyright">版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.
<!--            <a href="https://beian.miit.gov.cn/" rel="nofollow" target="_blank">京ICP备09083238号</a><br>-->
        </div>
    </div>
</footer>
<!-- 代码高亮 -->
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shCore.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shLegacy.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shAutoloader.js"></script>
<link type="text/css" rel="stylesheet" href="/static/syntaxhighlighter/styles/shCoreDefault.css"/>
<script type="text/javascript" src="/static/syntaxhighlighter/src/my_start_1.js"></script>





</body>

</html>