爬虫 第二讲 urllib模块和requests模块

文章目录

  • 一、urllib模块
    • 1.什么是urllib模块?
      • 1.urllib.request模块
      • 2.响应对象
      • 3.urllib.parse模块
      • 5.练习1:输入指定内容在百度中搜索,并保存网页内容
      • 6.练习2:输入指定内容在百度贴吧中搜索,并保存多个网页内容
      • 7.优化代码
  • 二、requests模块
    • 1.安装
    • 2.requests常用方法
    • 3.响应对象response的方法
    • 4.requests模块发送 POST请求
    • 5.requests设置代理
    • 6.处理不信任的SSL证书
    • 7.cookie
    • 8.会话

一、urllib模块

1.什么是urllib模块?

python的内置网络请求模块
为什么要学习这个模块?
1,有些比较老的爬虫项目用的就是这个技术
2.有的时候我们去爬取一些数据需要请求和urllib的配合使用
3.内置模块是标准库

示例1

# 保存'未来汽车'图片到本地
import requests

response = requests.get(
    'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fphotocdn.sohu.com%2F20120823%2FImg351337268.jpg&refer=http%3A%2F%2Fphotocdn.sohu.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1621479722&t=5cda4533dad4056d5809bf5f2450a22f').content
with open('未来汽车.jpg', 'wb') as f:
    f.write(response)

爬虫 第二讲 urllib模块和requests模块_第1张图片

示例2

# 保存'未来汽车'图片到本地
from urllib.request import urlretrieve

response = urlretrieve(
    'https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fphotocdn.sohu.com%2F20120823%2FImg351337268.jpg&refer=http%3A%2F%2Fphotocdn.sohu.com&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1621479722&t=5cda4533dad4056d5809bf5f2450a22f',
    '未来汽车.jpg')

爬虫 第二讲 urllib模块和requests模块_第2张图片

1.urllib.request模块

python2:urllib2,urllib
python3:把urllib和urllib2合并常用的方法

  • urllib.request.urlopen(“网址”) 作用 :向网站发起一个请求并获取响应
  • 字节流 = response.read()
  • 字符串 = response.read().decode(“utf-8”)
  • urllib.request.Request"网址",headers=“字典”) urlopen()不支持重构User-Agent

示例1

# urllib.request实现
# urllib.request.urlopen('网址')
# 作用:向网站发起请求并响应
import urllib.request

response = urllib.request.urlopen('https://www.baidu.com/')
print(type(response))  # 
print(response.read())
'''
b'\r\n\r\n\t\r\n\r\n\r\n\t\r\n\r\n'
'''   # 1.字节流bytes,需要解码     2.数据不对(网站做了反爬),需要添加ua

示例2

# 示例2
import urllib.request
headers = {
     'User-Agent': 'Mozilla/5.0'}
req = urllib.request.Request("https://www.baidu.com/",headers=headers)
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))
# 这样出来的数据没有问题

2.响应对象

  • read() 读取服务器响应的内容
  • getcode() 返回HTTP的响应码
  • geturl() 返回实际数据的URL(防止重定向问题)

示例

import urllib.request

# 向指定的url地址发起请求,并返回服务器响应的数据(文件的对象)
url = "http://www.baidu.com"
# 编码
newUrl2 = urllib.request.quote(url)
print(newUrl2)  # http%3A//www.baidu.com
# 解码
newUrl1 = urllib.request.unquote(newUrl2)
print(newUrl1)  # http://www.baidu.com

response = urllib.request.urlopen(newUrl1)
data = response.read()
print(data)
'''
b'\n\n\n    
# 返回当前环境的有关信息
print(response.info())
'''
Bdpagetype: 1
Bdqid: 0x0932eb0001c8f
Cache-Control: private
Content-Type: text/html;charset=utf-8
Date: Thu, 22 Apr 2021 02:28:48 GMT
Expires: Thu, 22 Apr 2021 02:27:56 GMT
P3p: CP=" OTI DSP COR IVA OUR IND COM "
P3p: CP=" OTI DSP COR IVA OUR IND COM "
Server: BWS/1.1
Set-Cookie: BAIDUID=1153AD40DBAF90F5435353FC10B:FG=1; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=214453447; path=/; domain=.baidu.com
Set-Cookie: BIDUPSID=1153AD40DBAF90F5432D31787A9FC10B; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: PSTM=163534548; expires=Thu, 31-Dec-37 23:55:55 GMT; max-age=2147483647; path=/; domain=.baidu.com
Set-Cookie: BAIDUID=1153AD35DBAF90F65468EA7EB4533595:FG=1; max-age=345435300; expires=Fri, 22-Apr-22 02:28:48 GMT; domain=.baidu.com; path=/; version=1; comment=bd
Set-Cookie: BDSVRTM=0; path=/
Set-Cookie: BD_HOME=1; path=/
Set-Cookie: H_PS_PSSID=33345_3345_315353_3364_3465_265740_28657; path=/; domain=.baidu.com
Traceid: 1745674674688087589780678584646519
Vary: Accept-Encoding
Vary: Accept-Encoding
X-Ua-Compatible: IE=Edge,chrome=1
Connection: close
Transfer-Encoding: chunked'''
# 返回状态码
print(response.getcode())  # 200
# if response.getcode() == 200 or response.getcode() == 304:
# 处理网页信息
#    pass

# 返回当前只在爬取的URL地址
print(response.geturl())  # http://www.baidu.com

3.urllib.parse模块

常用方法

  • urlencode(字典)
  • quote(字符串) (这个里面的参数是个字符串)

示例1

import urllib.request

# 如何编码 3个%是一个汉字
url = 'https://tieba.baidu.com/f?fr=wwwt&ie=utf-8&kw=%E7%BE%BD%E5%93%A5'
url1 = 'https://tieba.baidu.com/f?fr=wwwt&ie=utf-8&kw=羽哥'

# res = urllib.request.urlopen(url1)
# 如果我通过urllib向一个携带中文字样的url发起请求,这个时候需要注意把中文转换为 % + 十六进制 的这种数据类型:%E7%BE%BD%E5%93%A5
import urllib.parse

wd = {
     'wd': '羽哥'}
result = urllib.parse.urlencode(wd)
print(result)  # wd=%E7%BE%BD%E5%93%A5
new_url = 'https://tieba.baidu.com/f?fr=wwwt&ie=utf-8&' + result

示例2

# 示例2

import urllib.request
url2 = "https://tieba.baidu.com/f?kw=%E7%BE%BD%E5%93%A5"

# 解码
newUrl = urllib.request.unquote(url2)
print(newUrl)
'''
https://tieba.baidu.com/f?kw=羽哥'''
# 编码
newUrl2 = urllib.request.quote(newUrl)
print(newUrl2)
'''
https%3A//tieba.baidu.com/f%3Fkw%3D%E7%BE%BD%E5%93%A5'''
案例1:爬取王者荣耀高清壁纸

```python
# 爬取王者荣耀高清壁纸
# 网页分析:
# 主页: https://pvp.qq.com/web201605/wallpaper.shtml
# 第一页:https://pvp.qq.com/web201605/wallpaper.shtml
# 最后一页:https://pvp.qq.com/web201605/wallpaper.shtml  网址一样,说明下一页的图片是动态加载
# 每一页图片数量4*5=20张,共25页(第25页是6张图),一共20*24+6=486张高清图
# 第一张图地址:http://shp.qpic.cn/ishow/2735042018/1618915966_84828260_2160_sProdImgNo_7.jpg/0
# 第二张图地址:http://shp.qpic.cn/ishow/2735041519/1618485631_84828260_22420_sProdImgNo_7.jpg/0
# 第三张图地址:http://shp.qpic.cn/ishow/2735040920/1617970550_84828260_22886_sProdImgNo_7.jpg/0
# 第486张图地址:http://shp.qpic.cn/ishow/2735122518/1545733077_-888937974_7302_sProdImgNo_7.jpg/0
# No_2表示分辨率:1024x768,No_5表示分辨率:1440x900,No_7表示分辨率:1920x1200  图片地址结尾是.jpg/0
# 第1页js加载出来的数据:地址为https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page=0&iOrder=0&iSortNumClose=1&jsoncallback=jQuery17103347171427601099_1619236609881&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1619236924589
#  第2页js加载出来的数据:地址为https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page=1&iOrder=0&iSortNumClose=1&jsoncallback=jQuery17103347171427601099_1619236609882&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1619237018804
# 第25页js加载出来的数据:地址为https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page=24&iOrder=0&iSortNumClose=1&jsoncallback=jQuery17103347171427601099_1619236609886&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735&iModuleId=2735&_=1619237111998
import requests
from requests.packages.urllib3.exceptions import InsecureRequestWarning
import re
import urllib.parse


class WangzheSpider:
    def __init__(self):
        self.headers = {
     
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36'}

    def get_url(self):
        urls = []
        for i in range(25):
            url = 'https://apps.game.qq.com/cgi-bin/ams/module/ishow/V1.0/query/workList_inc.cgi?activityId=2735&sVerifyCode=ABCD&sDataType=JSON&iListNum=20&totalpage=0&page={}&iOrder=0&iSortNumClose=1&jsoncallback=jQuery17103347171427601099&iAMSActivityId=51991&_everyRead=true&iTypeId=2&iFlowId=267733&iActId=2735'.format(
                i)
            urls.append(url)
        return urls

    def req_page(self, url, headers):
        reponse = requests.get(url, verify=False, headers=self.headers).content
        return reponse

    def write_page(self, html, filename):
        print('正在保存%s' % filename)
        response = self.req_page(html,headers=self.headers)
        with open(filename,'wb')as f:
            f.write(response)
        print('%s保存完毕' % filename)

    def main(self):
        urls = self.get_url()
        hero_names = []
        hero_images = []
        for i in urls:
            response = self.req_page(i, headers=self.headers).decode('utf-8')
            pat1 = r'"sProdImgNo_7":"(.*?)",'
            content1 = re.compile(pat1, re.S)
            sProdImgNo_7_list = content1.findall(response)
            image_address_list = []
            for item in sProdImgNo_7_list:
                i = urllib.parse.unquote(item)
                image_address_list.append(i)
            # print(image_address_lists, len(image_address_lists))
            hero_images += image_address_list
            pat2 = r'"sProdName":"(.*?)",'
            content2 = re.compile(pat2, re.S)
            sProdName_list = content2.findall(response)
            hero_name_list = []
            for item in sProdName_list:
                i = urllib.parse.unquote(item)
                hero_name_list.append(i)
            # print(hero_name_list, len(hero_name_list))
            hero_names += hero_name_list
        # print(hero_names,hero_images)
        finall_list = zip(hero_names, hero_images)
        for i in finall_list:
            html = i[1].replace('7.jpg/200', '5.jpg/0')
            filename = './image/%s.jpg' % i[0]
            self.write_page(html, filename)


if __name__ == '__main__':
    requests.packages.urllib3.disable_warnings(InsecureRequestWarning)
    spider = WangzheSpider()
    spider.main()

发送 POST请求

示例:# 简单的翻译小软件

# 需求简单的翻译小软件
import urllib.request
import urllib.parse
import json
# 请输入您要翻译的内容
content = input("请输入您要翻译的内容:")
# 目标url 发请求
# url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'  # 需要去掉其中的'_o',否则返回 {"errorCode":50}
url = 'https://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
headers = {
     
    'User-Agent': '省略'}
# 携带数据
data = {
     
    'i': content,
    'from': 'AUTO',
    'to': 'AUTO',
    'smartresult': 'dict',
    'client': 'fanyideskweb',
    'salt': '16100656169337',
    'sign': '94a4176fmdr3lc0rcb9e8r410ra3a158',
    'lts': '1619064616935',
    'bv': 'b77c859e9me719deegcc45c5s42bda',
    'doctype': 'json',
    'version': '2.1',
    'keyfrom': 'fanyi.web',
    'action': 'FY_BY_REALTlME',
}
data = urllib.parse.urlencode(data)
data = bytes(data, 'utf-8')
req = urllib.request.Request(url, data=data, headers=headers)
res = urllib.request.urlopen(req)
html = res.read().decode('utf-8')
# print(html)
'''{"type":"ZH_CN2EN","errorCode":0,"elapsedTime":0,"translateResult":[[{"src":"你好","tgt":"hello"}]]}'''  # 这是一个json类型的字符串。
# 解析数据
# json类型的str --> python类型的字典
r_dict = json.loads(html)
print(r_dict['translateResult'][0][0]['tgt'])
'''
请输入您要翻译的内容:你好
hello'''

5.练习1:输入指定内容在百度中搜索,并保存网页内容

import urllib.parse
import urllib.request
# url = "https://www.baidu.com/s?wd=%E7%BE%BD%E5%93%A5"
# 构造url
key = input("请输入要搜索的内容:")
wd = {
     'wd': key}
result = urllib.parse.urlencode(wd)  # 编码
url = 'https://www.baidu.com/s?' + result
# 创建请求对象
headers = {
     
    'User-Agent': '省略'}
req = urllib.request.Request(url, headers=headers)
# 获取响应对象
response = urllib.request.urlopen(req)
# 读取数据
html = response.read().decode('utf-8')
# 保存数据
with open('%s.html' % key, 'w', encoding='utf-8')as f:
    f.write(html)

6.练习2:输入指定内容在百度贴吧中搜索,并保存多个网页内容

# 百度贴吧练习
# 输入要爬取的贴吧主题
# 进行翻页爬取  起始页和终止页
# 保存数据

import urllib.parse
import urllib.request

# 1.分析网页:
'''
第一页:https://tieba.baidu.com/f?kw=%E5%92%8C%E5%B9%B3%E7%B2%BE%E8%8B%B1&ie=utf-8&pn=0
第二页:https://tieba.baidu.com/f?kw=%E5%92%8C%E5%B9%B3%E7%B2%BE%E8%8B%B1&ie=utf-8&pn=50
最后一页:https://tieba.baidu.com/f?kw=%E5%92%8C%E5%B9%B3%E7%B2%BE%E8%8B%B1&ie=utf-8&pn=506500
共10131页'''
# url = "https://tieba.baidu.com/f?"
# 2.构造url
name = input("请输入要搜索的贴吧名称:")
begin = int(input("请输入起始页:"))
end = int(input("请输入结束页:"))
kw = {
     'kw': name}
result = urllib.parse.urlencode(kw)
for i in range(begin, end + 1):
    pn = (i - 1) * 50
    url = 'https://tieba.baidu.com/f?' + result + '&pn=' + str(pn)  # 可省略&ie=utf-8
    # 3.创建请求对象
    headers = {
     
        'User-Agent': '省略'}
    req = urllib.request.Request(url, headers=headers,)
    # 4.获取响应对象
    res = urllib.request.urlopen(req, timeout=20)
    # 读取数据
    html = res.read().decode('utf-8')
    # 保存数据
    with open('第%d页.html' % i, 'w', encoding='utf-8')as f:
        print('正在爬取第%d页.html' % i)
        f.write(html)
'''
请输入要搜索的贴吧名称:和平精英
请输入起始页:1
请输入结束页:3
正在爬取第1页.html
正在爬取第2页.html
正在爬取第3页.html
'''

7.优化代码

# 练习:输入指定内容在百度贴吧中搜索,并保存多个网页内容
import urllib.parse
import urllib.request


class BaiduSpider:
    def __init__(self):
        self.headers = {
     
            'User-Agent': '省略'}
        self.base_url = 'https://tieba.baidu.com/f?'

    def readPage(self, url, headers):
        req = urllib.request.Request(url, headers=self.headers)
        res = urllib.request.urlopen(req, timeout=20)
        html = res.read().decode('utf-8')
        return html

    def writePage(self, filename, html):
        with open(filename, 'w', encoding='utf-8')as f:
            f.write(html)
            print('写入成功')

    def main(self):
        name = input("请输入要搜索的贴吧名称:")
        begin = int(input("请输入起始页:"))
        end = int(input("请输入结束页:"))
        kw = {
     'kw': name}
        result = urllib.parse.urlencode(kw)
        for i in range(begin, end + 1):
            pn = (i - 1) * 50
            url = self.base_url + result + '&pn=' + str(pn)
            html = self.readPage(url, headers=self.headers)
            filename = './file/第%d页.html' % i
            self.writePage(filename, html)


if __name__ == '__main__':
    spider = BaiduSpider()
    spider.main()
'''
请输入要搜索的贴吧名称:法拉利
请输入起始页:5
请输入结束页:8
写入成功
写入成功
写入成功
写入成功'''

二、requests模块

1.安装

  • pip install requests
  • 在开发工具中安装

2.requests常用方法

  • requests.get(网址)

示例1

import requests
r = requests.get('http://www.baidu.com/').text
print(r)  # 返回网页数据

示例2

import requests

'''
response = requests.get(url, headers=headers)
1.url是最基本的url 不包含参数的
2.params中的键值对为参数
response = requests.get(url, params=params, headers=headers)
'''
# 示例1
# https://tieba.baidu.com/f?kw=%E6%B5%B7%E8%B4%BC%E7%8E%8B
url = 'https://tieba.baidu.com/f?'
params = {
     'kw': '海贼王', 'pn': '250'}
headers = {
     
    'User-Agent': '省略'}
response1 = requests.get(url, params=params, headers=headers, verify=False)
# print(response1.text)  # 成功返回百度贴吧关于海贼王的第6页html数据

# 示例2
# https://tieba.baidu.com/f?kw=%E6%B5%B7%E8%B4%BC%E7%8E%8B
url = 'https://tieba.baidu.com/f?kw=海贼王&pn=250'
headers = {
     
    'User-Agent': '省略'}
response2 = requests.get(url, headers=headers, verify=False)
# print(response2.text)  # 成功返回百度贴吧关于海贼王的第6页html数据

# 示例3
url = 'https://qq.yh31.com/zjbq/2920180.html'
headers = {
     
    'User-Agent': '省略'}
response3 = requests.get(url, headers=headers, verify=False)
# print(response3.text)  # 显示喜羊羊QQ表æƒ</span>
,可爱的懒羊羊搞笑图片_第<span class="token number">1</span>页_表æƒ
å
š<span class="token operator"><</span><span class="token operator">/</span>title<span class="token operator">></span>

<span class="token comment"># print(response3.content.decode('utf-8'))  # 正常返回html数据</span>
<span class="token triple-quoted-string string">'''
response.content 它是直接从网站上抓取数据,没有做任何处理
response.text 它是requests模块将response.content编码之后所得到的数据
requests就会猜一个解码方式

如果出现乱码
第一种方式  
response.content.decode('utf-8')

第二种方式  
response3.encoding='utf-8'
print(response3.text)
'''</span>
</code></pre> 
  <h2>3.响应对象response的方法</h2> 
  <ul> 
   <li>response.text 返回unicode格式的数据(str)</li> 
   <li>response.content 返回字节流数据(二进制)</li> 
   <li>response.content.decode(‘utf-8’) 手动进行解码</li> 
   <li>response.url 返回url</li> 
   <li>response.encoding=‘utf-8’<br> print(response.text)</li> 
  </ul> 
  <p>示例</p> 
  <pre><code class="prism language-python"><span class="token keyword">import</span> requests

r <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token string">'http://www.baidu.com'</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">.</span>text<span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta 
...
...
href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> äº¬ICP证030173号  <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
'''</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">.</span>content<span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
b'<!DOCTYPE html>\r\n<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta content=always name=referrer><link rel=stylesheet type=text/css href=http://s1.bdstatic.com/r/www/cache/bdorz/baidu.min
...
...
src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>\r\n'
'''</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">.</span>content<span class="token punctuation">.</span>decode<span class="token punctuation">(</span><span class="token string">'utf-8'</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta 
...
...
Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a>  <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号  <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
'''</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">.</span>url<span class="token punctuation">)</span>  <span class="token comment"># http://www.baidu.com/</span>

r<span class="token punctuation">.</span>encoding <span class="token operator">=</span> <span class="token string">'utf-8'</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">.</span>text<span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
<!DOCTYPE html>
<!--STATUS OK--><html> <head><meta http-equiv=content-type content=text/html;charset=utf-8><meta http-equiv=X-UA-Compatible content=IE=Edge><meta 
...
...
Baidu <a href=http://www.baidu.com/duty/>使用百度前必读</a>  <a href=http://jianyi.baidu.com/ class=cp-feedback>意见反馈</a> 京ICP证030173号  <img src=//www.baidu.com/img/gs.gif> </p> </div> </div> </div> </body> </html>
'''</span>

</code></pre> 
  <h2>4.requests模块发送 POST请求</h2> 
  <p>示例1</p> 
  <pre><code class="prism language-python"><span class="token keyword">import</span> requests

r <span class="token operator">=</span> requests<span class="token punctuation">.</span>post<span class="token punctuation">(</span><span class="token string">'http://httpbin.org/post'</span><span class="token punctuation">,</span> data<span class="token operator">=</span><span class="token punctuation">{
     </span><span class="token string">'key'</span><span class="token punctuation">:</span> <span class="token string">'value'</span><span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">.</span>text
<span class="token keyword">print</span><span class="token punctuation">(</span>r<span class="token punctuation">)</span>  <span class="token comment"># 正常返回网页数据</span>
<span class="token triple-quoted-string string">'''
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "key": "value"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Content-Length": "9", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.25.1", 
    "X-Amzn-Trace-Id": "Root=1-6083f17f-02c7589e52e792b40c5960af"
  }, 
  "json": null, 
  "origin": "省略", 
  "url": "http://httpbin.org/post"
}'''</span>
</code></pre> 
  <p>示例2:简单的翻译小软件</p> 
  <pre><code class="prism language-python"><span class="token comment"># 简单的翻译小软件</span>
<span class="token keyword">import</span> requests
<span class="token keyword">import</span> json

<span class="token comment"># 请输入您要翻译的内容</span>
content <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">(</span><span class="token string">"请输入您要翻译的内容:"</span><span class="token punctuation">)</span>
<span class="token comment"># 目标url 发请求</span>
<span class="token comment"># url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'  # 需要去掉其中的'_o',否则返回 {"errorCode":50}</span>
url <span class="token operator">=</span> <span class="token string">'https://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'</span>
headers <span class="token operator">=</span> <span class="token punctuation">{
     </span>
    <span class="token string">'User-Agent'</span><span class="token punctuation">:</span> <span class="token string">'Mozilla/5.0'</span><span class="token punctuation">}</span>
<span class="token comment"># 携带数据</span>
data <span class="token operator">=</span> <span class="token punctuation">{
     </span>
    <span class="token string">'i'</span><span class="token punctuation">:</span> content<span class="token punctuation">,</span>
    <span class="token string">'from'</span><span class="token punctuation">:</span> <span class="token string">'AUTO'</span><span class="token punctuation">,</span>
    <span class="token string">'to'</span><span class="token punctuation">:</span> <span class="token string">'AUTO'</span><span class="token punctuation">,</span>
    <span class="token string">'smartresult'</span><span class="token punctuation">:</span> <span class="token string">'dict'</span><span class="token punctuation">,</span>
    <span class="token string">'client'</span><span class="token punctuation">:</span> <span class="token string">'fanyideskweb'</span><span class="token punctuation">,</span>
    <span class="token string">'salt'</span><span class="token punctuation">:</span> <span class="token string">'16190646169357'</span><span class="token punctuation">,</span>
    <span class="token string">'sign'</span><span class="token punctuation">:</span> <span class="token string">'94a417e26fdc3c0cb59e843108a3a158'</span><span class="token punctuation">,</span>
    <span class="token string">'lts'</span><span class="token punctuation">:</span> <span class="token string">'1619064616935'</span><span class="token punctuation">,</span>
    <span class="token string">'bv'</span><span class="token punctuation">:</span> <span class="token string">'b77c8593ce9e7129dee4cc45ac542b2a'</span><span class="token punctuation">,</span>
    <span class="token string">'doctype'</span><span class="token punctuation">:</span> <span class="token string">'json'</span><span class="token punctuation">,</span>
    <span class="token string">'version'</span><span class="token punctuation">:</span> <span class="token string">'2.1'</span><span class="token punctuation">,</span>
    <span class="token string">'keyfrom'</span><span class="token punctuation">:</span> <span class="token string">'fanyi.web'</span><span class="token punctuation">,</span>
    <span class="token string">'action'</span><span class="token punctuation">:</span> <span class="token string">'FY_BY_REALTlME'</span><span class="token punctuation">,</span>
<span class="token punctuation">}</span>
response <span class="token operator">=</span> requests<span class="token punctuation">.</span>post<span class="token punctuation">(</span>url<span class="token punctuation">,</span> data<span class="token operator">=</span>data<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">,</span> verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>
html <span class="token operator">=</span> response<span class="token punctuation">.</span>text
<span class="token comment"># print(html)</span>
<span class="token triple-quoted-string string">'''{"type":"ZH_CN2EN","errorCode":0,"elapsedTime":0,"translateResult":[[{"src":"你好","tgt":"hello"}]]}'''</span>  <span class="token comment"># 这是一个json类型的字符串。</span>
<span class="token comment"># 解析数据</span>
<span class="token comment"># json类型的str --> python类型的字典</span>
r_dict <span class="token operator">=</span> json<span class="token punctuation">.</span>loads<span class="token punctuation">(</span>html<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r_dict<span class="token punctuation">[</span><span class="token string">'translateResult'</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">'tgt'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
请输入您要翻译的内容:猫
The cat'''</span>
</code></pre> 
  <p>示例3:简单的翻译小软件,通过js逆向<br> 第一步:确定url:https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule<br> 第二步:查看请求里携带的参数,分析<br> <a href="http://img.e-com-net.com/image/info8/32ed29e806a94bc5b50b641b97676c34.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/32ed29e806a94bc5b50b641b97676c34.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第3张图片" width="650" height="213" style="border:1px solid black;"></a><br> <a href="http://img.e-com-net.com/image/info8/fa7f6a022db342c88a81ffe3bb47d985.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/fa7f6a022db342c88a81ffe3bb47d985.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第4张图片" width="650" height="328" style="border:1px solid black;"></a><br> <strong>i: 中国</strong><br> from: AUTO<br> to: AUTO<br> smartresult: dict<br> client: fanyideskweb<br> <strong>salt: 16195811191097</strong><br> <strong>sign: ffc58c4904e3b84538f1a324b00a141d</strong><br> <strong>lts: 1619581119109</strong><br> bv: b77c8593ce9e7129dee4cc45ac542b2a<br> doctype: json<br> version: 2.1<br> keyfrom: fanyi.web<br> action: FY_BY_REALTlME</p> 
  <p>重点解决<br> <strong>salt: 16195811191097</strong><br> <strong>sign: ffc58c4904e3b84538f1a324b00a141d</strong><br> <strong>lts: 1619581119109</strong><br> 这三个参数的问题<br> <a href="http://img.e-com-net.com/image/info8/8525edd1f8fb45d4a8d0a593aa7f63cf.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/8525edd1f8fb45d4a8d0a593aa7f63cf.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第5张图片" width="650" height="127" style="border:1px solid black;"></a><br> 双击Initiator里面的 fanyi.min.js:1文件,点击{},查看json文件。<br> Ctrl+F 查找‘salt’,<br> <a href="http://img.e-com-net.com/image/info8/33e3134d848443b4ac5c30fdfdca121a.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/33e3134d848443b4ac5c30fdfdca121a.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第6张图片" width="650" height="253" style="border:1px solid black;"></a><br> 首先:r = “” + (new Date).getTime()<br> 复制(new Date).getTime()到Console里查看它,发现它是一个13位数字的时间戳<br> <a href="http://img.e-com-net.com/image/info8/0f88ac393dec49c5bc4458ba9c4cb337.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/0f88ac393dec49c5bc4458ba9c4cb337.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第7张图片" width="321" height="411" style="border:1px solid black;"></a><br> 写入模仿时间戳的程序:<br> import time<br> r = str(int(time.time()*1000))</p> 
  <p>其次:i = r + parseInt(10 * Math.random(), 10),parseInt(10 * Math.random(), 10)为0到9的随机值。</p> 
  <p><a href="http://img.e-com-net.com/image/info8/d201ea77e0fe4e86b0a0ea7b10480ca0.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/d201ea77e0fe4e86b0a0ea7b10480ca0.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第8张图片" width="440" height="427" style="border:1px solid black;"></a><br> 模拟生成i:<br> import random<br> i = random.randint(0, 9)<br> i = r + str(i)</p> 
  <p>最后:sign: n.md5(“fanyideskweb” + e + i + “Tbh5E8=q6U3EXe+&L[4c@”),它是md5加密,先找到e,设置断点查看e是啥?原来就是输入的内容<br> <a href="http://img.e-com-net.com/image/info8/069813da172b426bb742deb8455c6d0a.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/069813da172b426bb742deb8455c6d0a.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第9张图片" width="650" height="249" style="border:1px solid black;"></a><br> 模拟生成sign<br> import hashlib<br> def data_new(e):<br> str_sign = “fanyideskweb” + e + i + “Tbh5E8=q6U3EXe+&L[4c@”<br> md5 = hashlib.md5()<br> md5.update(str_sign.encode())<br> sign = md5.hexdigest()<br> # print(sign) # e8b710fe24c560f01dbb1f724899bdfd<br> data = { <br> ‘i’: e,<br> ‘from’: ‘AUTO’,<br> ‘to’: ‘AUTO’,<br> ‘smartresult’: ‘dict’,<br> ‘client’: ‘fanyideskweb’,<br> ‘salt’: i,<br> ‘sign’: sign,<br> ‘lts’: r,<br> ‘bv’: ‘b77c8593ce9e7129dee4cc45ac542b2a’,<br> ‘doctype’: ‘json’,<br> ‘version’: ‘2.1’,<br> ‘keyfrom’: ‘fanyi.web’,<br> ‘action’: ‘FY_BY_REALTlME’,<br> }<br> return data</p> 
  <p>data = data_new(e)</p> 
  <pre><code class="prism language-python"><span class="token comment"># 简单的翻译小软件 不去掉'_o',进行js逆向</span>
<span class="token comment"># 分析:</span>
<span class="token comment">#     'salt': '16190646169357',</span>
<span class="token comment">#     'sign': '94a417e26fdc3c0cb59e843108a3a158',</span>
<span class="token comment">#     'lts': '1619064616935',</span>
<span class="token keyword">import</span> random
<span class="token keyword">import</span> time
<span class="token keyword">import</span> requests
<span class="token keyword">import</span> json
<span class="token keyword">import</span> hashlib
<span class="token keyword">from</span> requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>exceptions <span class="token keyword">import</span> InsecureRequestWarning
requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>disable_warnings<span class="token punctuation">(</span>InsecureRequestWarning<span class="token punctuation">)</span>

e <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">(</span><span class="token string">"请输入您要翻译的内容:"</span><span class="token punctuation">)</span>
r <span class="token operator">=</span> <span class="token builtin">str</span><span class="token punctuation">(</span><span class="token builtin">int</span><span class="token punctuation">(</span>time<span class="token punctuation">.</span>time<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token operator">*</span><span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
i <span class="token operator">=</span> random<span class="token punctuation">.</span>randint<span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">,</span> <span class="token number">9</span><span class="token punctuation">)</span>
i <span class="token operator">=</span> r <span class="token operator">+</span> <span class="token builtin">str</span><span class="token punctuation">(</span>i<span class="token punctuation">)</span>
<span class="token comment"># print(r,i)</span>
<span class="token comment"># 目标url</span>
url <span class="token operator">=</span> <span class="token string">'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule'</span>
headers <span class="token operator">=</span> <span class="token punctuation">{
     </span>
    <span class="token string">'User-Agent'</span><span class="token punctuation">:</span> <span class="token string">'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36'</span><span class="token punctuation">,</span>
    <span class="token string">'Referer'</span><span class="token punctuation">:</span> <span class="token string">'https://fanyi.youdao.com/'</span><span class="token punctuation">,</span>
    <span class="token string">'Cookie'</span><span class="token punctuation">:</span> <span class="token string">'你的cookie'</span>
    <span class="token punctuation">}</span>
<span class="token comment"># 携带数据</span>
<span class="token keyword">def</span> <span class="token function">data_new</span><span class="token punctuation">(</span>e<span class="token punctuation">)</span><span class="token punctuation">:</span>
    str_sign <span class="token operator">=</span> <span class="token string">"fanyideskweb"</span> <span class="token operator">+</span> e <span class="token operator">+</span> i <span class="token operator">+</span> <span class="token string">"Tbh5E8=q6U3EXe+&L[4c@"</span>
    md5 <span class="token operator">=</span> hashlib<span class="token punctuation">.</span>md5<span class="token punctuation">(</span><span class="token punctuation">)</span>
    md5<span class="token punctuation">.</span>update<span class="token punctuation">(</span>str_sign<span class="token punctuation">.</span>encode<span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span>
    sign <span class="token operator">=</span> md5<span class="token punctuation">.</span>hexdigest<span class="token punctuation">(</span><span class="token punctuation">)</span>
    <span class="token comment"># print(sign)  # e8b710fe24c560f01dbb1f724899bdfd</span>
    data <span class="token operator">=</span> <span class="token punctuation">{
     </span>
        <span class="token string">'i'</span><span class="token punctuation">:</span> e<span class="token punctuation">,</span>
        <span class="token string">'from'</span><span class="token punctuation">:</span> <span class="token string">'AUTO'</span><span class="token punctuation">,</span>
        <span class="token string">'to'</span><span class="token punctuation">:</span> <span class="token string">'AUTO'</span><span class="token punctuation">,</span>
        <span class="token string">'smartresult'</span><span class="token punctuation">:</span> <span class="token string">'dict'</span><span class="token punctuation">,</span>
        <span class="token string">'client'</span><span class="token punctuation">:</span> <span class="token string">'fanyideskweb'</span><span class="token punctuation">,</span>
        <span class="token string">'salt'</span><span class="token punctuation">:</span> i<span class="token punctuation">,</span>
        <span class="token string">'sign'</span><span class="token punctuation">:</span> sign<span class="token punctuation">,</span>
        <span class="token string">'lts'</span><span class="token punctuation">:</span> r<span class="token punctuation">,</span>
        <span class="token string">'bv'</span><span class="token punctuation">:</span> <span class="token string">'b77c8593ce9e7129dee4cc45ac542b2a'</span><span class="token punctuation">,</span>
        <span class="token string">'doctype'</span><span class="token punctuation">:</span> <span class="token string">'json'</span><span class="token punctuation">,</span>
        <span class="token string">'version'</span><span class="token punctuation">:</span> <span class="token string">'2.1'</span><span class="token punctuation">,</span>
        <span class="token string">'keyfrom'</span><span class="token punctuation">:</span> <span class="token string">'fanyi.web'</span><span class="token punctuation">,</span>
        <span class="token string">'action'</span><span class="token punctuation">:</span> <span class="token string">'FY_BY_REALTlME'</span><span class="token punctuation">,</span>
    <span class="token punctuation">}</span>
    <span class="token keyword">return</span> data

data <span class="token operator">=</span> data_new<span class="token punctuation">(</span>e<span class="token punctuation">)</span>
response <span class="token operator">=</span> requests<span class="token punctuation">.</span>post<span class="token punctuation">(</span>url<span class="token punctuation">,</span> data<span class="token operator">=</span>data<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">,</span> verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>
html <span class="token operator">=</span> response<span class="token punctuation">.</span>text
<span class="token keyword">print</span><span class="token punctuation">(</span>html<span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''{"translateResult":[[{"tgt":"The fox","src":"狐狸"}]],"errorCode":0,"type":"zh-CHS2en","smartResult":{"entries":["","[脊椎] fox\r\n"],"type":1}}{"type":"ZH_CN2EN","errorCode":0,"elapsedTime":0,"translateResult":[[{"src":"你好","tgt":"hello"}]]}'''</span>  <span class="token comment"># 这是一个json类型的字符串。</span>
<span class="token comment"># 解析数据</span>
<span class="token comment"># json类型的str --> python类型的字典</span>
r_dict <span class="token operator">=</span> json<span class="token punctuation">.</span>loads<span class="token punctuation">(</span>html<span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>r_dict<span class="token punctuation">[</span><span class="token string">'translateResult'</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">'tgt'</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
请输入您要翻译的内容:狐狸
The fox'''</span>

</code></pre> 
  <h2>5.requests设置代理</h2> 
  <pre><code class="prism language-python"><span class="token comment"># 代理ip</span>
<span class="token comment"># 爬虫去爬取别的网站数据的时候,如果短时间内爬取的频次过高或者一些其他的原因,被对方识别出是爬虫后</span>
<span class="token comment"># 需要换个ip  就需要通过代理ip来解决  应对反爬策略</span>
<span class="token comment"># 作用 1.隐藏真实的ip  2.应对反爬的策略</span>
<span class="token comment"># 代理ip的匿名度  1.透明:服务器知道你使用了代理ip,也知道你的真实ip  2.匿名:知道你使用了代理ip,不知道你的真实ip</span>
<span class="token comment">#               3.高匿  不知道你使用了代理ip,也不知道你的真实ip</span>
<span class="token comment"># 使用豌豆ip代理:1.注册 2.设置白名单(加入自己外网的ip) 3.点击工具--提取api</span>
<span class="token keyword">import</span> requests

url <span class="token operator">=</span> <span class="token string">'http://httpbin.org/ip'</span>
<span class="token comment"># 设置代理</span>
ips <span class="token operator">=</span> <span class="token punctuation">[</span>
<span class="token string">'223.240.245.57:23564'</span><span class="token punctuation">,</span>
<span class="token string">'223.241.51.205:3617'</span><span class="token punctuation">,</span>
<span class="token string">'114.232.64.153:36410'</span><span class="token punctuation">,</span>
<span class="token string">'183.141.100.99:3617'</span><span class="token punctuation">,</span>
<span class="token string">'61.191.85.17:36410'</span><span class="token punctuation">,</span>
<span class="token string">'114.98.148.7:36410'</span><span class="token punctuation">,</span>
<span class="token string">'60.174.189.138:766'</span><span class="token punctuation">,</span>
<span class="token string">'117.57.21.134:3617'</span><span class="token punctuation">,</span>
<span class="token string">'117.70.39.253:5412'</span><span class="token punctuation">,</span>
<span class="token string">'114.227.163.5:766'</span><span class="token punctuation">,</span>
<span class="token string">'125.123.120.238:36410'</span><span class="token punctuation">,</span>
<span class="token string">'114.100.1.181:3617'</span><span class="token punctuation">,</span>
<span class="token string">'42.59.102.21:23564'</span><span class="token punctuation">,</span>
<span class="token string">'183.92.238.218:36410'</span><span class="token punctuation">,</span>
<span class="token string">'121.233.207.1:5412'</span><span class="token punctuation">,</span>
<span class="token string">'223.240.247.104:3617'</span><span class="token punctuation">,</span>
<span class="token string">'60.174.188.26:36410'</span><span class="token punctuation">,</span>
<span class="token string">'182.87.241.109:766'</span><span class="token punctuation">,</span>
<span class="token string">'114.227.11.169:766'</span><span class="token punctuation">,</span>
<span class="token string">'218.91.0.33:894'</span><span class="token punctuation">,</span>
<span class="token punctuation">]</span>
available_list <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token punctuation">]</span>
<span class="token keyword">for</span> i <span class="token keyword">in</span> <span class="token builtin">range</span><span class="token punctuation">(</span><span class="token number">20</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    ip <span class="token operator">=</span> ips<span class="token punctuation">[</span>i<span class="token punctuation">]</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>ip<span class="token punctuation">)</span>
    <span class="token keyword">try</span><span class="token punctuation">:</span>
        response <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> proxies<span class="token operator">=</span><span class="token punctuation">{
     </span><span class="token string">'http'</span><span class="token punctuation">:</span> ip<span class="token punctuation">}</span><span class="token punctuation">,</span> timeout<span class="token operator">=</span><span class="token number">0.5</span><span class="token punctuation">)</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span>response<span class="token punctuation">.</span>text<span class="token punctuation">)</span>
        available_list<span class="token punctuation">.</span>append<span class="token punctuation">(</span>ip<span class="token punctuation">)</span>
    <span class="token keyword">except</span> Exception <span class="token keyword">as</span> e<span class="token punctuation">:</span>
        <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"出现异常"</span><span class="token punctuation">)</span>
<span class="token keyword">print</span><span class="token punctuation">(</span>available_list<span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
['114.232.64.153:36410']'''</span>
</code></pre> 
  <h2>6.处理不信任的SSL证书</h2> 
  <p>什么是SSL证书?</p> 
  <ul> 
   <li>SSL证书是数字证书的一种,类似于驾驶证,护照和营业执照的电子副本因为配置在服务器上,也称为SSL服务器证书.SSL证书就是遵守SSL协议,由受信任的数字证书颁发机构CA,在验证服务器身份后提交,具有服务器身份验证和数据传输加密功能</li> 
  </ul> 
  <p>测试网站https://inv-veri.chinatax.gov.cn/</p> 
  <p>示例</p> 
  <pre><code class="prism language-python"><span class="token keyword">import</span> requests
<span class="token comment"># response = requests.get('https://inv-veri.chinatax.gov.cn/').text</span>
<span class="token comment"># print(response)</span>
<span class="token triple-quoted-string string">'''
requests.exceptions.SSLError: HTTPSConnectionPool(host='inv-veri.chinatax.gov.cn', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1091)')))'''</span>
response <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span><span class="token string">'https://inv-veri.chinatax.gov.cn/'</span><span class="token punctuation">,</span> verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span><span class="token punctuation">.</span>text
<span class="token keyword">print</span><span class="token punctuation">(</span>response<span class="token punctuation">)</span>  <span class="token comment"># 正常返回网页数据</span>

</code></pre> 
  <h2>7.cookie</h2> 
  <p>cookie:通过在客户端记录的信息确定用户身份HTTP是一种无连接协议,客户端和服务器交互仅连接请求/响应过程,结束后重新连接,下一次请求时,服务器会认为是一个新的客户端,为了维护他们之间的连接,让服务器知道这是前一个用户发起的请求,必须在一个地方保存客户端信息。</p> 
  <p>作用:<br> 1.模拟登录<br> 模拟登录知乎<br> 目标url: ‘https://www.zhihu.com/hot’<br> 发起请求,获取响应</p> 
  <p>示例1</p> 
  <pre><code class="prism language-python"><span class="token comment"># 模拟登录知乎</span>
<span class="token comment"># 目标 url=https://www.zhihu.com/hot</span>
<span class="token comment"># 发起请求,获取响应</span>
<span class="token keyword">import</span> requests

url <span class="token operator">=</span> <span class="token string">'https://www.zhihu.com/hot'</span>
headers <span class="token operator">=</span> <span class="token punctuation">{
     </span>
    <span class="token string">'Cookie'</span><span class="token punctuation">:</span> <span class="token string">'省略'</span><span class="token punctuation">,</span>
    <span class="token string">'User-Agent'</span><span class="token punctuation">:</span> <span class="token string">'省略'</span>
<span class="token punctuation">}</span>
response <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">)</span><span class="token punctuation">.</span>text
<span class="token keyword">print</span><span class="token punctuation">(</span>response<span class="token punctuation">)</span>  <span class="token comment"># 因为没有登录,所有无法显示登录之后的页面,添加'Cookie',可以返回正常数据</span>
</code></pre> 
  <p>2.反反爬机制<br> 12306官网<br> 查票 杭州-上海 5号 -->查询<br> 第一个问题:为什么页面中有数据而在网页的源码中没有呢?<br> 总结:在网页中有数据,而在源代码中没有数据,是不是服务器传输了多次数据,导致我们在网页源代码中没有找到<br> 第二个问题:G9314关键字如何找出来呢?<br> 网页整体没有发生变化,但是局部发生了变化,ajax<br> 解决方法:<br> 1.分析它真正的数据接口query<br> 2.通过selenium<br> <a href="http://img.e-com-net.com/image/info8/284c6e5ca9e64e2aafcd2326ce2c048b.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/284c6e5ca9e64e2aafcd2326ce2c048b.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第10张图片" width="650" height="366" style="border:1px solid black;"></a></p> 
  <p>示例2</p> 
  <pre><code class="prism language-python"><span class="token keyword">import</span> re
<span class="token keyword">import</span> requests
<span class="token keyword">from</span> requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>exceptions <span class="token keyword">import</span> InsecureRequestWarning


<span class="token keyword">class</span> <span class="token class-name">Ticket12306</span><span class="token punctuation">:</span>

    <span class="token keyword">def</span> <span class="token function">__init__</span><span class="token punctuation">(</span>self<span class="token punctuation">)</span><span class="token punctuation">:</span>
        self<span class="token punctuation">.</span>headers <span class="token operator">=</span> <span class="token punctuation">{
     </span>
            <span class="token string">'User-Agent'</span><span class="token punctuation">:</span> <span class="token string">'Mozilla/5.0 '</span>
        <span class="token punctuation">}</span>

    <span class="token keyword">def</span> <span class="token function">requests_url</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> cookie<span class="token punctuation">,</span> url<span class="token punctuation">)</span><span class="token punctuation">:</span>
        cookie<span class="token punctuation">.</span>update<span class="token punctuation">(</span>self<span class="token punctuation">.</span>headers<span class="token punctuation">)</span>
        response <span class="token operator">=</span> requests<span class="token punctuation">.</span>get<span class="token punctuation">(</span>url<span class="token punctuation">,</span> headers<span class="token operator">=</span>cookie<span class="token punctuation">,</span> verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>
        response<span class="token punctuation">.</span>encoding <span class="token operator">=</span> <span class="token string">'utf-8'</span>
        <span class="token keyword">return</span> response

    <span class="token keyword">def</span> <span class="token function">get_station</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> cookie<span class="token punctuation">,</span> url<span class="token punctuation">)</span><span class="token punctuation">:</span>
        response <span class="token operator">=</span> self<span class="token punctuation">.</span>requests_url<span class="token punctuation">(</span>cookie<span class="token punctuation">,</span> url<span class="token punctuation">)</span>
        station <span class="token operator">=</span> re<span class="token punctuation">.</span>findall<span class="token punctuation">(</span>r<span class="token string">'([\u4e00-\u9fa5]+)\|([A-Z]+)'</span><span class="token punctuation">,</span> response<span class="token punctuation">.</span>text<span class="token punctuation">)</span>    <span class="token comment"># \u4e00-\u9fa5代表所有的中文字符,也就是找到一个中文和与之对应的英文字符</span>
        <span class="token comment"># 将列表转成字典</span>
        station_data <span class="token operator">=</span> <span class="token builtin">dict</span><span class="token punctuation">(</span>station<span class="token punctuation">)</span>
        <span class="token comment"># 将键和对应的值互换</span>
        station_names <span class="token operator">=</span> <span class="token punctuation">{
     </span><span class="token punctuation">}</span>  <span class="token comment"># 空字典,用于将key和value进行交换</span>
        <span class="token keyword">for</span> item <span class="token keyword">in</span> station_data<span class="token punctuation">:</span>
            station_names<span class="token punctuation">[</span>station_data<span class="token punctuation">[</span>item<span class="token punctuation">]</span><span class="token punctuation">]</span> <span class="token operator">=</span> item
        <span class="token keyword">return</span> station_names

    <span class="token keyword">def</span> <span class="token function">main</span><span class="token punctuation">(</span>self<span class="token punctuation">,</span> cookie_1<span class="token punctuation">,</span> url_1<span class="token punctuation">,</span> cookie_2<span class="token punctuation">,</span> url_2<span class="token punctuation">)</span><span class="token punctuation">:</span>
        response <span class="token operator">=</span> self<span class="token punctuation">.</span>requests_url<span class="token punctuation">(</span>cookie_1<span class="token punctuation">,</span>url_1<span class="token punctuation">,</span><span class="token punctuation">)</span>
        json_tickets <span class="token operator">=</span> response<span class="token punctuation">.</span>json<span class="token punctuation">(</span><span class="token punctuation">)</span>
        data_list <span class="token operator">=</span> json_tickets<span class="token punctuation">[</span><span class="token string">'data'</span><span class="token punctuation">]</span><span class="token punctuation">[</span><span class="token string">'result'</span><span class="token punctuation">]</span>
        station_names <span class="token operator">=</span> self<span class="token punctuation">.</span>get_station<span class="token punctuation">(</span>url<span class="token operator">=</span>url_2<span class="token punctuation">,</span> cookie<span class="token operator">=</span>cookie_2<span class="token punctuation">)</span>
        <span class="token keyword">for</span> item <span class="token keyword">in</span> data_list<span class="token punctuation">:</span>
            data <span class="token operator">=</span> item<span class="token punctuation">.</span>split<span class="token punctuation">(</span><span class="token string">'|'</span><span class="token punctuation">)</span>
            l <span class="token operator">=</span> <span class="token builtin">list</span><span class="token punctuation">(</span>data<span class="token punctuation">[</span><span class="token number">13</span><span class="token punctuation">]</span><span class="token punctuation">)</span>
            l<span class="token punctuation">.</span>insert<span class="token punctuation">(</span><span class="token number">4</span><span class="token punctuation">,</span> <span class="token string">"-"</span><span class="token punctuation">)</span>
            l<span class="token punctuation">.</span>insert<span class="token punctuation">(</span><span class="token number">7</span><span class="token punctuation">,</span> <span class="token string">"-"</span><span class="token punctuation">)</span>
            data<span class="token punctuation">[</span><span class="token number">13</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token string">''</span><span class="token punctuation">.</span>join<span class="token punctuation">(</span>l<span class="token punctuation">)</span>
            <span class="token keyword">print</span><span class="token punctuation">(</span><span class="token string">"车次:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">3</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"出发站:"</span> <span class="token operator">+</span> station_names<span class="token punctuation">[</span>data<span class="token punctuation">[</span><span class="token number">6</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"到达站:"</span> <span class="token operator">+</span> station_names<span class="token punctuation">[</span>data<span class="token punctuation">[</span><span class="token number">7</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"出发时间:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">8</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"到达时间:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">9</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"历时:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">10</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"是否可预订:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">11</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"始发站:"</span> <span class="token operator">+</span> station_names<span class="token punctuation">[</span>data<span class="token punctuation">[</span><span class="token number">4</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"终点站:"</span> <span class="token operator">+</span> station_names<span class="token punctuation">[</span>data<span class="token punctuation">[</span><span class="token number">5</span><span class="token punctuation">]</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"出行时间:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">13</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"商务特等座:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">32</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"一等座:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">31</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"二等座/二等包座:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">30</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"高级软卧"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">21</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"软卧/一等卧:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">23</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"动卧:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">33</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"硬卧/二等卧:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">28</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"软座"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">24</span><span class="token punctuation">]</span><span class="token punctuation">,</span>
                  <span class="token string">"硬座:"</span> <span class="token operator">+</span> data<span class="token punctuation">[</span><span class="token number">29</span><span class="token punctuation">]</span><span class="token punctuation">)</span>


<span class="token keyword">if</span> __name__ <span class="token operator">==</span> <span class="token string">'__main__'</span><span class="token punctuation">:</span>
    requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>disable_warnings<span class="token punctuation">(</span>InsecureRequestWarning<span class="token punctuation">)</span>
    get_ticket_12306 <span class="token operator">=</span> Ticket12306<span class="token punctuation">(</span><span class="token punctuation">)</span>
    cookie_1 <span class="token operator">=</span> <span class="token punctuation">{
     </span>
        <span class="token string">'Cookie'</span><span class="token punctuation">:</span> <span class="token string">'省略'</span><span class="token punctuation">,</span> <span class="token punctuation">}</span>
    url_1 <span class="token operator">=</span> <span class="token string">'https://kyfw.12306.cn/otn/leftTicket/query?leftTicketDTO.train_date=2021-05-05&leftTicketDTO.from_station=HZH&leftTicketDTO.to_station=SHH&purpose_codes=ADULT'</span>
    cookie_2 <span class="token operator">=</span> <span class="token punctuation">{
     </span>
        <span class="token string">'Cookie'</span><span class="token punctuation">:</span> <span class="token string">'省略'</span><span class="token punctuation">}</span>
    url_2 <span class="token operator">=</span> <span class="token string">'https://kyfw.12306.cn/otn/resources/js/framework/station_name.js?station_version=1.9188'</span>
    get_ticket_12306<span class="token punctuation">.</span>main<span class="token punctuation">(</span>cookie_1<span class="token punctuation">,</span> url_1<span class="token punctuation">,</span> cookie_2<span class="token punctuation">,</span> url_2<span class="token punctuation">)</span>

</code></pre> 
  <p><a href="http://img.e-com-net.com/image/info8/a54a4d5c454b4bcd9755e16c61217a02.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/a54a4d5c454b4bcd9755e16c61217a02.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第11张图片" width="650" height="282" style="border:1px solid black;"></a><br> 总结:<br> 发现每个数据以‘|’分隔的这个时候,我们需要知道个别数据的位置<br> 以‘|’进行分隔,通过列表的下标索引值就可以知道个别数据的位置,就可以做后期的逻辑编写</p> 
  <h2>8.会话</h2> 
  <p>session:通过在服务端记录的信息确定用户身份,此处这个session指的是会话</p> 
  <p>案例:突破12306图片验证<br> 网址:https://kyfw.12306.cn/otn/resources/login.html<br> 1.账号正确,密码错误,验证码错误<br> 2.账号正确,密码错误,验证码正确<br> 3.账号正确,密码正确,验证码正确 ok</p> 
  <ul> 
   <li> <p>第1种情况:<br> 查看验证码错误下加载的js文件<br> <a href="http://img.e-com-net.com/image/info8/ea7baa1feb6a4abda1515db669a4bb35.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/ea7baa1feb6a4abda1515db669a4bb35.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第12张图片" width="650" height="270" style="border:1px solid black;"></a><br> <a href="http://img.e-com-net.com/image/info8/8f0e9d2383f54364b8a08020f07978b5.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/8f0e9d2383f54364b8a08020f07978b5.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第13张图片" width="650" height="166" style="border:1px solid black;"></a><br> 返回的结果为:/**/jQuery191018415675635795536_1619398855906({result_message: “验证码校验失败”, result_code: “5”});</p> </li> 
   <li> <p>第2种情况:<br> Request URL: https://kyfw.12306.cn/passport/captcha/captcha-check?callback=jQuery19109716061695448353_1619405746616&answer=197%2C45&rand=sjrand&login_site=E&_=1619405746618<br> 携带的参数:<br> callback: jQuery19109716061695448353_1619405746616<br> answer: 197,45<br> rand: sjrand<br> login_site: E<br> _: 1619405746618<br> <a href="http://img.e-com-net.com/image/info8/40b0ce5cf41c431483956a3611f11e0f.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/40b0ce5cf41c431483956a3611f11e0f.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第14张图片" width="650" height="233" style="border:1px solid black;"></a><br> 返回的结果为:/**/jQuery19109716061695448353_1619405746616({“result_message”:“验证码校验成功”,“result_code”:“4”});</p> </li> 
   <li> <p>同时加载了XHR的login文件:<br> Request URL: https://kyfw.12306.cn/passport/web/login<br> 携带参数:<br> sessionId: 01d2TIqaddEzxCU28_GKB5Vcx6pP744fcOUyRfChk3c8ipKNCQPUrw6EEG98nN5ql6XKac_gGEGflLST6xpxnGguWMsIsoEuz0kKp9vrymPCFPwwIWh5-mCKTHbuJ6JjJm9GOGs3FoKnRG4ekQumkiHipl-wh1fnhhp61Bca3DA7Eovt_bdEryA7r-P1XrPVhVRegW3nON-AG5VHfGAR7ESg<br> sig: 05XqrtZ0EaFgmmqIQes-s-CJXzPZeUryxboUG9ElN6m-Gluzj13p46YPFqGVUE13mwXLW9LePExNtTkfJbYwQx-SiDQkK0HgJuFMYzM4p78PFxKeRNvi0NcYUY_IvyYkChfVWcqh3BWyF92Tiszkl7vqhX7-KltDfOK_bDcSEC2-Bm7srz5Pm38t5tc6pY-tmg-CO_6Z8xNxewxRapD0iP30diKryST_sDaSZDYJNFYHFaUJU2g-Dpi_XenL-nsYWqCD7RBriG6_I3-IMPUHLq6d5yFpBFfH7act7AMeQErOAkktFlZ9147ZpgWCtCYmyosyaBjFn8j4_HQW9ZQlh_Agxq8w7fEASqbOQNfLm2HUM1Z6zD-wn314_uKIkFv2QiTQSNCXnM8LKGpZ9NRO_5J3FdUaNyYgPBu0uZ1chQAtaDXVkPG-z0HdogKCoeBSAyBEdv5Sx7EdbjOaTUSbuyiuhheYynx6CpZ6ZE0aItv3A<br> if_check_slide_passcode_token: FFFF0N000000000085DE:1619405993688:0.096504509317295041<br> scene: nc_login<br> tk:<br> username: 18582868483<br> password: @grRrViQiBQgpTr59DNzcVw==<br> appid: otn<br> <a href="http://img.e-com-net.com/image/info8/8c956c8c5133451e9ca079385345acd4.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/8c956c8c5133451e9ca079385345acd4.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第15张图片" width="650" height="239" style="border:1px solid black;"></a></p> </li> 
  </ul> 
  <p>逻辑:首先要验证码正确,才能向网页提交用户、密码请求。<br> 1.明确目标url:<strong>https://kyfw.12306.cn/passport/captcha/captcha-check</strong></p> 
  <p>2.发送post请求,并携带数据:<br> callback: jQuery19109716061695448353_1619405746616<br> answer: 197,45<br> rand: sjrand<br> login_site: E<br> _: 1619405746618</p> 
  <p>3.获取12306图片验证码<br> 方法一:<br> 在网页中点击鼠标右键,复制图片地址为:<br> img_url=‘’</p> 
  <pre><code class="prism language-python"><span class="token comment"># base64伪加密:根本不算是一种加密算法,只不过它的数据看上去更像密文而已</span>
<span class="token comment"># 64个字符来表示任意的二进制数据的方法</span>
<span class="token comment"># 使用A-Z a-z 0-9 + / 这64个字符进行加密</span>
<span class="token keyword">import</span> base64

img <span class="token operator">=</span> <span class="token string">''</span>
img_data <span class="token operator">=</span> base64<span class="token punctuation">.</span>b64decode<span class="token punctuation">(</span>img<span class="token punctuation">)</span>  <span class="token comment"># 返回的是二进制数据</span>
<span class="token keyword">print</span><span class="token punctuation">(</span><span class="token builtin">type</span><span class="token punctuation">(</span>img_data<span class="token punctuation">)</span><span class="token punctuation">)</span>  <span class="token comment"># <class 'bytes'></span>
fn <span class="token operator">=</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token string">'code.png'</span><span class="token punctuation">,</span> <span class="token string">'wb'</span><span class="token punctuation">)</span>
fn<span class="token punctuation">.</span>write<span class="token punctuation">(</span>img_data<span class="token punctuation">)</span>
fn<span class="token punctuation">.</span>close<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
我们打开了一个有base64加密的图片数据
binascii.Error: Incorrect padding填充不正确
去掉头部的data:image/jpg;base64,
'''</span>
</code></pre> 
  <p><a href="http://img.e-com-net.com/image/info8/45c545a7314542fcabb3f1388e007990.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/45c545a7314542fcabb3f1388e007990.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第16张图片" width="650" height="282" style="border:1px solid black;"></a><br> 方法二:<br> <a href="http://img.e-com-net.com/image/info8/4cb0b0b69f4e4472811a28c6fb29f436.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/4cb0b0b69f4e4472811a28c6fb29f436.jpg" alt="在这里插入图片描述" width="650" height="69"></a><br> 第一步:获取验证码图片的请求地址Request URL: https://kyfw.12306.cn/passport/captcha/captcha-image64?login_site=E&module=login&rand=sjrand&1619414089185&callback=jQuery19109716061695448353_1619405746616&_=1619405746621<br> 第二步:浏览器打开查看数据: https://kyfw.12306.cn/passport/captcha/captcha-image64?login_site=E&module=login&rand=sjrand<br> <a href="http://img.e-com-net.com/image/info8/e8a8c83954b244e495605d25131e8d17.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/e8a8c83954b244e495605d25131e8d17.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第17张图片" width="650" height="365" style="border:1px solid black;"></a><br> 第三步:去掉浏览器地址里的64<br> <a href="http://img.e-com-net.com/image/info8/027479a0beb245c491c6ccf3a7f6d057.jpg" target="_blank"><img src="http://img.e-com-net.com/image/info8/027479a0beb245c491c6ccf3a7f6d057.jpg" alt="爬虫 第二讲 urllib模块和requests模块_第18张图片" width="650" height="365" style="border:1px solid black;"></a></p> 
  <p>总结:https://kyfw.12306.cn/passport/captcha/captcha-image?login_site=E&module=login&rand=sjrand 请求图片不使用64伪加密</p> 
  <p>4.点击正确的图片</p> 
  <pre><code class="prism language-python"><span class="token comment"># 突破12306图片验证码</span>
<span class="token keyword">import</span> requests
<span class="token keyword">from</span> requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>exceptions <span class="token keyword">import</span> InsecureRequestWarning
requests<span class="token punctuation">.</span>packages<span class="token punctuation">.</span>urllib3<span class="token punctuation">.</span>disable_warnings<span class="token punctuation">(</span>InsecureRequestWarning<span class="token punctuation">)</span>

req <span class="token operator">=</span> requests<span class="token punctuation">.</span>session<span class="token punctuation">(</span><span class="token punctuation">)</span>  <span class="token comment"># 保持会话</span>
headers <span class="token operator">=</span> <span class="token punctuation">{
     </span>
    <span class="token string">'User-Agent'</span><span class="token punctuation">:</span> <span class="token string">'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36'</span><span class="token punctuation">}</span>


<span class="token keyword">def</span> <span class="token function">get_img</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token comment"># 获取验证码图片</span>
    pic_response <span class="token operator">=</span> req<span class="token punctuation">.</span>get<span class="token punctuation">(</span>
        <span class="token string">'https://kyfw.12306.cn/passport/captcha/captcha-image?login_site=E&module=login&rand=sjrand'</span><span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">,</span>verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span><span class="token punctuation">.</span>content
    <span class="token keyword">with</span> <span class="token builtin">open</span><span class="token punctuation">(</span><span class="token string">'code.png'</span><span class="token punctuation">,</span> <span class="token string">'wb'</span><span class="token punctuation">)</span><span class="token keyword">as</span> f<span class="token punctuation">:</span>
        f<span class="token punctuation">.</span>write<span class="token punctuation">(</span>pic_response<span class="token punctuation">)</span>


<span class="token keyword">def</span> <span class="token function">login</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">:</span>
    <span class="token comment"># 从验证码图片的左上角开始截屏获取位置坐标</span>
    codeStr <span class="token operator">=</span> <span class="token builtin">input</span><span class="token punctuation">(</span><span class="token string">'请输入验证码坐标:'</span><span class="token punctuation">)</span>
    data <span class="token operator">=</span> <span class="token punctuation">{
     </span>
        <span class="token string">'answer'</span><span class="token punctuation">:</span> codeStr<span class="token punctuation">,</span>
        <span class="token string">'rand'</span><span class="token punctuation">:</span> <span class="token string">'sjrand'</span><span class="token punctuation">,</span>
        <span class="token string">'login_site'</span><span class="token punctuation">:</span> <span class="token string">'E'</span>
    <span class="token punctuation">}</span>
    response <span class="token operator">=</span> req<span class="token punctuation">.</span>post<span class="token punctuation">(</span><span class="token string">'https://kyfw.12306.cn/passport/captcha/captcha-check'</span><span class="token punctuation">,</span> data<span class="token operator">=</span>data<span class="token punctuation">,</span> headers<span class="token operator">=</span>headers<span class="token punctuation">,</span>verify<span class="token operator">=</span><span class="token boolean">False</span><span class="token punctuation">)</span>
    <span class="token keyword">print</span><span class="token punctuation">(</span>response<span class="token punctuation">.</span>text<span class="token punctuation">)</span>  <span class="token comment"># {"result_message":"验证码校验失败,信息为空","result_code":"8"}</span>


get_img<span class="token punctuation">(</span><span class="token punctuation">)</span>
login<span class="token punctuation">(</span><span class="token punctuation">)</span>
<span class="token triple-quoted-string string">'''
请输入验证码坐标:50,40,185,114
{"result_message":"验证码校验成功","result_code":"4"}'''</span>
</code></pre> 
 </div> 
</div>
                            </div>
                        </div>
                    </div>
                    <!--PC和WAP自适应版-->
                    <div id="SOHUCS" sid="1465657732576178176"></div>
                    <script type="text/javascript" src="/views/front/js/chanyan.js"></script>
                    <!-- 文章页-底部 动态广告位 -->
                    <div class="youdao-fixed-ad" id="detail_ad_bottom"></div>
                </div>
                <div class="col-md-3">
                    <div class="row" id="ad">
                        <!-- 文章页-右侧1 动态广告位 -->
                        <div id="right-1" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_1"> </div>
                        </div>
                        <!-- 文章页-右侧2 动态广告位 -->
                        <div id="right-2" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_2"></div>
                        </div>
                        <!-- 文章页-右侧3 动态广告位 -->
                        <div id="right-3" class="col-lg-12 col-md-12 col-sm-4 col-xs-4 ad">
                            <div class="youdao-fixed-ad" id="detail_ad_3"></div>
                        </div>
                    </div>
                </div>
            </div>
        </div>
    </div>
    <div class="container">
        <h4 class="pt20 mb15 mt0 border-top">你可能感兴趣的:(爬虫,爬虫)</h4>
        <div id="paradigm-article-related">
            <div class="recommend-post mb30">
                <ul class="widget-links">
                    <li><a href="/article/1900712728830472192.htm"
                           title="Python 爬虫实战:艺术品市场趋势分析与交易平台数据抓取" target="_blank">Python 爬虫实战:艺术品市场趋势分析与交易平台数据抓取</a>
                        <span class="text-muted">西攻城狮北</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>一、引言在当今数字化时代,艺术品市场正经历着前所未有的变革。随着互联网技术的飞速发展,越来越多的艺术品交易转移到了线上平台,这为我们提供了海量的数据资源。通过Python爬虫技术,我们可以抓取艺术品交易平台上的数据,进而分析艺术品市场的趋势,为投资者、收藏家以及艺术爱好者提供有价值的参考。本文将带领读者深入探索Python爬虫在艺术品市场的应用。从爬虫的基本原理到实际代码实现,再到数据的清洗、分析</div>
                    </li>
                    <li><a href="/article/1900695326671564800.htm"
                           title="简单的网页链接爬虫" target="_blank">简单的网页链接爬虫</a>
                        <span class="text-muted">笑颜218</span>
<a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%AE%80%E5%8D%95/1.htm">简单</a>
                        <div>fromurllib.requestimporturlopenfromurllib.parseimporturljoinfromhtml.parserimportHTMLParser#自定义HTML解析器classLinkParser(HTMLParser):def__init__(self,base_url):super().__init__()self.base_url=base_url#基础</div>
                    </li>
                    <li><a href="/article/1900641450656329728.htm"
                           title="C#实现动态验证码生成器:安全防护与实际应用场景" target="_blank">C#实现动态验证码生成器:安全防护与实际应用场景</a>
                        <span class="text-muted">WangMing_X</span>
<a class="tag" taget="_blank" href="/search/C%23%E5%AE%9E%E7%8E%B0%E5%90%84%E7%A7%8D%E5%8A%9F%E8%83%BD%E5%B7%A5%E5%85%B7%E9%9B%86/1.htm">C#实现各种功能工具集</a><a class="tag" taget="_blank" href="/search/c%23/1.htm">c#</a><a class="tag" taget="_blank" href="/search/%E5%AE%89%E5%85%A8/1.htm">安全</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E9%AA%8C%E8%AF%81%E7%A0%81/1.htm">验证码</a><a class="tag" taget="_blank" href="/search/%E5%9B%BE%E7%89%87/1.htm">图片</a>
                        <div>一、核心应用场景用户登录/注册验证:防止恶意程序批量注册表单提交防护:确保关键操作由真人执行API接口限流:抵御自动化脚本攻击敏感操作验证:如支付、信息修改等关键步骤数据防爬机制:保护网站内容不被爬虫抓取二、技术实现方案1.基础架构设计//验证码服务架构+------------------------+|验证码生成模块|←随机字符|(CaptchaGenerator)|+------------</div>
                    </li>
                    <li><a href="/article/1900635272874356736.htm"
                           title="《Python实战进阶》No23: 使用 Selenium 自动化浏览器操作" target="_blank">《Python实战进阶》No23: 使用 Selenium 自动化浏览器操作</a>
                        <span class="text-muted">带娃的IT创业者</span>
<a class="tag" taget="_blank" href="/search/Python%E5%AE%9E%E6%88%98%E8%BF%9B%E9%98%B6/1.htm">Python实战进阶</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/selenium/1.htm">selenium</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E5%8A%A8%E5%8C%96/1.htm">自动化</a>
                        <div>No23:使用Selenium自动化浏览器操作摘要Selenium是自动化浏览器操作的“瑞士军刀”,可模拟人类行为操作网页,适用于爬虫、测试、重复任务自动化等场景。本集通过代码驱动实战,从安装配置到复杂交互,带你掌握Selenium的核心技能,并结合电商网站登录、商品下单等真实场景,解决动态加载、反爬等实际问题。核心概念与代码实战1.环境配置与WebDriver基础安装命令:pipinstalls</div>
                    </li>
                    <li><a href="/article/1900598193289228288.htm"
                           title="Python 实现的采集诸葛灵签" target="_blank">Python 实现的采集诸葛灵签</a>
                        <span class="text-muted">老大白菜</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>Python实现的采集诸葛灵签项目介绍这是一个基于Python开发的诸葛灵签数据采集和展示项目。通过爬虫技术获取诸葛神签的签文和解签内容,并提供数据存储和查询功能。项目结构zhuge/├──zhuge_scraper.py#爬虫主程序├──zhuge_pages/#数据存储目录│├──all_signs.json#汇总数据│└──zhuge_sign_*.json#单个签文数据└──zhuge.m</div>
                    </li>
                    <li><a href="/article/1900596931810357248.htm"
                           title="Python爬虫实战:从青铜到王者的数据采集进化论" target="_blank">Python爬虫实战:从青铜到王者的数据采集进化论</a>
                        <span class="text-muted">Loving_enjoy</span>
<a class="tag" taget="_blank" href="/search/%E5%AE%9E%E7%94%A8%E6%8A%80%E5%B7%A7/1.htm">实用技巧</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a>
                        <div>#开篇:当你打开浏览器时,爬虫程序在暗处露出了姨母笑某日凌晨3点,程序员老张盯着满屏的404错误,突然领悟了爬虫的真谛——这哪里是数据采集,分明是与网站运维人员斗智斗勇的谍战游戏!本文将带你体验从"HelloWorld"式爬虫到工业级采集系统的奇幻漂流,全程高能预警,请系好安全带。---###第一章青铜时代:初学者的三板斧####1.1环境搭建:你的第一把手术刀安装Python就像选择武器库:``</div>
                    </li>
                    <li><a href="/article/1900561741394276352.htm"
                           title="SEO 优化" target="_blank">SEO 优化</a>
                        <span class="text-muted">前端岳大宝</span>
<a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF%E6%A0%B8%E5%BF%83%E7%9F%A5%E8%AF%86%E6%80%BB%E7%BB%93/1.htm">前端核心知识总结</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/html/1.htm">html</a>
                        <div>以下是SEO(搜索引擎优化)的基础知识点梳理,从前端技术、内容策略到搜索引擎原理,覆盖核心优化方向:一、SEO基础概念定义与目标SEO是通过优化网站结构、内容和技术,提升网站在搜索引擎自然搜索结果中的排名,吸引更多免费流量。核心目标:满足用户搜索意图,同时符合搜索引擎爬虫的抓取规则。搜索引擎工作原理爬取(Crawling):搜索引擎蜘蛛(如Googlebot)抓取网页内容。索引(Indexing)</div>
                    </li>
                    <li><a href="/article/1900537647286251520.htm"
                           title="养生鲜知酒世界语意合™" target="_blank">养生鲜知酒世界语意合™</a>
                        <span class="text-muted">花间流风</span>
<a class="tag" taget="_blank" href="/search/%E7%90%B4%E8%AF%AD%E8%A8%80%E5%AD%A6%E4%B9%A0%E7%BC%96%E7%A8%8B%E5%AE%9E%E6%88%98100%E8%AE%B2/1.htm">琴语言学习编程实战100讲</a><a class="tag" taget="_blank" href="/search/%E5%87%A0%E4%BD%95%E5%AD%A6/1.htm">几何学</a><a class="tag" taget="_blank" href="/search/%E6%83%85%E6%84%9F%E5%88%86%E6%9E%90/1.htm">情感分析</a><a class="tag" taget="_blank" href="/search/%E7%9F%A9%E9%98%B5/1.htm">矩阵</a>
                        <div>养生鲜知酒世界语意合™介绍世界语意合™:无极养生鲜知酒™低代码爬虫插件生成平台,一切人文美篇都含共同的特点:鲜醇如酒,回味悠长,水不在深有龙则灵,山不在高有仙则灵,吐纳健身,诵致养生,气质达人,和气生财,平易近人,和悦泛函,慧极必伤,情深不寿,阳明心学,温文如玉,谦谦君子,神童晏殊启智音律宝典。琴生生物机械科技工业研究所国医学院医疗力量中心。云藏山鹰社会科学概论报告天下才气共一斗,云藏山鹰独占八分</div>
                    </li>
                    <li><a href="/article/1900524910644621312.htm"
                           title="Python爬虫教程:如何通过接口批量下载视频封面(FFmpeg技术实现)" target="_blank">Python爬虫教程:如何通过接口批量下载视频封面(FFmpeg技术实现)</a>
                        <span class="text-muted">Python爬虫项目</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/1.htm">数据分析</a><a class="tag" taget="_blank" href="/search/scrapy/1.htm">scrapy</a><a class="tag" taget="_blank" href="/search/selenium/1.htm">selenium</a>
                        <div>引言随着在线视频平台的蓬勃发展,视频封面作为视频内容的预览图,一直以来都是观众对视频的第一印象。在爬取视频资源时,很多开发者和研究者往往只关注视频本身,而忽略了视频封面。实际上,视频封面不仅能提供重要的信息(例如视频标题、主题或情感等),而且它们也能作为数据集中的重要属性,用于视频分类、推荐系统等应用。在这篇博客中,我们将深入探讨如何使用Python通过接口批量下载视频封面,利用FFmpeg等技术</div>
                    </li>
                    <li><a href="/article/1900413305999126528.htm"
                           title="【2025年35期免费获取股票数据API接口】实例演示五种主流语言获取股票行情api接口之沪深A股当天分价成交占比数据获取实例演示及接口API说明文档" target="_blank">【2025年35期免费获取股票数据API接口】实例演示五种主流语言获取股票行情api接口之沪深A股当天分价成交占比数据获取实例演示及接口API说明文档</a>
                        <span class="text-muted">不会写代码的码农农</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E8%82%A1%E7%A5%A8API/1.htm">股票API</a><a class="tag" taget="_blank" href="/search/%E8%82%A1%E7%A5%A8%E6%95%B0%E6%8D%AE%E6%8E%A5%E5%8F%A3/1.htm">股票数据接口</a><a class="tag" taget="_blank" href="/search/%E8%82%A1%E7%A5%A8%E6%95%B0%E6%8D%AE/1.htm">股票数据</a>
                        <div>在近一至两年期间,股票量化分析逐步成为备受关注的热门议题。对于投身于该领域工作而言,首要步骤便是获取全面且精准的股票数据。无论是实时交易数据、历史交易记录、财务数据,亦或是基本面信息,这些数据均是开展量化分析过程中不可或缺的宝贵资源。我们的核心任务在于从这些数据中提炼出具有价值的信息,从而为投资策略提供坚实有力的指导。在数据探索进程中,我尝试运用了多种方法,涵盖自编网易股票页面爬虫程序、申万行业数</div>
                    </li>
                    <li><a href="/article/1900380375864111104.htm"
                           title="Python 爬虫实战:开放数据集抓取与大数据分析应用" target="_blank">Python 爬虫实战:开放数据集抓取与大数据分析应用</a>
                        <span class="text-muted">西攻城狮北</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%88%86%E6%9E%90/1.htm">数据分析</a>
                        <div>引言在数据驱动的时代,开放数据集成为了各领域研究和应用的宝贵资源。通过抓取和分析开放数据集,我们可以挖掘出有价值的信息,为决策提供支持。本文将详细介绍如何使用Python爬虫技术抓取开放数据集,并进行大数据分析应用。一、项目背景与目标1.项目背景随着信息技术的飞速发展,越来越多的机构和组织开始开放其数据集,以促进创新和研究。这些开放数据集涵盖了各个领域,如气象、交通、医疗、金融等。通过抓取和分析这</div>
                    </li>
                    <li><a href="/article/1900350378004770816.htm"
                           title="Python爬虫-请求模块urllib3" target="_blank">Python爬虫-请求模块urllib3</a>
                        <span class="text-muted">andyyah晓波</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>Python爬虫-请求模块urllib3urllib3是一个功能强大、条理清晰,用于HTTP客户端的第三方模块,许多Python的原生系统已经开始使用urllib3。urllib3提供了很多Python标准库里所没有的重要特性:线程安全。连接池。客户端SSL/TLS验证。使用multipart编码上传文件。Helpers用于重试请求并处理HTTP重定向。支持gzip和deflate编码。支持HTT</div>
                    </li>
                    <li><a href="/article/1900350377404985344.htm"
                           title="Python爬虫-请求模块Urllib" target="_blank">Python爬虫-请求模块Urllib</a>
                        <span class="text-muted">andyyah晓波</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>Python爬虫-请求模块UrllibPython3中的Urllib模块中包含多个功能的子模块,具体内容如下:urllib.request:用于实现基本HTTP请求的模块。urllib.error:异常处理模块,如果在发送网络请求时出现了错误,可以捕获异常进行异常的有效处理。urllib.parse:用于解析URL的模块。urllib.robotparser:用于解析robots.txt文件,判断</div>
                    </li>
                    <li><a href="/article/1900339913845436416.htm"
                           title="Python 爬虫基础教程" target="_blank">Python 爬虫基础教程</a>
                        <span class="text-muted">盛子涵666</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>爬虫的背景与应用诞生爬虫(WebCrawling)是自动化程序,用于从互联网上获取信息。爬虫的基本任务是自动访问网站,通过抓取网页内容并提取有用数据来构建数据库、索引或者进行进一步的数据分析。爬虫通常会模拟浏览器的行为,以避免被服务器识别为机器人,并且能够在大规模范围内高效地抓取信息。爬虫技术最早由搜索引擎开发者提出,目的是自动收集网页信息并将其索引,便于用户搜索时快速检索相关内容。随着互联网的快</div>
                    </li>
                    <li><a href="/article/1900335372131430400.htm"
                           title="爬虫的精准识别:基于 User-Agent 的正则实现" target="_blank">爬虫的精准识别:基于 User-Agent 的正则实现</a>
                        <span class="text-muted">Small踢倒coffee_氕氘氚</span>
<a class="tag" taget="_blank" href="/search/%E7%BB%8F%E9%AA%8C%E5%88%86%E4%BA%AB/1.htm">经验分享</a><a class="tag" taget="_blank" href="/search/%E7%AC%94%E8%AE%B0/1.htm">笔记</a>
                        <div>##摘要随着互联网技术的飞速发展,网络爬虫在数据采集、搜索引擎优化等领域的应用日益广泛。然而,恶意爬虫的存在也给网站安全和数据隐私带来了严重威胁。因此,精准识别爬虫行为成为网络安全领域的重要课题。本文提出了一种基于User-Agent正则表达式的爬虫识别方法,通过分析User-Agent字符串的特征,构建正则表达式规则,实现对爬虫的精准识别。实验结果表明,该方法具有较高的识别准确率和较低的误报率,</div>
                    </li>
                    <li><a href="/article/1900326676189999104.htm"
                           title="【Python爬虫(71)】用Python爬虫解锁教育数据的奥秘" target="_blank">【Python爬虫(71)】用Python爬虫解锁教育数据的奥秘</a>
                        <span class="text-muted">奔跑吧邓邓子</span>
<a class="tag" taget="_blank" href="/search/Python%E7%88%AC%E8%99%AB/1.htm">Python爬虫</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E6%95%99%E8%82%B2%E6%95%B0%E6%8D%AE/1.htm">教育数据</a>
                        <div>【Python爬虫】专栏简介:本专栏是Python爬虫领域的集大成之作,共100章节。从Python基础语法、爬虫入门知识讲起,深入探讨反爬虫、多线程、分布式等进阶技术。以大量实例为支撑,覆盖网页、图片、音频等各类数据爬取,还涉及数据处理与分析。无论是新手小白还是进阶开发者,都能从中汲取知识,助力掌握爬虫核心技能,开拓技术视野。目录一、引言二、确定目标网站2.1教育机构官网2.2在线学习平台三、爬</div>
                    </li>
                    <li><a href="/article/1900318352153309184.htm"
                           title="Python爬虫实战:抓取电子图书平台图书信息与下载数据" target="_blank">Python爬虫实战:抓取电子图书平台图书信息与下载数据</a>
                        <span class="text-muted">Python爬虫项目</span>
<a class="tag" taget="_blank" href="/search/2025%E5%B9%B4%E7%88%AC%E8%99%AB%E5%AE%9E%E6%88%98%E9%A1%B9%E7%9B%AE/1.htm">2025年爬虫实战项目</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB/1.htm">网络爬虫</a><a class="tag" taget="_blank" href="/search/%E4%BF%A1%E6%81%AF%E5%8F%AF%E8%A7%86%E5%8C%96/1.htm">信息可视化</a>
                        <div>前言电子图书平台汇集了海量的图书资源和丰富的信息,抓取这些数据可用于研究图书销售趋势、阅读偏好分析,甚至为书籍推荐系统提供数据支持。本文将详细介绍如何使用Python爬虫技术抓取电子图书平台的图书信息和下载数据。我们会涵盖从需求分析到代码实现的完整流程,探讨如何应对复杂的反爬机制,并使用最新的技术工具优化抓取过程。目录前言一、需求分析与目标1.1抓取目标1.2难点与挑战二、技术选型与工具2.1使用</div>
                    </li>
                    <li><a href="/article/1900317848253820928.htm"
                           title="Python 爬虫实战:公开专利信息抓取与创新趋势分析系统构建" target="_blank">Python 爬虫实战:公开专利信息抓取与创新趋势分析系统构建</a>
                        <span class="text-muted">西攻城狮北</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>一、引言在当今数字化时代,专利信息已成为企业和科研机构进行技术创新与竞争分析的重要资源。通过获取和分析专利数据,可以了解行业动态、技术发展趋势以及竞争对手的创新方向。本文将详细介绍如何使用Python爬虫技术抓取公开专利信息,并构建一个创新趋势分析系统。二、项目背景与目标2.1项目背景随着全球科技创新的加速,专利数量不断增加。手动查阅专利信息已无法满足高效分析的需求,因此利用Python爬虫自动抓</div>
                    </li>
                    <li><a href="/article/1900298925198340096.htm"
                           title="爬虫中一些有用的用法" target="_blank">爬虫中一些有用的用法</a>
                        <span class="text-muted">才不是小emo的小杨</span>
<a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/xpath/1.htm">xpath</a>
                        <div>文本和标签在一个级别下如果文本和a标签在一个级别下比如:#获取a标签后的第一个文本节点text_node=a.xpath('following-sibling::text()[1]')[0].strip()将xpath的html代码转换成字符串etree.tostring(root,pretty_print=True,encoding="utf-8")获取所有同级标签的最后一个data_list=</div>
                    </li>
                    <li><a href="/article/1900294509279899648.htm"
                           title="Python全栈开发爬虫+自动化办公+数据分析教程" target="_blank">Python全栈开发爬虫+自动化办公+数据分析教程</a>
                        <span class="text-muted">jijihusong006</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F/1.htm">程序</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E8%87%AA%E5%8A%A8%E5%8C%96/1.htm">自动化</a>
                        <div>以下是一份系统化的Python全栈开发综合教程,涵盖Web开发、网络爬虫、自动化办公和数据分析四大核心领域,采用模块化结构进行深度技术解析:Python全栈开发综合实战教程1、Python全栈开发教程、+爬虫+自动化办公+数据分析课程https://pan.quark.cn/s/9bbb9c39e9652、传送资料库查询https://link3.cc/aa99第一部分全栈开发体系1.1技术架构全</div>
                    </li>
                    <li><a href="/article/1900267171817254912.htm"
                           title="2024年最全Python逆向进阶:Web逆向私单_逆向工程能接爬虫私活吗(1)" target="_blank">2024年最全Python逆向进阶:Web逆向私单_逆向工程能接爬虫私活吗(1)</a>
                        <span class="text-muted">2401_84692110</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a>
                        <div>可见,大家都迫切地想要掌握Python爬虫技术。很多人都表示,高阶的爬虫技术不太好上手,找到合适的练手项目也很不容易,每个人都在期待一套能快速进阶的技术速成方案。想要快速学好爬虫,尤其是可以用于变现的高阶爬虫技术,野路子的啃书自学就大可不必了,辣条推荐大家直接来参加Python爬虫实战特训营。可直接白瓢三天~↓↓↓文末的这个名片直接找我,直接参加即可↓↓↓这是一套专讲爬虫与反爬虫攻防的实战特训,迄</div>
                    </li>
                    <li><a href="/article/1900265912406175744.htm"
                           title="2024年Python逆向进阶:Web逆向私单_逆向工程能接爬虫私活吗(2)" target="_blank">2024年Python逆向进阶:Web逆向私单_逆向工程能接爬虫私活吗(2)</a>
                        <span class="text-muted">2301_82243558</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E5%89%8D%E7%AB%AF/1.htm">前端</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a>
                        <div>可见,大家都迫切地想要掌握Python爬虫技术。很多人都表示,高阶的爬虫技术不太好上手,找到合适的练手项目也很不容易,每个人都在期待一套能快速进阶的技术速成方案。想要快速学好爬虫,尤其是可以用于变现的高阶爬虫技术,野路子的啃书自学就大可不必了,辣条推荐大家直接来参加Python爬虫实战特训营。可直接白瓢三天~↓↓↓文末的这个名片直接找我,直接参加即可↓↓↓这是一套专讲爬虫与反爬虫攻防的实战特训,迄</div>
                    </li>
                    <li><a href="/article/1900189897101209600.htm"
                           title="python爬虫网络中断_如何解决Python爬虫中的网络掉线问题?" target="_blank">python爬虫网络中断_如何解决Python爬虫中的网络掉线问题?</a>
                        <span class="text-muted">weixin_39767645</span>
<a class="tag" taget="_blank" href="/search/python%E7%88%AC%E8%99%AB%E7%BD%91%E7%BB%9C%E4%B8%AD%E6%96%AD/1.htm">python爬虫网络中断</a>
                        <div>在学校里的时候,除了上课,还有一大幸福的事情,就是用着学校的网线网络。当然玩的时候很开心,就是没事关键词时刻掉链子。时不时地网络掉线让人非常恼火,什么团战在梦游啊,看剧卡住不动了,相信能引起很多小伙伴的共鸣。所以,为了大家的快乐,小编找到了一个解决办法,分享给大家。以山东大学网络为例,别的话不多说,直接上程序__author__='CQC'#-*-coding:utf-8-*-importurll</div>
                    </li>
                    <li><a href="/article/1900183339495649280.htm"
                           title="Python爬虫学习笔记_DAY_26_Python爬虫之requests库的安装与基本使用【Python爬虫】_requests库ip" target="_blank">Python爬虫学习笔记_DAY_26_Python爬虫之requests库的安装与基本使用【Python爬虫】_requests库ip</a>
                        <span class="text-muted">苹果Android开发组</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%AD%A6%E4%B9%A0/1.htm">学习</a>
                        <div>最后Python崛起并且风靡,因为优点多、应用领域广、被大牛们认可。学习Python门槛很低,但它的晋级路线很多,通过它你能进入机器学习、数据挖掘、大数据,CS等更加高级的领域。Python可以做网络应用,可以做科学计算,数据分析,可以做网络爬虫,可以做机器学习、自然语言处理、可以写游戏、可以做桌面应用…Python可以做的很多,你需要学好基础,再选择明确的方向。这里给大家分享一份全套的Pytho</div>
                    </li>
                    <li><a href="/article/1900176280268107776.htm"
                           title="python爬虫遇到IP被封的情况,怎么办?(2)" target="_blank">python爬虫遇到IP被封的情况,怎么办?(2)</a>
                        <span class="text-muted">2301_82242251</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>代理的设置:①urllib的代理设置fromurllib.errorimportURLErrorfromurllib.requestimportProxyHandler,build_opener‘’’更多Python学习资料以及源码教程资料,可以在群1136201545免费获取‘’’proxy=‘127.0.0.1:8888’#需要认证的代理#proxy=‘username:password@12</div>
                    </li>
                    <li><a href="/article/1900175776536391680.htm"
                           title="python爬虫碰到IP被封的情况,如何解决?" target="_blank">python爬虫碰到IP被封的情况,如何解决?</a>
                        <span class="text-muted">xinxinhenmeihao</span>
<a class="tag" taget="_blank" href="/search/%E4%BB%A3%E7%90%86IP/1.htm">代理IP</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/tcp%2Fip/1.htm">tcp/ip</a>
                        <div>在数据抓取和爬虫开发的实践中,Python作为一种功能强大且易于上手的编程语言,被广泛应用于网络数据的采集。然而,随着网络环境的日益复杂,爬虫活动也面临着越来越多的挑战,其中IP被封便是常见且棘手的问题。IP被封不仅会导致爬虫任务中断,还可能对目标网站的正常运营造成干扰。因此,了解并掌握解决Python爬虫IP被封的方法,对于爬虫开发者而言至关重要。一、IP被封的原因分析一般来说,IP被封主要源于</div>
                    </li>
                    <li><a href="/article/1900173634794745856.htm"
                           title="6个必备的 Node 网络爬虫库" target="_blank">6个必备的 Node 网络爬虫库</a>
                        <span class="text-muted">zz_jesse</span>
<a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a>
                        <div>作为一名程序员,你是否曾遇到过需要从各大网站提取数据的需求?随着互联网的快速扩展,能够高效地进行网络爬虫已经成为企业、研究人员以及个人的一项重要技能。在这个数据为王的时代,如何利用JavaScript和Node.js来实现高效的数据抓取,是每一个开发者都应该掌握的技巧。网络爬虫,即从网站提取数据的过程,已经成为各行各业的重要工具。而JavaScript和Node.js因其强大的功能和丰富的库,成为</div>
                    </li>
                    <li><a href="/article/1900097100482408448.htm"
                           title="Python 爬虫实战:时尚网站潮流趋势数据抓取与流行趋势预测" target="_blank">Python 爬虫实战:时尚网站潮流趋势数据抓取与流行趋势预测</a>
                        <span class="text-muted">西攻城狮北</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E6%97%B6%E5%B0%9A%E7%BD%91%E7%AB%99/1.htm">时尚网站</a>
                        <div>作为一名对时尚和编程都充满热情的创作者,我一直在寻找将这两者结合的方式。今天,我将带领大家进行一场独特的Python爬虫实战,通过抓取时尚网站的潮流趋势数据,预测未来的流行趋势。这不仅可以帮助时尚爱好者提前了解潮流走向,还能为时尚从业者提供决策依据。一、项目背景在当今快节奏的社会中,时尚潮流的变化速度越来越快。人们渴望及时了解最新的时尚趋势,以便跟上时代的步伐。时尚网站作为时尚信息的重要传播平台,</div>
                    </li>
                    <li><a href="/article/1900075791018946560.htm"
                           title="Python 爬虫实战:在线论坛用户活跃度分析系统构建" target="_blank">Python 爬虫实战:在线论坛用户活跃度分析系统构建</a>
                        <span class="text-muted">西攻城狮北</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a>
                        <div>作为一名对数据分析和社区运营感兴趣的内容创作者,我决定利用Python爬虫技术抓取在线论坛的用户数据,并构建一个用户活跃度分析系统。这对于了解用户行为、提升社区活跃度和优化运营策略具有重要意义。一、项目背景在线论坛是用户交流和分享信息的重要平台。用户的活跃度直接影响论坛的氛围和价值。通过分析用户的发帖、回帖、点赞等行为数据,我们可以评估用户的活跃度,找出活跃用户和沉寂用户,为社区的精细化运营提供数</div>
                    </li>
                    <li><a href="/article/1900056128948072448.htm"
                           title="Python爬虫:从人民网提取视频链接的完整指南" target="_blank">Python爬虫:从人民网提取视频链接的完整指南</a>
                        <span class="text-muted">小白学大数据</span>
<a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/python/1.htm">python</a><a class="tag" taget="_blank" href="/search/%E7%88%AC%E8%99%AB/1.htm">爬虫</a><a class="tag" taget="_blank" href="/search/%E9%9F%B3%E8%A7%86%E9%A2%91/1.htm">音视频</a><a class="tag" taget="_blank" href="/search/%E5%BC%80%E5%8F%91%E8%AF%AD%E8%A8%80/1.htm">开发语言</a><a class="tag" taget="_blank" href="/search/%E5%A4%A7%E6%95%B0%E6%8D%AE/1.htm">大数据</a>
                        <div>无论是用于数据分析、内容提取还是资源收集,Python爬虫都因其高效性和易用性而备受开发者青睐。本文将通过一个实际案例——从人民网提取视频链接,详细介绍如何使用Python构建一个完整的爬虫程序。我们将涵盖从基础的网络请求到HTML解析,再到最终提取视频链接的全过程。一、爬虫技术概述网络爬虫(WebCrawler)是一种自动化的程序,用于在互联网上浏览网页并收集信息。它通过模拟浏览器的行为,发送H</div>
                    </li>
                                <li><a href="/article/2.htm"
                                       title="Java序列化进阶篇" target="_blank">Java序列化进阶篇</a>
                                    <span class="text-muted">g21121</span>
<a class="tag" taget="_blank" href="/search/java%E5%BA%8F%E5%88%97%E5%8C%96/1.htm">java序列化</a>
                                    <div>        1.transient 
        类一旦实现了Serializable 接口即被声明为可序列化,然而某些情况下并不是所有的属性都需要序列化,想要人为的去阻止这些属性被序列化,就需要用到transient 关键字。 
</div>
                                </li>
                                <li><a href="/article/129.htm"
                                       title="escape()、encodeURI()、encodeURIComponent()区别详解 " target="_blank">escape()、encodeURI()、encodeURIComponent()区别详解 </a>
                                    <span class="text-muted">aigo</span>
<a class="tag" taget="_blank" href="/search/JavaScript/1.htm">JavaScript</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a>
                                    <div>原文:http://blog.sina.com.cn/s/blog_4586764e0101khi0.html 
  
JavaScript中有三个可以对字符串编码的函数,分别是: escape,encodeURI,encodeURIComponent,相应3个解码函数:,decodeURI,decodeURIComponent 。 
下面简单介绍一下它们的区别 
1 escape()函</div>
                                </li>
                                <li><a href="/article/256.htm"
                                       title="ArcgisEngine实现对地图的放大、缩小和平移" target="_blank">ArcgisEngine实现对地图的放大、缩小和平移</a>
                                    <span class="text-muted">Cb123456</span>
<a class="tag" taget="_blank" href="/search/%E6%B7%BB%E5%8A%A0%E7%9F%A2%E9%87%8F%E6%95%B0%E6%8D%AE/1.htm">添加矢量数据</a><a class="tag" taget="_blank" href="/search/%E5%AF%B9%E5%9C%B0%E5%9B%BE%E7%9A%84%E6%94%BE%E5%A4%A7%E3%80%81%E7%BC%A9%E5%B0%8F%E5%92%8C%E5%B9%B3%E7%A7%BB/1.htm">对地图的放大、缩小和平移</a><a class="tag" taget="_blank" href="/search/Engine/1.htm">Engine</a>
                                    <div>ArcgisEngine实现对地图的放大、缩小和平移: 
 个人觉得是平移,不过网上的都是漫游,通俗的说就是把一个地图对象从一边拉到另一边而已。就看人说话吧. 
 具体实现: 
一、引入命名空间 
   using ESRI.ArcGIS.Geometry; 
   using ESRI.ArcGIS.Controls; 
二、代码实现.</div>
                                </li>
                                <li><a href="/article/383.htm"
                                       title="Java集合框架概述" target="_blank">Java集合框架概述</a>
                                    <span class="text-muted">天子之骄</span>
<a class="tag" taget="_blank" href="/search/Java%E9%9B%86%E5%90%88%E6%A1%86%E6%9E%B6%E6%A6%82%E8%BF%B0/1.htm">Java集合框架概述</a>
                                    <div>   集合框架 
集合框架可以理解为一个容器,该容器主要指映射(map)、集合(set)、数组(array)和列表(list)等抽象数据结构。 
从本质上来说,Java集合框架的主要组成是用来操作对象的接口。不同接口描述不同的数据类型。 
  
简单介绍: 
  
Collection接口是最基本的接口,它定义了List和Set,List又定义了LinkLi</div>
                                </li>
                                <li><a href="/article/510.htm"
                                       title="旗正4.0页面跳转传值问题" target="_blank">旗正4.0页面跳转传值问题</a>
                                    <span class="text-muted">何必如此</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/jsp/1.htm">jsp</a>
                                    <div>跳转和成功提示 
a)        成功字段非空forward 
成功字段非空forward,不会弹出成功字段,为jsp转发,页面能超链接传值,传输变量时需要拼接。接拼接方式list.jsp?test="+strweightUnit+"或list.jsp?test="+weightUnit+&qu</div>
                                </li>
                                <li><a href="/article/637.htm"
                                       title="全网唯一:移动互联网服务器端开发课程" target="_blank">全网唯一:移动互联网服务器端开发课程</a>
                                    <span class="text-muted">cocos2d-x小菜</span>
<a class="tag" taget="_blank" href="/search/web%E5%BC%80%E5%8F%91/1.htm">web开发</a><a class="tag" taget="_blank" href="/search/%E7%A7%BB%E5%8A%A8%E5%BC%80%E5%8F%91/1.htm">移动开发</a><a class="tag" taget="_blank" href="/search/%E7%A7%BB%E5%8A%A8%E7%AB%AF%E5%BC%80%E5%8F%91/1.htm">移动端开发</a><a class="tag" taget="_blank" href="/search/%E7%A7%BB%E5%8A%A8%E4%BA%92%E8%81%94/1.htm">移动互联</a><a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>    移动互联网时代来了!     App市场爆发式增长为Web开发程序员带来新一轮机遇,近两年新增创业者,几乎全部选择了移动互联网项目!传统互联网企业中超过98%的门户网站已经或者正在从单一的网站入口转向PC、手机、Pad、智能电视等多端全平台兼容体系。据统计,AppStore中超过85%的App项目都选择了PHP作为后端程</div>
                                </li>
                                <li><a href="/article/764.htm"
                                       title="Log4J通用配置|注意问题 笔记" target="_blank">Log4J通用配置|注意问题 笔记</a>
                                    <span class="text-muted">7454103</span>
<a class="tag" taget="_blank" href="/search/DAO/1.htm">DAO</a><a class="tag" taget="_blank" href="/search/apache/1.htm">apache</a><a class="tag" taget="_blank" href="/search/tomcat/1.htm">tomcat</a><a class="tag" taget="_blank" href="/search/log4j/1.htm">log4j</a><a class="tag" taget="_blank" href="/search/Web/1.htm">Web</a>
                                    <div>关于日志的等级 那些去 百度就知道了! 
这几天 要搭个新框架  配置了 日志 记下来 !做个备忘! 
 

 #这里定义能显示到的最低级别,若定义到INFO级别,则看不到DEBUG级别的信息了~!
log4j.rootLogger=INFO,allLog

# DAO层 log记录到dao.log 控制台 和 总日志文件
log4j.logger.DAO=INFO,dao,C</div>
                                </li>
                                <li><a href="/article/891.htm"
                                       title="SQLServer TCP/IP 连接失败问题 ---SQL Server Configuration Manager" target="_blank">SQLServer TCP/IP 连接失败问题 ---SQL Server Configuration Manager</a>
                                    <span class="text-muted">darkranger</span>
<a class="tag" taget="_blank" href="/search/sql/1.htm">sql</a><a class="tag" taget="_blank" href="/search/c/1.htm">c</a><a class="tag" taget="_blank" href="/search/windows/1.htm">windows</a><a class="tag" taget="_blank" href="/search/SQL+Server/1.htm">SQL Server</a><a class="tag" taget="_blank" href="/search/XP/1.htm">XP</a>
                                    <div>当你安装完之后,连接数据库的时候可能会发现你的TCP/IP 没有启动.. 
发现需要启动客户端协议 : TCP/IP  
需要打开 SQL Server Configuration Manager... 
却发现无法打开 SQL Server Configuration Manager..?? 
 
解决方法:  C:\WINDOWS\system32目录搜索framedyn.</div>
                                </li>
                                <li><a href="/article/1018.htm"
                                       title="[置顶] 做有中国特色的程序员" target="_blank">[置顶] 做有中国特色的程序员</a>
                                    <span class="text-muted">aijuans</span>
<a class="tag" taget="_blank" href="/search/%E7%A8%8B%E5%BA%8F%E5%91%98/1.htm">程序员</a>
                                    <div>从出版业说起   网络作品排到靠前的,都不会太难看,一般人不爱看某部作品也是因为不喜欢这个类型,而此人也不会全不喜欢这些网络作品。究其原因,是因为网络作品都是让人先白看的,看的好了才出了头。而纸质作品就不一定了,排行榜靠前的,有好作品,也有垃圾。   许多大牛都是写了博客,后来出了书。这些书也都不次,可能有人让为不好,是因为技术书不像小说,小说在读故事,技术书是在学知识或温习知识,有些技术书读得可</div>
                                </li>
                                <li><a href="/article/1145.htm"
                                       title="document.domain 跨域问题" target="_blank">document.domain 跨域问题</a>
                                    <span class="text-muted">avords</span>
<a class="tag" taget="_blank" href="/search/document/1.htm">document</a>
                                    <div>document.domain用来得到当前网页的域名。比如在地址栏里输入:javascript:alert(document.domain); //www.315ta.com我们也可以给document.domain属性赋值,不过是有限制的,你只能赋成当前的域名或者基础域名。比如:javascript:alert(document.domain = "315ta.com"); </div>
                                </li>
                                <li><a href="/article/1272.htm"
                                       title="关于管理软件的一些思考" target="_blank">关于管理软件的一些思考</a>
                                    <span class="text-muted">houxinyou</span>
<a class="tag" taget="_blank" href="/search/%E7%AE%A1%E7%90%86/1.htm">管理</a>
                                    <div>  
工作好多看年了,一直在做管理软件,不知道是我最开始做的时候产生了一些惯性的思维,还是现在接触的管理软件水平有所下降.换过好多年公司,越来越感觉现在的管理软件做的越来越乱. 
在我看来,管理软件不论是以前的结构化编程,还是现在的面向对象编程,不管是CS模式,还是BS模式.模块的划分是很重要的.当然,模块的划分有很多种方式.我只是以我自己的划分方式来说一下. 
做为管理软件,就像现在讲究MVC这</div>
                                </li>
                                <li><a href="/article/1399.htm"
                                       title="NoSQL数据库之Redis数据库管理(String类型和hash类型)" target="_blank">NoSQL数据库之Redis数据库管理(String类型和hash类型)</a>
                                    <span class="text-muted">bijian1013</span>
<a class="tag" taget="_blank" href="/search/redis/1.htm">redis</a><a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E5%BA%93/1.htm">数据库</a><a class="tag" taget="_blank" href="/search/NoSQL/1.htm">NoSQL</a>
                                    <div>一.Redis的数据类型 
1.String类型及操作 
        String是最简单的类型,一个key对应一个value,string类型是二进制安全的。Redis的string可以包含任何数据,比如jpg图片或者序列化的对象。 
        Set方法:设置key对应的值为string类型的value </div>
                                </li>
                                <li><a href="/article/1526.htm"
                                       title="Tomcat 一些技巧" target="_blank">Tomcat 一些技巧</a>
                                    <span class="text-muted">征客丶</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/tomcat/1.htm">tomcat</a><a class="tag" taget="_blank" href="/search/dos/1.htm">dos</a>
                                    <div>以下操作都是在windows 环境下 
 
一、Tomcat 启动时配置 JAVA_HOME 
在 tomcat 安装目录,bin 文件夹下的 catalina.bat 或 setclasspath.bat 中添加 
 
set JAVA_HOME=JAVA 安装目录 
set JRE_HOME=JAVA 安装目录/jre 
 
即可; 
 
二、查看Tomcat 版本 
在 tomcat 安装目</div>
                                </li>
                                <li><a href="/article/1653.htm"
                                       title="【Spark七十二】Spark的日志配置" target="_blank">【Spark七十二】Spark的日志配置</a>
                                    <span class="text-muted">bit1129</span>
<a class="tag" taget="_blank" href="/search/spark/1.htm">spark</a>
                                    <div>在测试Spark Streaming时,大量的日志显示到控制台,影响了Spark Streaming程序代码的输出结果的查看(代码中通过println将输出打印到控制台上),可以通过修改Spark的日志配置的方式,不让Spark Streaming把它的日志显示在console 
  
在Spark的conf目录下,把log4j.properties.template修改为log4j.p</div>
                                </li>
                                <li><a href="/article/1780.htm"
                                       title="Haskell版冒泡排序" target="_blank">Haskell版冒泡排序</a>
                                    <span class="text-muted">bookjovi</span>
<a class="tag" taget="_blank" href="/search/%E5%86%92%E6%B3%A1%E6%8E%92%E5%BA%8F/1.htm">冒泡排序</a><a class="tag" taget="_blank" href="/search/haskell/1.htm">haskell</a>
                                    <div>面试的时候问的比较多的算法题要么是binary search,要么是冒泡排序,真的不想用写C写冒泡排序了,贴上个Haskell版的,思维简单,代码简单,下次谁要是再要我用C写冒泡排序,直接上个haskell版的,让他自己去理解吧。 
  
  
sort [] = []
sort [x] = [x]
sort (x:x1:xs)
    | x>x1 = x1:so</div>
                                </li>
                                <li><a href="/article/1907.htm"
                                       title="java 路径 配置文件读取" target="_blank">java 路径 配置文件读取</a>
                                    <span class="text-muted">bro_feng</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a>
                                    <div>这几天做一个项目,关于路径做如下笔记,有需要供参考。 
 
取工程内的文件,一般都要用相对路径,这个自然不用多说。 
 
在src统计目录建配置文件目录res,在res中放入配置文件。 
读取文件使用方式: 
1. MyTest.class.getResourceAsStream("/res/xx.properties") 
2. properties.load(MyTest.</div>
                                </li>
                                <li><a href="/article/2034.htm"
                                       title="读《研磨设计模式》-代码笔记-简单工厂模式" target="_blank">读《研磨设计模式》-代码笔记-简单工厂模式</a>
                                    <span class="text-muted">bylijinnan</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/%E8%AE%BE%E8%AE%A1%E6%A8%A1%E5%BC%8F/1.htm">设计模式</a>
                                    <div>声明: 本文只为方便我个人查阅和理解,详细的分析以及源代码请移步 原作者的博客http://chjavach.iteye.com/ 
 
 


package design.pattern;

/*
 * 个人理解:简单工厂模式就是IOC;
 * 客户端要用到某一对象,本来是由客户创建的,现在改成由工厂创建,客户直接取就好了
 */
interface IProduct {
	</div>
                                </li>
                                <li><a href="/article/2161.htm"
                                       title="SVN与JIRA的关联" target="_blank">SVN与JIRA的关联</a>
                                    <span class="text-muted">chenyu19891124</span>
<a class="tag" taget="_blank" href="/search/SVN/1.htm">SVN</a>
                                    <div>SVN与JIRA的关联一直都没能装成功,今天凝聚心思花了一天时间整合好了。下面是自己整理的步骤: 
一、搭建好SVN环境,尤其是要把SVN的服务注册成系统服务 
二、装好JIRA,自己用是jira-4.3.4破解版 
三、下载SVN与JIRA的插件并解压,然后拷贝插件包下lib包里的三个jar,放到Atlassian\JIRA 4.3.4\atlassian-jira\WEB-INF\lib下,再</div>
                                </li>
                                <li><a href="/article/2288.htm"
                                       title="JWFDv0.96 最新设计思路" target="_blank">JWFDv0.96 最新设计思路</a>
                                    <span class="text-muted">comsci</span>
<a class="tag" taget="_blank" href="/search/%E6%95%B0%E6%8D%AE%E7%BB%93%E6%9E%84/1.htm">数据结构</a><a class="tag" taget="_blank" href="/search/%E7%AE%97%E6%B3%95/1.htm">算法</a><a class="tag" taget="_blank" href="/search/%E5%B7%A5%E4%BD%9C/1.htm">工作</a><a class="tag" taget="_blank" href="/search/%E4%BC%81%E4%B8%9A%E5%BA%94%E7%94%A8/1.htm">企业应用</a><a class="tag" taget="_blank" href="/search/%E5%85%AC%E5%91%8A/1.htm">公告</a>
                                    <div>                   
 
 
随着工作流技术的发展,工作流产品的应用范围也不断的在扩展,开始进入了像金融行业(我已经看到国有四大商业银行的工作流产品招标公告了),实时生产控制和其它比较重要的工程领域,而</div>
                                </li>
                                <li><a href="/article/2415.htm"
                                       title="vi 保存复制内容格式粘贴" target="_blank">vi 保存复制内容格式粘贴</a>
                                    <span class="text-muted">daizj</span>
<a class="tag" taget="_blank" href="/search/vi/1.htm">vi</a><a class="tag" taget="_blank" href="/search/%E7%B2%98%E8%B4%B4/1.htm">粘贴</a><a class="tag" taget="_blank" href="/search/%E5%A4%8D%E5%88%B6/1.htm">复制</a><a class="tag" taget="_blank" href="/search/%E4%BF%9D%E5%AD%98%E5%8E%9F%E6%A0%BC%E5%BC%8F/1.htm">保存原格式</a><a class="tag" taget="_blank" href="/search/%E4%B8%8D%E5%8F%98%E5%BD%A2/1.htm">不变形</a>
                                    <div>    vi是linux中非常好用的文本编辑工具,功能强大无比,但对于复制带有缩进格式的内容时,粘贴的时候内容错位很严重,不会按照复制时的格式排版,vi能不能在粘贴时,按复制进的格式进行粘贴呢? 答案是肯定的,vi有一个很强大的命令可以实现此功能 。 
 
    在命令模式输入:set paste,则进入paste模式,这样再进行粘贴时</div>
                                </li>
                                <li><a href="/article/2542.htm"
                                       title="shell脚本运行时报错误:/bin/bash^M: bad interpreter 的解决办法" target="_blank">shell脚本运行时报错误:/bin/bash^M: bad interpreter 的解决办法</a>
                                    <span class="text-muted">dongwei_6688</span>
<a class="tag" taget="_blank" href="/search/shell%E8%84%9A%E6%9C%AC/1.htm">shell脚本</a>
                                    <div>出现原因:windows上写的脚本,直接拷贝到linux系统上运行由于格式不兼容导致 
解决办法: 
 1. 比如文件名为myshell.sh,vim myshell.sh 
 2. 执行vim中的命令 : set ff?查看文件格式,如果显示fileformat=dos,证明文件格式有问题 
 3. 执行vim中的命令 :set fileformat=unix 将文件格式改过来就可以了,然后:w</div>
                                </li>
                                <li><a href="/article/2669.htm"
                                       title="高一上学期难记忆单词" target="_blank">高一上学期难记忆单词</a>
                                    <span class="text-muted">dcj3sjt126com</span>
<a class="tag" taget="_blank" href="/search/word/1.htm">word</a><a class="tag" taget="_blank" href="/search/english/1.htm">english</a>
                                    <div>honest 诚实的;正直的 
argue 争论 
classical 古典的 
hammer 锤子 
share  分享;共有 
sorrow 悲哀;悲痛 
adventure 冒险 
error 错误;差错 
closet 壁橱;储藏室 
pronounce 发音;宣告 
repeat 重做;重复 
majority 大多数;大半 
  
native 本国的,本地的,本国</div>
                                </li>
                                <li><a href="/article/2923.htm"
                                       title="hibernate查询返回DTO对象,DTO封装了多个pojo对象的属性" target="_blank">hibernate查询返回DTO对象,DTO封装了多个pojo对象的属性</a>
                                    <span class="text-muted">frankco</span>
<a class="tag" taget="_blank" href="/search/POJO/1.htm">POJO</a><a class="tag" taget="_blank" href="/search/hibernate%E6%9F%A5%E8%AF%A2/1.htm">hibernate查询</a><a class="tag" taget="_blank" href="/search/DTO/1.htm">DTO</a>
                                    <div>      DTO-数据传输对象;pojo-最纯粹的java对象与数据库中的表一一对应。 
      简单讲:DTO起到业务数据的传递作用,pojo则与持久层数据库打交道。 
  
      有时候我们需要查询返回DTO对象,因为DTO</div>
                                </li>
                                <li><a href="/article/3050.htm"
                                       title="Partition List" target="_blank">Partition List</a>
                                    <span class="text-muted">hcx2013</span>
<a class="tag" taget="_blank" href="/search/partition/1.htm">partition</a>
                                    <div>Given a linked list and a value x, partition it such that all nodes less than x come before nodes greater than or equal to x. 
You should preserve the original relative order of th</div>
                                </li>
                                <li><a href="/article/3177.htm"
                                       title="Spring MVC测试框架详解——客户端测试" target="_blank">Spring MVC测试框架详解——客户端测试</a>
                                    <span class="text-muted">jinnianshilongnian</span>

                                    <div>上一篇《Spring MVC测试框架详解——服务端测试》已经介绍了服务端测试,接下来再看看如果测试Rest客户端,对于客户端测试以前经常使用的方法是启动一个内嵌的jetty/tomcat容器,然后发送真实的请求到相应的控制器;这种方式的缺点就是速度慢;自Spring 3.2开始提供了对RestTemplate的模拟服务器测试方式,也就是说使用RestTemplate测试时无须启动服务器,而是模拟一</div>
                                </li>
                                <li><a href="/article/3304.htm"
                                       title="关于推荐个人观点" target="_blank">关于推荐个人观点</a>
                                    <span class="text-muted">liyonghui160com</span>
<a class="tag" taget="_blank" href="/search/%E6%8E%A8%E8%8D%90%E7%B3%BB%E7%BB%9F/1.htm">推荐系统</a><a class="tag" taget="_blank" href="/search/%E5%85%B3%E4%BA%8E%E6%8E%A8%E8%8D%90%E4%B8%AA%E4%BA%BA%E8%A7%82%E7%82%B9/1.htm">关于推荐个人观点</a>
                                    <div>    回想起来,我也做推荐了3年多了,最近公司做了调整招聘了很多算法工程师,以为需要多么高大上的算法才能搭建起来的,从实践中走过来,我只想说【不是这样的】 
 
    第一次接触推荐系统是在四年前入职的时候,那时候,机器学习和大数据都是没有的概念,什么大数据处理开源软件根本不存在,我们用多台计算机web程序记录用户行为,用.net的w</div>
                                </li>
                                <li><a href="/article/3431.htm"
                                       title="不间断旋转的动画" target="_blank">不间断旋转的动画</a>
                                    <span class="text-muted">pangyulei</span>
<a class="tag" taget="_blank" href="/search/%E5%8A%A8%E7%94%BB/1.htm">动画</a>
                                    <div>
CABasicAnimation* rotationAnimation;
    rotationAnimation = [CABasicAnimation animationWithKeyPath:@"transform.rotation.z"];
    rotationAnimation.toValue = [NSNumber numberWithFloat: M</div>
                                </li>
                                <li><a href="/article/3558.htm"
                                       title="自定义annotation" target="_blank">自定义annotation</a>
                                    <span class="text-muted">sha1064616837</span>
<a class="tag" taget="_blank" href="/search/java/1.htm">java</a><a class="tag" taget="_blank" href="/search/enum/1.htm">enum</a><a class="tag" taget="_blank" href="/search/annotation/1.htm">annotation</a><a class="tag" taget="_blank" href="/search/reflect/1.htm">reflect</a>
                                    <div>对象有的属性在页面上可编辑,有的属性在页面只可读,以前都是我们在页面上写死的,时间一久有时候会混乱,此处通过自定义annotation在类属性中定义。越来越发现Java的Annotation真心很强大,可以帮我们省去很多代码,让代码看上去简洁。 
下面这个例子 主要用到了 
1.自定义annotation:@interface,以及几个配合着自定义注解使用的几个注解 
2.简单的反射 
3.枚举 </div>
                                </li>
                                <li><a href="/article/3685.htm"
                                       title="Spring 源码" target="_blank">Spring 源码</a>
                                    <span class="text-muted">up2pu</span>
<a class="tag" taget="_blank" href="/search/spring/1.htm">spring</a>
                                    <div>1.Spring源代码 
https://github.com/SpringSource/spring-framework/branches/3.2.x 
注:兼容svn检出 
 
2.运行脚本 
import-into-eclipse.bat 
注:需要设置JAVA_HOME为jdk 1.7 
 
build.gradle 
 compileJava { 
 sourceCompatibilit</div>
                                </li>
                                <li><a href="/article/3812.htm"
                                       title="利用word分词来计算文本相似度" target="_blank">利用word分词来计算文本相似度</a>
                                    <span class="text-muted">yangshangchuan</span>
<a class="tag" taget="_blank" href="/search/word/1.htm">word</a><a class="tag" taget="_blank" href="/search/word%E5%88%86%E8%AF%8D/1.htm">word分词</a><a class="tag" taget="_blank" href="/search/%E6%96%87%E6%9C%AC%E7%9B%B8%E4%BC%BC%E5%BA%A6/1.htm">文本相似度</a><a class="tag" taget="_blank" href="/search/%E4%BD%99%E5%BC%A6%E7%9B%B8%E4%BC%BC%E5%BA%A6/1.htm">余弦相似度</a><a class="tag" taget="_blank" href="/search/%E7%AE%80%E5%8D%95%E5%85%B1%E6%9C%89%E8%AF%8D/1.htm">简单共有词</a>
                                    <div>word分词提供了多种文本相似度计算方式: 
方式一:余弦相似度,通过计算两个向量的夹角余弦值来评估他们的相似度 
实现类:org.apdplat.word.analysis.CosineTextSimilarity 
用法如下: 
String text1 = "我爱购物";
String text2 = "我爱读书";
String text3 = </div>
                                </li>
                </ul>
            </div>
        </div>
    </div>

<div>
    <div class="container">
        <div class="indexes">
            <strong>按字母分类:</strong>
            <a href="/tags/A/1.htm" target="_blank">A</a><a href="/tags/B/1.htm" target="_blank">B</a><a href="/tags/C/1.htm" target="_blank">C</a><a
                href="/tags/D/1.htm" target="_blank">D</a><a href="/tags/E/1.htm" target="_blank">E</a><a href="/tags/F/1.htm" target="_blank">F</a><a
                href="/tags/G/1.htm" target="_blank">G</a><a href="/tags/H/1.htm" target="_blank">H</a><a href="/tags/I/1.htm" target="_blank">I</a><a
                href="/tags/J/1.htm" target="_blank">J</a><a href="/tags/K/1.htm" target="_blank">K</a><a href="/tags/L/1.htm" target="_blank">L</a><a
                href="/tags/M/1.htm" target="_blank">M</a><a href="/tags/N/1.htm" target="_blank">N</a><a href="/tags/O/1.htm" target="_blank">O</a><a
                href="/tags/P/1.htm" target="_blank">P</a><a href="/tags/Q/1.htm" target="_blank">Q</a><a href="/tags/R/1.htm" target="_blank">R</a><a
                href="/tags/S/1.htm" target="_blank">S</a><a href="/tags/T/1.htm" target="_blank">T</a><a href="/tags/U/1.htm" target="_blank">U</a><a
                href="/tags/V/1.htm" target="_blank">V</a><a href="/tags/W/1.htm" target="_blank">W</a><a href="/tags/X/1.htm" target="_blank">X</a><a
                href="/tags/Y/1.htm" target="_blank">Y</a><a href="/tags/Z/1.htm" target="_blank">Z</a><a href="/tags/0/1.htm" target="_blank">其他</a>
        </div>
    </div>
</div>
<footer id="footer" class="mb30 mt30">
    <div class="container">
        <div class="footBglm">
            <a target="_blank" href="/">首页</a> -
            <a target="_blank" href="/custom/about.htm">关于我们</a> -
            <a target="_blank" href="/search/Java/1.htm">站内搜索</a> -
            <a target="_blank" href="/sitemap.txt">Sitemap</a> -
            <a target="_blank" href="/custom/delete.htm">侵权投诉</a>
        </div>
        <div class="copyright">版权所有 IT知识库 CopyRight © 2000-2050 E-COM-NET.COM , All Rights Reserved.
<!--            <a href="https://beian.miit.gov.cn/" rel="nofollow" target="_blank">京ICP备09083238号</a><br>-->
        </div>
    </div>
</footer>
<!-- 代码高亮 -->
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shCore.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shLegacy.js"></script>
<script type="text/javascript" src="/static/syntaxhighlighter/scripts/shAutoloader.js"></script>
<link type="text/css" rel="stylesheet" href="/static/syntaxhighlighter/styles/shCoreDefault.css"/>
<script type="text/javascript" src="/static/syntaxhighlighter/src/my_start_1.js"></script>





</body>

</html>