python爬虫实现贴吧表情包的爬取

本文利用urllib在python3.7的环境下实现贴吧表情包的爬取!

用到的包有urllib与re两个模块,具体实现如下!

import urllib.request
import re
import ssl

url = "https://tieba.baidu.com/p/5059180075?red_tag=0069685467"

def baidu(url):
    ssl._create_default_https_context = ssl._create_unverified_context
    req = urllib.request.Request(url)
    data = urllib.request.urlopen(req).read().decode('utf-8')
    print(data)
    #return data

def parse(html):
    pat = r'

需要注意的是,代码中还有导入一个ssl模块,在python2.7.9之后,用urllib模块打开一个网址时,会验证一次SSL证书,如果没有声明它,会报出如下错误!

urllib.error.URLError:

所以必须在代码上中声明,另外还有一种声明方式,如下:

context = ssl._create_unverified_context()
res = request.urlopen(req, context=context)

下面是抓取多页的版本!

import urllib.request
import ssl
import re

def main():
    ssl._create_default_https_context = ssl._create_unverified_context
    temp = 1
    for i in range(1,3):
        url = "https://tieba.baidu.com/p/5059180075?pn=%s" %i
        req = urllib.request.Request(url)
        data = urllib.request.urlopen(req).read().decode("utf-8")
        print(data)
        pat = '

 

你可能感兴趣的:(爬虫,爬虫)