Python3网络爬虫:从索引爬取全部的章节的url,用for循环打印

倚天屠龙记小说列表:URL
https://www.2biqukan.com/fiction/zsczu/contents.html

实现代码

 from urllib import request
    from bs4 import BeautifulSoup
    
if __name__ == "__main__":
    index_url = "https://www.2biqukan.com/fiction/zsczu.html"
    header={
        'User-Agent': 'Mozilla/5.0 (Linux; Android 4.1.1; Nexus 7 Build/JRO03D) AppleWebKit/'
                      '535.19 (KHTML, like Gecko) Chrome/18.0.1025.166  Safari/535.19'
    }
    #指定url,header生成request
    url_req = request.Request(index_url,headers=header)
    #打开url,并获得请求内容response
    response = request.urlopen(url_req)
    #读取response的内容,用gbk解码,得到html内容
    html = response.read().decode('utf-8', 'ignore')
    #用BeautifulSoup处理得到的网页html
    html_soup = BeautifulSoup(html,'lxml')
    index = BeautifulSoup(str(html_soup.find_all('ul', class_='list-group novel-index row')),'lxml')
    #print(index.find_all(['ul', ['li']]))
    #判断是否找到了《神墓》正文卷
    body_flag = False    
    for element in index.find_all(['ul', ['li']]):
        if element.name == 'ul':
            body_flag = True
        if body_flag is True and element.name == 'li':
            chapter_name = element.string
            chapter_url = "https://www.2biqukan.com"+element.a.get('href')
            print(" {} 链接:{}".format(chapter_name,chapter_url))

Python3网络爬虫:从索引爬取全部的章节的url,用for循环打印_第1张图片
Python3网络爬虫:从索引爬取全部的章节的url,用for循环打印_第2张图片

你可能感兴趣的:(Python)