简单爬虫爬取知乎日报并保存日报网页到本地

知乎日报爬虫

# coding=utf-8

import requests
from lxml import html


def spider_zhihudaily():
    url = "http://daily.zhihu.com/"
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36"}
    response = requests.get(url, headers = headers)
    # print(response.encoding)
    url_data = response.text
    selector = html.fromstring(url_data)
    ul_list = selector.xpath('//div[@class="main-content-wrap"]/div[@class="row"]//a')

    for ul in ul_list:
        title = ul.xpath("span/text()")[0].replace("?","")
        print(title)
        link = ul.xpath("@href")
        print("https://daily.zhihu.com" + link[0])

        img_url = ul.xpath("img/@src")
        print(img_url[0])

        content = requests.get("https://daily.zhihu.com" + link[0], headers = headers).text
        f = open('./zhihu_html/{0}.html'.format(title), 'w', encoding='utf-8')
        f.write(content)

        print("--------------------------")


if __name__ == "__main__":
    spider_zhihudaily()

# 代码运行结果

C:\Users\ws\.virtualenvs\pytools\Scripts\python.exe D:/python/pytools/chapter01_spider_book/spider_zhihudaily.py
阿杜的亚文化再研究
https://daily.zhihu.com/story/9713313
https://pic1.zhimg.com/v2-36cbffea9cdf1590fd90a1cfe4184bec.jpg
--------------------------
老一辈人常说的「脚气在土里走走好了」,有什么原理?
https://daily.zhihu.com/story/9713335
https://pic1.zhimg.com/v2-bd24d473b75d213a015f4fe9b6080840.jpg
--------------------------
如果三代蜘蛛侠对决,谁胜算更大?
https://daily.zhihu.com/story/9713283
https://pic4.zhimg.com/v2-3d06d6821d6db4bde782be5ae7b692f3.jpg
--------------------------
威少和保罗互换东家,能盘活火箭和雷霆吗?
https://daily.zhihu.com/story/9713376
https://pic2.zhimg.com/v2-cfd8375cb3af92e6fb265814faf0ab01.jpg
--------------------------
为什么大部分人会对喝自己的口水感到恶心?
https://daily.zhihu.com/story/9713245
https://pic4.zhimg.com/v2-ca9cb3863923fc574aadd9332fc980c3.jpg
--------------------------
「无反相机」是如何发展起来的?
https://daily.zhihu.com/story/9713258
https://pic3.zhimg.com/v2-53d720366dee5c6c061f2a292248508e.jpg
--------------------------
瞎扯 · 如何正确地吐槽
https://daily.zhihu.com/story/9713369
https://pic1.zhimg.com/v2-88354bf286e898517b6a91db45ab6ec4.jpg
--------------------------
《长安十二时辰》中有哪些值得剖析的细节和彩蛋?
https://daily.zhihu.com/story/9713242
https://pic4.zhimg.com/v2-a3ba34ca7385c9a04fcfa4df2c95b3bb.jpg
--------------------------
未检疫水果入境,这事到底多严重?
https://daily.zhihu.com/story/9713230
https://pic4.zhimg.com/v2-1c82311ae5b9eb6133819cb566b7066b.jpg
--------------------------
如何看待游戏《恋与制作人》宣布动画化?
https://daily.zhihu.com/story/9713296
https://pic3.zhimg.com/v2-bf3a3bfecd18aad6d0525943ec034e22.jpg
--------------------------
怎样避免早上起床时的口臭?
https://daily.zhihu.com/story/9713234
https://pic3.zhimg.com/v2-5641bc119681279f94ad00b780492d12.jpg
--------------------------
瞎扯 · 如何正确地吐槽
https://daily.zhihu.com/story/9713164
https://pic1.zhimg.com/v2-29b8253fc73d604e9c3b3331d07507f0.jpg
--------------------------
女友婚前,让我过户婚前房子的一半给她,该怎么办?
https://daily.zhihu.com/story/9713262
https://pic1.zhimg.com/v2-e201ee5ec7aa14b3cbdb78bb6466e34c.jpg
--------------------------
人的听觉系统是怎样对声音进行定位的?
https://daily.zhihu.com/story/9713210
https://pic3.zhimg.com/v2-95e4632978715690366e4cc65adf75ba.jpg
--------------------------
儿子的玩伴很聪明也算很有心机,还应该让孩子跟他一起玩吗?
https://daily.zhihu.com/story/9713204
https://pic3.zhimg.com/v2-17325b8d8847598b78b6a87923973042.jpg
--------------------------
为什么滑石粉作为一种已知致癌物,还被添加进化妆品里?
https://daily.zhihu.com/story/9713224
https://pic4.zhimg.com/v2-1bdfd07e9a202258c1b2adebee504a27.jpg
--------------------------
瞎扯 · 如何正确地吐槽
https://daily.zhihu.com/story/9713298
https://pic4.zhimg.com/v2-0e0b61ed9b282eeb180b132b663a9aa3.jpg
--------------------------
如何评价《怪奇物语》第三季
https://daily.zhihu.com/story/9713215
https://pic1.zhimg.com/v2-70d018a670e42f857dc38903f3148d6c.jpg
--------------------------
开普勒是如何得出开普勒三大定律的?
https://daily.zhihu.com/story/9713194
https://pic2.zhimg.com/v2-caca938de9a4fd46175f31630107fa51.jpg
--------------------------
中国有什么 ACG 爱好者圣地巡礼的地方?
https://daily.zhihu.com/story/9713184
https://pic4.zhimg.com/v2-fca6cb00b1da7d6ec0244c78d20d74af.jpg
--------------------------
为什么把鱼放进可乐和雪碧中浸泡 30 天,鱼没了?
https://daily.zhihu.com/story/9713253
https://pic4.zhimg.com/v2-a1a6a6286dcb6249abb0f12cba7e0a7f.jpg
--------------------------
瞎扯 · 如何正确地吐槽
https://daily.zhihu.com/story/9713107
https://pic3.zhimg.com/v2-c3e0252364705040c480e4cd2900503e.jpg
--------------------------
为什么最近的地震如此频繁?
https://daily.zhihu.com/story/9713203
https://pic3.zhimg.com/v2-8fcae8e7e92a40d67002cff00b0e0e42.jpg
--------------------------
如何看待伦纳德与快船达成 41.42 亿美元签约协议?
https://daily.zhihu.com/story/9713169
https://pic2.zhimg.com/v2-77848b0003dc703c89a2d7d8a1578b6d.jpg
--------------------------
对科学执着追求的人,可以整晚打游戏吗?
https://daily.zhihu.com/story/9713115
https://pic3.zhimg.com/v2-f73ebfeb032d7a12e1e01aa77e40343e.jpg
--------------------------
怎样快速去除嘴里的蒜味?
https://daily.zhihu.com/story/9713140
https://pic3.zhimg.com/v2-ac354902d2f1572c03d6501d21c1f6be.jpg
--------------------------
瞎扯 · 如何正确地吐槽
https://daily.zhihu.com/story/9713158
https://pic1.zhimg.com/v2-0a84aa7f699d493504e7e1cce5d50374.jpg
--------------------------
小事 · 医生,我的丈夫,还有多久才……死?
https://daily.zhihu.com/story/9713139
https://pic2.zhimg.com/v2-b7ca3f6664db39c41b8d78607e553fe9.jpg
--------------------------
原生动物是如何演化成后生动物的?
https://daily.zhihu.com/story/9713001
https://pic4.zhimg.com/v2-43f652b7055ea70c646eb4a4c484c4c3.jpg
--------------------------
为什么有时候吃了油炸物就会喉咙痛?
https://daily.zhihu.com/story/9713137
https://pic2.zhimg.com/v2-38033e1a5fe2dc2357928c23f0fa6e61.jpg
--------------------------

Process finished with exit code 0

简单爬虫爬取知乎日报并保存日报网页到本地_第1张图片

你可能感兴趣的:(简单爬虫爬取知乎日报并保存日报网页到本地)