python 爬虫入门——获取页面代码

常见的爬虫有很多:requests库,lxml库,re库,bs4库,urllib库等。
因为urllib是python自带的库,而包含urllib用法的requests库其实还挺好用的(各有各的好,这里先不比较)。
这里看看爬虫requests库的使用。

import requests
# url返回百度
response = requests.get("https://www.baidu.com")
>>> response
<Response [200]>

返回结果是200,OK

>>> response.status_code
200
>>> response.status_code==requests.codes.ok
True

python可以拿数据了,嗯哼。表示ok的。
那就试着获取前端文本数据。

>>> requests.get('http://www.baidu.com').text
'\r\n ç\x99¾åº¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93  
\r\n'
>>>

python 爬虫入门——获取页面代码_第1张图片

获取页面信息,以方便后边查询分析,非常简单。

重点在于大家可以试试获取一下自己喜欢的一些网站试一下。


再来一次:

>>> response = requests.get("https://www.douban.com")
>>> response
<Response [418]>

python 爬虫入门——获取页面代码_第2张图片
为何不是200而是418?
python 爬虫入门——获取页面代码_第3张图片
I’m a teapot => 我就是个杯具

看看百度和豆瓣有何区别吧:
在网址主页后加一个robots.txt看看

python 爬虫入门——获取页面代码_第4张图片

python 爬虫入门——获取页面代码_第5张图片

看开有些网站可以爬取,有些被发现触发反爬取了!!!

python 爬虫入门——获取页面代码_第6张图片
哈哈哈,其实这只是第一个坎儿,可以立马避开!

让爬虫换装,好好做好SPY。

加上一个headers就好了,可以直接加上浏览器浏览的‘外衣’

python 爬虫入门——获取页面代码_第7张图片

>>> headers= { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36" }
>>> url = "https://www.douban.com"
>>> r = requests.get(url,headers=headers)
>>> r
<Response [200]>
>>> r.text
'\n\n\n\n\n\n\n\n\n\n豆瓣\n\n\n\n\n\n\n\n\n\n  
\n \n\n

豆瓣

\n\n
\n
\n \n \n
\n
\n
\n\n\n\n
\n
\n \n
\n

豆瓣6.0

\n

\n 下载豆瓣 App\n
\n \n
\n \n

iOS / Android 扫码直接下载

\n
\n
\n
\n
\n \n
\n\n\n\n\n \n
\n
\n \n\n
\n\n\n \n
\n
\n
\n
\n
\n \n
\n
\n
\n
\n\n \n

\n 热点内容\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n \n
\n
\n
\n \n\n
\n\n\n\n\n\n\n\n\n \n\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n

电影

\n \n \n \n \n \n\n\n
\n
    \n
\n
\n\n
\n\n
\n
\n\n\n

\n 影片分类\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n\n
\n\n
\n\n\n

\n 近期热门\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n
    \n
  1. \n 春潮\n
  2. \n
  3. \n 真心半解\n
  4. \n
  5. \n 大饿\n
  6. \n
  7. \n 黑帮大佬和 我的365日\n
  8. \n
  9. \n 隐秘的爱\n
  10. \n
  11. \n 给我翅膀\n
  12. \n
  13. \n 纽约的一个雨天\n
  14. \n
  15. \n 血液机器\n
  16. \n
  17. \n 某种寂静\n
  18. \n
  19. \n 火口的两人\n
  20. \n
\n
\n
\n
\n
\n\n\n

\n 正在热映\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n
\n
\n \n
\n\n
\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n

小组

\n \n \n \n \n \n\n\n
\n
    \n
\n
\n\n
\n\n
\n
\n\n

\n 小组分类\n  · · · · · ·\n

\n\n \n \n
\n \n
\n \n \n \n
\n \n
\n
\n \n
\n
\n
\n
\n\n\n

\n 热门小组\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n
\n
\n \n
\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n
\n

读书

\n \n \n \n \n
\n\n \n\n\n
\n \n
\nn
\n\n
\n\n
\n\n

\n 热门标签\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n
\n \n
\n
\n \n
\n
\n \n
\n
\n \n
\n
\n \n
\n
\n \n
\n
\n
\n
\n
\n\n
\n\n

\n 新书速递\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n
\n\n
\n\n

\n 原创数字作品\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n
\n
\n
\n \n
\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n

音乐

\n \n \n \n\n \n\n\n
\n \n
\n\n
\n\n
\n
\n \n \n

\n 本周流行音乐人\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n
    \n
  • \n 1.\n
    \n \n \n \n
    \n
    \n 血雾宫殿\n
    \n 流派: 摇滚 Rock\n
    234人关注\n
    \n
    \n
  • \n
  • \n 2.\n
    \n \n \n \n
    \n
    \n nara\n
    \n 流派: 轻音乐 Easy Listening\n
    819人关注\n
    \n
    \n
  • \n
  • \n 3.\n
    \n \n \n \n
    \n
    \n Rnzi\n
    \n 流派: 摇滚 Rock\n
    89人关注\n
    \n
    \n
  • \n
  • \n 4.\n
    \n \n \n \n
    \n
    \n Alternative for Baroque\n
    \n 流派: 轻音乐 Easy Listening\n
    461人关注\n
    \n
    \n
  • \n
  • \n 5.\n
    \n \n \n \n
    \n
    \n 拟 白\n
    \n 流派: 电子 Electronica\n
    171人关注\n
    \n
    \n
  • \n
\n
\n
\n
\n
\n\n \n \n

\n 豆瓣新碟榜\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n\n \n \n

\n 热门歌单\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n
    \n \n
  • \n
    \n
    中文丨你听过他们的 名字,不...
    \n
  • \n \n
  • \n
    \n
    「摇滚现场」那些骨子里的温...
    \n
  • \n \n
  • \n
    \n
    轻松加愉快的Beat
    \n
  • \n \n
  • \n
    \n
    Industrial|冰冷的旋律
    \n
  • \n \n
  • \n
    \n
    电音trap|带你进入炸裂音符
    \n
  • \n \n
  • \n
    \n
    ♪『綵劍竹水琉』~ ソング⑤ ~
    \n
  • \n
\n
\n\n
\n
\n \n
\n\n
\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n

\n \n 豆品\n \n

\n
\n\n
\n
\n \n

\n 热门活动\n  · · · · · ·\n

\n\n \n
\n\n
\n \n

\n 官方小组\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
    \n
\n
\n
\n
\n \n

\n 热卖商品\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n \n
\n
\n \n
\n\n\n\n\n\n\n\n\n \n
\n
\n \n \n
\n

同城

\n \n \n \n \n \n\n\n
\n
    \n
\n
\n\n
\n\n
\n\n
\n\n

\n 活动标签\n  · · · · · ·\n

\n\n\n\n\n\n
\n \n
\n
\n
\n
\n\n\n

\n 石家庄 · 本周热门活动\n  · · · · · ·\n  (\n \n 更多\n ) \n

\n\n
\n \n
\n
\n
\n \n
\n\n\n\n\n\n\n\n\n\n
\n
\n
\n \n \n\n © 2005-2020 douban.com, all rights reserved 北京豆网科技有限公司\n
\n 京ICP证090015号 京ICP备11027288号 网络视听许可证0110418号\n
京网文[2015]2026-368号 京公网安备11010502000728  新出网证(京)字129号\n
违法和不良信息举报电话:4008353331-9 \n
中国互联网举报中心 电话:12377 新出发京批字第直160029号\n
\n\n\n\n\n 关于豆瓣\n · 在豆瓣工作\n · 联系我们\n · 法律声明\n \n · 帮助中心\n · 移动应用\n · 豆瓣广告\n \n 网上有害信息举报专区\n \n\n\n
\n
\n\n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'

python 爬虫入门——获取页面代码_第8张图片

这样,我们python爬虫爬取图片信息就完了。

能跑得动的,记得点个赞呗。

后期会有其他方法更新,关注一波呗

你可能感兴趣的:(网络爬虫,大数据,python,大数据,python,其他)