\n 流派: 摇滚 Rock\n
234人关注\n
\n 234人关注\n
常见的爬虫有很多:requests库,lxml库,re库,bs4库,urllib库等。
因为urllib是python自带的库,而包含urllib用法的requests库其实还挺好用的(各有各的好,这里先不比较)。
这里看看爬虫requests库的使用。
import requests
# url返回百度
response = requests.get("https://www.baidu.com")
>>> response
<Response [200]>
返回结果是200,OK
>>> response.status_code
200
>>> response.status_code==requests.codes.ok
True
python可以拿数据了,嗯哼。表示ok的。
那就试着获取前端文本数据。
>>> requests.get('http://www.baidu.com').text
'\r\n ç\x99¾åº¦ä¸\x80ä¸\x8bï¼\x8cä½\xa0å°±ç\x9f¥é\x81\x93 å\x85³äº\x8eç\x99¾åº¦ About Baidu
©2017 Baidu 使ç\x94¨ç\x99¾åº¦å\x89\x8då¿\x85读 æ\x84\x8fè§\x81å\x8f\x8dé¦\x88 京ICPè¯\x81030173å\x8f·
\r\n'
>>>
获取页面信息,以方便后边查询分析,非常简单。
重点在于大家可以试试获取一下自己喜欢的一些网站试一下。
再来一次:
>>> response = requests.get("https://www.douban.com")
>>> response
<Response [418]>
为何不是200而是418?
I’m a teapot => 我就是个杯具
看看百度和豆瓣有何区别吧:
在网址主页后加一个robots.txt看看
看开有些网站可以爬取,有些被发现触发反爬取了!!!
让爬虫换装,好好做好SPY。
加上一个headers就好了,可以直接加上浏览器浏览的‘外衣’
>>> headers= { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36" }
>>> url = "https://www.douban.com"
>>> r = requests.get(url,headers=headers)
>>> r
<Response [200]>
>>> r.text
'\n\n\n\n\n\n\n\n\n\n豆瓣 \n\n\n\n\n\n\n\n\n\n \n\n\n\n\n \n \n \n \n \n \n \n\n\n\n\n\n \n\n \n \n\n\n\n\n \n \n \n \n \n \n \n \n\n\n\n\n\n\n\n \n \n 热门话题\n · · · · · ·\n (\n \n 去话题广场\n ) \n
\n\n\n \n \n - \n 关于宇宙的浪漫句子\n \n 65.6万次浏览\n
\n - \n 夏天的梅子酒泡好了\n \n 54.6万次浏览\n
\n - \n 现实生活中“误入”他人世界的经历\n \n 81.6万次浏览\n
\n - \n 看展记\n \n 2151.6万次浏览\n
\n - \n 你日常生活中的小发明\n \n 17.5万次浏览\n
\n - \n 让你终生难忘 的高考题目\n \n 3.1万次浏览\n
\n
\n\n\n \n - \n \n
\n - \n \n
\n \n
\n \n\n \n\n\n \n \n 热点内容\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n \n \n - \n \n 孤独宇航员\n 193张照片\n
- \n \n 天真有邪\n 12张照片\n
- \n \n Gabriella Barouch\n 108张照片\n
- \n \n 我前幾天去了趟動物園\n 83张照片\n
\n \n \n \n - \n \n 再谈教 育孩子\n \n \n
再谈教育孩子,前两周遇 到假日,大超市不开门,门口小店开门,但牛奶比大超市贵一...
\n \n\n - 与裁缝店有关的往事
\n - 电影的当代性
\n - 我的老娘叫老巧
\n - 以外
\n - 国内的“小镇做 题家”与加拿大的“小镇技术工”
\n - 郑渊洁是80后的cult启蒙,皮皮鲁专营店是世纪末的麦加圣城
\n - 法国孕期手记2 为什么生孩子?
\n - 给想养猫的朋友泼冷水
\n - 寻找伯希和 —学术以外的生平补充
\n
\n \n\n\n \n \n\n\n\n\n\n\n\n\n\n\n \n\n \n \n \n \n 豆瓣时间
\n \n\n \n \n \n \n 热门专栏\n ······\n (\n \n 更多\n ) \n
\n\n\n \n\n\n\n\n - \n \n \n \n \n 童话交响梦——古典音乐这样听\n 音频专栏\n
\n - \n \n \n \n \n 简食知味——20道全生素食料理课\n 视频专栏\n
\n - \n \n \n \n \n 人心可测——姜振宇的微 表情读心术\n 音频专栏\n
\n - \n \n \n \n \n 生死之间:10堂课学会如何与疾病共处\n 音频专栏\n
\n - \n \n \n \n \n 爱我,请先懂我——纽约大学艺术治疗师的儿童心理成长课\n 音频专栏\n
\n - \n \n \n \n \n 罪恶的背后——人人必修的60堂犯罪心理学\n 音频专栏\n
\n - \n \n \n \n \n 用性别之尺丈量世界——18堂思想课解读女性问题\n 音频专栏\n
\n - \n \n \n \n \n 微电影剧作——如何在剧作中运用观众心理学\n 视频专栏\n
\n - \n \n \n \n \n 电影产业破壁课——13小时重塑电影世界观\n 视频专栏\n
\n - \n \n \n \n \n 不准无聊!精品大师课免费放送\n 音频专栏\n
\n
\n\n\n \n \n\n\n\n\n\n\n\n\n\n \n\n \n \n \n\n\n\n\n\n\n\n\n\n\n \n\n \n \n \n\n 小组
\n \n \n \n \n \n\n\n \n \n
\n \n\n\n\n \n\n\n \n 小组分类\n · · · · · ·\n
\n\n \n \n \n \n \n \n \n \n\n\n \n\n\n \n 热门小组\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n\n \n - \n \n \n \n 我爱三毛\n \n 47546 个成员\n \n
- \n \n \n \n 沙发客\n \n 23443 个成员\n \n
- \n \n \n \n 刘慈欣\n \n 30591 个成员\n \n
- \n \n \n \n 西双版纳\n \n 11718 个成员\n \n
- \n \n \n \n 记事本圆梦小组\n \n 117350 个成员\n \n
- \n \n \n \n 搭讪学\n \n 94277 个成员\n \n
- \n \n \n \n 这辈子一定要做几件疯狂的事\n \n 92523 个成员\n \n
- \n \n \n \n 自己给自己剪头发\n \n 35204 个成员\n \n
- \n \n \n \n 我们就是要做衣服给自己穿\n \n 3711 个成员\n \n
\n\n\n \n \n\n\n\n\n\n\n\n\n\n \n\n \n \n \n\n \n\n \n\n\n \nn\n\n \n\n\n\n \n 热门标签\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n\n\n\n\n\n\n\n\n\n\n \n\n\n\n \n 新书速递\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n\n \n - \n \n \n 铜锣烧也有春天...\n \n \n 免费试读\n
- \n \n \n 双子星\n \n \n 免费试读\n
- \n \n \n 风暴来的那一天...\n \n \n 免费试读\n
- \n \n \n 德米安\n \n \n 免费试读\n
\n\n\n\n\n\n \n \n\n\n\n\n\n\n\n\n\n \n\n \n \n \n \n 音乐
\n \n \n \n \n \n\n \n\n\n \n\n \n\n \n \n \n \n \n 本周流行音乐人\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n \n \n - \n 1.\n \n \n 血雾宫殿\n \n 流派: 摇滚 Rock\n
234人关注\n \n \n \n - \n 2.\n \n \n nara\n \n 流派: 轻音乐 Easy Listening\n
819人关注\n \n \n \n - \n 3.\n \n \n Rnzi\n \n 流派: 摇滚 Rock\n
89人关注\n \n \n \n - \n 4.\n \n \n Alternative for Baroque\n \n 流派: 轻音乐 Easy Listening\n
461人关注\n \n \n \n - \n 5.\n \n \n 拟 白\n \n 流派: 电子 Electronica\n
171人关注\n \n \n \n
\n \n \n\n \n\n \n \n \n 豆瓣新碟榜\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n \n \n - \n \n \n \n \n \n \n 1. 莲\n \n \n 张艺兴\n \n \n
\n - \n \n \n \n \n \n \n 2. 格格不入\n \n \n 陈立农\n \n \n
\n - \n \n \n \n \n \n \n 3. NANA I\n \n \n 欧阳娜娜\n \n \n
\n - \n \n \n \n \n \n \n 4. Awaken The World\n \n \n 威神V WayV\n \n \n
\n - \n \n \n \n \n \n \n 5. Chromatica\n \n \n Lady Gaga\n \n \n
\n - \n \n \n \n \n \n \n 6. Run the Jewels 4\n \n \n Run the Jewels\n \n \n
\n - \n \n \n \n \n \n \n 7. MORE & MORE\n \n \n 트와이스 TWICE\n \n \n
\n - \n \n \n \n \n \n \n 8. Delight\n \n \n 백현 伯贤\n \n \n
\n
\n \n\n \n \n \n 热门歌单\n · · · · · ·\n (\n \n 更多\n ) \n
\n\n \n\n\n \n \n \n\n\n\n\n\n\n\n\n\n\n \n\n \n \n\n\n\n\n\n\n\n\n\n \n\n \n \n\n\n\n\n\n\n\n\n\n\n\n \n \n \n \n\n © 2005-2020 douban.com, all rights reserved 北京豆网科技有限公司\n
\n 京ICP证090015号 京ICP备11027288号 网络视听许可证0110418号\n
京网文[2015]2026-368号 京公网安备11010502000728 新出网证(京)字129号\n
违法和不良信息举报电话:4008353331-9 \n
中国互联网举报中心 电话:12377 新出发京批字第直160029号\n\n\n \n\n\n 关于豆瓣\n · 在豆瓣工作\n · 联系我们\n · 法律声明\n \n · 帮助中心\n · 移动应用\n · 豆瓣广告\n \n \n \n\n\n \n\n\n\n\n\n\n\n\n\n\n \n \n\n\n\n\n \n\n\n\n\n\n\n\n\n\n\n\n \n \n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n'
这样,我们python爬虫爬取图片信息就完了。
能跑得动的,记得点个赞呗。
后期会有其他方法更新,关注一波呗