首先感谢【小甲鱼】极客Python之效率革命。讲的很好,通俗易懂,适合入门。
感兴趣的朋友可以访问https://fishc.com.cn/forum-319-1.html来支持小甲鱼。谢谢大家。
想要学习requests库的可以查阅: https://fishc.com.cn/forum.php?mod=viewthread&tid=95893&extra=page%3D1%26filter%3Dtypeid%26typeid%3D701
1.首先我们来分析一下,先元素定位
我们先把网页源代码爬下来看看
# -*- coding:UTF-8 -*-
import requests
def get_url(url):
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
"110 Safari/537.36 SE 2.X MetaSr 1.0"}
res = requests.get(url, headers=headers)
return res
def main():
url = input("请输入链接地址:")
res = get_url(url)
with open("res.txt", "w", encoding="utf-8") as file:
file.write(res.text)
if __name__ == "__main__":
main()
发现内容里面并没有我们想要的精彩评论。
2.放慢浏览器的加载速度,一旦出现精彩评论内容,就给它取消掉,找到评价对应的资源文件
Request URL:https://music.163.com/weapi/v1/resource/comments/R_SO_4_1356350562?csrf_token=643432a22c0bfd772c33e2726c942e48
Request Method:POST
这样我们把这个目标文件给下载下来(用requests去模范浏览器请求)
# -*- coding:UTF-8 -*-
import requests
def get_comments(url):
name_id = url.split('=')[1]
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
"110 Safari/537.36 SE 2.X MetaSr 1.0",
"refer": "https://music.163.com/"
}
params = "jvRGxPQYIeDQiiYsS8qg51ryAhi9TwM0H3NGLu7B9re4EOw9/a7jHRW0P5jhupFbSamLsjHvSpivhbtFiTObUOR2mYA7nFh5KUxaXn3bYh8GXy9sGTbxLeFCuY0KoNAfwWICK0n9ZRPlBHQ1CGBiohOq8+FDDPVBJhbcYgOSPhpTiZ22Ea+/xoYuk7UHnXHty093tfxAXJU032N1uaksCQmMzHxafQ1OA0BroKvyEMA="
encSecKey = "969f735e7bc94d2b6a6f8371dd89e27d16161ea019a7d2b31391c257452c358678e7ffc11c45712a7f1e47fb1bea81dcf0dbb6f6335045766c06ef1fcc3758987cd30a8674510a062bf626dc2aed8b24c25e7a92ecb1ea38ac514e937f69343923a669d9024ff7a65f8154a35f854de05b67a56dd46d7fa5c136b02c414ce0ea"
data = {
"params": params,
"encSecKey": encSecKey
}
target_url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id) # 对目标URL进行分析,让每个URL都能用
res = requests.post(target_url, headers=headers, data=data) # 把这个post请求给构造出来,F12看浏览器里面是怎么样的
return res
def main():
url = input("请输入链接地址:")
res = get_comments(url)
with open("data.txt", "w", encoding="utf-8") as file:
file.write(res.text)
if __name__ == "__main__":
main()
3.提取我们要的数据(把返回内容保存为json用火狐打开分析下,看看我们需要提取的数据是在哪里的)
这样我们就知道我们要的数据在哪里了
上完整代码
# -*- coding:UTF-8 -*-
import requests
import json
def get_hot_comment(res):
comment_json = json.loads(res.text) # 将已编码的 JSON 字符串解码为 Python 对象
hot_comments = comment_json['hotComments']
print(hot_comments)
with open('hot_comment.txt', 'w', encoding='utf-8') as file:
for each in hot_comments:
file.write(each['user']['nickname'] + ':\n\n')
file.write(each['content'] + '\n')
file.write('-'*50 + '\n')
def get_comments(url):
name_id = url.split('=')[1]
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029."
"110 Safari/537.36 SE 2.X MetaSr 1.0",
"refer": "https://music.163.com/"
}
params = "jvRGxPQYIeDQiiYsS8qg51ryAhi9TwM0H3NGLu7B9re4EOw9/a7jHRW0P5jhupFbSamLsjHvSpivhbtFiTObUOR2mYA7nFh5KUxaXn3bYh8GXy9sGTbxLeFCuY0KoNAfwWICK0n9ZRPlBHQ1CGBiohOq8+FDDPVBJhbcYgOSPhpTiZ22Ea+/xoYuk7UHnXHty093tfxAXJU032N1uaksCQmMzHxafQ1OA0BroKvyEMA="
encSecKey = "969f735e7bc94d2b6a6f8371dd89e27d16161ea019a7d2b31391c257452c358678e7ffc11c45712a7f1e47fb1bea81dcf0dbb6f6335045766c06ef1fcc3758987cd30a8674510a062bf626dc2aed8b24c25e7a92ecb1ea38ac514e937f69343923a669d9024ff7a65f8154a35f854de05b67a56dd46d7fa5c136b02c414ce0ea"
data = {
"params": params,
"encSecKey": encSecKey
}
target_url = "https://music.163.com/weapi/v1/resource/comments/R_SO_4_{}?csrf_token=".format(name_id)
res = requests.post(target_url, headers=headers, data=data)
return res
def main():
url = input("请输入链接地址:")
res = get_comments(url)
get_hot_comment(res)
if __name__ == "__main__":
main()
实现的效果
是不是挺有意思的呢