Python爬取360手机助手评论——以百度地图为例

想做竞品分析,打算先从应用市场爬一些应用的用户用户评论作为素材;这次爬取的是360手机助手网站,结尾附爬取完的百度地图和高德地图的用户评论文件~

网页链接:http://zhushou.360.cn/detail/index/soft_id/7655?recrefer=SE_D_%E7%99%BE%E5%BA%A6%E5%9C%B0%E5%9B%BE#nogo

以中评为例 ,打开f12开发者模式,点击“查看更多评论”,可以看到一条getComments的网页链接(如下),分析一下参数可得:
start指开始的评论索引,count指每次加载的评论个数(经试验最多可调至count = 50),type分为三种,best、good和bad分别对应好评、中评和差评;level与前面type参数对应,分别是1,2,3;其余参数不影响数据获取

https://comment.mobilem.360.cn/comment/getComments?callback=jQuery1720035670320680676326_1571120471046&baike=%E7%99%BE%E5%BA%A6%E6%89%8B%E6%9C%BA%E5%9C%B0%E5%9B%BE+for+android&c=message&a=getmessage&start=10&count=10&type=good&level=2&_=1571120557227

查看response发现包含版本信息、评论时间、打分、评论内容等,稍加改动就可以将其作为json格式的数据提取我们想要的参数了:

try{jQuery1720035670320680676326_1571120471046({"errno":0,"error":"","data":{"total":"1550","messages":[{"likes":"9","replies":"0","weight":"0","create_time":"2019-07-27 20:57:03","version_name":"10.17.2","score":"2","text_score":"0","m_type":"0","puid":"0","pid":"0","support_type":"0","content":"\u4e00\u8d77\u5f88\u597d\u7528\u7684\uff0c\u73b0\u5728\u5bfc\u822a\u8def\u7ebf\u90fd\u4e0d\u52a8\u4e86\uff0cGPS\u4fe1\u53f7\u5dee\uff01","imgs":"","username":"\u514b\u4ec0\u7c73\u5c14\u56fd\u738b","image_url":"http:\/\/p1.qhmsg.com\/dm\/50_50_100\/t01be171c8c069b324b.jpg","msgid":"58675464","type":"good","qid":"223943267","isadmin":"","liked":"0"}

接下来就是找到对应的url修改参数进行爬取并将结果保存到本地文件啦:

import requests
import re
import json
import time

headers = {
    "Accept":"*/*",
    "Accept-Encoding":"gzip, deflate, sdch",
    "Accept-Language":"zh-CN,zh;q=0.8",
    "Connection":"keep-alive",
    "Cookie":"__huid=11mrH/1/uQtfZUIEZInizlWZyTeXPCGtKxxUrq+259Bvw=; __guid=231226694.4010251614273200128.1551939329000.2239; quCryptCode=2V3Qdc%252BWb6%252BTkB0SrDzTRxtKXqtVsQrJM16piVT%252Bajy9rkSCTh45KHL61p8oIcGG9Z8S%252Bo0SjO4%253D; quCapStyle=1; Q=u%3D360H3160831966%26n%3D%26le%3D%26m%3DZGH5WGWOWGWOWGWOWGWOWGWOAGHm%26qid%3D3160831966%26im%3D1_t01923d359dad425928%26src%3Dpcw_zhushou%26t%3D1; T=s%3D5c36d642ca398fca5e724a27ee1b556e%26t%3D1567936771%26lm%3D%26lf%3D2%26sk%3D32df2010e1a826de391fe4da733dbcde%26mt%3D1567936771%26rc%3D%26v%3D2.0%26a%3D0; __DC_gid=59612149.365858392.1567936731372.1567936766331.3",
    "Host":"comment.mobilem.360.cn",
    "Referer":"http://zhushou.360.cn/detail/index/soft_id/7655?recrefer=SE_D_%E7%99%BE%E5%BA%A6%E5%9C%B0%E5%9B%BE",
    "User-Agent":"Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.98 Safari/537.36 LBBROWSER"
}
start = 0
#设一个较大的范围暴力循环到返回值为空
for i in range(1, 50):
    print("开始第{0}条".format(start))
    url2 = "http://comment.mobilem.360.cn/comment/getComments?callback=jQuery1720334417598110619_1568026585094&baike=%E9%AB%98%E5%BE%B7%E5%9C%B0%E5%9B%BE+android&c=message&a=getmessage&start="+str(start)+"&count=50&type=best&level=1&"
    r = requests.get(url2,headers = headers)
    #将获取到的数据整理为json格式方便提取
    s = re.findall("{\"errno\"(.*)\);}catch\(e\){}",r.text)
    str1 = "{\"errno\""+s[0]
    
    s1 = json.loads(str1)
    for message in s1["data"]["messages"]:
        print(message["create_time"], message["content"])
        with open("F:\pycharm\py3\实习\百度地图_360应用市场_好评.txt", "a", encoding="utf-8") as f: f.write(str(message["create_time"])+","+message["content"]+","+str(message["version_name"])+","+str(message["score"])+","+message["type"]+"\n")
        f.close()
    start = start+50
    time.sleep(2)

运行结果示例:
Python爬取360手机助手评论——以百度地图为例_第1张图片
百度地图、高德地图用户评论文件下载:
https://download.csdn.net/download/qq_37089628/11866281

你可能感兴趣的:(python,爬虫相关)