python爬虫:案例四:新浪微指数

新浪的微指数,首页输入一个关键字,比如 欢乐颂,会跳转至:http://data.weibo.com/index/hotword?wid=1091324230349&wname=欢乐颂

我不知道wid是什么编号还是什么,也不是和其他关键字的wid规则,于是我就删除了这个参数再请求一次,发现去掉也可以进入页面

热词趋势是一张图,鼠标动就会显示每天的数据,这个和360指数,百度指数一样

微指数还和360指数一样是一次请求就直接将所有数据以json的形式返回过来

我们用工具会找到一个http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238,里面是整体趋势,pc&移动端趋势的所有数据

但是我现在没有弄明白是每个关键字的__rnd值都不一样,我还不知道如何获取到这个值,或者这个值的规律,或者如何自动获取到这个url,如果搞不定这个,那我只能做到单一关键词的数据采集

以下先采用单一采集

#coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
import requests
import urllib

class xl():
    def pc(self):
        r=requests.get("http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238")
        return r.text

x=xl()
print x.pc()


结果:

csrf


很明显,跨站请求伪造,这样我们请求时就要把请求的头信息带上

#coding=utf-8
import sys
reload(sys)
sys.setdefaultencoding( "utf-8" )
import requests
import urllib

class xl():
    def pc(self,name):
        url_name=urllib.quote(name)
        headers={
'Host': 'data.weibo.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:46.0) Gecko/20100101 Firefox/46.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,en-US;q=0.5,en;q=0.3',
'Accept-Encoding': 'gzip, deflate',
'Content-Type': 'application/x-www-form-urlencoded',
'X-Requested-With': 'XMLHttpRequest',
'Referer': 'http://data.weibo.com/index/hotword?wname='+url_name,
'Cookie': 'UOR=www.baidu.com,data.weibo.com,www.baidu.com; SINAGLOBAL=1213237876483.9214.1464074185942; ULV=1464183246396:2:2:2:3463179069239.6826.1464183246393:1464074185944; DATA=usrmdinst_12; _s_tentry=www.baidu.com; Apache=3463179069239.6826.1464183246393; WBStore=8ca40a3ef06ad7b2|undefined; PHPSESSID=3mn5oie7g3cm954prqan14hbg5',
'Connection': 'keep-alive'
}
        r=requests.get("http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=1464188164238",headers=headers)
        return r.text

x=xl()
print x.pc("欢乐颂")
结果:

{"data":[{"zt":[{"day_key":"2016-04-25","wid":"1091324230349","value":"224539"},{"day_key":"2016-04-26","wid":"1091324230349","value":"157686"},{"day_key":"2016-04-27","wid":"1091324230349","value":"180757"},{"day_key":"2016-04-28","wid":"1091324230349","value":"219171"},{"day_key":"2016-04-29","wid":"1091324230349","value":"165993"},{"day_key":"2016-04-30","wid":"1091324230349","value":"141948"},{"day_key":"2016-05-01","wid":"1091324230349","value":"126398"},{"day_key":"2016-05-02","wid":"1091324230349","value":"174244"},{"day_key":"2016-05-03","wid":"1091324230349","value":"180751"},{"day_key":"2016-05-04","wid":"1091324230349","value":"212351"},{"day_key":"2016-05-05","wid":"1091324230349","value":"252814"},{"day_key":"2016-05-06","wid":"1091324230349","value":"340472"},{"day_key":"2016-05-07","wid":"1091324230349","value":"316276"},{"day_key":"2016-05-08","wid":"1091324230349","value":"260587"},{"day_key":"2016-05-09","wid":"1091324230349","value":"222790"},{"day_key":"2016-05-10","wid":"1091324230349","value":"200010"},{"day_key":"2016-05-11","wid":"1091324230349","value":"224717"},{"day_key":"2016-05-12","wid":"1091324230349","value":"166743"},{"day_key":"2016-05-13","wid":"1091324230349","value":"103426"},{"day_key":"2016-05-14","wid":"1091324230349","value":"135842"},{"day_key":"2016-05-15","wid":"1091324230349","value":"75692"},{"day_key":"2016-05-16","wid":"1091324230349","value":"68669"},{"day_key":"2016-05-17","wid":"1091324230349","value":"79509"},{"day_key":"2016-05-18","wid":"1091324230349","value":"110907"},{"day_key":"2016-05-19","wid":"1091324230349","value":"44296"},{"day_key":"2016-05-20","wid":"1091324230349","value":"82582"},{"day_key":"2016-05-21","wid":"1091324230349","value":"41602"},{"day_key":"2016-05-22","wid":"1091324230349","value":"27270"},{"day_key":"2016-05-23","wid":"1091324230349","value":"31520"},{"day_key":"2016-05-24","wid":"1091324230349","value":"29199"},{"word":"\u6b22\u4e50\u9882"}],"yd":[{"daykey":"2016-04-25","pc":"67488","mobile":"157051"},{"daykey":"2016-04-26","pc":"47711","mobile":"109975"},{"daykey":"2016-04-27","pc":"43718","mobile":"137039"},{"daykey":"2016-04-28","pc":"43571","mobile":"175600"},{"daykey":"2016-04-29","pc":"42836","mobile":"123157"},{"daykey":"2016-04-30","pc":"41607","mobile":"100341"},{"daykey":"2016-05-01","pc":"25525","mobile":"100873"},{"daykey":"2016-05-02","pc":"45209","mobile":"129035"},{"daykey":"2016-05-03","pc":"52973","mobile":"127778"},{"daykey":"2016-05-04","pc":"57490","mobile":"154861"},{"daykey":"2016-05-05","pc":"71589","mobile":"181225"},{"daykey":"2016-05-06","pc":"133376","mobile":"207096"},{"daykey":"2016-05-07","pc":"92976","mobile":"223300"},{"daykey":"2016-05-08","pc":"49791","mobile":"210796"},{"daykey":"2016-05-09","pc":"62232","mobile":"160558"},{"daykey":"2016-05-10","pc":"59730","mobile":"140280"},{"daykey":"2016-05-11","pc":"80675","mobile":"144042"},{"daykey":"2016-05-12","pc":"81176","mobile":"85567"},{"daykey":"2016-05-13","pc":"40298","mobile":"63128"},{"daykey":"2016-05-14","pc":"42531","mobile":"93311"},{"daykey":"2016-05-15","pc":"13055","mobile":"62637"},{"daykey":"2016-05-16","pc":"20792","mobile":"47877"},{"daykey":"2016-05-17","pc":"41057","mobile":"38452"},{"daykey":"2016-05-18","pc":"70896","mobile":"40011"},{"daykey":"2016-05-19","pc":"13487","mobile":"30809"},{"daykey":"2016-05-20","pc":"45656","mobile":"36926"},{"daykey":"2016-05-21","pc":"20755","mobile":"20847"},{"daykey":"2016-05-22","pc":"7732","mobile":"19538"},{"daykey":"2016-05-23","pc":"13396","mobile":"18124"},{"daykey":"2016-05-24","pc":"10143","mobile":"19056"}]}],"len":1,"keyword":["\u6b22\u4e50\u9882"]}

json信息全部获得

zt是整体趋势数据

yd是pc&移动趋势数据

"keyword":["这里就是关键字"]

我又试了几个关键字,看了http://data.weibo.com/index/ajax/getchartdata?month=default&__rnd=xxxxxx这个url,__rnd这个参数的值可以为空,应该是个时间戳





你可能感兴趣的:(python爬虫)