python抓取谷歌指数(Google Trends)

过去7天的数据链接:https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req={“time”:“2020-05-11T03\:38\:59 2020-05-18T03\:38\:59”,“resolution”:“HOUR”,“locale”:“zh-CN”,“comparisonItem”:[{“geo”:{},“complexKeywordsRestriction”:{“keyword”:[{“type”:“BROAD”,“value”:“NBA”}]}}],“requestOptions”:{“property”:"",“backend”:“CM”,“category”:0}}&token=APP6_UEAAAAAXsNU002YsOS6N9Eb5Z_2BpV-LTY0_AGz&tz=-480
在这个链接中req和token参数需要我们获取的:
获取req,token参数:

def get_token(keyword):
  headers = {
     
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36',
    'x-client-data': 'CIu2yQEIo7bJAQjEtskBCKmdygEIy67KAQjQr8oBCLywygEIl7XKAQjttcoBCI66ygEYx7fKAQ==',
    'referer': 'https://trends.google.com/trends/explore?date=today%201-m&q=bitcoin,blockchain,eth',
    'cookie': '__utmc=10102256; __utma=10102256.31392724.1583402727.1586332529.1586398363.11; __utmz=10102256.1586398363.11.11.utmcsr=shimo.im|utmccn=(referral)|utmcmd=referral|utmcct=/docs/qxW86VTXr8DK6HJX; __utmt=1; __utmb=10102256.9.9.1586398779015; ANID=AHWqTUlRutPWkqC3UpC_-5XoYk6zqoDW3RQX5ePFhLykky73kQ0BpL32ATvqV3O0; CONSENT=WP.284bc1; NID=202=xLozp9-VAAGa2d3d9-cqyqmRjW9nu1zmK0j50IM4pdzJ6wpWTO_Z49JN8W0s1OJ8bySeirh7pSMew1WdqRF890iJLX4HQwwvVkRZ7zwsBDxzeHIx8MOWf27jF0mVCxktZX6OmMmSA0txa0zyJ_AJ3i9gmtEdLeopK5BO3X0LWRA; 1P_JAR=2020-4-9-2'
  }
  url = 'https://trends.google.com/trends/api/explore?hl=zh-CN&tz=-480&req={
     {"comparisonItem":[{
     {"keyword":"{}","geo":"","time":"now 7-d"}}],"category":0,"property":""}}&tz=-480'.format(keyword)
  r = requests.get(url, headers=headers)
  data = json.loads(r.text[5:])
  req = data['widgets'][0]['request']
  token = data['widgets'][0]['token']
  result = {
     'req':req,'token':token}
  return result

获取趋势变化图数据:

def google(keyword):
  """谷歌指数"""
  info = get_token(keyword)
  req = info['req']
  token = info['token']
  url = 'https://trends.google.com/trends/api/widgetdata/multiline?hl=zh-CN&tz=-480&req={}&token={}&tz=-480'.format(req, token)
  r = requests.get(url)
  if r.status_code == 200:
    data = json.loads(r.text.encode().decode('unicode_escape')[6:])['default']['timelineData']
    for data_e in data:
      timestamp = int(data_e['time']) * 1000
      value = data_e['value'][0]
      keyword = keyword
      print(timestamp, value, keyword)

输出:print(google(‘NBA’)
输出结果(部分数据):
python抓取谷歌指数(Google Trends)_第1张图片
由于Google的接口中数据大部分不是可以直接转换成字典的,需要将前面几个字符去掉才可以转换,所以这边需要注意一下!

你可能感兴趣的:(python,爬虫,google)