使用 GA API 获取数据(通过代理)以及问题排查

出错信息:

TimeoutError: [WinError 10060] 由于连接方在一段时间后没有正确答复或连接的主机没有反应,连接尝试失败。

错误排查过程:

  1. 排除网络设置错误。确保系统设置如下(win7/win10): Internet属性–>局域网设置–>自动检测设置
  2. 准备好一个代理IP和端口号,比如这样:192.168.1.123:9878
  3. 用以下代码来检查你的代理ip是否有问题。 提示一下,不要图方便在浏览器上设置代理,浏览器上测试可以打开google的时候,在代码中可能仍然有问题。同样的,运行全局代理软件或者设置全局代理,都有可能出现以上错误。
import httplib2
import socks

hObj = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, "192.168.1.123", 9878))
#response, content = hObj.request('https://www.w3.org')
response, content = hObj.request('https://www.google.com')
# content
response

一切正常的话,会返回一些 response 信息。然后再试下GA API 代码。

在运行代码之前,需要准备好:

  • proj
  • 打开 api 开关
  • 创建服务帐号并添加到GA帐户下的view中
  • 创建服务帐号的josn密钥,并下载到代码的当前目录
  • 安装必要的python库

安装必要的python库

pip install google-api-python-client
pip install oauth2client
pip install httplib2
pip install google

4. 连接 google analytic 服务

import os
from oauth2client.service_account import ServiceAccountCredentials
from googleapiclient.discovery import build
import httplib2
import socks
import pandas as pd

class Conn_GA:
    """
    google analytic 服务
    """

    def __init__(self):
        # 以下是视图id 
        self.fot_viewID = 11421xxxx
        # path
        self.KEY_FILE_LOCATION_JSON = os.path.abspath("D:/python_work/ga_data_API/jianshu1/proj-for-service-303b815284.json")
        self.SCOPE = ['https://www.googleapis.com/auth/analytics.readonly']
        # 服务
        self.service = None

    def get_service_v3_json(self, api_name='analytics', api_version='v3'):
        """Get a service that communicates to a Google API.
        Args:
          api_name: string The name of the api to connect to.
          api_version: string The api version to connect to.
          scope: A list of strings representing the auth scopes to authorize for theconnection.
          client_secrets_path: string A path to a valid client secrets file.
        Returns:
          A service that is connected to the specified API.
        """
        credentials = ServiceAccountCredentials.from_json_keyfile_name(self.KEY_FILE_LOCATION_JSON, self.SCOPE)
        print(credentials)
        print("ServiceAccountCredentials !!")
        
        #正常的 httplib2.Http() 的写法
        hp = httplib2.Http()
        
        #避免 ssl 出错的写法
        #hp = httplib2.Http(disable_ssl_certificate_validation=True)
        
        #使用代理的写法
        #hp = httplib2.Http(proxy_info = httplib2.ProxyInfo(socks.PROXY_TYPE_HTTP, "192.168.1.123", 9878))
        #来自:https://www.cnblogs.com/congbo/archive/2012/08/16/2641079.html

        http = credentials.authorize(hp)
        
        
        print(http)
        print("authorize !!")
        service = build(api_name, api_version, http=http)
        print("build !!")
        return service
  1. 获取数据的类
class GAData:

    def __init__(self):
        # 服务
        self.service_v3 = None
        # 起止时间
        self.startDays = 2
        self.endDays = 1

    def connect(self):
        # 如果ga服务已启动,就不必重新获取,减少获取数据的时间
        if self.service_v3 is None:
            try:
                self.service_v3 = Conn_GA().get_service_v3_json()
            #except Exception,e:
            except Exception:
                # 最好打印一下异常信息,方便调试
                #print(repr(e))
                print("connect error !!!")
                self.service_v3 = Conn_GA().get_service_v3_json()

    def set_days(self, start=1, end=1):
        # 设置获取数据的起止天数,ga上可以以N days ago 来定义,很方便
        self.startDays = start
        self.endDays = end
        
	# 注意 view_id 要替换成自己GA账户的view id
    def get_ga_data(self, view_id='11421xxxx', start_date='7daysAgo', end_date='yesterday',
                    metrics="ga:sessions",
                    dimensions='ga:date',
                    filters=None,
                    max_results=None, **kwargs):
        # 不管何时获取ga数据,保证ga service已启动
        self.connect()
        start_date = '{}daysAgo'.format(self.startDays)
        end_date = '{}daysAgo'.format(self.endDays)
        # print start_date, end_date
        # 如果参数不存在,可以设置为None,一样能运行
        ga_data = self.service_v3.data().ga().get(ids='ga:' + view_id, start_date=start_date, end_date=end_date,
                                                  metrics=metrics, dimensions=dimensions,
                                                  filters=filters, max_results=max_results, **kwargs).execute()
        cleaned = pd.DataFrame(ga_data['rows'])
        return cleaned
  1. 获取数据
G = GAData()
df = G.get_ga_data()
df


ServiceAccountCredentials !!

authorize !!
build !!
0 20190407 58288
1 20190408 57997

参考资料:

python httplib2 - 使用代理出错
https://www.cnblogs.com/congbo/archive/2012/08/16/2641079.html
httplib, httplib2, urllib, requests 区别
https://blog.csdn.net/hotdust/article/details/77927800
Python httplib2获取网页数据(基本用法)
https://blog.csdn.net/y396397735/article/details/79606033
Python - 熟悉httplib2
https://blog.csdn.net/leehark/article/details/7079761
web但是没有进行https认证
https://my.oschina.net/lenglingx/blog/184505?p=1
通过httplib2 探索的学习的最佳方式
https://www.cnblogs.com/liuzq/p/5225387.html
httplib2 官方文档
https://pypi.org/project/httplib2/
Google OAuth 2.0 认证指南(中文翻译)
http://wiki.jikexueyuan.com/project/google-oauth-2/service-accounts.html

你可能感兴趣的:(Python,GA)