python_爬取深圳政府数据开放平台_商事主体数据

主要步骤:
1、 在深圳政府数据开放平台注册,获取调用API的appKey
2、订阅所需的API接口,在个人中心的接口测试里获取API地址
3、根据page参数和数据量大小遍历调用接口
PS:每个人的Cookie不一样,记得参数要改

核心代码为:

'''
    此文件用于爬取深圳商事主体数据 
    https://opendata.sz.gov.cn/api/1564501785/1/service.xhtml
'''
import json
import requests
import pandas as pd


def get_business(i):
    url = "https://opendata.sz.gov.cn/api/1564501785/1/service.xhtml"
    try:
        header = {
            'Accept': '*/*',
            'Accept-Language': 'zh-CN,zh;q=0.8',
            'Connection': 'keep-alive',
            'Host': 'opendata.sz.gov.cn',
            'Origin': 'https://opendata.sz.gov.cn',
            'Referer': 'https://opendata.sz.gov.cn/maintenance/personal/toApiTest',
            'Cookie': '_trs_uv=k1q8o8my_2368_4sr9; JSESSIONID=717e9762-8a93-45b0-9f58-c514de3ab211',
            'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153 Safari/537.36 SE 2.X MetaSr 1.0',
        }
        formdata = {
            "page": i,
            "appKey": "1f7f94fa62e64224a86ffea30255267c",
        }
        response = requests.get(url, params=formdata, headers=header)
        if response.status_code == 200:
            data = json.loads(response.text)
            return data
        return {"asd":"asda"}
    except Exception as e:
        print(e)
        print(i)
        # mian_spider()


def mian_spider():
    # start = len(eachFile(Json_dir))
    for i in range(30000, 54667):
        page = str(i)
        Json_data = get_business(page)
        # writeOneJson(Json_data, Json_dir + 'page' + page + '.json')
        if 'data' in Json_data.keys():
            df = pd.DataFrame.from_dict(Json_data['data'], orient='columns')
            df.to_csv(csv_dir + '深圳市最新商事主体.csv', mode='a', index=False, header=False)


if __name__ == "__main__":
    csv_dir = 'D:\data\csv\\'
    mian_spider()

如需数据请私聊我,如有问题可评论
如果比较急的话请打开我其他文章获得我的联系方式

你可能感兴趣的:(python爬虫,python数据处理,python数据挖掘,python,接口,json,spark,hadoop)