中国商标网 -爬虫

        最近有时间,找了一些比较麻烦的网站来练手,然后想起来 以前说要弄商标网的,今天就又上去看了下!

        以前转载的链接 :商标局网请收下我的膝盖

        上去查看了下,感觉怎么参数这么明显了!!!???      应该是取消了很多爬虫限制!

        然后模拟请求的试了下,请求成功,成功获取到数值! 
中国商标网 -爬虫_第1张图片

 使用的接口是:

http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html 

http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html  http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearchDG.html

组合起来 能根据不同的 条件进行查询,并下载最终的图片,有一点需要注意的是 返回的是图片链接列表 ,我们需要的是 下标为3的那个 

简单代码如下(仅做学习参考):

import requests, re, json, time, random

with open("搜索结果1.json", "r", encoding="utf-8") as f:
    data = f.read()


def run(ann_num, page_no, ann_type_code):
    url = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/selectInfoidBycode.html"

    headers = {
        "Accept": "application/json, text/javascript, */*; q=0.01",
        "Accept-Encoding": "gzip, deflate",
        "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8",
        "Connection": "keep-alive",
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Cookie": "",# cookie
        "Host": "sbgg.saic.gov.cn:9080",
        "Origin": "http://sbgg.saic.gov.cn:9080",
        "Referer": "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/annSearch.html?annNum=",
        "User-Agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest",
    }

    data = {
        "annNum": ann_num,
        "annTypecode": ann_type_code,
    }
    response = requests.post(url=url, headers=headers, data=data, timeout=15)
    id = response.text
    print(id)

    URL2 = "http://sbgg.saic.gov.cn:9080/tmann/annInfoView/imageView.html"

    data2 = {
        "id": id,
        "pageNum": page_no,
        "flag": "1",
    }
    response2 = requests.post(url=URL2, headers=headers, data=data2, timeout=15)
    data = response2.text
    data = eval(data)
    image = data["imaglist"][3]
    print(image)


if __name__ == '__main__':
    """代码仅做学习参考"""
    data_dict = eval(data)
    total = data_dict["total"]  # 商标总数
    rows = data_dict["rows"]  # 商标总数
    print(total)
    for i in rows:
        page_no = i["page_no"]  # 页数编号
        tm_name = i["tm_name"]  # 商标名称
        ann_type_code = i["ann_type_code"]  # 请求参数
        tmname = i["tmname"]  # 商标名称
        reg_name = i["reg_name"]  # 公司名称
        ann_type = i["ann_type"]  # 公告还是省定
        ann_num = i["ann_num"]  # 公告期数
        reg_num = i["reg_num"]  # 商标id
        id = i["id"]  # 请求id
        rn = i["rn"]  # 位置
        app_date = i["ann_date"]  # 申请日期
        regname = i["regname"]  # # 申请人名称???

        if ann_type == "商标初步审定公告":
            run(ann_num, page_no, ann_type_code)
            time.sleep(5)

 

你可能感兴趣的:(爬虫)