python爬虫实战-京东商品数据

前言

大家早好、午好、晚好吖 ❤ ~欢迎光临本文章

今天介绍一下如何用 Python 来批量获取京东商品信息!!

如果有什么疑惑/资料需要的可以点击文章末尾名片领取源码

第三方库:

  • requests >>> pip install requests

开发环境:

  • python 3.8

  • pycharm 专业版

爬虫具体实现流程

一. 思路分析

找到数据来源 (找到 数据所在的链接地址)

https://api.m.jd.com/?appid=search-pc-java&functionId=pc_search_s_new&client=pc&clientVersion=1.0.0&t=1697545127305&body=%7B%22keyword%22%3A%22iPhone%22%2C%22qrst%22%3A%221%22%2C%22wq%22%3A%22iPhone%22%2C%22ev%22%3A%22exbrand_Apple%5E%22%2C%22pvid%22%3A%22c2a8f09dbfa044a6a12f860e20edb6c7%22%2C%22isList%22%3A0%2C%22page%22%3A%223%22%2C%22s%22%3A%2256%22%2C%22click%22%3A%220%22%2C%22log_id%22%3A%221697544397338.9790%22%2C%22show_items%22%3A%22%22%7D&loginType=3&uuid=122270672.1675327822068798256204.1675327822.1696749738.1697544369.7&area=18_1482_48942_49058&h5st=20231017201847323%3Bg5giig9tnm63gij2%3Bf06cc%3Btk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC%3B7710a41bb85a10fe65109f794fb3b815%3B4.1%3B1697545127323%3Bee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbb08d3f76f13bca81938336d1934b88ded260caabac20e37a63dd3f6a093fd5dd2d936e95b67fee9654732d8a2908d96fe4b8d0a0b9d9b65996563d4cb94925fd651106c8e7c1234f63f57b1baa40324d6e8969e5c7b48e35e2c4bc5d325e88db237e42c33d6b256ebc720e76f574f34b&x-api-eid-token=jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWDXLUYAAAAACL4BJCT4CQASEEX

二. 代码实现

  1. 发送请求 (访问网站)

  2. 提取数据 将需要的内容提取出来

  3. 保存数据

详情页: 评论数量 销量 商品介绍 店铺评分

翻页抓取: 如何实现翻页抓取

一页分为两部分加载 每个部分 30条数据

翻页的规律:

第二页的第一个包和第二个包的对比

t: 1697545127305
2-1-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"3","s":"56","click":"0","log_id":"1697544397338.9790","show_items":""}
3-1-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"5","s":"116","click":"0","log_id":"1697546973358.2929","show_items":""}
2-2-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"4","s":"86","scrolling":"y","log_id":"1697545127114.3155","tpl":"3_M","isList":0,"show_items":""}
3-2-body: {"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"6","s":"146","scrolling":"y","log_id":"1697547015728.2990","tpl":"3_M","isList":0,"show_items":""}

page每次累加1

s每次累加 30

1-1: s 1
1-2: s 26
2-1: s 56
2-2: s 86
...
...
h5st: 20231017201847323;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;7710a41bb85a10fe65109f794fb3b815;4.1;1697545127323;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbb08d3f76f13bca81938336d1934b88ded260caabac20e37a63dd3f6a093fd5dd2d936e95b67fee9654732d8a2908d96fe4b8d0a0b9d9b65996563d4cb94925fd651106c8e7c1234f63f57b1baa40324d6e8969e5c7b48e35e2c4bc5d325e88db237e42c33d6b256ebc720e76f574f34b
h5st: 20231017204933583;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;0ed5b74f81ac6ded4aeee2f615d6e03f;4.1;1697546973583;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cbfe11ef406022b163c00824a22a034ed25520965f3f71ba25eca1fe340990d9a3c4d0100fbc84b1e9094cbe21ed8f59acc7a3bfd1bdd706f19bc06fd1d9a10233e68a2c851f66633c3357188dfeec7cc88dc36ba5ab73fac1ee81fd17352694c31f5a0096b50478e73a7b645153333271

代码展示

import requests     # 发送请求 第三方库 (需要安装)
import parsel       # 第三方库 用来提取网页源代码的
import csv          # 内置模块 无需安装
import time


with open("jingdong.csv", mode='w', newline='', encoding='utf-8') as f:
    csv.writer(f).writerow(['title', 'price', 'shop', 'detail_url'])
headers = {
    'Cookie': '__jdu=1675327822068798256204; shshshfpa=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfpx=a8c4d3ab-4de2-1594-07c6-96937703bc48-1675511732; shshshfp=df23b3178a68c52485e728025047439d; areaId=18; _pst=jd_7449b8b770c1a; unick=u_y14qxm7bysay; pin=jd_7449b8b770c1a; _tp=vZPPhy6cqARc6L2%2B3nOzUq3kCs2OWuApKpEwLezV01A%3D; unpl=JF8EAMhnNSttDRsGBx9XExcQHAlVWw4ATx4LP2JXXFpYSVwHS1VPGhl7XlVdXxRLFh9vYRRXXFNKUw4aCysSEXteXVdZDEsWC2tXVgQFDQ8VXURJQlZAFDNVCV9dSRZRZjJWBFtdT1xWSAYYRRMfDlAKDlhCR1FpMjVkXlh7VAQrAhwUFEleUldeC0oQCmlvDFdZX0hVACsDKxUge21UWloLQxczblcEZB8MF1EHGwcZFV1LWlJaXwtNHgBsZgJdW1BCVwEcARoXIEptVw; __jdv=76161171|baidu-pinzhuan|t_288551095_baidupinzhuan|cpc|0f3d30c8dba7459bb52f2eb5eba8ac7d_0_dac35d941fe04b9589a4c961393afe98|1697544369451; PCSYCityID=CN_430000_430100_0; jsavif=1; __jda=122270672.1675327822068798256204.1675327822.1696749738.1697544369.7; __jdc=122270672; wlfstk_smdl=zqjf27ll62rd5uge85230utp29qi2wv2; logintype=qq; npin=jd_7449b8b770c1a; thor=459E9A0707CDD36020E74D14717A705AD6CEE67A8D55FEDAACBD33B9D31511E6AA1AEEA695BDBF1921A135769B716889400BBD0DCF1CCB0F3B325202A6A3E27AD6388CDB3EBDB3F0B59C1377A16E8774FACFD9FCFC04AEE31844B7ABFC6C39EE9C2F52540A2CCF902FCA67B460688F87FCAC3279B369769DBB94CCADFE20BF7EE14A8666D30DEFBBA7837A308B8165AD71D91B839EF96E5CCB7F2F0026C5679B; flash=2_ZrWfSfPGSnxmE-YDUlWOCIWikxr51SV82QCigp8WUVY6X70ebZL51YYs2-iD8o1O6FnCUtUnKJhz7L-PsPM9Ts6kNGDO2_sAyca7PjZdqqN*; pinId=f_SKjtPUQ3D1_NrwwoSZkrV9-x-f3wj7; shshshsID=e63b3af9ee1f8ba7e59ca5c63186d670_3_1697544398707; 3AB9D23F7A4B3C9B=BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZU; token=49a2f429c466477218207fee65086990,3,943080; __tk=IsupJpIwkDtzkvnzkDjFJsAwjiJTIskqlsuoJpt1jpSykpfojUbTIS,3,943080; 3AB9D23F7A4B3CSS=jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWJEJRYAAAAACH56UEPWVWMVU4X; _gia_d=1; __jdb=122270672.5.1675327822068798256204|7.1697544369; shshshfpb=AAidGkj2LEsTTq03iFZQHxpaTdwO8SBZ1URcyTgAAAAA; ipLoc-djd=18-1482-48942-49058',
    'Origin': '**屏蔽,完整源码可+我 V:python10010免费领 好友验证备注:6***',
    'Referer': '**屏蔽**',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36'
}
s = 1
for page in range(1, 121):
    t = int(time.time() * 1000)
    body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"'+str(page)+'","s":"'+str(s)+'","click":"0","log_id":"1697547020245.6899","show_items":""}'
    if page == 2:
        s = 26
        body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"' + str(
            page) + '","s":"' + str(s) + '","click":"0","log_id":"1697547020245.6899","show_items":""}'
    elif page > 2:
        s += 30
        if page % 2 == 0:
            body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","page":"'+str(page)+'","s":"'+str(s)+'","scrolling":"y","log_id":"1697545127114.3155","tpl":"3_M","isList":0,"show_items":""}'
        else:
            body = '{"keyword":"iPhone","qrst":"1","wq":"iPhone","ev":"exbrand_Apple^","pvid":"c2a8f09dbfa044a6a12f860e20edb6c7","isList":0,"page":"'+str(page)+'","s":"'+str(s)+'","click":"0","log_id":"1697544397338.9790","show_items":""}'
    params = {
        'appid': 'search-pc-java',
        'functionId': 'pc_search_s_new',
        'client': 'pc',
        'clientVersion': '1.0.0',
        't': str(t),
        'body': body,
        'loginType': '3',
        'uuid': '122270672.1675327822068798256204.1675327822.1696749738.1697544369.7',
        'area': '18_1482_48942_49058',
        'h5st': '20231017205657848;g5giig9tnm63gij2;f06cc;tk03wbde31c7218nmTOuI4vmUG1gibUwyDKLNpF6B_t1uk9ukpSq3k_k19h74PyUWE_Fz9mV-ggz4JCtsVbQZVSId9dC;825dbf6bd60713fa1ddad5e95d169108;4.1;1697547417848;ee3cf7f6b94dc20e9265d83066bb9ceece4bb89e2b7e8bf5afb1bfd928788174bfa06c210ddd4437d8a2e234330c3a3980b96c3953b1ab788029ae792b39e113ccac142f09e3a1fa8c3f25055353b835ed0bf65228424626b8a9e1d2c030999d9be97a9dee9fb20116ceb0deb8736546109bc1cf5b91d1dfa2b39c79b3b0f0a5a036cdc921a1f147179b291c830dc87a6d3d0c3885fe721d5f0391a55bb4bf663963282084e04c7f24e6d3bcb219f4cb08a33c86f2c515c368479ab2fffd0f4935b373832965c1ba9aa292710f7023e99dac2e1bde15cd796fe1601c5425e954a8cebb66dc24031fb337c7d79d2a6f46c875d77cbc102770fd5125f99aaa366d5abac9c006c2f0275731844dd1353f808489e029e35b485616771b972ae3bb95',
        'x-api-eid-token': 'jdd03BFXLLB72GO2GWA4OW3JSYXJPOVRF3WAKAKETOTSMNISZ6VIJTLEVQKEHWUA6VLD7ORS2QYC55PWBVUZVPZTXPDCHZUAAAAMLHWDXLUYAAAAACL4BJCT4CQASEEX',
    }
    url = '**屏蔽,完整源码可+我 V:python10010免费领 好友验证备注:6***/'
    # 1. 发送请求 (访问网站)
    response = requests.get(url=url, params=params, headers=headers)
    # 2. 提取数据 将需要的内容提取出来
    html_data = response.text
    # 怎么样提取网页源代码当中的内容
    select = parsel.Selector(html_data)
    # //ul[@class="gl-warp clearfix"]/li
    # 拿到了每个商品所属的标签
    lis = select.xpath('//ul[@class="gl-warp clearfix"]/li')
    for li in lis:
        # li.xpath('string(.//div[@class="p-name p-name-type-2"])').get()
        title = li.xpath('string(.//div[@class="p-name p-name-type-2"])').get("").strip()
        price = li.xpath('string(.//div[@class="p-price"])').get("").strip()
        shop = li.xpath('string(.//div[@class="p-shop"])').get("").strip()
        detail_url = "https:"+li.xpath('.//div[@class="p-name p-name-type-2"]/a/@href').get("")
        print(title, price, shop, detail_url)
        # 3. 保存数据
        with open("jingdong.csv", mode='a', newline='', encoding='utf-8') as f:
            csv.writer(f).writerow([title, price, shop, detail_url])

尾语

好了,今天的分享就差不多到这里了!

对下一篇大家想看什么,可在评论区留言哦!看到我会更新哒(ง •_•)ง

喜欢就关注一下博主,或点赞收藏评论一下我的文章叭!!!

最后,宣传一下呀~ 更多源码、资料、素材、解答、交流 皆点击下方名片获取呀

你可能感兴趣的:(爬虫小案例,python,爬虫,开发语言,pycharm,学习)