Python爬虫实时监测口罩状态并邮件通知

最近微信群里很多小伙伴都在问“哪儿可以买到口罩?”“X品牌口罩在12点上架了吗?”
于是乎有个想法,监测一些品牌口罩在京东上的购买状态,一旦可以购买就进行邮件通知,并通过邮件里的链接跳转到APP购买页面。先来看张效果图(示例):

效果图示意.png


如上图所示,某品牌口罩的链接是https://item.jd.com/100006784140.html,其中100006784140是商品ID。我们接下来获取商品的信息,如价格,店铺,图片链接,是否可以购买等信息都跟此商品ID密切相关。这里有2个API,分别是获取产品详细信息以及查看产品库存状态。

产品详细信息

https://cdnware.m.jd.com/c1/skuDetail/apple/11.3.0/100006784140.json
这个网址是使用青花瓷(Charles)进行手机APP抓包发现的,链接里的一些无用参数已去除。这是我之前看一些爬JD网站数据教学视频里看到的方法。

产品库存状态

https://c0.3.cn/stocks?type=getstocks&skuIds=100006784140&area=2_2813_51976_0
这个可以直接通过网页检查,进行搜索'无货'获取其链接。使用时直接替换商品ID即可。


原始链接很长,里面有很多参数,但是有用的参数就3个:type,skuIdsarea,其中skuIds可以是多个产品的id,area应该是所选地区。

项目文件结构

http.py

其是个工具文件,返回一个随机的请求头 headers。这边使用随机的请求头是因为我在长时间监测时会遇到请求错误,于是每次监测时都随机使用一个请求头。可能还需要更改请求使用的IP地址,这边没有尝试JD会不会对同一个IP地址一直访问进行反爬处理。

import random

def getheaders():
    # 各种PC端
    userAgentList = [
        # Opera
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36 OPR/26.0.1656.60",
        "Opera/8.0 (Windows NT 5.1; U; en)",
        "Mozilla/5.0 (Windows NT 5.1; U; en; rv:1.8.1) Gecko/20061208 Firefox/2.0.0 Opera 9.50",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 9.50",
        # Firefox
        "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0",
        "Mozilla/5.0 (X11; U; Linux x86_64; zh-CN; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
        # Safari
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/534.57.2 (KHTML, like Gecko) Version/5.1.7 Safari/534.57.2",
        # chrome
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36",
        "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11",
        "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.648.133 Safari/534.16",
        # 360
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/30.0.1599.101 Safari/537.36",
        "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
        # 淘宝浏览器
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
        # 猎豹浏览器
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER)",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
        # QQ浏览器
        "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
        "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E)",
        # sogou浏览器
        "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0",
        "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0)",
        # maxthon浏览器
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36",
        # UC浏览器
        "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36",
    ]
    headers = {'User-Agent': random.choice(userAgentList)}
    return headers
if __name__ == '__main__':
    getheaders()

settings.py

放一些配置文件,如邮箱的用户名、密码以及收件人列表,监测时间间隔,监测的口罩链接列表。

#用户名
FROM_ADDR     = '你的163邮箱用户名'
#密码
USER_PASSWORD = '你的邮箱密码'

#接收人员列表
TO_ADDRS = [
            '[email protected]',
            '[email protected]',
            ]

#检查是否有货时间间隔,单位min
CHECK_ISPURCHASE_INTERVAL = 2

#京东商品列表
PRODUCT_URLS = [
    # 'https://item.jd.com/17882998284.html',  #测试url,是有货状态
    'https://item.jd.com/65429694202.html',#代尔塔
    'https://item.jd.com/1612617212.html',#代尔塔 104019 N95 欧标FFP3 免保养P3 防尘 防雾霾PM2.5 带呼吸阀口罩 10只(2盒)
    'https://item.jd.com/1612617211.html',#代尔塔 104019 N95 欧标FFP3 免保养P3 防尘 防雾霾PM2.5 带呼吸阀口罩 5只(1盒)
    'https://item.m.jd.com/product/4846196.html',
    'https://item.m.jd.com/product/100006046599.html', #霍尼韦尔(Honeywell)口罩 H950V-G10靓呼吸萌宠版女孩 斑马 KN95折叠式防尘防雾霾口罩 5只/盒 定制
    'https://item.m.jd.com/product/100010893244.html',#霍尼韦尔(Honeywell)口罩 D7051V-RS2靓呼吸 KN95 防尘 耳带式带阀清新玫瑰 男女骑行 5只/盒 定制
    'https://item.m.jd.com/product/45095729071.html',#吉可GIKO KN95口罩头戴式三层过滤 1200H防雾霾防尘防工业粉尘一次性透气防甲醛防晒鼻 20只装
    'https://item.jd.com/35371843915.html',  #3M口罩9501VT KN95级防雾霾PM2.5 工业防尘口罩防细菌病毒颗粒物n95口罩 9502VT 带呼吸阀 头戴 25只
    'https://item.jd.com/35371843916.html',  #3M口罩9501VT KN95级防雾霾PM2.5 工业防尘口罩防细菌病毒颗粒物n95口罩 9501V带呼吸阀鼻垫 耳戴 25只
    'https://item.jd.com/100006784140.html', #霍尼韦尔(Honeywell)防尘口罩 H950V防雾霾口罩 耳带折叠式带阀 KN95级别 防花粉 25只/盒 定制
    'https://item.jd.com/100005919167.html', #霍尼韦尔(Honeywell)H950V-FUN 肆版 京东专享口罩 KN95级耳带折叠式呼吸阀口罩 12只装 定制
    'https://item.jd.com/100010638554.html', #霍尼韦尔(Honeywell)H950V-FUN 肆版 京东专享口罩 KN95级耳带折叠式呼吸阀口罩 5只装 定
]

email.py

import email
import smtplib
from email.mime.text import MIMEText
from settings import FROM_ADDR,USER_PASSWORD,TO_ADDRS

class Email(object):
    def __init__(self):
        self.host = 'smtp.163.com'
        self.port = '465'
        self.user = FROM_ADDR
        self.password = USER_PASSWORD
        self.receivers = TO_ADDRS

    def sendMail(self, subject ='', body =''):
        msg = MIMEText(body,'plain', 'utf-8') #body邮件正文,纯文本格式
        msg['From'] = self.user
        msg['Subject'] = subject

        if len(self.receivers)> 1:
            msg['To'] = ','.join(self.receivers) #多人群发
        else:
            msg['To'] = self.receivers[0]
        try:
            server = smtplib.SMTP_SSL(self.host,self.port,timeout = 10) #设置超时时间
            server.login(self.user,self.password)
            server.sendmail(self.user,self.receivers,msg.as_string())
            server.quit()
        except smtplib.SMTPException as ex:
            print(str(ex))

jd.py

import json
from settings import PRODUCT_URLS
import requests
from mail import Email
import schedule
from settings import CHECK_ISPURCHASE_INTERVAL
import time
from utils.http import getheaders

class JingDong(object):

    def __init__(self):
        self.productUrls = PRODUCT_URLS
        self.email = Email()
        self.session = requests.Session()
        self.session.headers = getheaders()

    def sendInfo(self):

        for productInfo in self.getProductInfos():
            testTime = time.strftime("%Y-%m-%d %H:%M:%S", time.localtime()) #监测时间,用于打印信息
            if productInfo['StockStateName'] == '现货':
                subject = '口罩到货啦!到货啦!'
                body = '亲爱的用户:\n\n您关注的商品到货啦,赶紧去抢购吧!' \
                       '\n[商品店铺]\n' \
                       '{}\n[商品名称]\n' \
                       '{}\n[商品链接]\n' \
                       '{}\n\nRingo'\
                    .format(productInfo['shopName'],productInfo['productName'],productInfo['url'])
                # 现货时发送邮件通知
                self.email.sendMail(subject= subject, body= body)
                # 有货后将其url从list里移除,防止一直发邮件
                self.productUrls.remove(productInfo['url'])
            else:
                print('{}-JD商品:{},暂时没有库存'.format(testTime,productInfo['productName']))


    def getProductInfos(self):
 
        productInfo = {} #产品信息字典
        for url in  self.productUrls:
            skuID = url.rsplit('/',maxsplit=1)[1].split('.')[0]
            skuDetailUrl = 'https://cdnware.m.jd.com/c1/skuDetail/apple/11.3.0/{}.json'.format(skuID)
            res = self.session.get(skuDetailUrl)
            result = json.loads(res.text)
            productInfo['shopName'] = result['wareInfo']['shopInfo']['shop']['name'] #店铺名称
            productInfo['productName'] = result['wareInfo']['basicInfo']['name'] #商品名称
            productInfo['productImg'] = result['wareInfo']['basicInfo']['wareImage'][0]['small'] #商品图片链接
            stockUrl = 'https://c0.3.cn/stocks?type=getstocks&skuIds={}&area=2_2813_51976_0'.format(skuID) #库存链接
            stockRes = self.session.get(stockUrl)
            productInfo['StockStateName'] = json.loads(stockRes.text)[skuID]['StockStateName']
            productInfo['url'] = url
            yield productInfo

    @classmethod
    def start(cls):
        # 创建本类对象
        JD = cls()
        JD.sendInfo()
        # 每隔指定时间
        schedule.every(CHECK_ISPURCHASE_INTERVAL).minutes.do(JD.sendInfo)
        while True:
            schedule.run_pending()
            time.sleep(2)

if __name__ == '__main__':

    JingDong.start()

最后run这个jd.py文件即可。项目里其实还有Taobao里一些口罩的状态检测,但是有点问题,故暂时就不放在这里了。

你可能感兴趣的:(Python爬虫实时监测口罩状态并邮件通知)