爬虫07 爬取阿里旅行特价机票

https://sjipiao.alitrip.com/cheap_flight_search.htm?tripType=0&depCityName=&depCity=&arrCityName=&arrCity=&depDate=2016-09-04&range=30&searchGapDay=3

通过填写出发地和到达地以及出发日期前后多少天,我们得到网址变化

https://sjipiao.alitrip.com/cheap_flight_search.htm?tripType=0&depCityName=%C9%CF%BA%A3&depCity=SHA&arrCityName=%B1%B1%BE%A9&arrCity=BJS&depDate=2016-09-04&range=30&searchGapDay=3

发现depCity=由&变成了SHA,arrCity=由&变成BJS

depDate=2016-09-04&range=0也会随着时间选择而改变

由此得到选择地点和时间的初步想法

通过抓包看到一个很棒的文件名cheap_light


 
看到它有个get申请

请求网址是https://sjipiao.alitrip.com/search/cheapFlight.htm?startDate=2016-09-04&endDate=2016-09-04&routes=SHA-BJS&_ksTS=1472729701579_680&callback=jsonp681&ruleId=4&flag=1

打开后发现是json文件,接下去要爬取的目标就是它


对它进行分析 发现从flights开始为我们需要的

"flights":[{"depCode":"SHA","depName":"\u4e0a\u6d77","arrCode":"BJS","arrName":"\u5317\u4eac","price":419,"depDate":"2016-09-04","week":6,"discount":3.4,"priceDesc":"\u5143"

从而得到出发地,目标地,价格,折扣,出发时间

用json.load库匹配并放入字典

接下去就是代码工作了

# -*- coding: utf-8 -*-
import datetime
import json
import urllib
import re
import sys
reload(sys)
sys.setdefaultencoding('utf-8')

def getinfo(startdate, enddate,startplace,endplace):
    url = 'https://sjipiao.alitrip.com/search/cheapFlight.htm?startDate=%s&endDate=%s&' \
         'routes=%s-%s&_ksTS=1472727272319_680&callback=jsonp681&ruleId=4&flag=1' % (startdate, enddate,startplace,endplace)
    flight_html = urllib.urlopen(url).read()
    pattern = r'\(\s+(.+)\)'
    re_rule = re.compile(pattern)
    json_data = re.findall(pattern, flight_html)[0]
    flight_json = json.loads(json_data)
    flights = flight_json['data']['flights']
    return flights

def trip(flight):
        for f in flight:
            depname = '%s - ' % f['depName']
            arrname = '%s\t' % f['arrName']
            price = '\t价格:%s%s(折扣:%s)\t' % ((f['price']), f['priceDesc'], f['discount'])
            departdate = '\t日期:%s' % f['depDate']
            print depname + arrname + price + departdate

today = datetime.date.today()
delay = int(raw_input('输入数字(最多查询到45天后) 要查询几天内的机票 \n '))
enddate = today + datetime.timedelta(delay)
print enddate
endstr = str(enddate)
print '航班日期'+str(today) + ' To ' + endstr +'\n'
print '城市简称查询办法进入阿里旅行输入地点后查看网址中depCity= arrCity= \n'
startplac=raw_input('请输入出发地城市大写英文简称 egSHA \n')
endplac=raw_input('请输入目的地城市大写英文简称 egSHA \n ')
flights = getinfo(today, enddate=endstr,startplace=startplac,endplace=endplac)
print '==================以下为航班信息=================='
trip(flights)
print '================机票信息已显示完毕================\n'

你可能感兴趣的:(python)