工具:
pycharm 2018.3.5
Anaconda3-2018.12 + Python 3.7.1
前期准备:
目标网站页面——携程>酒店>上海酒店
找到页面URL
1、此次使用到的全部库
import requests
import json
import re
import csv
import demjson
import pymysql
import time
2、MySQL数据库的连接
conn = pymysql.Connect(host='localhost', port=3306, user=' ', passwd=' ', db='jiudian')
curor = conn.cursor()
3、网站页面的获取
headers={
"Connection": "keep-alive",
"origin":"http://hotels.ctrip.com",
"Host": "hotels.ctrip.com",
"referer": "#需要获取的页面网址",
"user-agent":"通过f12查找",
}
4、具体源的获取
data={
"cityId":2,
"cityPY":" shanghai",
"cityCode":"021",
"cityLat": 121.22,
"cityLng":31.03,
"page":i,
}
5、所需字段信息获取
for n in range(0,25):
dianming = aa["hotelPositionJSON"][n]["name"]
# eval函数,将列表样式的字符串转化为列表
jiage=eval(aa["HotelMaiDianData"]["value"]["htllist"])[n]["amount"]
xinji=aa["hotelPositionJSON"][n]["star"][-2:]
dangci=aa["hotelPositionJSON"][n]["stardesc"]
pingfen=aa["hotelPositionJSON"][n]["score"]
lianjie="http://hotels.ctrip.com"+aa["hotelPositionJSON"][n]["url"]
dizhi=aa["hotelPositionJSON"][n]["address"]
ss += 1
lists.append([ss, dianming,xinji,dangci,pingfen,jiage + "元",lianjie,dizhi])
6、存储为CSV格式
with open("shjiudian.csv", "w", encoding="utf-8",newline="") as f:
k = csv.writer(f, dialect="excel")
k.writerow(["数量", "酒店名", "星级", "档次", "评分", "价格","链接","地址"])
7、运行程序,由于数据有一万多条,而且中途可能会被阻断,所以时间较长,请耐心等待。
运行完成后,会出现CSV文件。