本项目利用网络爬虫技术从某旅游网站爬取各城市的景点旅游数据,根据旅游网的数据综合分析每个城市的热度、热门小吃和景点周边住宿,
可以很方便的通过浏览器端找到自己所需要的信息,获取到当前的热门目的地,根据各城市景点的数据,周围小吃,住宿等信息,制定出适合自己的最佳旅游方案。
基于python的城市旅游数据采集分析系统的主要功能包括:
旅游数据的采集主要包括热门城市基本信息、热门城市的景点信息、热门城市的美食信息、酒店信息等的抓取。以热门城市的景点信息抓取为例:
def get_top_jd(city_code):
“”“抓取 Top 景点 “””
top_jd_url = “http://www.xxxx.cn/jd/{}/gonglve.html”.format(city_code)
headers = {
‘Accept’: ‘text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9’,
‘Content-Type’: ‘application/x-www-form-urlencoded’,
‘Accept-Encoding’: ‘gzip, deflate, compress’,
‘Accept-Language’: ‘en-us;q=0.5,en;q=0.3’,
‘Cache-Control’: ‘max-age=0’,
‘Connection’: ‘keep-alive’,
‘Host’: ‘www.mafengwo.cn’,
‘Cookie’: ‘Your cookies’,
‘User-Agent’: ‘Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36’,
}
response = requests.get(top_jd_url, headers=headers)
response.encoding = ‘utf8’
soup = BeautifulSoup(response.text, ‘lxml’)
items = soup.select(‘div.item.clearfix’)
gaikuang = soup.find(‘span’, id=‘mdd_poi_desc’).text.strip()
top_jds = []
for item in items:
top_jd = item.h3.a.text.strip()
comment_count = item.h3.em.text.strip()
intro = item.p.text.strip()
image = item.img[‘src’]
top_jds.append({‘景点名称’: top_jd, ‘评论个数’: comment_count, ‘简介’: intro, ‘图片’: image})
return gaikuang, top_jds
对全国所有省份的热门城市进行循环,采集其热门景点、小吃、住宿等信息:
…
city_lvyou_info = []
for sheng in sheng_info:
sheng = sheng.replace(‘\n’, ‘’)
print(‘–> 抓取 {} 省的城市信息…’.format(sheng))
city_info = sheng_info[sheng]
for city in city_info:
print(‘抓取 {} 市信息…’.format(city[0]))
# Top 景点 http://www.xxxxxx.cn/jd/10065/gonglve.html
city_code = city[1].split(‘/’)[-1].split(‘.’)[0]
try:
gaikuang, top_jds = get_top_jd(city_code)
except:
gaikuang, top_jds = ‘’, ‘{}’
print(‘空数据’)
time.sleep(1)
# 城市的热门小吃 http://www.xxxxxx.cn/cy/10065/tese.html
try:
top_xiaochi = get_top_xiaochi(city_code)
except:
top_xiaochi = ‘{}’
print(‘空数据’)
time.sleep(1)
# 景点周边住宿,结合网上数据,分析出性价比(方案:实时调接口获取数据,列表即可,不用性价比)
try:
top_jiudian = get_top_jiudian(city[0], is_zhixiashi=int(sheng==‘直辖市’))
except:
top_jiudian = ‘[]’
print(‘空数据’)
time.sleep(1)
…
热门小吃分析
项目分享:
https://gitee.com/asoonis/feed-neo