本文所有数据来源于kaggle:Hotel booking demand
使用工具:MySQL,Excel
您是否想过一年中什么时候预定酒店房间?还是为了获得最佳每日房价而获得的最佳停留时间?如果您想预测酒店是否可能收到过多的特殊要求,该怎么办?该酒店预订数据集可以帮助您探索这些问题!
1)酒店运营分析(城市酒店和假日酒店预订需求和入住率比较、客流量趋势、渠道等角度)
2)用户分析(预定时长、入住时长、预定餐饮、特殊要求、用户类型等)
3)顾客一年中最佳预定酒店时间是什么时候?
4)酒店该怎样增加收入?
以上两张图片是字段和部分数据截图,原始数据共32个字段,每个字段119390行
对每个字段理解如下:
字段 | 解释 |
---|---|
hotel | Hotel(H1 = Resort Hotel or H2 = City Hotel) |
is_canceled | 是否取消预定(取消(1),不取消(0)) |
lead_time | 预定时长(从预定到入住的时间) |
arrival_date_year | 到达时间(年) |
arrival_date_month | 到达时间(月) |
arrival_date_day_of_month | 到达时间(日) |
arrival_date_week_number | 到达时间(在当年为第几周) |
stays_in_weekend_nights | 在周末的入住天数 |
stays_in_week_nights | 在工作日的入住天数 |
adults | 入住成人数 |
children | 入住儿童数 |
babies | 入住婴儿数 |
meal | 预定餐饮类型(Undefined/SC – no meal package; BB – Bed & Breakfast; HB – Half board (breakfast and one other meal – usually dinner; FB – Full board (breakfast, lunch and dinner)) |
market_segment | 细分市场( “TA” means “Travel Agents” and “TO” means “Tour Operators”) |
distribution_channel | 分销渠道 |
is_repeated_guest | 是否为老客(是(1),否(0)) |
previous_cancellations | 客户在当前预订之前取消的先前预订数 |
previous_bookings_not_canceled | 客户在当前预订之前未取消的先前预订数 |
reserved_room_type | 预定房间类型 |
assigned_room_type | (酒店)安排的入住房间类型(房型可能会受时间等因素有所调整) |
deposit_type | 押金类型 |
agent | 预定的旅行社ID |
company | 预定的公司ID |
days_in_waiting_list | 酒店方确认预定所需时长(从顾客下预定订单到酒店方确认订单所需时长) |
customer_type | 顾客类型(Contract - when the booking has an allotment or other type of contract associated to it; Group – when the booking is associated to a group; Transient – when the booking is not part of a group or contract, and is not associated to other transient booking; Transient-party – when the booking is transient, but is associated to at least other transient booking) |
adr | 日平均消费 |
required_car_parking_spaces | 顾客要求提供的停车位数量 |
total_of_special_requests | 顾客特殊需求的总数量 |
reservation_status | 预定最终状态(Canceled – booking was canceled by the customer; Check-Out – customer has checked in but already departed; No-Show – customer did not check-in and did inform the hotel of the reason why) |
reservation_status_date | 预定最终状态确认时间 |
分为两个大方向,用户方向和酒店运营方向
– 1.选择用户画像需要字段的子集,创建用户表。
CREATE table hotel_customer
AS(SELECT hotel,lead_time,adults,children,babies,country,meal,stays_in_week_nights,stays_in_weekend_nights,booking_changes,customer_type,required_car_parking_spaces,total_of_special_requests
FROM hotel_bookings)
– 2.清洗数据,检查数据完整性
SELECT count(hotel),count(lead_time),count(adults),count(children),count(babies),count(country),count(meal),count(stays_in_week_nights),count(stays_in_weekend_nights),count(booking_changes),count(customer_type),count(required_car_parking_spaces),count(total_of_special_requests)
FROM hotel_customer;
可以看到只有country字段有小部分记录缺失,考虑使用众数填充。
-- 先查询出众数
SELECT country
FROM hotel_customer
WHERE country IS NOT NULL
GROUP BY country
ORDER BY count(1) DESC
LIMIT 1
-- 使用众数进行填充
UPDATE hotel_customer
SET country=IFNULL(country,'PRT')
– 3.为便于后续分析,增加字段总人数,停留时间。检查字段数据格式,发现计算列皆为int,可以直接进行运算。
ALTER TABLE hotel_customer ADD COLUMN 总人数 INT NULL;
ALTER TABLE hotel_customer ADD COLUMN 停留时间 INT NULL;
UPDATE hotel_customer SET 总人数=adults+children+babies;
UPDATE hotel_customer SET 停留时间=stays_in_week_nights+stays_in_weekend_nights;
-- 检查数据
SELECT *
FROM hotel_customer
– 4.检查数据是否合法,查看是否有总人数为0的记录
SELECT *
FROM hotel_customer
WHERE 总人数=0;
发现存在总人数为0的记录,删除并重新检查数据
DELETE FROM hotel_customer WHERE 总人数=0;
SELECT *
FROM hotel_customer
WHERE 总人数=0;
完成用户表数据清洗
– 1.选择酒店运营方向子集,建立表格,研究预定入住率、取消率、客户结构、房型、押金、客流量、渠道等。
CREATE TABLE hotel_business AS
(SELECT hotel,is_canceled,reservation_status,is_repeated_guest,customer_type,reserved_room_type,deposit_type,assigned_room_type,distribution_channel,arrival_date_year,arrival_date_month,arrival_date_week_number,arrival_date_day_of_month,market_segment,agent,adr,company
FROM hotel_bookings);
– 2.清洗数据,查看数据有无缺失
SELECT count(hotel),count(is_canceled),count(reservation_status),count(is_repeated_guest),count(customer_type),count(reservation_status),COUNT(deposit_type),COUNT(assigned_room_type),count(distribution_channel),count(arrival_date_year),count(arrival_date_month),count(arrival_date_week_number),count(arrival_date_day_of_month),count(market_segment),count(agent),count(adr),count(company)
FROM hotel_business;
结果发现agent和company字段存在空缺,如下图
company字段空缺过多,考虑删除字段。
agent缺失代表无旅行社,可以使用sc进行填充(这里不使用0进行填充是因为不确定agent的id是不是存在0的情况,所以直接使用英文字符进行填充)
-- 0填充agent,删除company字段
ALTER TABLE hotel_business DROP COLUMN company;
UPDATE hotel_business SET agent=IFNULL(agent,'sc');
– 3.将月份列的英文转化为数字,并将三列日期列合成为一列,删除之前的三列日期
-- 修改日期格式并把到达时间合称为一列,删除之前三列。
-- 新加一列month,将英文格式月份转为数字
ALTER TABLE hotel_business ADD COLUMN month VARCHAR(10) NULL;
UPDATE hotel_business SET month=month(STR_TO_DATE(arrival_date_month,'%M'));
-- 三列时间合一
ALTER TABLE hotel_business ADD COLUMN arrival_date VARCHAR(255) NULL;
UPDATE hotel_business SET arrival_date=CONCAT(arrival_date_year,'-',month,'-',arrival_date_day_of_month);
-- 删除之前三列时间数据
ALTER TABLE hotel_business
DROP arrival_date_year,
DROP arrival_date_month,
DROP arrival_date_day_of_month,
DROP month;
– 4.查看adr(日平均消费)是否存在负值,若有删除记录
SELECT adr
FROM hotel_business
WHERE adr<0;
DELETE FROM hotel_business WHERE adr<0;
– 5.查看几个带选项的字段中是否存在不合法的值
SELECT reservation_status,count(1)
FROM hotel_business
GROUP BY reservation_status;
SELECT customer_type,count(1)
FROM hotel_business
GROUP BY customer_type;
SELECT distribution_channel,count(1)
FROM hotel_business
GROUP BY distribution_channel;
SELECT market_segment,count(1)
FROM hotel_business
GROUP BY market_segment;
发现distribution_channel和market_segment中都含有Undefined的值,删去
DELETE FROM hotel_business WHERE distribution_channel='Undefined' OR market_segment='Undefined';
酒店运营表数据清洗结束
1.两酒店的预定取消率都非常高,酒店方需考虑取消率高的原因
2.城市酒店订单占比大,人们更喜欢住城市酒店
客人大多数都是新客,作为酒店是正常现象,但是酒店方可以使用适当方法提高用户复购率,如定期给老客发送优惠信息,增加售后咨询等。
1.结果表明,两酒店在第二、三季度收费都会增加,特别是度假酒店在第三季度收费会猛增;
2.城市酒店在每年的第二季度迎来客流量高峰期,之后客流量递减至次年的第一季度,城市酒店应该合理安排各个季度的人力物力,确保在高峰期时有足够的资源周转,在淡季时减少人力物力的投入,完成降本;
3.度假酒店客流量一直比较稳定,可以想办法增加客流量,比如在第二、三季度时收费不要提高过多。