1.value_counts()函数的用法
country_data=pd.DataFrame(data_clean[‘country’].value_counts())
没有选择行,所以有178行
country_data=pd.DataFrame(data_clean.loc[data_clean[‘is_canceled’]==0][‘country’].value_counts())
选择了行,所以只有166行
生成的行标签为国家名,列名是country
2. 用rename()函数为列重命名
country_data.rename(columns={“country”:“numbers of country”}, inplace=True)
3. sum()可以对列求和
total_country=country_data[‘numbers of country’].sum()
4. round( 算术式,2)
对算术式保留两位小数。
5. 用plotly.express绘制饼图
import plotly.express as px
fig = px.pie(country_data,
values="numbers of country",
names="country",
title="Home country of guests",
template="seaborn")
fig.update_traces(textposition="inside", textinfo="value+percent+label")
fig.show()
guest_map = px.choropleth(country_data,
locations=country_data.index,
color=country_data["%"],
hover_name=country_data.index,
color_continuous_scale=px.colors.sequential.Plasma,
title="Home country of guests")
guest_map.show()
import matplotlib as plt
plt.figure(figsize=(12, 8))
sns.boxplot(x="reserved_room_type",
y="adr_p",
hue="hotel",
data=guest_price,
hue_order=["City Hotel", "Resort Hotel"],
fliersize=0)
plt.title("Price of room types per night and person", fontsize=16)
plt.xlabel("Room type", fontsize=16)
plt.ylabel("Price [EUR]", fontsize=16)
plt.legend(loc="upper right")
plt.ylim(0, 160)
plt.show()
3. pd.Categorical(需要改变的列,列的顺序,ordered=True)
c = pd.Categorical(['a', 'b', 'c', 'a', 'b', 'c'], ordered=True,categories=['c', 'b', 'a'])
>>> c
[a, b, c, a, b, c]
Categories (3, object): [c < b < a]
>>> c.min()
'c'
plt.figure(figsize=(12, 8))
sns.lineplot(x = "arrival_date_month", y="adr_p", hue="hotel", data=month_price,
hue_order = ["City Hotel", "Resort Hotel"], ci="sd", size="hotel", sizes=(2.5, 2.5))
plt.title("Room price per night and person over the year", fontsize=16)
plt.xlabel("Month", fontsize=16)
plt.xticks(rotation=45)
plt.ylabel("Price [EUR]", fontsize=16)
plt.show()
import pandas as pd
df = pd.DataFrame({'key':['a','b','c','b','a','b','c','a'],'data':[1,3,5,7,8,9,2,2]})
df1=df['data'].groupby(df['key']).sum()
df = pd.DataFrame({'key1':['a','b','c','b','a'],'key2':['one','two','one','two','one'],'data':[1,3,5,7,8]})
data=df['data'].groupby([df['key1'],df['key2']]).sum()
print(data)
df = pd.DataFrame({'key1':['a','b','c','b','a'],'key2':['one','two','one','two','one'],'data1':[1,3,5,7,8],'data2':[2,4,5,6,8]})
a= df.groupby('key2')['data2'].sum()
print(a)
import pandas as pd
df = pd.DataFrame({'key':['a','b','c','b','a','b','c','a'],'data':[1,3,5,7,8,9,2,2]})
df1= pd.DataFrame({'key1':['a','a','g','i','f','s','o'],'data1':[1,2,5,4,8,9,4]})
# df1=df['data'].groupby(df['key']).sum()
full_data=pd.concat([df,df1],ignore_index=True)
print(full_data)
plt.figure(figsize=(12, 8))
sns.lineplot(x = "month", y="guests", hue="hotel", data=full_guest_num)
plt.title("Average number of hotel guests per month", fontsize=16)
plt.xlabel("Month", fontsize=16)
plt.xticks(rotation=45)
plt.ylabel("Number of guests", fontsize=16)
plt.show()
1.绘制柱状图
plt.figure(figsize=(16, 8))
sns.barplot(x = "stay_num", y = "stay%", hue="hotel", data=full_stay)
plt.title("Length of stay", fontsize=16)
plt.xlabel("Number of nights", fontsize=16)
plt.ylabel("Guests [%]", fontsize=16)
plt.legend(loc="upper right")
plt.xlim(0,22)
plt.show()
fig = px.pie(segment,
values=segment.values,
names=segment.index,
title="Bookings by market segment",
template="seaborn")
fig.update_traces(rotation=-90, textinfo="percent+label")
fig.show()
输出:
2. 绘制箱型图
给出横轴x,纵轴数据y,hue标签 ,会自动计算箱型图中的数据
plt.figure(figsize=(12, 8))
sns.barplot(x="market_segment",
y="adr_p",
hue="reserved_room_type",
data=data_clean,
ci="sd",
errwidth=1,
capsize=0.1)
plt.title("ADR by market segment and room type", fontsize=16)
plt.xlabel("Market segment", fontsize=16)
plt.xticks(rotation=45)
plt.ylabel("ADR per person [EUR]", fontsize=16)
plt.legend(loc="upper left")
plt.show()
1、count:返回数组的个数,如上述为4个元素,所以返回为4;
2、mean:返回数组的平均值,1 3 5 9的平均值为4.5;
3、std:返回数组的标准差;
4、min:返回数组的最小值;
5、25%,50%,75%:返回数组的三个不同百分位置的数值,也就是统计学中的四分位数!