本次项目会利用pandas数据分析方法和matplotlib可视化手段对星巴克店铺的分布情况进行分析。
本次使用的数据集来源于Kaggle官网,含有13个字段信息:
# 设置cell多行输出
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all' #默认为'last'
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
os.chdir(r'E:\python_learn\train')
file_name='directory.csv'
data = pd.read_csv(file_name)
data.head()
Brand | Store Number | Store Name | Ownership Type | Street Address | City | State/Province | Country | Postcode | Phone Number | Timezone | Longitude | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Starbucks | 47370-257954 | Meritxell, 96 | Licensed | Av. Meritxell, 96 | Andorra la Vella | 7 | AD | AD500 | 376818720 | GMT+1:00 Europe/Andorra | 1.53 | 42.51 |
1 | Starbucks | 22331-212325 | Ajman Drive Thru | Licensed | 1 Street 69, Al Jarf | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.42 |
2 | Starbucks | 47089-256771 | Dana Mall | Licensed | Sheikh Khalifa Bin Zayed St. | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.39 |
3 | Starbucks | 22126-218024 | Twofour 54 | Licensed | Al Salam Street | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.38 | 24.48 |
4 | Starbucks | 17127-178586 | Al Ain Tower | Licensed | Khaldiya Area, Abu Dhabi Island | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.54 | 24.51 |
本次将围绕星巴克店铺所在地展开分析:
# 查看数据结构
data.info() # → 25600条数据,4个字段
RangeIndex: 25600 entries, 0 to 25599
Data columns (total 13 columns):
Brand 25600 non-null object
Store Number 25600 non-null object
Store Name 25600 non-null object
Ownership Type 25600 non-null object
Street Address 25598 non-null object
City 25585 non-null object
State/Province 25600 non-null object
Country 25600 non-null object
Postcode 24078 non-null object
Phone Number 18739 non-null object
Timezone 25600 non-null object
Longitude 25599 non-null float64
Latitude 25599 non-null float64
dtypes: float64(2), object(11)
memory usage: 2.5+ MB
data.isna().sum() # 缺失值查看
Brand 0
Store Number 0
Store Name 0
Ownership Type 0
Street Address 2
City 15
State/Province 0
Country 0
Postcode 1522
Phone Number 6861
Timezone 0
Longitude 1
Latitude 1
dtype: int64
# 查看缺失的具体数据
data[data['City'].isna()]
Brand | Store Number | Store Name | Ownership Type | Street Address | City | State/Province | Country | Postcode | Phone Number | Timezone | Longitude | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5069 | Starbucks | 31657-104436 | سان ستيفانو | Licensed | طريق الكورنيش أبراج سان ستيفانو | NaN | ALX | EG | NaN | 20120800287 | GMT+2:00 Africa/Cairo | 29.96 | 31.24 |
5088 | Starbucks | 32152-109504 | النايل سيتى | Licensed | كورنيش النيل أبراج النايل سيتى | NaN | C | EG | NaN | 20120800307 | GMT+2:00 Africa/Cairo | 31.23 | 30.07 |
5089 | Starbucks | 32314-115172 | أسكندرية الصحراوى | Licensed | الكيلو 28 طريق الاسكندرية الصحراوى, سيتى سنتر ... | NaN | C | EG | NaN | 20185022214 | GMT+2:00 Africa/Cairo | 31.03 | 30.06 |
5090 | Starbucks | 31479-105246 | مكرم عبيد | Licensed | شارع مكرم عبيد, سيتى ستارز مول | NaN | C | EG | NaN | 20120800332 | GMT+2:00 Africa/Cairo | 31.34 | 30.09 |
5091 | Starbucks | 31756-107161 | سيتى ستارز 1 | Licensed | شارع عمر بن الخطاب, سيتى ستارز مول | NaN | C | EG | NaN | 20120800350 | GMT+2:00 Africa/Cairo | 31.33 | 30.06 |
5092 | Starbucks | 1397-139244 | سيتى ستارز 3 | Licensed | شارع عمر بن الخطاب, كارفور المعادى | NaN | C | EG | NaN | 20120029885 | GMT+2:00 Africa/Cairo | 31.33 | 30.06 |
5093 | Starbucks | 32191-116645 | معادى سيتى سنتر | Licensed | القطامية الطريق الدائرى | NaN | C | EG | NaN | 20185002677 | GMT+2:00 Africa/Cairo | 31.30 | 29.99 |
5094 | Starbucks | 3664-142484 | سليمان أباظة | Licensed | 34شارع سليمان أباظة المهندسين, تيفولى مول | NaN | C | EG | NaN | 129007799 | GMT+2:00 Africa/Cairo | 31.20 | 30.06 |
5095 | Starbucks | 3562-131562 | تيفولى | Licensed | ألماظة ميدان الجوهر شارع أحمد فوزى, صالة السفر 1 | NaN | C | EG | NaN | 018-0819995 | GMT+2:00 Africa/Cairo | 31.34 | 30.08 |
5096 | Starbucks | 31646-106547 | مطار القاهرة | Licensed | صالة السفر 1- مطار القاهرة, فندق سنافير | NaN | C | EG | NaN | 20120800335 | GMT+2:00 Africa/Cairo | 31.41 | 30.11 |
5097 | Starbucks | 31755-107182 | سنافير - نعمه بيه | Licensed | فندق سنافير - نعمة بيه, المركاتو مول2 | NaN | JS | EG | NaN | 20120800327 | GMT+2:00 Africa/Cairo | 34.33 | 27.91 |
5098 | Starbucks | 32389-107342 | المركاتو مول2 | Licensed | الهضبة - الملركاتو2 بجوار المسرح الرومانى, مول... | NaN | JS | EG | NaN | 20185022217 | GMT+2:00 Africa/Cairo | 34.33 | 27.92 |
5099 | Starbucks | 32490-111349 | خان لاجونا | Licensed | خليج نبق مول - خان لاجونا | NaN | JS | EG | NaN | 20189888547 | GMT+2:00 Africa/Cairo | 34.43 | 28.04 |
9871 | Starbucks | 26909-228505 | Vivacity Megamall | Licensed | NA, Na | NaN | 13 | MY | NaN | 82263673 | GMT+08:00 Asia/Kuala_Lumpur | 110.36 | 1.53 |
10767 | Starbucks | 31429-102231 | ابراج البيت 1 | Licensed | شارع اجياد- باب الملك عبد العزيز | NaN | 2 | SA | NaN | 96625719012 | GMT+03:00 Asia/Riyadh | 39.83 | 21.42 |
data['City'] = data['City'].fillna(data['State/Province'])
data[data['Country']=='EG'][['City','State/Province']] # 查看EG的City的填补情况
City | State/Province | |
---|---|---|
5069 | ALX | ALX |
5070 | Cairo | C |
5071 | Cairo | C |
5072 | Cairo | C |
5073 | Cairo | C |
5074 | Cairo | C |
5075 | Cairo | C |
5076 | Cairo | C |
5077 | Cairo | C |
5078 | Cairo | C |
5079 | Cairo | C |
5080 | Cairo | C |
5081 | Cairo | C |
5082 | Cairo | C |
5083 | Cairo | C |
5084 | Cairo | C |
5085 | Cairo | C |
5086 | Cairo | C |
5087 | Cairo | C |
5088 | C | C |
5089 | C | C |
5090 | C | C |
5091 | C | C |
5092 | C | C |
5093 | C | C |
5094 | C | C |
5095 | C | C |
5096 | C | C |
5097 | JS | JS |
5098 | JS | JS |
5099 | JS | JS |
data['Brand'].unique()
array(['Starbucks', 'Teavana', 'Evolution Fresh', 'Coffee House Holdings'],
dtype=object)
starbucks = data[data['Brand']=='Starbucks']
starbucks.head()
Brand | Store Number | Store Name | Ownership Type | Street Address | City | State/Province | Country | Postcode | Phone Number | Timezone | Longitude | Latitude | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Starbucks | 47370-257954 | Meritxell, 96 | Licensed | Av. Meritxell, 96 | Andorra la Vella | 7 | AD | AD500 | 376818720 | GMT+1:00 Europe/Andorra | 1.53 | 42.51 |
1 | Starbucks | 22331-212325 | Ajman Drive Thru | Licensed | 1 Street 69, Al Jarf | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.42 |
2 | Starbucks | 47089-256771 | Dana Mall | Licensed | Sheikh Khalifa Bin Zayed St. | Ajman | AJ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 55.47 | 25.39 |
3 | Starbucks | 22126-218024 | Twofour 54 | Licensed | Al Salam Street | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.38 | 24.48 |
4 | Starbucks | 17127-178586 | Al Ain Tower | Licensed | Khaldiya Area, Abu Dhabi Island | Abu Dhabi | AZ | AE | NaN | NaN | GMT+04:00 Asia/Dubai | 54.54 | 24.51 |
# 数据集数据量
starbucks.shape
print('数据集涵盖数据%i条,%i个字段信息'%(starbucks.shape[0],starbucks.shape[1]))
(25249, 13)
数据集涵盖数据25249条,13个字段信息
# 国家数量
country_count=len(starbucks['Country'].unique())
print('数据集涵盖%i个国家'%country_count)
数据集涵盖73个国家
# 城市数量
city_count = len(starbucks['City'].unique())
print('数据集涵盖%i个城市'%city_count)
数据集涵盖5406个城市
# 星巴克全球分布情况
# 哪些国家最多
world = starbucks['Country'].value_counts()[:10] # 数量最多的前10个国家
world
US 13311
CN 2734
CA 1415
JP 1237
KR 993
GB 901
MX 579
TW 394
TR 326
PH 298
Name: Country, dtype: int64
# 可视化
plt.style.use('ggplot')
world.plot(kind='bar',figsize=(10,8),alpha=0.5,rot=45,color='b')
plt.xticks(fontsize=14)
plt.xlabel('country',fontsize=14,labelpad=12)
plt.ylabel('counts',fontsize=14,labelpad=12)
plt.title('Global distribution of Starbucks for country',fontsize=14,pad=12)
for i,j in zip(range(len(world)),world):
plt.text(i,j+0.5,j,ha='center',va='bottom',fontsize=12)
# 哪些城市最多
city = starbucks['City'].value_counts()[:10]
city
上海市 542
Seoul 243
北京市 234
New York 230
London 215
Toronto 186
Mexico City 180
Chicago 179
Las Vegas 153
Seattle 151
Name: City, dtype: int64
# 可视化
city.plot(kind='bar',figsize=(10,8),color='orange',alpha=0.8,rot=45)
plt.xticks(fontsize=14)
plt.xlabel('city',fontsize=14,labelpad=12)
plt.ylabel('counts',fontsize=14,labelpad=12)
plt.title('Global distribution of Starbucks for city',fontsize=14,pad=12)
for i,j in zip(range(len(city)),city):
plt.text(i,j+0.5,j,ha='center',va='bottom',fontsize=12)
# 筛出中国的店铺信息
cn = starbucks[starbucks['Country']=='CN']
cn.shape
(2734, 13)
cn_city = cn['City'].value_counts()[:10]
cn_city
上海市 542
北京市 234
杭州市 117
深圳市 113
广州市 106
Hong Kong 104
成都市 98
苏州市 90
南京市 73
武汉市 67
Name: City, dtype: int64
# 可视化
cn_city.plot(kind='barh',color='g',alpha=0.8,figsize=(10,8))
plt.yticks(fontsize=14)
plt.ylabel('city',fontsize=14,labelpad=12)
plt.xlabel('counts',fontsize=14,labelpad=12)
plt.title('Distribution of Starbucks in China for city',fontsize=14,pad=12)
for i,j in zip(cn_city,range(len(cn_city))):
plt.text(i+5,j,i,ha='left',va='center',fontsize=12)
1.星巴克作为小资标志其店铺的分布与国家和城市的经济有密切关系。
2.星巴克在全球的分布中,店铺数量最多的为美国,其次市是中国和加拿大。美国作为经济实力国,中国作为经济实力迅速发展的国家,这两个国家的星巴克店铺排名靠前与其经济发展有密切关系。
3.星巴克在我国城市的分布情况也与城市的经济发展水平有密切关系,其在店铺分布都集中在北上广深,其中最多的是上海市,超出排名第二的北京市超2倍之多的数量。