https://medium.com/analytics-vidhya/introduction-to-interactive-geoplots-with-plotly-and-mapbox-9249889358eb
如今拥有智能手机的任何人都非常了解位置跟踪。几乎您现在使用的任何应用程序都希望使用它来了解其客户群的人口统计特征。Uber和Ola等乘车服务根据地点,时间和交通情况提供乘车服务。感谢地理位置,现在您可以想象出这种位置数据!
数据可视化工具通过绘图显示模式和洞察力变得越来越有效。由于这些工具的进步,您甚至可以在您的机器上构建交互式3D绘图。
在本文中,我们将探索geoplots的概念和使用。为此,我们将利用Python中流行的plotly库,集成Mapbox图(稍后将详细介绍)。我们将在本文中使用纽约市出租车票价预测数据集,因此请继续从此链接下载。我们来挖掘吧!
[图片上传中...(image-7b1e36-1561453902541-10)]
先决条件
- 熊猫
- Matplotlib
让我们先导入基本库。在执行以下代码之前,您需要在机器中进行绘图安装:
#import必要的库
导入numpy as np
import pandas as pd
import plotly
import plotly.plotly as py
import plotly.offline as offline
import plotly.graph_objs as go
默认情况下,在联机模式下可以使用,这要求您在达到公共API限制后生成个人API令牌。如果您想与他人共享可视化并动态修改数据点以查看更新的可视化,那么在线模式将为您执行此操作。
但是,如果您希望在Jupyter Notebook中脱机工作,则可以通过添加以下行来执行此操作:
init_notebook_mode(连接=真)
来到我们的数据集,它有超过一百万行!传统计算机可能很难处理所有这些数据点而不会升温(甚至崩溃)。使用pandas,我们可以解析数据帧的前n个:
[图片上传失败...(image-7ca5b8-1561453902541)]
您需要从地图框中获取个人访问权限才能绘制自定义地图。这些图是从两个对象中提取的:
第一个是数据
另一个是情节的布局
数据对象是一个python列表对象类型,其中包含来自plotly 的go.Scattermapbox函数。参数声明为python字典键值对。有关参数和实现的更多详细信息,请参阅plotly文档页面。
[图片上传中...(image-357477-1561453902540-8)]
shaz13_custom_style =“mapbox:// styles / shaz13 / cjiog1iqa1vkd2soeu5eocy4i”
#set the geo = spatial data
data = [go.Scattermapbox(
lat = train ['pickup_latitude'],
lon = train ['pickup_longitude'],
customdata = train ['key'],
mode ='markers',
marker = dict (
size = 4,
color ='gold',
opacity = .8,
),
)]
#set布局绘制
布局= go.Layout(autosize = False,
mapbox = dict(accesstoken =“YOUR_ACCESS_TOKEN”,
方位= 10,
pitch = 60,
zoom = 13,
center = dict(lat = 40.721319,
lon = -73.987130) ),
style = shaz13_custom_style),
width = 900,
height = 600,
title =“在NewYork挑选地点”)
几乎完成了!现在,您需要做的就是将其包装到称为图形的dict对象中。这应该初始化数据点并映射到我们的无花果对象。您可以通过简单地使用iplot函数来绘制它:
fig = dict(data = data,layout = layout)
iplot(fig)
[图片上传失败...(image-359577-1561453902540)]
您可以做的另一个很棒的事情是放大图并检查所有微型点,如下所示:
[图片上传中...(image-aed30b-1561453902540-6)]
您可以尝试使用Mapbox提供的不同主题的完整图库。您还可以在mapbox studio中设计自己的主题。
[图片上传中...(image-20af1e-1561453902540-5)]
您可以将自定义主题设置为公共主题并从仪表板复制mapstyle链接,从而使其他人可以访问自定义主题:
[图片上传失败...(image-691079-1561453902540)]
探索情节
这些地图有很多值得探索的地方。使用plotly的内置功能,我们可以在同一个图中可视化两组条件!一个很好的例子就是在纽约机场绘制早期和晚期的接送地点。
首先,让我们从时间戳中提取日期时间功能:
train ['pickup_datetime_month'] = train ['pickup_datetime']。dt.month
train ['pickup_datetime_year'] = train ['pickup_datetime']。dt.year
train ['pickup_datetime_day_of_week_name']
= train ['pickup_datetime']。dt。 weekday_name
train ['pickup_datetime_day_of_week']
= train ['pickup_datetime']。dt.weekday
train ['pickup_datetime_day_of_hour'] = train ['pickup_datetime']。dt.hour
[图片上传中...(image-55b40d-1561453902540-3)]
大!现在我们有了年,小时,日,月和工作日的名称信息。让我们来看看纽约人的一些模式吧!
典型的工作日从星期一开始。因此,我们将在工作日对我们的数据进行细分。所述pickup_datetime_day_of_week是的数值表示pickup_datetime_day_of_week_name(从周一从0开始)。
#Weekday
business_train = train [train ['pickup_datetime_day_of_week'] <5]
#bining time of
a early early_business_hours = business_train [business_train ['pickup_datetime_day_of_hour'] <10]
late_business_hours = business_train [business_train ['pickup_datetime_day_of_hour']> 18]
data = [go.Scattermapbox(
lat = early_business_hours ['dropoff_latitude'],
lon = early_business_hours ['dropoff_longitude'],
customdata = early_business_hours ['key'],
mode ='markers',
marker = dict(
size = 5,
color = 'gold',
opacity = .8),
name ='early_business_hours'
),
go.Scattermapbox(
lat = late_business_hours ['dropoff_latitude'],
lon = late_business_hours ['dropoff_longitude'],
customdata = late_business_hours ['key'],
mode = “markers',
marker = dict(
size = 5,
color ='cyan',
opacity = .8),
name ='late_business_hours'
)]
layout = go.Layout(autosize = False,
mapbox = dict(accesstoken =“YOUR_ACCESS_TOKEN”,
bearing = 10,
pitch = 60 ,
zoom = 13,
center = dict(
lat = 40.721319,
lon = -73.987130),
style =“mapbox:// styles / shaz13 / cjiog1iqa1vkd2soeu5eocy4i”),
width = 900,
height = 600,title =“早期与晚期营业日取件地点”)
fig = dict(data = data,layout = layout)
iplot(fig)
[图片上传中...(image-da53a4-1561453902539-2)]
看起来不错。其中许多地点可能是办公室或工作场所。将它与周末进行比较会很有趣。
weekend_train = train [train ['pickup_datetime_day_of_week']> = 5]
early_weekend_hours = weekend_train [weekend_train ['pickup_datetime_day_of_hour'] <10]
late_weekend_hours = weekend_train [weekend_train ['pickup_datetime_day_of_hour']> 6]
data = [go.Scattermapbox(
lat = early_weekend_hours ['dropoff_latitude'],
lon = early_weekend_hours ['dropoff_longitude'],
customdata = early_weekend_hours ['key'],
mode ='markers',
marker = dict(
size = 5,
color = 'violet',
opacity = .8),
name ='early_weekend_hours'
),
go.Scattermapbox(
lat = late_weekend_hours ['dropoff_latitude'],
lon = late_weekend_hours ['dropoff_longitude'],
customdata = late_weekend_hours ['key'],
mode = “标记”,
标记=字典(
size = 5,
color ='orange',
opacity = .8),
name ='late_weekend_hours'
)]
layout = go.Layout(autosize = False,
mapbox = dict(accesstoken =“YOUR_ACCESS_TOKEN”,
bearing = 10,
pitch = 60 ,
zoom = 13,
center = dict(
lat = 40.721319,
lon = -73.987130),
style =“mapbox:// styles / shaz13 / cjiog1iqa1vkd2soeu5eocy4i”),
width = 900,
height = 600,title =“早期和晚期周末天的取件地点”)
fig = dict(data = data,layout = layout)
iplot(fig)
[图片上传中...(image-3ba20e-1561453902539-1)]
即便是这样的细微信息也会带来数据中隐藏的模式,例如乘客的行为,飞行时间与出租车预订等等。以下是我在30,000行的随机样本中观察到的一些有趣模式:
- 与当天早些时候相比,纽约人倾向于在营业时间较晚的时候乘坐出租车
- 我们知道票价取决于行驶的距离和时间。但是,某个地点多久会吸引更高的票价?而且,为什么?
high_fares = train [train ['fare_amount']> train.fare_amount.mean()+ 3 * train.fare_amount.std()]
data = [go.Scattermapbox(
lat = high_fares ['pickup_latitude'],
lon = high_fares ['pickup_longitude'],
customdata = high_fares ['key'],
mode ='markers',
marker = dict(
size = 8,
color = 'violet',
opacity = .8),
name ='high_fares_pick_up'
),
go.Scattermapbox(
lat = high_fares ['dropoff_latitude'],
lon = high_fares ['dropoff_longitude'],
customdata = high_fares ['key'],
mode = “标记”,
marker = dict(
size = 8,
color ='gold',
opacity = .8),
name ='high_fares_drop_off'
)]
layout = go.Layout(autosize = False,
mapbox = dict(accesstoken =“YOUR_ACCESS_TOKEN”,
bearing = 10,
pitch = 60,
zoom = 13 ,
center = dict(
lat = 40.721319,
lon = -73.987130),
style =“mapbox:// styles / shaz13 / cjk4wlc1s02bm2smsqd7qtjhs”),
width = 900,
height = 600,title =“High Fare Locations”)
fig = dict(data = data,layout = layout)
iplot(fig)
[图片上传失败...(image-f129e5-1561453902538)]
结束笔记
我们给出的数据通常有许多隐藏的见解和模式,我们需要通过玩它来提取。创造力,好奇心和想象力是您进行此类分析所需的关键技能(当然还有数据科学!)。我的Kaggle内核提供了完整的教程和代码实现。
请在下面的评论中分享您的想法,想法和反馈。