交互式数据可视化

简介(我们将创建的内容)： (Introduction (what we’ll create):)

Plotly is the library that has set the benchmark for interactivity for all the available map-visualization python libraries. It is based on the JavaScript library D3.js. There’s hardly anything you can’t do with Plotly. Display data on hover, zoom into the map, pan the map, add buttons and sliders, create live animations, and the list goes on.

Plotly是为所有可用的地图可视化python库设置了交互性基准的库。它基于JavaScript库D3.js。使用Plotly几乎没有什么可以做的。在悬停上显示数据，放大地图，平移地图，添加按钮和滑块，创建实时动画，然后列表继续。

This tutorial will introduce Plotly. We will make a Choropleth visualization in this tutorial, like the one shown below. However, Plotly can be used equally well for creating scatter visualizations. We will cover scatter visualizations using Plotly in the ‘Plotly + Mapbox’ tutorial and the ‘Plotly + Datashader’ tutorial.

本教程将介绍Plotly。我们将在本教程中进行Choropleth可视化，如下图所示。但是，Plotly可以很好地用于创建散点图。我们将在“ Plotly + Mapbox”教程和“ Plotly + Datashader”教程中使用Plotly涵盖散点图。

本教程的结构： (Structure of the tutorial:)

The tutorial is structured into the following sections:

本教程分为以下几节：

Pre-requisites
先决条件
Installing Plotly
剧情安装
Converting shapefile to GeoJSON
将shapefile转换为GeoJSON

Converting shapefile to GeoJSON
将shapefile转换为GeoJSON

-
--

Method 1: Using OGR
方法1：使用OGR

Method 1: Using OGR
方法1：使用OGR

-
--

Method 2: Using GeoPandas
方法2：使用GeoPandas
Getting started with the tutorial
教程入门
When to use this library
何时使用此库

先决条件： (Pre-requisites:)

This tutorial assumes that you are familiar with python and that you have python downloaded and installed in your machine. If you are not familiar with python but have some experience of programming in some other languages, you may still be able to follow this post, depending on your proficiency.

本教程假定您熟悉python，并且已在计算机中下载并安装了python。如果您不熟悉python，但有一些使用其他语言进行编程的经验，则根据您的熟练程度，仍然可以继续阅读此文章。

It is recommended, but not necessary, that you go through the GeoPandas tutorial to get an overall idea of shapefiles.

建议(但不是必须)您通过GeoPandas教程来获得shapefile的整体概念。

Plotly安装： (Installing Plotly:)

If you are using Anaconda, you can run:

如果您正在使用Anaconda，则可以运行：

conda install -c plotly plotly=4.8.1

Otherwise, you can try the pip installer:

否则，您可以尝试使用pip安装程序：

pip install plotly==4.8.1

For more information related to the installation, you can see https://plotly.com/python/getting-started/

有关与安装有关的更多信息，请参见https://plotly.com/python/getting-started/

将shapefile转换为GeoJSON： (Converting shapefile to GeoJSON:)

Unlike GeoPandas, plotly doesn’t read shapefiles. Instead, it requires a GeoJSON file for reading the geometry and attributes of the relevant shapes. If you have a GeoJSON file available, you can skip this section.

与GeoPandas不同，plotly不读取shapefile。相反，它需要一个GeoJSON文件来读取相关形状的几何形状和属性。如果您有可用的GeoJSON文件，则可以跳过此部分。

If you don’t have a GeoJSON file directly, you can convert a shapefile to GeoJSON.

如果您没有直接的GeoJSON文件，则可以将shapefile转换为GeoJSON。

We will discuss two methods of converting a shapefile to GeoJSON.

我们将讨论将shapefile转换为GeoJSON的两种方法。

方法1：使用OGR (Method 1: Using OGR)

This is the method that has been shown in the notebook. It makes use of the Geospatial Data Abstraction Library (GDAL). Actually, to be more specific, it makes use of the OGR library, which comes along with GDAL, to perform manipulations on geospatial vector data. For more information on GDAL, see https://pypi.org/project/GDAL/.

这是笔记本中显示的方法。它利用了地理空间数据抽象库(GDAL)。实际上，更具体地说，它利用GDAL附带的OGR库对地理空间矢量数据执行操作。有关GDAL的更多信息，请参见https://pypi.org/project/GDAL/ 。

Once you have GDAL installed, import ogr and call the ESRI Shapefile driver.

安装GDAL后，导入ogr并调用ESRI Shapefile驱动程序。

import ogr#We used compressed shapefiles obtained from mapshaper.org
driver = ogr.GetDriverByName('ESRI Shapefile')shp_path = 'shape_files\\India_States_2020_compressed\\India_states.shp'data_source = driver.Open(shp_path, 0)

Here, you can see that we are using compressed shapefiles. This is because the visualizations generated by plotly tend to be very heavy, and their size is directly proportional to the size of the geospatial data. So if our shapefile size gets reduced by 50%, the visualization size also gets reduced by approximately the same proportion. So we took our original shapefiles and performed compression on them using the online tool https://mapshaper.org/. To know more about how to perform the conversions using mapshaper, click here. The compressed shapefile sizes are approximately 10% of the original.

在这里，您可以看到我们正在使用压缩的shapefile。这是因为通过散点图生成的可视化效果往往非常繁重，并且其大小与地理空间数据的大小成正比。因此，如果我们的shapefile大小减少了50％，则可视化文件的大小也将减少大约相同的比例。因此，我们采用了原始的shapefile，并使用在线工具https://mapshaper.org/对它们进行了压缩。要了解有关如何使用mapshaper进行转换的更多信息，请单击此处。压缩的shapefile大小约为原始大小的10％。

Once we have the shapefiles opened, we extract the individual features (including geometry and attributes) and store them in the correct JSON format (this is done directly by the ExportToJson method). Once that is done, we dump the JSON file to local storage.

打开shapefile后，我们将提取单个特征(包括几何和属性)并将其存储为正确的JSON格式(这直接由ExportToJson方法完成)。完成后，我们将JSON文件转储到本地存储。

fc = {
         
    'type': 'FeatureCollection',
    'features': []
    }lyr = data_source.GetLayer(0)
for feature in lyr:    
    fc['features'].append(feature.ExportToJson(as_object=True))with open('json_files\\India_States_2020_compressed.json', 'w') as f:
    json.dump(fc, f)

方法2：使用GeoPandas (Method 2: Using GeoPandas)

This method is much more straightforward.

此方法更加简单。

import geopandas as gpd# set the filepath and load
fp = "shape_files\\India_States_2020_compressed\\India_States.shp"#reading the file stored in variable fp
map_df = gpd.read_file(fp)#Export it as GeoJSON
map_df.to_file("json_files\\India_States_2020_compressed_gpd.json", driver='GeoJSON')

You just need to read the shapefile using GeoPandas and then export it to GeoJSON using a single line, by specifying the driver as ‘GeoJSON’. However, the GeoJSON files created with this method tend to be slightly larger in size. For instance, for the same shapefile, the GeoJSON file created using OGR was 800 KB large while the one created using GeoPandas was 900 KB large. That’s about a 12.5% higher size using GeoPandas.

您只需要使用GeoPandas读取shapefile，然后通过将驱动程序指定为'GeoJSON'，即可使用一行将其导出到GeoJSON。但是，使用此方法创建的GeoJSON文件的大小往往会稍大。例如，对于同一个shapefile，使用OGR创建的GeoJSON文件大800 KB，而使用GeoPandas创建的GeoJSON文件大900 KB。使用GeoPandas的大小大约增加了12.5％。

教程入门： (Getting started with the tutorial:)

GitHub repo: https://github.com/carnot-technologies/MapVisualizations

GitHub回购： https : //github.com/carnot-technologies/MapVisualizations

Relevant notebook: PlotlyChoroplethDemo.ipynb

相关笔记本： PlotlyChoroplethDemo.ipynb

View notebook on NBViewer: Click Here

在NBViewer上查看笔记本：单击此处

导入相关软件包： (Importing relevant packages:)

import numpy as np
import pandas as pd
import plotly.express as px
import json
import ogr
# import geopandas as gpd

As can be seen from the import packages, we will be using plotly.express to create the visualization.

从导入包中可以看出，我们将使用plotly.express创建可视化文件。

了解GeoJSON文件： (Understanding the GeoJSON file:)

Now that we have the GeoJSON file, let’s open it.

现在我们有了GeoJSON文件，让我们打开它。

with open('json_files\\India_States_2020_compressed.json') as f:
  India_states = json.load(f)

You will see that the properties key for each feature holds the name for that feature, which, in our case, are the various states.

您将看到每个功能的属性键都具有该功能的名称，在我们的例子中，是各种状态。

India_states["features"][0]['properties']>> {'dtname': 'North  & Middle Andaman',
 'stcode11': '35',
 'dtcode11': '639',
 'year_stat': '2011_c',
 'Dist_LGD': 632,
 'State_LGD': 35,
 'JID': 178,
 'state_name': 'ANDAMAN & NICOBAR',
 'FID': 0}

Let us dig deeper into the geometry and have a look at one lat-lon pair:

让我们更深入地研究几何，看看一对纬度：

India_states["features"][0]['geometry']['coordinates'][0][0][0]>> [93.7, 7.22]

As you can see, we have coordinates accurate up to 2 decimal places, or about 1.1 km, which is more than sufficient for us.

如您所见，我们的坐标精确到小数点后2位(约1.1公里)，对我们来说已经足够了。

加载数据并创建可视化文件： (Loading the data and creating the visualization:)

We will use the state_dummy_data_no_null.csv file present in the data folder. We will revisit the ‘no-null’ part shortly.

我们将使用数据文件夹中存在的state_dummy_data_no_null.csv文件。我们将在短期内重新讨论“无空”部分。

#Load the csv file and check its contents. 
#Make sure that there is one entry for each state in the geojson
 
df = pd.read_csv('data/state_dummy_data_no_null.csv')
df.head()

Now, with plotly express, generating the visualization is boils down to just a couple of lines of code.

现在，使用plotly express，生成可视化过程可以归结为几行代码。

max_value = df['count'].max()
fig = px.choropleth(df, geojson=India_states, locations='st_nm',       
                           color='count',
                           color_continuous_scale="Viridis",
                           range_color=(0, max_value),
                           featureidkey="properties.state_name",
                           projection="mercator"
                          )fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
fig.show()

Here, we are specifying df as the dataframe of interest. India_states is the GeoJSON file. st_nm column in df is the relevant column containing the names of the states for which we have data. We want the coloring of each shape to happen according to the count column in df, with the range upper-bound at the max value of count Now, in the GeoJSON, the property which contains the names corresponding to the locations field is state_name . Finally the map projection will be mercator . You can get the list of map projections along with their interpretations on Wikipedia.

在这里，我们将df指定为感兴趣的数据帧。 India_states是GeoJSON文件。 df中的st_nm列是相关列，其中包含我们拥有数据的州的名称。我们希望根据df中的count列对每种形状进行着色，范围上限在count的最大值处。现在，在GeoJSON中，包含与location字段相对应的名称的属性为state_name 。最后，地图投影将成为mercator 。您可以在Wikipedia上获取地图投影列表及其解释。

Now, we want the visualization to be limited only to the locations of interest, and not span the lat-lon range of the entire world. That is achieved by setting the fitbounds= “locations” in the update_geos method. To get the list of all arguments of the update_geos method, click here. Finally, we set all margins to zero and display our visualization. To get the list of all arguments of the update_layout method, click here. Do explore the different arguments of the update_geos and the update_layout methods. There’s a lot you can do with them. Congratulations on your first interactive visualization. Take your time and play around with it!

现在，我们希望将可视化仅限于感兴趣的位置，而不要覆盖整个世界的经纬度范围。这可以通过在update_geos方法中设置fitbounds= “locations”来实现。要获取update_geos方法的所有参数的列表，请单击此处。最后，我们将所有边距设置为零并显示可视化。要获取update_layout方法的所有参数的列表，请单击此处。请探索update_geos和update_layout方法的不同参数。您可以为他们做很多事情。恭喜您首次互动可视化。慢慢来玩吧！

Please note that plotly express is a higher-level library. If you need a lower level library with enhanced control, you can switch to plotly graph objects. See the references for more information.

请注意，plotly express是更高级别的库。如果需要具有增强控件的较低级库，则可以切换到可绘制图形对象。有关更多信息，请参见参考。

Now, let us visit the no_null part. We have explicitly made sure that data corresponding to each shape in the GeoJSON is present in the CSV. Where there was no data present, we have added a 0 for the count. Why? Because plotly renders only those shapes for which some data is present. Let us plot only the first 25 rows of the dataframe, taking df.head(25) . The resulting visualization looks like this:

现在，让我们访问no_null部分。我们已明确确保在CSV中存在与GeoJSON中每个形状相对应的数据。如果没有数据，我们将计数添加为0。为什么？因为plotly仅渲染那些存在某些数据的形状。让我们仅绘制数据帧的前25行，取为df.head(25) 。产生的可视化效果如下所示：

Now, you certainly don’t want this kind of visualization, especially when there is no background to provide context. We will, however, make good use of the null shapes, when we add a background to the visualization using Mapbox. Till then, you better include data for all shapes. To get a list of all the shape names in the GeoJSON, you can learn the following loop:

现在，您当然不希望这种可视化，尤其是在没有背景可提供上下文的情况下。但是，当使用Mapbox向可视化添加背景时，我们将充分利用null形状。到那时，您最好包括所有形状的数据。要获取GeoJSON中所有形状名称的列表，您可以学习以下循环：

for i in range(0, len(India_states["features"])):
    print(India_states["features"][i]["properties"]["state_name"])

This will also help you identify any spelling and case differences between the shape names in your data and the shape names in the GeoJSON.

这还将帮助您识别数据中的形状名称与GeoJSON中的形状名称之间的任何拼写和大小写差异。

保存可视化： (Saving the visualization:)

Once you are ready with your visualization, you can export it as a standalone HTML file, to share with your friends without sharing the source code, or to include in your website.

准备好可视化后，您可以将其导出为独立HTML文件，与您的朋友共享而不共享源代码，或包含在您的网站中。

fig.write_html('html_files\\plotly_choropleth_demo.html')

You can also push these visualizations to Chart Studio and then get a direct embed link for your blog or website. We’ll discuss that in the Plotly with Mapbox tutorial.

您还可以将这些可视化效果推送到Chart Studio，然后直接为您的博客或网站嵌入链接。我们将在Plotly with Mapbox教程中进行讨论。

何时使用此库： (When to use this library:)

This library is incredibly powerful. But its power comes at a cost: file size. If you need an interactive visualization, just use this library without a thought. However, if you are going to add the visualization in a PDF report, or in a static presentation, you can still use plotly and download the visualization as a PNG, but it will consume higher resources compared to a library like GeoPandas.

这个库非常强大。但是其功能是有代价的：文件大小。如果您需要交互式的可视化效果，则无需考虑即可使用该库。但是，如果要在PDF报表或静态演示文稿中添加可视化文件，则仍然可以使用plotly并将可视化文件下载为PNG，但是与GeoPandas之类的库相比，它将消耗更多资源。

We are trying to fix some broken benches in the Indian agriculture ecosystem through technology, to improve farmers’ income. If you share the same passion join us in the pursuit, or simply drop us a line on [email protected]

我们正在尝试通过技术修复印度农业生态系统中一些破烂的长凳，以提高农民的收入。如果您有同样的热情，请加入我们的行列，或者直接给我们写信至[email protected]

翻译自: https://medium.com/tech-carnot/interactive-map-based-visualization-using-plotly-44e8ad419b97