When it comes to data visualization there are many possible tools Matplotlib, Plotly, Bokeh… Which one is fitting my short term goals, within a notebook, and is a good choice for longer-term, in production? What does production mean?
在数据可视化方面,有许多可能的工具Matplotlib,Plotly,Bokeh ...哪一款适合我的短期目标(在笔记本中),是长期生产中的理想选择? 生产是什么意思?
Now that you have a nice machine learning model, or you have completed some data mining or analysis, you need to present and promote this amazing work. You may initially reuse some notebooks to produce a few charts… but soon colleagues or clients are requesting access to the data or are asking for other views or parameters. What should you do? Which tools and libraries should you use? Is there a one fits all solution for all stages of my work?
现在您已经拥有了一个不错的机器学习模型,或者您已经完成了一些数据挖掘或分析,您需要展示并推广这项出色的工作。 您最初可能会重用一些笔记本来生成一些图表…但是很快同事或客户正在请求访问数据或要求其他视图或参数。 你该怎么办? 您应该使用哪些工具和库? 是否有适合我工作所有阶段的解决方案?
Data-visualization has a very wide scope, ranging from presenting data with simple charts to be included in a report, to complex interactive dashboards. The first is reachable to anybody that knows about Excel whereas the later is more a software product that may require the full software development cycle and methodology.
数据可视化的范围很广,范围从用简单的图表显示要包含在报告中的数据到复杂的交互式仪表板。 第一个对任何了解Excel的人都可以实现,而第二个则更多是可能需要完整的软件开发周期和方法论的软件产品。
In between these two extreme cases, data scientists face many choices that are not trivial. This post is providing some questions that will come along this process, and some tips and answers to these. The chosen starting point is Python within a Jupiter notebook, the target is a Web dashboard in production.
在这两种极端情况之间,数据科学家面临许多不平凡的选择。 这篇文章提供了这个过程中会遇到的一些问题,以及一些提示和解答。 选择的起点是Jupiter笔记本中的Python,目标是生产中的Web仪表板。
我想要哪种图表类型的库? (Which library for the chart type do I want?)
Getting the right chart type is always the first issue we are thinking about.
获取正确的图表类型始终是我们正在考虑的首要问题。
You have a great new idea for the data visualization, your boss is in love with Sunburst graphs, but is this doable with the charting libraries you are using?
您对数据可视化有了一个很棒的新主意,您的老板爱上了Sunburst图形,但这在您使用的图表库中可行吗?
Mainstream open-source charting libraries in Python Matplotlib with Seaborn, Plotly, Bokeh, support more or less the same set of chart types. They also support pixel matrices that allow for extensions like displaying word clouds.
具有Seaborn , Plotly , Bokeh的 Python Matplotlib中的主流开源图表库支持或多或少的同一组图表类型。 它们还支持像素矩阵,可以进行扩展,例如显示词云。
Here is a sample of drawing a word cloud from the Word Cloud Python library documentation¹:
这是从Word Cloud Python库文档¹中绘制词云的示例:
import numpy as np
import matplotlib.pyplot as plt
from wordcloud import WordCloud
text = "square"
x, y = np.ogrid[:300, :300]
mask = (x - 150) ** 2 + (y - 150) ** 2 > 130 ** 2
mask = 255 * mask.astype(int)
wc = WordCloud(background_color="white", repeat=True, mask=mask)
wc.generate(text)
plt.axis("off")
plt.imshow(wc, interpolation="bilinear")
plt.show()
网络图 (Network graphs)
Network graphs are a specific category that is not natively handled by the above-listed libraries. The main Python library for networks is NetworkX. By default, NetworkX is using Matplotlib as a backend for drawing². Graphviz (Pygraphviz) is the de facto standard graph drawing libraries and can be coupled with NetworkX³. With quite a few lines of code, you may also use Plotly to draw the graph⁴. The integration between NetworkX and Bokeh is also documented⁵.
网络图是上述库未本地处理的特定类别。 网络的主要Python库是NetworkX 。 默认情况下,NetworkX使用Matplotlib作为绘图²的后端。 Graphviz(Pygraphviz)是事实上的标准图形绘图库,可以与NetworkX³结合使用。 有了几行代码,您还可以使用Plotly绘制图形⁴。 NetworkX和Bokeh之间的集成也已记录⁵。
地理图 (Geographic plots)
Geographically located information and maps is also a specific subfield of data-visualization. Plotting maps bears many challenges, among them:
地理位置优越的信息和地图也是数据可视化的特定子领域。 绘制地图面临许多挑战,其中包括:
- Handling large and complex contours (e.g. country borders) 处理大而复杂的轮廓(例如国家边界)
- Showing more or less information depending on the zoom, this is also known as the level of detail. The impact is at the same time on the readability of the plot, and on the complexity that turns into latency to load and memory footprint 根据缩放显示更多或更少的信息,这也称为详细程度。 同时,这会影响图形的可读性以及对复杂性的影响,复杂性会导致加载延迟和内存占用量
- Handling of geographical coordinates, that is single or multiple projections from non-euclidean spaces to the 2D euclidian space (e.g. latitude-longitude to UTM with a given UTM zone) 地理坐标的处理,即从非欧氏空间到2D欧几里得空间的单个或多个投影(例如,具有给定UTM区域的UTM的经纬度)
- Availability of information in multiple file formats, even if there are de facto standards like GeoJSON and Shape-file 即使存在像GeoJSON和Shape-file这样的事实上的标准,也可以多种文件格式提供信息
Depending on the chosen plotting library, you may have to do little or a lot of pre-processing. The common open-source library to deal with geographical coordinate is Proj, and GDAL to deal with file formats and translation from formats or coordinates to other contexts.
根据所选的绘图库,您可能需要做很少或很多预处理。 常见的用于处理地理坐标的开源库是Proj ,而GDAL则用于处理文件格式以及从格式或坐标到其他上下文的转换。
Matplotlib has no direct support for plotting maps, it relies on pixel matrics (a raster image) as explained in the gallery⁶. But this is a pis-aller, you do not want to do that, but if your only target is a static image.
Matplotlib不直接支持绘制地图,它依赖于像素矩阵(栅格图像),如gallery⁶中所述。 但是,这很麻烦 ,您不想这样做,但是如果您的唯一目标是静态图像。
Plotly is demonstrating map plots⁷ based on Mapbox. Many features are available but at least one is missing, the management of the level of detail in scatter plots.
Plotly在演示基于Mapbox的地图地块 。 可以使用许多功能,但至少缺少一项功能,即散点图中的详细程度管理。
Bokeh has some support for maps including the integration of Google maps⁸, but it seems quite crude.
Bokeh对地图提供了一些支持,包括Google Maps的集成,但是看起来很粗糙。
Map with areas (contours), clusters of markers, hover information created with Folium © the author 使用区域(轮廓),标记群,使用Folium创建的悬停信息进行地图©作者Folium is a Python library wrapping the Leaflet Javascript library. Leaflet is used in many collaborative and commercial websites, for example Open Street Map. Folium is very effective to draw maps on Open data.
Folium是包装Leaflet Javascript库的Python库。 Leaflet用于许多协作和商业网站,例如Open Street Map 。 Folium在打开数据上绘制地图非常有效。
The open-source reference for geographic data manipulation is QGIS from OSGeo. It is a desktop application but it includes a Python console and it is also possible to directly use the Pyqgis API⁹.
地理数据操作的开源参考是OSGeo的QGIS 。 它是一个桌面应用程序,但是包含一个Python控制台,还可以直接使用PyqgisAPI⁹。
数据框 (Dataframes)
Pandas and its Dataframe are must use for data science in Python. What is the impact on chart creation and dashboard?
在Python中,Pandas及其数据框必须用于数据科学。 对图表创建和仪表板有什么影响?
On the one hand, Pandas Dataframes and Series have a plot API. By default, it is relying on Matplotlib as the backend. However, Plotly as the graphical backend for Pandas is available¹⁰. Support for Bokeh is also available through a Github project¹¹.
一方面,Pandas Dataframes和Series具有plot API。 默认情况下,它依赖Matplotlib作为后端。 但是,可以使用Plotly作为Pandas的图形后端。 也可以通过Github项目¹¹获得对Bokeh的支持。
On the other hand, the plots of Pandas might not fit with your requirements and you are wondering how to use dataframes in plots besides using columns and series as vectors. Plotly Express has this capability with support for column-oriented data¹².
另一方面,Pandas的图可能不符合您的要求,您想知道如何在图中使用数据框,除了将列和序列用作向量之外。 Plotly Express具有此功能,并且支持面向列的数据1,2。
不得已,d3.js (Last resort, d3.js)
If none of these libraries and their extensions are dealing with the chart you are looking for, then you may switch to d3.js which is the base charting library for browsers. It means that you would leave the Python world and enter Javascript’s domain. Possibilities and customization are vast as shown in the example gallery¹³. However, you will need to handle many aspects of the graph that are granted in other libraries: axes, legend, interactivity…
如果这些库及其扩展名都无法处理您要查找的图表,则可以切换到d3.js ,它是浏览器的基本图表库。 这意味着您将离开Python世界,然后进入Javascript的域。 如示例库¹所示,可能性和定制性很大。 但是,您将需要处理其他库中授予的图形的许多方面:轴,图例,交互性...
如何建立包含多个图表的视图? (How can I build a view with multiple charts?)
In the design of a dataviz, the layout or composition of charts come along with the requirement for multiple charts to display several features. You probably already enjoyed the pluses and minuses of Matplotlib’s subplots, starting with the quirky imperative commands like “`plt.subplot(1, 2, 1)`” or even weirder equivalent “`plt.subplot(121)`”. If this is enough to reach your goal, I would anyway suggest using the alternate and cleaner “plt.subplots()” API that returns a figure and an array of axes. You might anyway feel limited not only by the interactivity, this is dealt with in the next section, but also limited in layout capabilities.
在数据视图的设计中,图表的布局或组成伴随着多个图表显示多个功能的需求。 您可能已经喜欢Matplotlib子图的优缺点,从“ plt.subplot(1、2、1)”之类的古怪的命令性命令甚至是等效的“ plt.subplot(121)”之类的怪异命令开始)。 如果这足以达到您的目标,无论如何,我还是建议您使用备用且更清洁的“ plt.subplots()” API,该API返回一个图形和一个轴数组。 无论如何,您可能会感到不仅受到交互性的限制,这将在下一部分中进行处理,而且还会受到布局功能的限制。
布局增强 (Enhancements to layouts)
Matplotlib allows for uneven widths and heights using calls to Figure.add_subplots method like “fig.add_subplot(3, 1, (1, 2))
" making a subplot that spans the upper 2/3 of the figure¹⁴. Seaborn is introducing one enhancement which is the Scatter matrix¹⁵.
Matplotlib使用对Figure.add_subplots方法的调用fig.add_subplot(3, 1, (1, 2))
例如“ fig.add_subplot(3, 1, (1, 2))
”)来制作宽度和高度不均匀的子图,该子图跨越了图形的上2/3。Seaborn正在引入一项增强功能这是分散矩阵。
Plotly allows for similar capabilities including the uneven sub-plots. However, I find the API rather limited. For example, it is not possible to set the font size of sub-plots title or to share a legend. Bokeh has similar capabilities¹⁶.
绘图允许类似的功能,包括不均匀的子图。 但是,我发现API相当有限。 例如,无法设置子图标题的字体大小或共享图例。 散景具有类似的功能¹⁶。
Plotly, Express API, goes further with marginal probability plots as histograms or rug¹⁷, and also a synchronized overview-detail chart that is called “range slider¹⁸”. This is leading us to the interactivity of graphs that is detailed in the next section.
图中,Express API更进一步,包括直方图或rug的边际概率图,以及称为“范围滑块”的同步概览详细信息图。 这将导致我们在下一节中详细介绍图的交互性。
前进的道路 (A way forward)
But what if these layout helpers are not enough for my purpose? Possible answers are many, ranging from the Plotly Dash solution to full HTML layout or SVG custom design with d3.js.
但是,如果这些布局助手不足以实现我的目的该怎么办? 可能的答案很多,从Plotly Dash解决方案到完整HTML布局或d3.js的SVG自定义设计。
Plotly Dash is proposing an intermediate solution in which you stick to Python but can generate some more complex and more interactive dashboards than the plotting libraries. Still, it requires that you have some basic HTML knowledge and sooner or later will dive into the cascading stylesheets (CSS).
Plotly Dash提出了一个中间解决方案,在该解决方案中,您坚持使用Python,但可以生成比绘图库更复杂,更具交互性的仪表板。 尽管如此,它仍然需要您具备一些基本HTML知识,并且迟早会深入到级联样式表(CSS)中。
如何与图表互动? (How can I interact with my graph?)
You are very pleased with the chart but it feels so static, there is not even a zoom!
您对图表非常满意,但是感觉非常静态,甚至没有缩放!
Interactivity is so many different things. It starts with common operations like zoom and pan. The next step is synchronized graphs: zoom and pan are applied simultaneously on several charts that share an axis. You might also be interested in synchronous selection on two graphs, also called brushing (example in Bokeh¹⁹). Matplolib has such interactivity for all render engines but within a notebook²⁰. There is however a solution based on Matplotlib, mpld3 is handling this and might provide all you need. However, the trend is to use newer libraries like Plotly or Bokeh that have zoom and pan in notebooks “out of the box”.
交互性是如此之多。 它从缩放和平移等常见操作开始。 下一步是同步图形:将缩放和平移同时应用于共享一个轴的多个图表。 您可能还对两个图形上的同步选择(也称为“刷”)感兴趣(例如,“散景”中的示例)。 Matplolib对于笔记本电脑内的所有渲染引擎都具有这种交互性。 但是,有一个基于Matplotlib的解决方案mpld3 正在处理此问题,可能会提供您所需的一切。 但是,趋势是使用诸如Plotly或Bokeh之类的较新库,这些库具有“开箱即用”的缩放和平移功能。
Then come dynamic annotations. They span from hover information when the mouse is located on a marker to line plot highlights. Regarding hover, whatever the used library (Matplotlib with Mpld3 and plugins, Plotly, Bokeh) it means attaching an HTML document div to each marker, and probably also some CSS.
然后是动态注释。 它们从鼠标悬停在标记上的悬停信息到线图高亮显示。 关于悬停,无论使用什么库(带有Mpld3和插件的Matplotlib和插件,Plotly,Bokeh),都意味着将HTML文档div附加到每个标记,可能还附加一些CSS。
More complex interactions are related to filtering or querying the data. It can be close to the zoom function when the filter is modifying a range (e.g.: daily / weekly / monthly selector for a time series), or a selector on the series of facets, or even more complex associations. Selectors are available in Plotly as Custom Controls²¹ and in Bokeh as widgets²².
更复杂的交互与过滤或查询数据有关。 当过滤器正在修改范围(例如:时间序列的每日/每周/每月选择器),一系列方面中的选择器或什至更复杂的关联时,它可能接近缩放功能。 选择器在Plotly中作为“自定义控件”¹¹,在Bokeh中作为小部件²²。
The common plotting libraries provide basic capabilities for interactivity up to the creation of some widget, but, as for advanced layouts, I would suggest to directly switch to Plotly Dash which is more versatile.
通用的绘图库提供了一些交互功能,直到创建某些小部件为止,但是对于高级布局,我建议直接切换到功能更丰富的Plotly Dash。
dataviz仪表板的呈现流畅吗? (Is the rendering of my dataviz dashboard fluid?)
The more complex the dashboard or the larger the data, the longer it takes to process, thus the longer to render. It may be ok to wait for a few seconds to get a view of a static plot. It is no longer ok to wait more than 1 second when the graph is interactive with widgets and controls.
仪表板越复杂或数据越大,处理所需的时间就越长,因此呈现的时间就越长。 等待几秒钟以获得静态图的视图可能没问题。 当图形与小部件和控件进行交互时,等待超过1秒不再是可以的。
If fluidity is gone, you have four main solutions, with increasing complexity:
如果流动性消失了,您将有四个主要解决方案,而且解决方案的复杂性将不断提高:
- Simplify the dashboard with fewer plots, fewer controls. That may be ok but then you should think why such complexity was needed first? 使用更少的绘图和更少的控件来简化仪表板。 没关系,但是您应该考虑为什么首先需要这种复杂性?
- Simplify the data, that shows less data or with less granularity. That may provide a good tradeoff between features and accuracy. It is leading to the next solution… 简化数据,以显示较少的数据或较小的粒度。 这可以在功能和准确性之间提供良好的折衷。 它导致了下一个解决方案……
- Offline pre-processing of data to pre-assemble the data shown in the dashboard. That probably means storing new series or new tables, leading eventually to other issues linked to data management. In the end, you will do a lot more data engineering and will probably reach a dead-end with two much data, too many tables. The solution to the dead-end is even more data engineering with… 离线数据预处理以预组装仪表盘中显示的数据。 这可能意味着要存储新系列或新表,最终导致与数据管理相关的其他问题。 最后,您将进行更多的数据工程,并且可能会遇到两个数据,太多表的死胡同。 死胡同的解决方案是使用……实现更多的数据工程。
- Online processing in dedicated servers and the design of an API. 专用服务器中的在线处理和API的设计。
A server with an API is not the first thing you were thinking but you end up dealing with this software project sooner than you think. It is also better to anticipate it than delay until there is no other solution and project deadlines are coming fast.
带有API的服务器不是您要考虑的第一件事,但是您最终比预期的要早处理该软件项目。 最好不要比没有其他解决方案并且项目截止日期快之前延迟要好。
Defining an API involves often several teams and skills: data scientists to define the why, data engineers to define the what, and infrastructure engineers to define the how, including performance, security, persistence, and integrity.
定义API通常涉及几个团队和技能:数据科学家定义原因,数据工程师定义内容,基础结构工程师定义方式,包括性能,安全性,持久性和完整性。
Plotly Dash allows for an initial API since it is based on the Flask framework. See the Dash documentation on integration with a Flask app which is defining a custom Flask route, that could serve data, i.e. an API²³. There is still no proper API access management, including authentication. On that latter aspect, Dash is very limited²⁴.
由于Plotly Dash基于Flask框架,因此允许使用初始API。 请参阅有关与Flask应用程序集成的Dash文档,该应用程序定义了可提供数据的自定义Flask路线,即API²³。 仍然没有适当的API访问管理,包括身份验证。 在后一个方面,Dash非常有限。
在开发dataviz时,我感觉自己很快/很慢/不知道 (While developing the dataviz, I feel like I am fast/slow/don’t know)
How much effort will it take you to develop and publish?
开发和发布将花费多少精力?
Some tools are effective, they deliver the expected result, but are not efficient, getting to the result takes a large amount of time. For example, d3.js is known as a very powerful and complete data-visualization and charting library but at the same time, it requires dealing with many things that are by default available in libraries with higher abstraction.
一些工具是有效的,它们提供了预期的结果,但是效率不高,要获得结果需要大量时间。 例如,d3.js被称为功能非常强大且完整的数据可视化和图表库,但与此同时,它需要处理许多默认情况下具有更高抽象度的库中可用的内容。
Productivity is coming not only with using the right level of abstraction, but also an API that is easy to use and well documented. I would say that none of the surveyed Python charting libraries are easy to master.
生产率不仅来自使用正确的抽象级别,还来自易于使用且文档完善的API。 我要说的是,没有一个调查过的Python图表库很容易掌握。
Matplotlib’s API is quite complex when dealing with axes, formats (of labels), it is not always consistent and quite redundant. As an example, see the above comment on “`plt.subplot()`”. That’s not the only example, for example, there is a sub-routine “`plt.xlabel()`” that is equivalent to the method “`ax.set_xlabel()`” on the Axes object.
当处理轴,(标签的)格式时,Matplotlib的API相当复杂,它并不总是一致的,而且非常多余。 例如,请参见上面关于“`plt.subplot()`”的注释。 这不是唯一的示例,例如,有一个子例程“`plt.xlabel()`”等效于Axes对象上的方法“`ax.set_xlabel()`”。
Plotly’s API is not better, first, you must choose between two API sets: the Express set that is quite simple but limited and mostly targeted at dataframes, and the Graphical Object set that is more complete and complementing the Express set, but does not have some nice high-level features that are in Express. Then you will have to deal with the Plotly documentation that is, to me, really difficult to get through. Searching with either the Plotly website internal or a Web search engine seldom leads you to the API you are looking for. And you may have to deal with the documentation of the underneath data model in JSON.
Plotly的API并不好,首先,您必须在两个API集之间进行选择:Express集非常简单但受限制,并且主要针对数据帧;而Graphical Object集则更完整并补充了Express集,但没有Express中的一些不错的高级功能。 然后,您将不得不处理对我而言确实很难理解的Plotly文档。 很少使用内部Plotly网站或Web搜索引擎进行搜索,都无法找到所需的API。 并且您可能必须处理JSON中底层数据模型的文档。
Bokeh API is probably leaner and better documented but has some weird things like two separate instructions to plot a line chart and associated markers.
Bokeh API可能更精简和文档更丰富,但有些怪异的东西,例如两个单独的指令来绘制折线图和相关的标记。
我真的需要一个漂亮,流畅的Web应用程序,我应该担心吗? (I really need a nice and slick web app, should I be afraid of it?)
Your dashboard is successful and will be deployed as a product internally to your organization, available to clients, or even directly exposed on the Internet.
您的仪表板成功了,并将作为产品部署到组织内部,可供客户使用,甚至直接在Internet上公开。
As a data scientist, you are missing skills to handle that and get the help of software specialists. However, you are asked what is the effort or scope of this development. This highly depends on the path to production, which framework is used there, and on the framework that you have been using until now.
作为数据科学家,您缺少处理这些信息并获得软件专家帮助的技能。 但是,系统会询问您该开发工作的范围是什么? 这在很大程度上取决于生产的路径,在此使用的框架以及到目前为止所使用的框架。
使用当前框架(的一部分)开始生产 (Getting to production with (part of) current framework)
Plotly standalone graphics can be exported as static HTML. Bokeh provides some schemes to embed it²⁵. Matplotlib with Mpld3 has an HTML output²⁷. However, this solution is targeting illustrations rather than dashboards.
可以将独立的图形导出为静态HTML。 Bokeh提供了一些嵌入它的方案。 具有Mpld3的Matplotlib具有HTML输出。 但是,此解决方案的目标是插图而不是仪表板。
Plotly Dash may in some cases go up to production as the main Web app or an embedded widget²³. As said earlier in such setup you will need to inspect the security of your system before jumping to online production. Regarding security, as a data designer, you mainly need to check that you are exposing only the wanted data and not more.
在某些情况下,Plotly Dash可能会作为主要的Web应用程序或嵌入式窗口小部件3投入生产。 如之前在这种设置中所述,您需要在跳至在线生产之前检查系统的安全性。 关于安全性,作为数据设计者,您主要需要检查以确保仅公开所需数据,而不公开其他数据。
Data-Publica Data-Publica通过单页应用程序开始生产 (Getting to production with a single page application)
Today, most of the Web applications we use are based on a pattern called single page application (SPA): the application is loaded once in the Web browser and then interacts with the server through some asynchronous API without reloading the Web page. This is what all of us now expect from a nice Web application.
今天,我们使用的大多数Web应用程序都基于一种称为单页应用程序(SPA)的模式:该应用程序在Web浏览器中加载一次,然后通过某种异步API与服务器进行交互,而无需重新加载Web页面。 这就是我们所有人现在都期望的一个好的Web应用程序。
SPA has two separate components: the browser side application in Javascript with a framework like Angular or React.js, and the server-side application or service that may get written on many frameworks and language: Java, C#, PHP, Javascript…, and even Python.
SPA具有两个独立的组件:具有Angular或React.js等框架的Javascript浏览器应用程序,以及可以在许多框架和语言上编写的服务器端应用程序或服务:Java,C#,PHP,Javascript…和甚至Python。
Dash is already doing part of it. In fact, Dash is using one of the leading browser-side framework, React.js, and on the server-side is based on Flask and Python. But as said above you may reach some limits of Dash.
达世币已经在做它的一部分。 实际上,Dash使用的是领先的浏览器端框架之一React.js,而在服务器端则基于Flask和Python。 但是如上所述,您可能会达到Dash的某些限制。
Besides the transition through Dash, Plotly and Bokeh have another advantage: they are also available in Javascript as Plotly JS (and a React.js wrapper wrapper²⁶), Bokeh JS. In fact, the Python version of Plotly is a wrapper around the Javascript. This implies that given some plots or dashboards in Python based on Plotly or Dash or Bokeh, most of the concepts and chart properties can be reused in the equivalent Javascript implementation.
除了通过Dash进行过渡之外,Plotly和Bokeh还具有另一个优势:它们在Javascript中也可以作为Plotly JS (和React.js包装器wrapper²), Bokeh JS使用 。 实际上,Python版的Plotly是Java语言的包装。 这意味着给定Python中基于Plotly或Dash或Bokeh的一些图表或仪表板,大多数概念和图表属性可以在等效的Javascript实现中重用。
结论 (Conclusion)
In this post, we have brushed the path for a data-visualization dashboard from experiments within notebooks, up to production. We have seen that the traditional plotting library, Matplotlib, still has strong features and is usually the default backend for specialized libraries like NetworkX and the Pandas Dataframe. But Matplolib is also lagging on some aspects like integration and interactivity. Learning another framework may be a good investment and will help you going forward up to production.
在本文中,我们为从笔记本内的实验到生产的数据可视化仪表板扫清了道路。 我们已经看到传统的绘图库Matplotlib仍然具有强大的功能,并且通常是专用库(例如NetworkX和Pandas Dataframe)的默认后端。 但是Matplolib在集成和交互性等方面也滞后。 学习另一个框架可能是一项不错的投资,并且将帮助您迈向生产。
Two alternative frameworks are presented: Plotly and Bokeh. Both bring value as they are more modern than Matplotlib. Both of them have a leading advantage when it comes to bringing the dashboard to production: they are based on Javascript plotting frameworks and most of the plots Python code can be translated directly in the Javascript equivalent.
提出了两个替代框架:Plotly和Bokeh。 两者都带来了价值,因为它们比Matplotlib更现代。 在将仪表板投入生产时,它们都具有领先优势:它们均基于Javascript绘图框架,并且大多数绘图Python代码都可以直接以Javascript等效语言翻译。
Plotly has another advantage on the go-to production path: it is integrated with Dash, a framework to develop a simple dashboard as single-page applications while sticking to Python. Required Javascript, including React components, and server API is generated smoothly by Dash.
Plotly在继续生产的道路上还有另一个优势:它与Dash集成在一起,Dash是一个框架,可在坚持使用Python的同时将简单的仪表板开发为单页应用程序。 Dash顺利生成了所需的Javascript(包括React组件)和服务器API。
We have also seen that, as a data scientist or data-visualization designer, you should anticipate requirements like interactivity, and their implications that may lead to the development of an API to serve data.
我们还看到,作为数据科学家或数据可视化设计师,您应该预期诸如交互性之类的需求及其影响,这些需求可能导致开发用于提供数据的API。
翻译自: https://towardsdatascience.com/which-library-should-i-use-for-my-dashboard-c432726a52bf