自动可靠,安全地部署数据仪表板

Today’s data projects require interactive visualizations for them to stand out and impress clients or decision-makers in your organization. There is a growing list of open source dashboarding frameworks that allow data scientists to build user interfaces without having to learn Javascript or HTML.

当今的数据项目需要交互式的可视化效果,才能脱颖而出并打动组织中的客户或决策者。 越来越多的开源仪表板框架允许数据科学家无需学习Javascript或HTML即可构建用户界面。

For example, Volià, described in its simplest form, displays a Jupyter notebook as a user-friendly web app — code cells are hidden, and the notebook is run automatically from top-to-bottom without the user having to shift+enter their way through it. Python-based widget libraries provide simple user controls such as sliders and dropdowns.

例如,以最简单的形式描述的Volià将Jupyter笔记本显示为用户友好的Web应用程序-代码单元被隐藏,并且笔记本从上到下自动运行,而用户无需移动+输入方式通过这。 基于Python的窗口小部件库提供了简单的用户控件,例如滑块和下拉菜单。

Data Scientists and Analysts can start using these frameworks by following a short tutorial. And they clearly have the technical ability to find a way to host the resulting web apps, perhaps on their own laptop temporarily, or by deploying through an AWS account.

遵循简短的教程,数据科学家和分析师可以开始使用这些框架。 而且,他们显然具有技术能力,可以找到一种方法来托管最终的Web应用程序,可能是暂时在自己的笔记本电脑上,或者通过AWS帐户进行部署。

But this simple deployment step turns out to be the most common blocker for the adoption of dashboarding frameworks in an organization.

但是事实证明,此简单的部署步骤是组织中采用仪表板框架的最常见的障碍。

Although the process can be straightforward, it is unrewarding in itself — some tedious commands to run just when the data scientist proudly finished their brand-new analysis. Even worse, it is error-prone and can leave sensitive data exposed.

尽管该过程可能很简单,但它本身却是毫无用处的-当数据科学家自豪地完成他们的全新分析时,要运行一些乏味的命令。 更糟糕的是,它容易出错,并且可能使敏感数据暴露在外。

Just as automated testing and continuous integration make releasing new features fun rather than a chore in traditional software development, any barrier to sharing the data dashboard stifles innovation and discourages new iterations.

就像自动化测试和持续集成使发布新功能变得有趣而不是传统软件开发中的琐事一样,共享数据仪表板的任何障碍都扼杀了创新并阻碍了新迭代。

Leaving the deployment choices to your data scientists is unwise, as technically astute as they undoubtedly are. They aren’t going to have time to secure the servers, and will likely choose the easiest authentication system. For secure regular deployment, you really need a unified approach to hosting across your data team that can be approved wholesale by your information security department.

将部署选择留给数据科学家是不明智的,因为技术上无疑是明智的。 他们将没有时间保护服务器安全,因此可能会选择最简单的身份验证系统。 为了进行安全的常规部署,您确实需要一个统一的方法来托管整个数据团队,并且可以由信息安全部门批量批准。

These dashboards are technically web-apps. They don’t need IT to spend three months auditing them for security, but there needs to be an approved method for deploying them that wouldn’t horrify the IT department as much as the ad-hoc methods that your data scientists will use if left to their own devices.

这些仪表板从技术上讲是网络应用程序。 他们不需要IT部门花三个月的时间来审核它们的安全性,但是需要一种批准的方法来部署它们,不会让IT部门感到恐惧,就像您的数据科学家将使用的临时方法那样到自己的设备。

You need to know where all your dashboards are running and how they are authenticated. Otherwise, there may be outdated and insecure servers running out there in the wild exposing your networks and sensitive data. When employees leave your organization, you need to be able to terminate their access to these dashboards. If the employee happens to be the one running a handful of dashboards on their personal AWS account, you don’t want to rely on them to remember what was running where so you can turn them off or transfer ownership!

您需要知道所有仪表板都在哪里运行以及如何进行身份验证。 否则,可能会有陈旧和不安全的服务器在外面狂奔,从而暴露了您的网络和敏感数据。 当员工离开您的组织时,您需要能够终止他们对这些仪表板的访问。 如果该员工恰好是在其个人AWS账户上运行少量仪表板的人员,则您不想让他们记住在哪里运行的仪表板,因此可以将其关闭或转让所有权!

ContainDS产品套件 (The ContainDS Product Suite)

Free open source ContainDS software products can provide a unified deployment platform for your data scientists, allowing them to share dashboards based on open source frameworks in an automated, secure, and reproducible way.

免费的开源ContainsDS软件产品可以为您的数据科学家提供统一的部署平台,使他们可以基于开源框架以自动化,安全和可复制的方式共享仪表板。

Any open source dashboarding framework can be utilized. Supported as standard are Voilà, Streamlit, Plotly Dash, Bokeh, Panel, and R Shiny. Those should be a good starting point for any projects your data science team is likely to face!

可以使用任何开源仪表板框架。 标配支持Voilà,Streamlit,Plotly Dash,Bokeh,Panel和R Shiny。 对于您的数据科学团队可能面临的任何项目,这些应该是一个很好的起点!

ContainDS is a collection of two main products and related open source technologies.

ContainDS是两个主要产品和相关开源技术的集合。

ContainDS Dashboards documentation (image by author) ContainDS Dashboards文档(作者提供的图像)

ContainDS Dashboards is a platform for hosting and sharing dashboards over the internet or an internal network, with named authenticated users — perhaps specific colleagues or clients.

ContainDS Dashboards是一个平台,用于通过Internet或内部网络托管和共享仪表板,其中包含经过身份验证的用户(可能是特定的同事或客户)。

Sometimes even this is too open, so where dashboards need to be shared offline due to lack of internet or for contractual reasons why the data can’t be accessed over a network, ContainDS Desktop is an app for your Windows or Mac computer allowing you to run dashboards on your local machine and share them with others as single flat files.

有时甚至是太开放了,因此由于缺乏互联网或出于合同原因而无法通过网络访问数据的情况下,仪表盘需要脱机共享, ContainDS Desktop是适用于Windows或Mac计算机的应用程序,可让您在本地计算机上运行仪表板,并将它们作为单个平面文件与他人共享。

Here we’ll focus on the online dashboards software.

在这里,我们将重点介绍在线仪表板软件。

通过JupyterHub的在线仪表板 (Online Dashboards through JupyterHub)

ContainDS Dashboards is an extension for the popular JupyterHub software. This makes it especially easy to install if you already have a JupyterHub in use, but setting one up for the first time is not too complicated and it can be useful to have anyway.

ContainDS Dashboards是流行的JupyterHub软件的扩展。 如果您已经在使用JupyterHub,这将使其特别容易安装,但是首次设置它并不太复杂,无论如何都会很有用。

JupyterHub is a way to centrally manage Jupyter notebook environments for your whole team. The standard installation allows each user to spin up their own Jupyter notebook, and the ContainDS Dashboards extension allows them to directly start user-friendly dashboards instead, sharing them with other authenticated users.

JupyterHub是一种为整个团队集中管理Jupyter笔记本环境的方法。 标准安装允许每个用户旋转自己的Jupyter笔记本,ContainDS Dashboards扩展允许他们直接启动用户友好的仪表板,与其他经过身份验证的用户共享。

Different ‘distributions’ of JupyterHub provide different approaches to maintenance and scalability. There are lots of bespoke options, but the two main paths are Zero 2 JupyterHub which runs on Kubernetes, allowing seamless scaling of resources over multiple machines for large numbers of users or projects; and The Littlest JupyterHub to set up a single VM to run JupyterHub (given the range of VMs available on cloud providers these days, this can still support surprisingly heavy usage!).

JupyterHub的不同“发行版”提供了不同的维护和可伸缩性方法。 有很多定制选项,但是两个主要路径是运行在Kubernetes上的Zero 2 JupyterHub ,它允许为大量用户或项目在多台机器上无缝扩展资源; 和Littlest JupyterHub可以设置一个虚拟机来运行JupyterHub(考虑到如今云提供商上可用的VM数量众多,这仍然可以支持惊人的大量使用!)。

加载应用程序的文件 (Loading your app’s files)

There are two main ways to choose app files for deploying a dashboard: your ‘file source’ can either be from your existing Jupyter server tree or from a Git repo (public or private).

有两种主要方法来选择用于部署仪表板的应用程序文件:“文件源”可以来自现有的Jupyter服务器树,也可以来自Git存储库(公共或私有)。

Creating a Voilà dashboard from a GitHub repo (image by author) 从GitHub存储库创建Voilà仪表板(作者提供的图片)

If you are already heavy users of Jupyter notebooks, and perhaps just want to deploy notebooks as Voilà or Panel apps, then it might make sense to use the first option — Jupyter Tree. You can edit your notebooks as normal, then once you’re happy head to the Dashboards menu in JupyterHub to enter the path to your notebook and see it deployed automatically as a new dashboard.

如果您已经是Jupyter笔记本电脑的重度用户,并且可能只是想将笔记本电脑部署为Voilà或Panel应用程序,那么使用第一个选项Jupyter Tree可能很有意义。 您可以像平常一样编辑笔记本,然后在高兴的时候进入JupyterHub中的Dashboards菜单,输入笔记本的路径,并看到它作为新的仪表板自动部署。

Even better, there is now a companion Jupyter extension so you can create a dashboard directly from JupyterLab or Notebook with one click.

更好的是,现在有了一个配套的Jupyter扩展程序,因此您可以一键直接从JupyterLab或Notebook创建仪表板。

Alternatively, if you are used to editing your Streamlit, R Shiny, Plotly Dash apps etc on your local machine, it might be more convenient to check your code into a Git repo and then instruct ContainDS Dashboards to pull it straight from your repo and deploy it. You can use public or private Git repos, and GitHub integration means you can one-click login to JupyterHub through your GitHub account and automatically grant access to your repos in the process.

另外,如果您习惯在本地计算机上编辑Streamlit,R Shiny,Plotly Dash应用程序等,则将代码检查到Git存储库中,然后指示ContainDS Dashboards从存储库中直接提取代码并进行部署,可能会更方便。它。 您可以使用公共或私有Git仓库,而GitHub集成意味着您可以通过GitHub帐户一键式登录JupyterHub,并在此过程中自动授予对仓库的访问权限。

Using either file source method, you can also select from multiple Conda environments if you’ve made them available to your JupyterHub users.

使用任何一种文件源方法,如果已将它们提供给JupyterHub用户,则还可以从多个Conda环境中进行选择。

It’s also important to select the correct ‘framework’ from the dropdown to ensure the right mechanism is used to serve the dashboard. As already listed, Voilà, Streamlit, Plotly Dash, Bokeh, Panel, and R Shiny are currently supported out-of-the-box, but it is easy to add any custom framework that works as a web app.

从下拉列表中选择正确的“框架”也很重要,以确保使用正确的机制来服务仪表板。 如已经列出的,Voilà,Streamlit,Plotly Dash,Bokeh,Panel和R Shiny当前是开箱即用的支持,但是添加用作Web应用程序的任何自定义框架很容易。

Once deployed, dashboards are really just like separate Jupyter servers, but instead of running Jupyter notebook they run directly the server software of your chosen framework. If you’ve ever tried the Voilà Preview button in a Jupyter notebook, you will be familiar with the end result — but in the case of ContainDS Dashboards the deployment has no Jupyter front-end at all. Your apps will be deployed as pure web apps. This is exactly what you need for apps that you are going to share with others… the end users should not be able to run arbitrary code on your server.

部署后,仪表板实际上就像单独的Jupyter服务器一样,但是它们没有运行Jupyter Notebook,而是直接运行所选框架的服务器软件。 如果您曾经尝试过在Jupyter笔记本中使用Voilà预览按钮,那么您将熟悉最终结果-但是对于ContainDS Dashboards,部署根本没有Jupyter前端。 您的应用程序将被部署为纯Web应用程序。 这正是您要与他人共享的应用程序所需要的……最终用户应该不能在您的服务器上运行任意代码。

与其他用户共享 (Sharing with Other Users)

The new Dashboards menu that is added to JupyterHub is not only used to register a new dashboard for deployment, but also serves as a list of contents for any dashboards that have been shared with you.

添加到JupyterHub的新“仪表板”菜单不仅用于注册要部署的新仪表板,而且还用作与您共享的所有仪表板的内容列表。

Dashboards main menu screen (image by author) 仪表板主菜单屏幕(作者提供的图像)

When you create the dashboard, you can choose whether to make it available to all users in your JupyterHub, or just to selected named users. JupyterHub allows a wide range of authentication methods — so, for example, using LDAP or Google Single-sign-on, all your colleagues can easily access your dashboards through an account that will be automatically created for them.

创建仪表板时,您可以选择将其提供给JupyterHub中的所有用户,还是仅提供给选定的命名用户使用。 JupyterHub支持多种身份验证方法,因此,例如,使用LDAP或Google单点登录,您的所有同事都可以通过将自动为其创建的帐户轻松访问仪表板。

自动可靠,安全地部署数据仪表板_第1张图片
Selecting named users for dashboard access (image by author) 选择命名用户以访问仪表板(作者提供的图像)

Authorized users can click into any dashboard that has been shared with them, click to confirm the OAuth consent screen, then immediately start interacting with the dashboard.

授权用户可以单击进入与他们共享的任何仪表板,单击以确认OAuth同意屏幕,然后立即开始与仪表板进行交互。

自动可靠,安全地部署数据仪表板_第2张图片
Example dashboard visualizations running inside ContainDS Dashboards (image by author) 在ContainDS仪表板内部运行的示例仪表板可视化示例(作者提供的图像)

可扩展和可配置 (Extendable and Configurable)

Everything about JupyterHub is highly configurable: from where you host it (Kubernetes, on a cloud VM, or on your internal network) to how users authenticate at login.

关于JupyterHub的所有内容都是高度可配置的:从托管它的位置(Kubernetes,云VM或内部网络上)到用户登录时进行身份验证的方式。

The same applies to ContainDS Dashboards — you have full control over the way it behaves, and you can even plug in your own dashboarding visualization frameworks (e.g. Flask-based web apps) just by editing the configuration files.

同样适用于ContainDS Dashboards-您可以完全控制它的行为方式,甚至可以通过编辑配置文件来插入自己的仪表板可视化框架(例如,基于Flask的Web应用程序)。

结论 (Conclusion)

That was a quick overview of ContainDS Dashboards, explaining how easy it is for a data scientist to deploy a new interactive visualization to share with clients or colleagues.

那是对ContainDS Dashboards的快速概述,解释了数据科学家部署新的交互式可视化文件以与客户或同事共享是多么容易。

Your data scientists are already experimenting with the new visualization frameworks that have appeared on the open source landscape over the last few years. Their apps work great on their development machine, but it’s always a pain when they need to deploy it.

您的数据科学家已经在尝试在过去几年中出现在开源领域中的新的可视化框架。 他们的应用程序在开发计算机上可以很好地运行,但是在需要部署它时总是很痛苦。

Often, they revert to exporting a PDF or just copy-and-pasting graphs into emails instead. This is a real missed opportunity to allow decision-makers to truly immerse themselves in the data models.

通常,他们恢复为导出PDF或仅将图形复制粘贴到电子邮件中。 这是让决策者真正沉浸于数据模型中的真正机会。

If the dashboard does end up being deployed, it’s often not in an IT-approved manner, with simple authentication steps and hosting on arbitrary cloud servers.

如果仪表板最终确实可以部署,则通常不会以IT批准的方式进行,只需简单的身份验证步骤并在任意云服务器上托管即可。

For medium-to-large data science teams, different projects have different needs — and data scientists want to choose the open source frameworks that make sense to their own skills and the project’s requirements.

对于中型到大型数据科学团队,不同的项目有不同的需求-数据科学家希望选择对自己的技能和项目要求有意义的开源框架。

To overcome these problems, ContainDS Dashboards provides a unified deployment and sharing model that can be administered by IT and used effortlessly by data science teams whatever technologies they are using to drive their analyses.

为了克服这些问题,ContainDS Dashboards提供了一个统一的部署和共享模型,该模型可以由IT管理,并且无论数据科学团队使用何种技术来进行分析,都可以轻松使用。

For installation details see ContainDS Dashboards documentation.

有关安装的详细信息,请参见ContainDS Dashboards文档 。

Dan Lester is a co-founder of Ideonate, making tools for data scientists, including ContainDS, a data science deployment platform for teams working on discrete projects.

丹·莱斯特(Dan Lester)是 Ideonate 的联合创始人, 为数据科学家提供工具,其中包括 ContainDS ,这是一个为离散项目团队工作的数据科学部署平台。

翻译自: https://towardsdatascience.com/deploying-data-dashboards-automatically-reliably-and-securely-372ef802ca3c

你可能感兴趣的:(python,java,大数据,mysql)