aws eks
Today we are going to be talking about Deploying a Dash App on Kubernetes with a Helm Chart using the AWS Managed Kubernetes Service EKS.
今天,我们将讨论使用AWS Managed Kubernetes服务EKS在带有Helm Chart的Kubernetes上部署Dash应用程序。
For this post, I’m going to assume that you have an EKS cluster up and running because I want to focus more on the strategy behind a real-time data visualization platform. If you don’t, please check out my Deploy RShiny with Kubernetes using AWS EKS and Terraform.
在本文中,我将假设您已经建立并运行了一个EKS集群,因为我想更多地关注实时数据可视化平台背后的策略。 如果不这样做,请使用AWS EKS和Terraform签出我的带有Kubernetes的Deploy RShiny 。
Dash is a data visualization platform written in Python.
Dash是用Python编写的数据可视化平台。
Dash is the most downloaded, trusted framework for building ML & data science web apps.
Dash是用于构建ML和数据科学Web应用程序的最受下载,最受信任的框架。
Dash empowers teams to build data science and ML apps that put the power of Python, R, and Julia in the hands of business users. Full stack apps that would typically require a front-end, backend, and dev ops team can now be built and deployed in hours by data scientists with Dash. https://plotly.com/dash/
Dash使团队能够构建数据科学和ML应用程序,从而将Python,R和Julia的功能交到企业用户手中。 现在,数据科学家可以使用Dash在数小时内构建和部署通常需要前端,后端和开发人员团队的全栈应用程序。 https://plotly.com/dash/
If you’d like to know what the Dash people say about Dash on Kubernetes you can read all about that here.
如果您想了解Dash员工对Kubernetes上Dash的评价,可以在这里阅读全部内容。
Pretty much though, Dash is stateless, meaning you can throw it behind a load balancer and DOMINATE. Then, because Kubernetes is taking over the world, can load balance, and can autoscale applications along with the cluster itself being elastic, Dash + Kubernetes is a pretty perfect match!
但是,Dash几乎是无状态的,这意味着您可以将其扔到负载平衡器后面并进行管理。 然后,由于Kubernetes接管了世界,可以实现负载平衡,并且可以随着集群本身的弹性自动扩展应用程序,因此Dash + Kubernetes是一个非常完美的匹配!
数据可视化基础架构 (Data Visualization Infrastructure)
历史观 (Historical View)
I’ve been studying Dash as a part of my ongoing fascination with real-time data visualization of large scale genomic data sets. I suppose other datasets too, but I like biology so here we are.
我一直在研究Dash,这是我对大型基因组数据集的实时数据可视化的不断关注的一部分。 我也猜想其他数据集,但是我喜欢生物学,所以我们就在这里。
I think we are reaching new capabilities data visualization capabilities with all the shiny new infrastructure out there. When I started off in Bioinformatics all the best data viz apps were desktop applications written in Java. If you’d like to see an example go check out Integrate Genome Browser — IGV.
我认为我们将利用所有崭新的基础架构来实现新功能,数据可视化功能。 当我开始使用生物信息学时,所有最好的数据视图应用程序都是用Java编写的桌面应用程序。 如果您想查看示例,请查看“ 整合基因组浏览器-IGV” 。
There are definite downsides to desktop applications. They just don’t scale for the ever-increasing amount and resolution of data. With this approach, there is a limit to the amount of data you can visualize in real-time.
桌面应用程序有一定的缺点。 它们只是无法适应不断增长的数据量和分辨率。 使用这种方法,可以实时可视化的数据量是有限的。
Then there’s the approach to set up pipelines that output a bunch of jpegs or PDFs and maybe wrapping them into some HTML report. This works well enough, assuming your pipeline is working as expected but doesn’t allow you to update parameters or generally peruse your data in real-time.
然后是建立流水线的方法,该流水线输出一堆jpeg或PDF,并可能将它们包装到一些HTML报告中。 假设您的管道按预期运行,但是这不允许您更新参数或通常细读数据,这足够好。
花式的新Data Viz应用程序 (Fancy New Data Viz Apps)
We’re no longer limited to a desktop app to get the level of interactivity we’re all looking for when interacting with data. Frameworks such as RShiny and Dash, and because they do such a nice job of integrating server-side computing with front end widgetizing (is that a word?) you can really think of how to scale your applications.
我们不再局限于桌面应用程序来获得与数据交互时我们一直在寻找的交互性水平。 诸如RShiny和Dash之类的框架,由于它们在将服务器端计算与前端微件化集成方面做得很好(这是一个简单的词?),因此您真的可以考虑如何扩展应用程序。
计算基础架构和扩展 (Compute Infrastructure and Scaling)
Now, with all sorts of neat cloud computing, we can have automatically scaling clusters. You can say things like “hey, Kubernetes, when any one of my pods is above 50% CPU throw some other pods on there”.
现在,借助各种巧妙的云计算,我们可以自动扩展集群 。 您可以说类似“嘿,Kubernetes,当我的任何一个Pod超过50%CPU时,在此处扔一些其他Pod”。
Python and R both have very nice data science ecosystems. You can scale your Python apps with Dask and/or Spark, and your R apps with Spark by completely separating out your web stuff with Dash/Shiny from your heavy-duty computation with Dask/Spark.
Python和R都具有非常好的数据科学生态系统。 您可以使用Dask和/或Spark扩展Python应用程序,并使用Spark扩展R应用程序,通过使用Dash / Shiny将您的Web内容与使用Dask / Spark进行的繁重计算完全分开。
If you’d like to read more about how I feel about Dask go check out my blog post Deploy and Scale your Dask Cluster with Kubernetes. (As a side note you can install the Dask chart as is alongside your Dash app and then you can use both!)
如果您想了解有关我对Dask的看法的更多信息,请查看我的博客文章“ 使用Kubernetes部署和扩展Dask集群” 。 (作为附带说明,您可以按原样在Dash应用程序旁边安装Dask图表,然后可以同时使用两者!)
Then, of course, you can throw some networked file storage up in Kubernetes cluster, because seriously no matter how fancy my life gets I’m still using bash, ssh, and networked file storage!
然后,当然,您可以在Kubernetes集群中放置一些网络文件存储,因为无论我的生活多么美好,我仍然仍在使用bash,ssh和网络文件存储!
让我们开始吧! (Let’s Build!)
We’re using this app. I did not write this app, so many thanks to the Plotly Sample apps! ;-)
我们正在使用这个程序 。 我没有写这个程序,非常感谢Plotly Sample应用程序! ;-)
I’ve already built it into a docker container and this post is going to focus on the Helm Chart and Kubernetes Deployment.
我已经将其构建到docker容器中,并且本文将重点介绍Helm Chart和Kubernetes部署。
自定义Bitnami / NGINX舵图 (Customize the Bitnami/NGINX Helm Chart)
I base most of my Helm Charts off of the NGINX Bitnami Helm chart. Today is no exception. I will be changing a few things though!
我的大部分头盔图表都基于NGINX Bitnami头盔图表 。 今天也不例外。 我会改变一些事情!
Then, for the sake of brevity, I’m going to leave out all the metrics-server and load from git niceness. I actually like these and include them in my production deployments, but its kind of overkill for just throwing a Dash app up on AWS EKS. If you’re working towards a production instance I’d recommend going through the bitnami/nginx helm chart and seeing what all they have there, particularly for metrics along with replacing the default Ingress with the NGINX Ingress.
然后,为了简洁起见,我将省略所有指标服务器,并从git niceness中加载。 我实际上很喜欢这些,并将它们包括在我的生产部署中,但这只是在AWS EKS上投放Dash应用程序而已。 如果您正在尝试生产实例,我建议您仔细阅读bitnami / nginx掌舵图,看看它们有什么用,特别是对于度量标准以及用NGINX Ingress替换默认Ingress的情况。
We’re not going to be modifying any files today. Instead, we will abuse the power of the --set
flag in the helm CLI.
我们今天不会修改任何文件。 相反,我们将在掌舵CLI中滥用--set
标志的功能。
关于SSL的快速说明 (A quick note about SSL)
Getting into SSL is a bit beyond the scope of this tutorial, but here are two resources to get you started.
入门SSL超出了本教程的范围,但是这里有两个资源可以帮助您入门。
The first is a Digital Ocean tutorial on securing your application with the NGINX Ingress. I recommend giving this article a thorough read as this will give you a very good conceptual understanding of setting up https. In order to do this, you will need to switch the default Ingress with the NGINX Ingress.
首先是有关使用NGINX Ingress保护应用程序安全的Digital Ocean教程 。 我建议您仔细阅读本文,因为这将使您对设置https非常了解。 为此,您将需要使用NGINX Ingress切换默认的Ingress。
The second is an article by Bitnami. It is a very clear tutorial on using helm charts to get up and running with HTTPS, and I think it does an excellent job of walking you through the steps as simply as possible.
第二篇是Bitnami的文章。 这是一个非常清晰的教程,说明如何使用掌舵图来启动HTTPS并运行HTTPS,我认为它在引导您尽可能简单地完成这些步骤方面做得很好。
If you don’t care about understanding the ins and outs of https with Kubernetes just go with the Bitnami tutorial. ;-)
如果您不关心使用Kubernetes来了解https的来龙去脉,请阅读Bitnami教程。 ;-)
安装舵表 (Install the Helm Chart)
Let’s install the Helm Chart. First off, we’ll start with the simplest configuration in order to test that nothing too strange is happening.
让我们安装头盔图。 首先,我们将以最简单的配置开始,以测试没有什么奇怪的事情发生。
# Install the bitnami/nginx helm chart with a RELEASE_NAME dash
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
helm upgrade --install dash bitnami/nginx \
--set image.repository="jerowe/dash-sample-app-dash-cytoscape-lda" \
--set image.tag="1.0" \
--set containerPort=8050
获得破折号服务 (Get the Dash Service)
Alrighty. So you can either pay attention to what the notes say, get the notes, or use the Kubernetes CLI (kubectl)
好的 因此,您可以注意注释的内容,获取注释或使用Kubernetes CLI(kubectl)
使用头盔 (Use Helm)
This assumes that the information you need is actually in the Helm notes.
假设您所需的信息实际上在Helm便笺中。
helm get notes dash
This should spit out:
这应该吐出来:
export SERVICE_IP=$(kubectl get svc --namespace default dash-nginx --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}")echo "NGINX URL: http://$SERVICE_IP/"
If you’re on AWS you’ll see something similar to this:
如果您使用的是AWS,则会看到类似以下内容:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
dash-nginx LoadBalancer 172.20.48.151 acf3379ed0fcb4790afc8036310259dc-994191965.us-east-1.elb.amazonaws.com 80:31019/TCP,443:32684/TCP 18m
kubernetes ClusterIP 172.20.0.1
使用kubectl获取SVC (Use kubectl to get the SVC)
You can also just run
你也可以跑
kubectl get svc | grep dash
Or you can use the JSON output to programmatically grab your SVC. This is handy if you want to update it with a CI/CD service.
或者,您可以使用JSON输出以编程方式获取SVC。 如果要使用CI / CD服务进行更新,这将很方便。
export EXPOSED_URL=$(kubectl get svc --namespace default dash-nginx-ingress-controller -o json | jq -r '.status.loadBalancer.ingress[]?.hostname')
在AWS上查看Dash! (Check out Dash on AWS!)
Grab that URL and check out our Dash App!
获取该URL,然后查看我们的Dash App!
规模 (SCALE)
Now, this is where things get fun! We can scale the application, dynamically or manually. The helm chart is already setup to work as a Load Balancer, so we can have 1 dash app or as many dash apps as we have compute power for. The Load Balancer will take care of serving all these under a single URL in a way that’s completely hidden from the end user.
现在,这是事情变得有趣的地方! 我们可以动态或手动扩展应用程序。 舵图已经设置为可以用作负载均衡器,因此我们可以拥有1个破折号应用程序,也可以拥有尽可能多的破折号应用程序,只要它们具有计算能力即可。 负载平衡器将以完全对最终用户隐藏的方式,在单个URL下提供所有这些服务。
手动缩放正在运行的Dash应用程序的数量 (Manually Scale the Number of Dash Apps Running)
First, we’re going to manually scale the number of Dash Apps running by increasing the number of replicas. This is pretty standard in web land.
首先,我们将通过增加副本数量来手动缩放正在运行的Dash应用程序的数量。 这是网络领域的标准。
helm upgrade --install dash bitnami/nginx \
--set image.repository="jerowe/dash-sample-app-dash-cytoscape-lda" \
--set image.tag="1.0" \
--set containerPort=8050 \
--set replicaCount=3
Now when you run kubectl get pods
you should see 3 dash-nginx-*
pods.
现在,当您运行kubectl get pods
您应该看到3个dash-nginx-*
pods。
Then, when you run kubectl get svc
you'll see that there is still one LoadBalancer service for the Dash App.
然后,当您运行kubectl get svc
您会看到Dash App仍然有一个LoadBalancer服务。
That’s the most straight forward way to statically scale your app. Let’s bring it back down, because next we’re going to dynamically scale it!
这是静态扩展应用程序的最直接的方法。 让我们降低它,因为下一步我们将动态缩放它!
helm upgrade --install dash bitnami/nginx \
--set image.repository="jerowe/dash-sample-app-dash-cytoscape-lda" \
--set image.tag="1.0" \
--set containerPort=8050 \
--set replicaCount=1
使用水平Pod自动缩放器动态缩放Dask应用 (Dynamically Scale your Dask App with a Horizontal Pod AutoScaler)
The Kubernetes Horizontal Pod Autoscaler allows you dynamically scale your application based on the CPU or Memory load on the pod. Pretty much you set a rule, say that once a rule reaches some percent of total CPU to increase the number of pods.
Kubernetes Horizontal Pod Autoscaler可让您根据Pod上的CPU或内存负载动态扩展应用程序。 您几乎设置了一条规则,说一旦一条规则达到总CPU的某个百分比,就可以增加Pod的数量。
This is kind of tricky for people, so I’m going to show the code from the Nginx helm chart.
对于人们来说,这有点棘手,所以我将展示Nginx掌舵图中的代码。
Here it is in the values.yaml
.
这是在values.yaml
。
# https://github.com/bitnami/charts/blob/master/bitnami/nginx/values.yaml#L497
## Autoscaling parameters
##
autoscaling:
enabled: false
# minReplicas: 1
# maxReplicas: 10
# targetCPU: 50
# targetMemory: 50
And here it is in the templates/hpa.yaml
.
这是在templates/hpa.yaml
。
# https://github.com/bitnami/charts/blob/master/bitnami/nginx/templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "nginx.fullname" . }}
labels: {{- include "nginx.labels" . | nindent 4 }}
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "nginx.fullname" . }}
minReplicas: {{ .Values.autoscaling.minReplicas }}
maxReplicas: {{ .Values.autoscaling.maxReplicas }}
metrics:
{{- if .Values.autoscaling.targetCPU }}
- type: Resource
resource:
name: cpu
targetAverageUtilization: {{ .Values.autoscaling.targetCPU }}
{{- end }}
{{- if .Values.autoscaling.targetMemory }}
- type: Resource
resource:
name: memory
targetAverageUtilization: {{ .Values.autoscaling.targetMemory }}
{{- end }}
{{- end }}
You set your minimum number of replicas, your maximum number of replicas, and the amount of Memory/CPU you’re targeting.
您可以设置最小的副本数,最大的副本数以及目标内存/ CPU的数量。
Now, this is a blog post and not particularly indicative of the real world. I’m going to show you some values so this is interesting, but really when you’re doing your own autoscaling you’ll have to play around with this.
现在,这是一篇博客文章,并不特别表示现实世界。 我将向您展示一些值,因此这很有趣,但实际上,当您执行自己的自动缩放比例时,您将不得不使用它。
First, we’ll have to install the metrics server.
首先,我们必须安装指标服务器。
helm repo add stable https://kubernetes-charts.storage.googleapis.com
helm repo update
helm install metrics stable/metrics-server
If you get an error here about metrics already taken you probably have the metrics chart installed already. It’s installed by default on a lot of platforms or may have been required by another chart.
如果在此处收到有关已采用指标的错误,则可能已经安装了指标表。 默认情况下,它已安装在许多平台上,或者可能是其他图表所必需的。
In order to get this to work you need to set limits on the resources of the deployment itself so Kubernetes has a baseline for scaling.
为了使它起作用,您需要对部署本身的资源设置限制,以便Kubernetes具有扩展基准。
I’m using a t2.medium
instance for my worker groups, which has 2 CPUs, 24 CPU credits/hour, and 4GB of memory. You should be fine if you're on a larger instance, but if you're on a smaller you may need to play with the resource values.
我正在为我的工作组使用t2.medium
实例,该实例具有2个CPU,每小时24个CPU积分和4GB的内存。 如果使用较大的实例,则应该没问题,但是如果使用较小的实例,则可能需要使用资源值。
helm upgrade --install dash bitnami/nginx \
--set image.repository="jerowe/dash-sample-app-dash-cytoscape-lda" \
--set image.tag="1.0" \
--set autoscaling.enabled=true \
--set autoscaling.minReplicas=2 \
--set autoscaling.maxReplicas=3 \
--set autoscaling.targetCPU=1 \
--set resources.limits.cpu="200m" \
--set resources.limits.memory="200m" \
--set resources.requests.cpu="200m" \
--set resources.requests.memory="200m" \
--set containerPort=8050
We should see, at a minimum, 2 pods (or 1 pod up with one scaling up).
我们应该至少看到2个Pod(或1个Pod向上放大一个)。
kubectl get pods
Then we can describe our HPA.
然后我们可以描述我们的HPA。
kubectl get hpa kubectl describe hpa dash-nginx
I set the resource requirements very low, so grab your SVC and just keep refreshing the page. You should see the pods start scaling up.
我将资源需求设置得很低,因此请抓紧您的SVC并继续刷新页面。 您应该看到豆荚开始扩大规模。
kubectl get svc |grep dash
用htop找出您的自动缩放值 (Figure out your Autoscaling Values with htop)
Now, what you should do is to install Prometheus/Grafana and use the metrics server to keep track of what’s happening on your cluster.
现在,您应该做的是安装Prometheus / Grafana并使用指标服务器来跟踪集群上发生的事情。
But, sometimes you just don’t want to be bothered you can always just exec into a Kubernetes pod and run htop.
但是,有时您只是不想被打扰,您总是可以执行到Kubernetes pod中并运行htop。
kubectl get pods |grep dash # grab the POD_NAME - something like this dash-nginx-75c5c8649-rbdj7 kubectl exec -it POD_NAME bash
Then, depending on when you read this you may need to install htop
.
然后,根据您阅读本文的时间,可能需要安装htop
。
apt-get install -y htop
htop
结语 (Wrap Up)
That’s it! I hope you see how you can gradually build up your data science and visualization infrastructure, piece by piece, in order to do real-time data visualization of large data sets!
而已! 我希望您能看到如何逐步逐步建立数据科学和可视化基础架构,以便对大型数据集进行实时数据可视化!
If you have any questions, comments, or tutorials requests please reach out to me at [email protected], or leave a comment below. ;-)
如果您有任何问题,评论或教程要求,请通过[email protected]与我联系,或在下面发表评论。 ;-)
有用的命令 (Helpful Commands)
Here are some helpful commands for navigating your Dash deployment on Kubernetes.
以下是一些有用的命令,可用于在Kubernetes上导航Dash部署。
头盔有用命令 (Helm Helpful Commands)
# List the helm releases
helm list
# Add the Bitnami Repo
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
# Install a helm release from a helm chart (the nice way to integrate with CI/CD)
helm upgrade --install RELEASE_NAME bitnami/nginx
# Install a helm chart from a local filesystem
helm upgrade --install RELEASE_NAME ./path/to/folder/with/values.yaml
# Get the notes
helm get notes RELEASE_NAME
Kubectl有用的命令 (Kubectl Helpful Commands)
# Get a list of all running pods
kubectl get pods
# Describe the pod. This is very useful for troubleshooting!
kubectl describe pod PODNAME
# Drop into a shell in your pod. It's like docker run.
kubectl exec -it PODNAME bash
# Get the logs from stdout on your pod
kubectl logs PODNAME
# Get all the services and urls
kubectl get svc
Originally published at https://www.dabbleofdevops.com.
最初发布在 https://www.dabbleofdevops.com 。
翻译自: https://levelup.gitconnected.com/deploy-dash-with-helm-on-kubernetes-with-aws-eks-cabe035c0565
aws eks