aws架构
What I learned building the StateOfVeganism ?
我学到的建立素食主义的方法是什么?
By now, we all know that news and media shape our views on the topics we discuss. Of course, this is different from person to person. Some might be influenced a little more than others, but there always is some opinion communicated.
到目前为止,我们都知道新闻和媒体塑造了我们对所讨论主题的看法 。 当然,这因人而异。 有些人可能受到的影响要比其他人多一些,但总是传达了一些意见。
Considering this, I thought it would be really interesting to see the continuous development of mood directed towards a specific topic or person in the media.
考虑到这一点,我认为看到针对特定话题或媒体人物的情绪的持续发展真的很有趣。
For me, Veganism is an interesting topic, especially since it is frequently mentioned in the media. Since the media’s opinion changes the opinion of people, it would be interesting to see what “sentiment” they communicate.
对我而言, 素食主义是一个有趣的话题,尤其是因为它在媒体中经常被提及。 由于媒体的观点改变了人们的观点,所以很有趣的是看到他们传达的是什么“情感”。
This is what this whole project is about. It collects news that talks about or mentions Veganism, finds out the context in which it was mentioned, and analyses whether it propagates negativity or positivity.
这就是整个项目的目的。 它收集谈论或提及素食主义的新闻,找出提及它的背景,并分析其传播的是否定性还是积极性。
Of course, a huge percentage of the analysed articles should be classified as “Neutral” if the writers do a good job in only communicating information, so we should keep that in mind, too.
当然,如果作者仅在交流信息方面做得很好,则被分析文章中有很大一部分应归类为“中性”,因此我们也应牢记这一点。
I realized that this was an incredible opportunity to pick up new toolset, especially when I thought about the sheer number of articles published daily. So, I thought about building a scalable architecture — one that is cheap/free in the beginning when there is no traffic and only a few articles, but scales easily and infinitely once the amount of mentions or traffic increases. I heard the cloud calling.
我意识到这是获取新工具集的绝佳机会,尤其是当我想到每天发表的文章数量之多时 。 因此,我考虑过要构建一种可伸缩的体系结构,这种体系结构一开始是便宜/免费的,没有流量,只有几篇文章,但是一旦提及或流量增加,就可以轻松,无限地扩展。 我听到了云的呼唤。
Planning is everything, especially when we want to make sure that the architecture scales right from the beginning.
规划就是一切,尤其是当我们要确保架构从一开始就可以扩展时。
Starting on paper is a good thing, because it enables you to be extremely rough and quick in iterating.
从纸上开始是一件好事,因为它使您可以非常粗暴且快速地进行迭代。
Your first draft will never be your final one, and if it is, you’ve probably forgotten to question your decisions.
您的初稿永远不会是您的初稿,如果是的话,您可能已经忘记质疑您的决定了。
For me, the process of coming up with a suitable and, even more important, reasonable architecture was the key thing I wanted to improve with this project. The different components seemed pretty “easy” to implement and build, but coming up with the right system, the right communication, and a nice, clean data pipeline was the really interesting part.
对我来说,提出一个合适的,甚至更重要的,合理的体系结构的过程是我希望通过该项目进行改进的关键。 不同的组件似乎很容易实现和构建,但是提出了正确的系统,正确的通信以及良好而干净的数据管道是真正有趣的部分。
In the beginning, I had some bottlenecks in my design which, at one point, would’ve brought my whole system to its knees. In that situation, I thought about just adding more “scalable” services like queues to queue the load and take care of it.
一开始,我在设计中遇到了一些瓶颈,这些瓶颈曾经使我的整个系统崩溃。 在那种情况下,我考虑过只添加更多的“可伸缩”服务(例如队列)以使负载排队并进行处理。
When I finally had a design which, I guessed, could handle a ton of load and was dynamically scalable, it was a mess: too many services, a lot of overhead, and an overall “dirty” structure.
我猜想,当我最终设计出一种可以处理大量负载并可以动态扩展的设计时,那是一团糟:太多的服务,大量的开销以及整个“脏”的结构。
When I looked at the architecture a few days later, I realised that there was so much I could optimise with a few changes. I started to remove all the queues and thought about replacing actual virtual machines with FAAS components. After that session, I had a much cleaner and still scalable design.
几天后,当我查看该体系结构时,我意识到可以进行一些更改就可以优化很多东西。 我开始删除所有队列,并考虑过用FAAS组件替换实际的虚拟机。 在那次会议之后,我有了一个更加整洁且可扩展的设计。
That was one of the mistakes I made quite early in the project. I started out by looking at what services IBM’s BlueMix could offer and went on from there. Which ones could I mix together and use in my design that seemed to work together with triggers and queues and whatever?
那是我在项目早期就犯的错误之一。 首先,我着眼于IBM的BlueMix可以提供什么服务,然后再继续。 我可以将哪些混合在一起并在我的设计中使用,这些设计似乎可以与触发器和队列一起使用?
In the end, I could remove a lot of the overhead in terms of services by simply stepping away from it and thinking of the overall structure and technologies I needed, rather than the different implementations.
最后,我可以通过简单地放弃服务并考虑我需要的总体结构和技术,而不是不同的实现 ,来消除很多服务方面的开销。
Broken down into a few distinct steps, the project should:
分解为几个不同的步骤 ,该项目应:
Every hour (in the beginning, since there would only be a few articles at the moment -> could be made every minute or even every second) get the news from some NewsAPI and store it.
每小时(从一开始,因为现在只有几篇文章->可以每分钟甚至每秒制作一次)就可以从News API获取新闻并进行存储。
So, what I finally ended up with was a CloudWatch Trigger which triggers a Lambda Function every hour. This Function gets the news data for the last hour from the NewsAPI. It then saves each article as a separate JSON file into an S3 bucket.
因此,我最终得到的是一个CloudWatch触发器,该触发器每小时触发一次Lambda函数。 此函数从NewsAPI获取最后一小时的新闻数据。 然后,将每篇文章作为单独的JSON文件保存到S3存储桶中。
This bucket, upon ObjectPut, triggers another Lambda Function. This loads the JSON from S3, creates a “context” for the appearance of the part-word “vegan,” and sends the created context to the AWS Comprehend sentiment analysis. Once the function gets the sentiment information for the current article, it writes it to a DynamoDB table.
在ObjectPut上,此存储桶触发另一个Lambda函数。 这将从S3加载JSON,为部分单词“纯素”的外观创建“上下文”,并将创建的上下文发送到AWS Comprehend情感分析。 一旦该函数获取了当前文章的情感信息,它就会将其写入DynamoDB表。
This Table is the root for the data displayed in the frontend. It gives the user a few filters with which they can explore the data a little bit more.
该表是前端中显示的数据的根。 它为用户提供了一些过滤器,他们可以使用这些过滤器来探索更多数据。
If you’re interested in a deeper explanation, jump down to the description of the separate components.
如果您对更深入的解释感兴趣,请跳至各个组件的描述。
Before I knew that I was going with AWS, I tried out two other cloud providers. It’s a very basic and extremely subjective view on which provider to choose, but maybe this will help some other “Cloud-Beginners” choose.
在我知道要使用AWS之前,我尝试了另外两个云提供商。 这是一个关于选择哪个提供程序的非常基础且极为主观的观点,但这也许会帮助其他“云计算初学者”进行选择。
I started out with IBMs Bluemix Cloud, moved to Google Cloud, and finally ended up using AWS. Here are some of the “reasons” for my choice.
我最初是从IBM的Bluemix Cloud开始的,后来转移到Google Cloud,最后使用了AWS。 这是我选择的一些“理由”。
A lot of the points listed here really only tell how good the overall documentation and community is, how many of the issues I encountered that already existed, and which ones had answers on StackOverflow.
这里列出的许多要点实际上只能说明整体文档和社区的水平,我遇到的已经存在的许多问题以及哪些在StackOverflow上有答案。
Especially for beginners and people who’ve never worked with cloud technologies, this is definitely the case. The documentation and, even more importantly, the documented and explained examples were simply the best for AWS.
尤其是对于初学者和从未使用过云技术的人们来说,确实是这样。 文档,甚至更重要的是,已记录和解释的示例对于AWS来说简直是最好的。
Of course, you don’t have to settle for a single provider. In my case, I could’ve easily used Google’s NLU tools because, in my opinion, they brought the better results. I just wanted to keep my whole system on one platform, and I can still change this later on if I want to.
当然,您不必满足于单个提供商。 就我而言,我本可以轻松使用Google的NLU工具,因为我认为它们带来了更好的结果。 我只是想将整个系统保留在一个平台上,如果需要,以后仍然可以更改此系统。
The starter packs of all providers are actually really nice. You’ll get $300 on Google Cloud which will enable you to do a lot of stuff. However, it’s also kind of dangerous, since you’ll be charged if you should use up the amount and forget to turn off and destroy all the services building up the costs.
所有提供商的入门包实际上都非常不错。 您将在Google Cloud上获得300美元,这将使您能够做很多事情。 但是,这也很危险,因为如果您用光了金额却忘记关闭并销毁所有增加费用的服务,则会向您收取费用。
BlueMix only has very limited access to services on their free tier, which is a little bit unfortunate if you want to test out the full suite.
BlueMix在其免费层上只能访问非常有限的服务,如果要测试完整套件,这有点不幸。
Amazon, for me, was the nicest one, since they also have a free tier which will allow you to use nearly every feature (some only with the smallest instance like EC2.micro).
对我来说,亚马逊是最好的,因为它们还有免费层,您可以使用几乎所有功能(有些仅适用于最小的实例,例如EC2.micro)。
Like I already mentioned, this is a very flat and subjective opinion on which one to go for… For me AWS was the easiest and fastest to pick up without investing too much time upfront.
就像我已经提到的那样,这是一个非常平坦和主观的观点,值得一提……对我来说,AWS是最简单,最快的方法,无需花太多时间在前。
The whole project can basically be split into three main components that need work.
整个项目基本上可以分为三个需要工作的主要部分。
The Article Collection, which consists of the hourly cron job, the lambda function which calls the NewsAPI, and the S3 bucket that stores all the articles.
文章集合,由每小时的cron作业,调用NewsAPI的lambda函数以及存储所有文章的S3存储桶组成。
The Data Enrichment part which loads the article from S3, creates the context and analyses it using Comprehend, and the DynamoDB that stores the enriched data for later use in the frontend.
数据丰富部分从S3加载文章,创建上下文并使用Comprehend对其进行分析,以及DynamoDB,该数据库存储丰富的数据以供以后在前端使用。
And the Frontend which gets displayed when the users request the webpage. This component consists of a graphical user interface, a scalable server service which serves the webpage, and, again, the DynamoDB.
以及当用户请求网页时显示的前端 。 该组件由图形用户界面,为网页提供服务的可伸缩服务器服务以及DynamoDB组成。
The first and probably easiest part of the whole project was collecting all the articles and news that contain the keyword “vegan”. Luckily, there are a ton of APIs that provide such a service.
整个项目的第一部分(可能是最简单的部分)是收集所有包含关键字“素食主义者”的文章和新闻。 幸运的是,有许多提供此类服务的API。
One of them is NewsAPI.org.
其中之一是NewsAPI.org 。
With their API, it’s extremely easy and understandable. They have different endpoints. One of them is called “everything” which, as the name suggests, just returns all the articles that contain a given keyword.
有了他们的API,它非常简单易懂。 它们具有不同的端点。 其中之一称为“一切”,顾名思义,它仅返回包含给定关键字的所有文章。
Using Node.js here, it looks something like this:
在这里使用Node.js,看起来像这样:
The + sign in front of the query String “vegan” simply means that the word must appear.
查询字符串“ vegan”前面的+号仅表示必须出现该单词。
The pageSize defines how many articles per request will be returned. You definitely want to keep an eye on that. If, for example, your system has extremely limited memory, it makes sense to do more requests (use the provided cursor) in order to not crash the instance with responses that are too big.
pageSize定义每个请求将返回多少文章。 您绝对想留意这一点。 例如,如果您的系统内存非常有限,则有必要执行更多请求(使用提供的游标),以免因响应太大而使实例崩溃。
The response from NewsAPI.org looks like this. If you’re interested in seeing more examples, head over to their website where they have a lot of examples displayed.
来自NewsAPI.org的响应如下所示。 如果您有兴趣查看更多示例,请访问他们的网站 ,其中显示了很多示例。
As you can see, those article records only give a very basic view of the article itself. Terms like vegan, which appear in some context inside the article without being the main topic of it, are not represented in the title or description.Therefore, we need the Data Enrichment component, which we’ll cover a little bit later. However, this is exactly the type of JSON data that is stored in the S3 bucket, ready for further processing.
如您所见,这些文章记录仅提供了文章本身的非常基本的视图。 诸如素食主义者这样的术语出现在文章中的某些上下文中而不是本文的主要主题,因此没有在标题或描述中表示。因此,我们需要数据丰富组件,稍后我们将进行介绍。 但是,这正是存储在S3存储桶中的JSON数据的类型,可供进一步处理。
Trying an API locally and actually using it in the cloud are really similar. Of course, there are some catches where you don’t want to paste your API key into the actual code but rather use environment variables, but that’s about it.
在本地尝试API并在云中实际使用它确实很相似。 当然,有一些陷阱不希望将API密钥粘贴到实际代码中,而是使用环境变量,仅此而已。
AWS has a very neat GUI for their Lambda setup. It really helps you understand the structure of your component and visualise which services and elements are connected to it.
AWS的Lambda设置非常简洁。 它确实可以帮助您了解组件的结构,并可视化将哪些服务和元素连接到该组件。
In the case of the first component, we have the CloudWatch Hourly Trigger on the “Input”-side and the Logging with CloudWatch and the S3 Bucket as a storage system on the “Output”-side.
对于第一个组件,我们在“输入”端具有CloudWatch每小时触发器,在“输出”端具有CloudWatch和S3存储桶作为存储系统。
So, after putting everything together, importing the Node.JS SDK for AWS, and testing out the whole script locally, I finally deployed it as a Lamdba Function.
因此,在将所有内容放在一起,导入适用于AWS的Node.JS SDK并在本地测试整个脚本之后,我最终将其部署为Lamdba Function。
The final script is actually pretty short and understandable:
最终的脚本实际上很简短,可以理解:
The GUI has some nice testing features with which you can simply trigger your Function by hand.
GUI具有一些不错的测试功能,您可以通过它们简单地手动触发功能。
But nothing worked…
但是没有任何效果……
After a few seconds of googling, I found the term “Policies”. I’d heard of them before, but never read up on them or tried to really understand them.
搜寻了几秒钟后,我发现了“政策”一词。 我以前听说过它们,但从未阅读过它们或试图真正理解它们。
Basically, they describe what service/user/group is allowed to do what. This was the missing piece: I had to allow my Lambda function to write something to S3. (I won’t go into detail about it here, but if you want to skip to policies, feel free to head to the end of the article.)
基本上,它们描述了允许哪些服务/用户/组执行哪些操作。 这是缺少的部分:我不得不允许我的Lambda函数向S3写一些东西。 (我在这里不会详细介绍它,但是如果您想跳到政策,请随意转到本文结尾。)
A policy in AWS is a simple JSON-Style configuration which, in the case of my article collection function, looked like this:
AWS中的策略是一个简单的JSON样式配置,在我的文章收集功能中,该配置如下所示:
This is the config that describes the previously mentioned “Output”-Side of the function. In the statements, we can see that it gets access to different methods of the logging tools and S3.
这是描述函数前面提到的“输出”端的配置。 在这些语句中,我们可以看到它可以访问日志记录工具和S3的不同方法。
The weird part about the assigned resource for the S3 bucket is that, if not stated otherwise in the options of your S3 bucket, you have to both provide the root and “everything below” as two separate resources.
关于S3存储桶分配的资源的怪异之处在于,如果未在S3存储桶的选项中另行说明,则必须同时提供根目录和“以下所有内容”作为两个单独的资源。
The example given above allows the Lambda Function to do anything with the S3 bucket, but this is not how you should set up your system! Your components should only be allowed to do what they are designated to.
上面给出的示例允许Lambda函数对S3存储桶执行任何操作,但这不是设置系统的方式! 只允许您的组件执行指定的操作。
Once this was entered, I could finally see the records getting put into my S3 bucket.
输入后,我终于可以看到记录已放入我的S3存储桶中。
When I tried to get the data back from the S3 bucket I encountered some problems. It just wouldn’t give me the JSON file for the key that was created. I had a hard time finding out what was wrong until at one point, I realised that, by default, AWS enables logging for your services.
当我尝试从S3存储桶取回数据时,遇到了一些问题。 它只是不会为我提供所创建密钥的JSON文件。 我很难找出问题所在,直到有一点才意识到,默认情况下,AWS启用了服务日志记录。
This was gold!
这是金子!
When I looked into the logs, the problem jumped out at me right away: it seemed like the key-value that gets sent by the S3-Trigger does some URL-Encoding. However, this problem was absolutely invisible when just looking at the S3 key names where everything was displayed correctly.
当我查看日志时,问题立刻就扑向了我:似乎S3-Trigger发送的键值进行了一些URL编码。 但是,仅查看所有显示正确的S3键名时,此问题是绝对看不到的。
The solution to this problem was pretty easy. I just replaced every special character with a dash which won’t be replaced by some encoded value.
解决这个问题非常容易。 我只是将每个特殊字符替换为破折号,而破折号不会被某些编码值替换。
So, always make sure to not risk putting some special characters in keys. It might save you a ton of debugging and effort.
因此,始终确保不要冒险在键中放入一些特殊字符。 它可以节省大量的调试和工作量。
Since we now have all the articles as single records in our S3 bucket, we can think about enrichment. We have to combine some steps in order to fulfill our pipeline which, just to think back, was the following:
由于我们现在将所有文章作为单个记录保存在S3存储桶中,因此我们可以考虑充实。 我们必须结合一些步骤才能完成我们的管道,回想一下,这是以下步骤:
One of the really awesome things about Promises in JavaScript is that you can model pipelines exactly the way you would describe them in text. If we compare the code with the explanation of what steps will be taken, we can see the similarity.
关于JavaScript中的Promises的真正令人敬畏的事情之一是,您可以完全按照在文本中描述管道的方式对管道进行建模。 如果将代码与将要执行的步骤的说明进行比较,则可以看到相似之处。
If you take a closer look at the first line of the code above, you can see the export handler. This line is always predefined in the Lambda Functions in order to know which method to call. This means that your own code belongs in the curly braces of the async block.
如果仔细查看上面代码的第一行,您会看到导出处理程序。 该行始终在Lambda函数中预定义,以便知道要调用的方法。 这意味着您自己的代码属于异步块的花括号。
For the Data Enrichment part, we need some more services. We want to be able to send and get data from Comprehends sentiment analysis, write our final record to DynamoDB, and also have logging.
对于数据充实部分,我们需要更多服务。 我们希望能够从理解情绪分析中发送和获取数据,将最终记录写入DynamoDB,并进行日志记录。
Have you noticed the S3 Service on the “Output”-side? This is why I always put the Output in quotes, even though we only want to read data here. It’s displayed on the right hand side. I basically just list all the services our function interacts with.
您是否注意到“输出”端的S3服务? 这就是为什么即使我们只想在此处读取数据, 我也总是将Output放在引号中的原因 。 显示在右侧。 我基本上只是列出我们的函数与之交互的所有服务。
The policy looks comparable to the one of the article collection component. It just has some more resources and rules which define the relation between Lambda and the other services.
该策略看起来与文章收集组件之一相当。 它只是具有更多的资源和规则,这些资源和规则定义了Lambda与其他服务之间的关系。
Even though Google Cloud, in my opinion, has the “better” NLU components, I just love the simplicity and unified API of AWS’ services. If you’ve used one of them, you think you know them all. For example, here’s how to get a record from S3 and how the sentiment detection works in Node.js:
我认为,即使Google Cloud具有“更好”的NLU组件, 我还是喜欢AWS服务的简单性和统一的API。 如果您使用了其中之一,则认为您全部了解。 例如,以下是从S3获取记录的方法以及情感检测在Node.js中的工作方式:
Probably one of the most interesting tasks of the Data Enrichment Component was the creation of the “context” of the word vegan in the article.
数据丰富组件最有趣的任务之一就是在本文中创建纯素食主义者一词的“上下文”。
Just as a reminder — we need this context, since a lot of articles only mention the word “Vegan” without having “Veganism” as a topic.
提醒一下-我们需要这种背景,因为许多文章只提到“素食主义者”一词,而没有以“素食主义者”为主题。
So, how do we extract parts from a text? I went for Regular Expressions. They are incredibly nice to use, and you can use playgrounds like Regex101 to play around and find the right regex for your use case.
那么,我们如何从文本中提取部分呢? 我去了正则表达式。 它们非常好用,您可以使用Regex101之类的游乐场玩耍并找到适合您的用例的正则表达式。
The challenge was to come up with a regex that could find sentences that contained the word “vegan”. Somehow it was harder than I expected to make it generalise for whole text passages that also had line breaks and so on in them.
面临的挑战是提出一个正则表达式,以便找到包含“素食主义者”一词的句子。 某种程度上来说,要使它推广到整个文本段落(其中也包含换行符)的难度比我预期的要难。
The final regex looks like this:
最终的正则表达式如下所示:
The problem was that for long texts, this was not working due to timeout problems. The solution in this case was pretty “straightforward”… I simply crawled the text and split it by line breaks, which made it way easier to process for the RegEx module.
问题是,对于长文本,由于超时问题而无法正常工作。 在这种情况下,解决方案非常“简单”……我只是对文本进行爬网并按换行符进行了拆分,这使RegEx模块的处理变得更加容易。
In the end, the whole context “creation” was a mixture of splitting the text, filtering for passages that contained the word vegan, extracting the matching sentence from that passage, and joining it back together so that it could be used in the sentiment analysis.
最后,整个上下文“创建”是混合以下内容的过程: 拆分文本,过滤包含纯素词的段落,从该段落中提取匹配的句子,然后将其重新组合在一起,以便可以用于情感分析。
Also the title and description might play a role, so I added those to the context if they contained the word “vegan”.
标题和描述也可能起作用,因此,如果标题和描述中包含“素食主义者”一词,则将其添加到上下文中。
Once all the code for the different steps was in place, I thought I could start building the frontend. But something wasn’t right. Some of the records just did not appear in my DynamoDB table…
一旦完成了用于不同步骤的所有代码,我就可以开始构建前端了。 但是有些事情是不对的。 有些记录只是没有出现在我的DynamoDB表中...
When checking back with the status of my already running system, I realised that some of the articles would not be converted to a DynamoDB table entry at all.
当检查我已经在运行的系统的状态时,我意识到有些文章根本不会转换为DynamoDB表条目。
After checking out the logs, I found this Exception which absolutely confused me…
在检查了日志之后,我发现了这个使我完全困惑的异常…
To be honest, this was a really weird behaviour since, as stated in the discussion, the semantics and usage of an empty String are absolutely different than that of a Null value.
老实说,这是一个非常奇怪的行为,因为正如讨论中所述,空String的语义和用法与Null值的语义和用法完全不同。
However, since I couldn’t change anything about the design of the DynamoDB, I had to find a solution to avoid getting the empty String error.
但是,由于我无法更改DynamoDB的设计,因此我不得不找到一种解决方案来避免出现空的String错误。
In my case, it was really easy. I just iterated through the whole JSON object and checked whether there was an empty String or not. If there was, I just replaced the value with null. That’s it, works like charm and does not cause any problems. (I needed to check if it has a value in the frontend, though, since getting the length of a null value throws an error).
就我而言,这真的很容易。 我只是遍历整个JSON对象,并检查是否有空的String。 如果有的话,我只是将值替换为null。 就是这样,就像魅力一样工作,不会引起任何问题。 (不过,我需要检查其前端是否具有值,因为获取空值的长度会引发错误)。
The last part was to actually create a frontend and deploy it so people could visit the page and see the StateOfVeganism.
最后一部分是实际创建一个前端并进行部署,以便人们可以访问该页面并查看StateOfVeganism 。
Of course, I was thinking about whether I should use one of those fancy frontend frameworks like Angular, React or Vue.js… But, well, I went for absolutely old school, plain HTML, CSS and JavaScript.
当然,我在考虑是否应该使用像Angular,React或Vue.js这样的高级前端框架之一。但是,好吧,我去了绝对古老的学校,纯HTML,CSS和JavaScript。
The idea I had for the frontend was extremely minimalistic. Basically it was just a bar that was divided into three sections: Positive, Neutral and Negative. When clicking on either one of those, it would display some titles and links to articles that were classified with this sentiment.
我对前端的想法极简主义 。 基本上,它只是一个分为三个部分的标准:积极,中立和消极。 当单击其中任何一个时,它将显示一些标题和指向以此情感分类的文章的链接。
In the end, that was exactly what it turned out to be. You can check out the page here. I thought about making it live at stateOfVeganism.com, but we’ll see…
最后,这就是事实。 您可以在此处签出该页面。 我曾考虑过在stateOfVeganism.com上发布它,但我们会看到…
Make sure to note the funny third article of the articles that have been classified as “Negative” ;)
请确保注意被分类为“负面”的文章中有趣的第三篇文章;)
Deploying the frontend on one of AWS’ services was something else I had to think about. I definitely wanted to take a service that already incorporated elastic scaling, so I had to decide between Elastic Container Service or Elastic Beanstalk (actual EC2 instances).
我还必须考虑在AWS的一项服务上部署前端。 我绝对想使用已经包含弹性缩放的服务,因此我必须在Elastic Container Service或Elastic Beanstalk(实际EC2实例)之间做出选择。
In the end, I went for Beanstalk, since I really liked the straightforward approach and the incredibly easy deployment. You can basically compare it to Heroku in the way you set it up.
最后,我选择了Beanstalk,因为我真的很喜欢这种简单的方法和极其简单的部署。 您基本上可以按照设置方式将其与Heroku进行比较。
Side note: I had some problems with my auto scaling group not being allowed to deploy EC2 instances, because I use the free tier on AWS. But after a few emails with AWS support, everything worked right out of the box.
旁注:由于我在AWS上使用免费层,因此我的自动伸缩组无法部署EC2实例时遇到了一些问题。 但是,在收到几封具有AWS支持的电子邮件之后,一切都立即可用。
I just deployed a Node.js Express Server Application that serves my frontend on each path.
我刚刚部署了一个Node.js Express Server Application,该服务器在每个路径上都为我的前端服务。
This setup, by default, provides the index.html which resides in the “public” folder, which is exactly what I wanted.
默认情况下,此设置提供了index.html,它位于“ public”文件夹中,而这正是我想要的。
Of course this is the most basic setup. For most applications, it’s not the recommended way, since you somehow have to provide the credentials in order to access the DynamoDB table. It would be better to do some server-side rendering and store the credentials in environment variables so that nobody can access them.
当然,这是最基本的设置。 对于大多数应用程序,这不是推荐的方法,因为您必须以某种方式提供凭据才能访问DynamoDB表。 最好进行一些服务器端渲染并将凭据存储在环境变量中,以使任何人都无法访问它们。
This is something you should never do. However, since I restricted the access of those credentials to only the scan method of the DynamoDB table, you can get the chance to dig deeper into my data if you’re interested.
这是您永远都不应做的事情。 但是,由于我仅将这些凭据的访问权限限制为DynamoDB表的扫描方法,因此,如果您有兴趣,就可以有机会更深入地研究我的数据。
I also restricted the number of requests that can be done, so that the credentials will “stop working” once the free monthly limit has been surpassed, just to make sure.
我还限制了可以完成的请求数量,以确保一旦超过免费每月限制,凭据将“停止工作”。
But feel free to look at the data and play around a little bit if you’re interested. Just make sure to not overdo it, since the API will stop providing the data to the frontend at some point.
但是,如果您有兴趣,可以随时查看数据并进行一些操作。 只要确保不要过度使用它,因为API有时会停止向前端提供数据。
When I started working with cloud technologies, I realised that there has to be a way to allow/restrict access to the single components and create relations. This is where policies come into place. They also help you do access management by giving you the tools you need to give specific users and groups permissions. At one point, you’ll probably struggle with this topic, so it makes sense to read up on it a little bit.
当我开始使用云技术时,我意识到必须有一种允许/限制对单个组件的访问并创建关系的方法。 这就是制定政策的地方。 它们还为您提供了授予特定用户和组权限所需的工具,从而帮助您进行访问管理。 有时,您可能会在这个主题上苦苦挣扎,因此,对此稍作阅读是有意义的。
There are basically two types of policies in AWS. Both are simple JSON style configuration files. However, one of them is assigned to the resource itself, for example S3, and the other one gets assigned to roles, users, or groups.
AWS中基本上有两种策略。 两者都是简单的JSON样式配置文件。 但是,其中一个分配给资源本身,例如S3,另一个分配给角色,用户或组。
The table below shows some very rough statements about which policy you might want to choose for your task.
下表显示了一些非常粗糙的陈述,说明您可能希望为任务选择哪种策略。
So, what is the actual difference? This might become clearer when we compare examples of both policy types.
那么,实际区别是什么? 当我们比较两种策略类型的示例时,这可能会变得更加清晰。
The policy on the left is the IAM-Policy (or Identity-Based). The right one is the Resource-(Based)-Policy.
左侧的策略是IAM策略(或基于身份的)。 正确的是资源(基于)策略。
If we start to compare them line by line, we can’t see any difference until we reach the first statement which defines some rules related to some service. In this case, it’s S3.
如果我们开始逐行比较它们,那么直到到达定义了与某些服务相关的某些规则的第一条语句之前,我们看不到任何区别。 在这种情况下,它是S3。
In the Resource-Policy, we see an attribute that is called “Principal” which is missing in the IAM-Policy. In the context of a Resource-Policy, this describes the entities that are “assigned” to this rule. In the example given above, this would be the users, Alice and root.
在资源策略中,我们看到一个称为“委托人”的属性,该属性在IAM策略中丢失。 在资源策略的上下文中,这描述了“分配”给该规则的实体。 在上面给出的示例中,这将是用户Alice和root。
On the other hand, to achieve the exact same result with IAM-Policies, we would have to assign the policy on the left to our existing users, Alice and root.
另一方面,要使用IAM策略获得完全相同的结果,我们必须将左侧的策略分配给现有用户Alice和root。
Depending on your use case, it might make sense to use one or the other. It’s also a question of what your “style” or the convention or your workplace is.
根据您的用例,可能需要使用其中一个。 这也是您的“风格”,惯例或工作场所的问题。
StateOfVeganism is live already. However, this does not mean that there is nothing to improve. One thing I definitely have to work on is, for example, that recipes from Pinterest are not classified as “Positive” but rather “Neutral”. But the basic functionality is working as expected. The data pipeline works nicely, and if anything should go wrong, I will have nice logging with CloudWatch already enabled.
素食主义已经存在。 但是,这并不意味着没有任何改善。 我肯定要处理的一件事是,例如,Pinterest配方未分类为“正”,而是“中性”。 但是基本功能正在按预期方式工作。 数据管道运行良好,如果出现任何问题,我将在启用CloudWatch的情况下进行日志记录。
It’s been great to really think through and build such a system. Questioning my decisions was very helpful in optimising the whole architecture.
认真考虑并构建这样的系统真是太好了。 质疑我的决定对优化整个体系结构很有帮助。
The next time you’re thinking about building a side project, think about building it with one of the cloud providers. It might be a bigger time investment in the beginning, but learning how to use and build systems with an infrastructure like AWS really helps you to grow as a developer.
下一次考虑构建辅助项目时,请考虑与其中一个云提供商一起构建它。 一开始可能会花费更多时间,但是学习如何使用和构建类似AWS这样的基础架构的系统确实可以帮助您成长为一名开发人员 。
I’d love to hear about your projects and what you build. Reach out and tell me about them.
我希望知道您的项目和建造的内容。 伸出手 ,告诉我有关他们的信息。
Thank you for reading. Be sure to follow me on YouTube and to star StateOfVeganism on GitHub.
感谢您的阅读。 确保在YouTube上关注我,并在GitHub上为StateOfVeganism加上星号。
Don’t forget to hit the clap button and follow me on Twitter, GitHub, Youtube, and Facebook to follow me on my journey.
不要忘了按下拍手按钮,并在Twitter , GitHub , Youtube和Facebook上关注我,以跟随我的旅程。
I’m always looking for new opportunities.So please, feel free to contact me. I’d love to get in touch with you.
我一直在寻找新的机会。 请随时与我联系 。 我很想与您联系。
Also, I’m currently planning to do a half year internship in Singapore starting in March 2019. I’d like to meet as many of you as possible. If you live in Singapore, please reach out. Would love to have a chat over coffee or lunch.
此外,我目前正计划从2019年3月开始在新加坡进行为期半年的实习。我希望与您见面。 如果您住在新加坡,请与我们联系。 很想在喝咖啡或午餐时聊天。
翻译自: https://www.freecodecamp.org/news/how-to-build-a-fully-scalable-architecture-with-aws-5c4e8612565e/
aws架构