深入浅出强化学习:原理入门_强化学习:表面解释

深入浅出强化学习:原理入门

Artificial Intelligence (AI) has become a huge buzz word in the past 5 years or more, and more and more people are being clued up about Artificial Neural Networks that can be trained in two different ways, namely supervised learning and unsupervised learning. However, there is one more that doesn’t really fall under either of the two mentioned categories and this is called reinforcement learning.

在过去的5年或更长的时间里,人工智能(AI)已成为一个热门话题,越来越多的人开始关注可以通过两种不同方式进行训练的人工神经网络,即监督学习和非监督学习。 但是,还有另外一种并没有真正属于上述两种类别,这称为强化学习。

Reinforcement learning is generally used on already established neural network models to encourage specific behaviors to achieve more of a favored outcome. Reinforcement learning currently has been used as a buzz word, and in these cases, it just placed into a black box.

强化学习通常用于已经建立的神经网络模型上,以鼓励特定的行为获得更多的满意结果。 强化学习目前已被用作流行语,在这些情况下,它只是放在一个黑盒子中。

In this article, I want to give a surface level explanation of what reinforcement learning is, by opening this “black box” that has been thrown around and expected to do amazing things.

在本文中,我想通过打开这个“黑匣子”来对强化学习是一个表面层面的解释,该“黑匣子”被扔来扔去,并有望做奇妙的事情。

深入浅出强化学习:原理入门_强化学习:表面解释_第1张图片

这个怎么运作? (How It Works?)

深入浅出强化学习:原理入门_强化学习:表面解释_第2张图片

什么是什么? (What Is What?)

深入浅出强化学习:原理入门_强化学习:表面解释_第3张图片

The agent is basically the decision-maker. It can either make smart decisions using some artificial neural network or maybe a simple decision-maker — that is a little more advanced than an If-Else statement. The decision-maker decides on what actions to take based on a given condition or state that it is currently in.

代理人基本上是决策者 。 它可以使用某些人工神经网络做出明智的决策,也可以使用简单的决策者进行决策,这比If-Else语句要先进一些。 决策者根据给定的条件或当前状态来决定要采取的措施。

深入浅出强化学习:原理入门_强化学习:表面解释_第4张图片

The environment is the “place” that the agent interacts with. It provides the agent with varying conditions or states.

环境是代理与之交互“场所” 。 它为代理提供了不同的条件或状态。

深入浅出强化学习:原理入门_强化学习:表面解释_第5张图片

The interpreter is what provides the agent with feedback based on the action the agent has chosen given a condition or state that the environment has exposed to the agent to.

解释器根据给定条件或状态(该环境已暴露给代理人),根据代理人选择的动作为代理提供反馈

将其链接到实际(已知)模型 (Linking It To Real (Known) Models)

So let’s use some models that we’re familiar with, like social media platforms: Facebook, Instagram, LinkedIn, and any other platform that encourages users to produce and share content with others — mainly with the web.

因此,让我们使用一些我们熟悉的模型,例如社交媒体平台:Facebook,Instagram,LinkedIn,以及任何其他鼓励用户与他人(主要是通过网络)制作和共享内容的平台。

深入浅出强化学习:原理入门_强化学习:表面解释_第6张图片

So basically these social media platforms itself is a model in which users have the options — in this case actions to chose and pick what topic, image, and content to share on these platforms. However, just sharing and posting things alone does not really encourage someone to produce the content, especially if they don’t know whether their audience (in the form of friends, family, colleagues, and maybe even strangers) enjoy the content that the user is sharing. Hence, the introduction of the “like” button, which is used to encourage users to share more.

因此,基本上,这些社交媒体平台本身就是用户可以选择的模型-在这种情况下,将选择并选择要在这些平台上共享的主题,图像和内容。 但是,仅共享和发布内容并不能真正鼓励某人制作内容,特别是如果他们不知道他们的听众(以朋友,家人,同事甚至陌生人的形式)是否喜欢用户所喜欢的内容正在共享。 因此,引入了“喜欢”按钮,该按钮用于鼓励用户分享更多。

深入浅出强化学习:原理入门_强化学习:表面解释_第7张图片

The users use this “like” button as a form of feedback. The more likes that someone gets for sharing and/or posting specific topics or content. The more the user will find and create that type of content. However, if a user shares content that does not get as many likes compared to other more interesting content, then the user will either share less of it or not even share it at all.

用户使用此“喜欢”按钮作为反馈形式。 人们更喜欢分享和/或发布特定主题或内容。 用户将找到并创建更多类型的内容。 但是,如果用户共享的内容与其他更有趣的内容相比没有得到太多喜欢,则该用户要么共享较少的内容,要么根本不共享。

深入浅出强化学习:原理入门_强化学习:表面解释_第8张图片

This method of determining whether to post (or do) more or less of the specific content is what reinforcement learning is. Based on given actions — in this case shared content, it determines whether to do more of it or less of it based on the rewards that the agent (user) gets.

确定是否发布(或执行)更多或更少的特定内容的方法就是强化学习。 基于给定的操作-在这种情况下,共享内容将根据代理(用户)获得的奖励来决定是做多还是少做。

深入浅出强化学习:原理入门_强化学习:表面解释_第9张图片

So in a nutshell, reinforcement learning encourages actions that allow the agent to achieve as many rewards as possible.

简而言之, 强化学习鼓励采取行动,使行动者获得尽可能多的回报

翻译自: https://medium.com/analytics-vidhya/reinforcement-learning-a-surface-level-explanation-75690f03840d

深入浅出强化学习:原理入门

你可能感兴趣的:(python,强化学习,人工智能,机器学习,java)