博弈论 ai 大数据开源库
by Elena Nisioti
由Elena Nisioti
Artificial Intelligence (AI) is full of questions that cannot be answered and answers that cannot be assigned to the correct questions. In the past, it paid for its persistence to wrong practices with periods of stagnation, known as AI winters. The calendar of AI, however, has just reached spring, and the applications are flourishing.
人工智能(AI)充满了无法回答的问题和无法分配给正确问题的答案。 过去,它为在错误停滞期间坚持不懈的行为付出了代价,这被称为AI Winters。 但是,AI的日历刚刚到来,应用程序也蓬勃发展。
Yet, there is a branch of AI that has long been neglected. The talk is about reinforcement learning, that has recently exhibited impressive results on games like AlphaGo and Atari. But let’s be honest: these were not reinforcement learning wins. What got deeper in these cases was the deep neural networks, and not our understanding of reinforcement learning, which maintains the depth it achieved decades ago.
但是,AI的一个分支早已被忽略。 这次演讲是关于强化学习的,最近在AlphaGo和Atari等游戏上展示了令人印象深刻的结果。 但说实话:这些并不是强化学习的胜利。 在这些情况下,更深入的是深度神经网络,而不是我们对强化学习的理解,这种理解保持了数十年前所达到的深度。
Even worse is the case of reinforcement learning when applied to real life problems. If training a robot to balance on a rope sounds hard, try training a team of robots to win a football game, or a team of drones to monitor a moving target.
当应用于现实生活中的问题时,强化学习的情况更糟。 如果训练机器人在绳索上保持平衡听起来很困难,请尝试训练一组机器人赢得足球比赛,或训练一组无人机来监视移动的目标。
Before we lose the branch, or even worse the tree, we must sharpen our understanding of these applications. Game theory is the most common approach to studying teams of players that share a common goal. It can lend us tools to guide learning algorithms in these settings.
在失去分支甚至更糟的树之前,我们必须加深对这些应用程序的理解。 博弈论是研究具有共同目标的玩家团队的最常用方法。 它可以为我们提供工具,以在这些情况下指导学习算法。
But let’s see why the common approach is not a common sense approach.
但是,让我们看看为什么常识方法不是常识方法。
To kill an error is as good a service as, and sometimes even better than, the establishing of a new truth or fact. — Charles Darwin
消除错误与建立新的真相或事实一样好,有时甚至更好。 - 查尔斯·达尔文
First, let’s dirty our hands with some terminology and basics of these areas.
首先,让我们动手掌握这些领域的一些术语和基础知识。
Game: like games in popular understanding, it can be any setting where players take actions and its outcome will depend on them.
游戏:就像人们普遍理解的游戏一样,它可以是玩家采取行动的任何环境,其结果将取决于他们。
Player: a strategic decision-maker within a game.
玩家:游戏中的战略决策者。
Strategy: a complete plan of actions a player will take, given the set of circumstances that might arise within the game.
策略:考虑到游戏中可能发生的一系列情况,玩家将采取的完整的行动计划。
Payoff: the gain a player receives from arriving at a particular outcome of a game.
付清: 玩家从获得特定游戏结果中获得的收益。
Equilibrium: the point in a game where both players have made their decisions and an outcome is reached.
平衡:游戏中两个玩家都做出决定并达到结果的地步。
Nash equilibrium: an equilibrium in which no player can gain by changing their own strategy if the strategies of the other players remain unchanged.
纳什均衡:如果其他参与者的策略保持不变,则任何参与者都无法通过改变自己的策略获得收益的均衡。
Dominant strategy: occurs when one strategy is better than another strategy for one player, no matter how that player’s opponents may play.
主导策略:当一个策略胜过一个玩家的另一策略时发生,无论该玩家的对手如何玩。
This is probably the most famous game in the literature. The figure below presents its payoff matrix. Now, a payoff matrix is worth a thousand words. It is sufficient, to an experienced eye, to provide all the information necessary to describe a game. But let’s be a bit less laconic.
这可能是文学中最著名的游戏。 下图显示了其回报矩阵。 现在,回报矩阵值一千个字。 对于有经验的人来说,提供描述游戏所需的所有信息就足够了。 但是,让我们少一些简洁。
The police arrest two criminals, criminal A and criminal B. Although quite notorious, the criminals cannot be imprisoned for the crime under investigation due to lack of evidence. But they can be held for lesser charges.
警察逮捕了两名罪犯,分别是罪犯A和罪犯B。尽管臭名昭著,但由于缺乏证据,罪犯无法因调查中的罪行而被监禁。 但是他们可以以较低的费用被扣押。
The length of their imprisonment will depend on what they will say in the interrogation room, which gives rise to the game. Each criminal (player) is given the chance to either stay silent or snitch on the other criminal (player). The payoff matrix depicts how many years each player will be imprisoned depending on the outcome. For example, if player A stays silent and player B snitches on them, player A will serve 3 years (-3) and player B will serve none (0).
他们的监禁时间长短取决于他们在审讯室所说的话,这会引起比赛。 每个罪犯(玩家)都有机会保持沉默,或者对另一个罪犯(玩家)窃贼。 收益矩阵描述了根据结果,每个玩家将被监禁多少年。 例如,如果玩家A保持沉默,而玩家B告他们,则玩家A将服务3年(-3),而玩家B将不服务(0)。
If you reviews the payoff matrix carefully, you will find out that the logical action of a player is to betray the other person or, in game-theoretic terms, betraying is the dominant strategy. This will lead to the Nash equilibrium of the game, where each player has a payoff of -2.
如果仔细查看收益矩阵,您会发现玩家的逻辑行为是背叛另一个人,或者从博弈论的角度讲,背叛是主要策略。 这将导致游戏的纳什均衡,其中每个玩家的收益为-2。
Does something feel odd? Yes, or at least it should. If both players somehow agreed to remain silent they would both get a higher reward of -1. Prisoner’s dilemma is an example of a game where rationality leads to a worse result than cooperation would.
有什么奇怪的感觉吗? 是的,或者至少应该如此。 如果两个玩家都以某种方式同意保持沉默,那么他们两个都将获得-1的更高奖励。 囚徒困境是游戏导致理性比合作更糟的结果的一个例子。
Game theory originated in economics, but is today an interdisciplinary area of study. Its father, John von Neumann (you will notice that Johns have serious career prospects in this area), was the first to give a strict formulation to the common notion of a game. He restricted his studies to games of two players, as they were easier to analyze.
博弈论起源于经济学,但如今已成为跨学科研究领域。 它的父亲约翰·冯·诺依曼(John von Neumann,约翰·冯·诺依曼(John von Neumann),您会注意到约翰斯在这一领域的职业前景很深)是第一个对游戏的普遍概念做出严格表述的人。 他的研究仅限于两个玩家的游戏,因为它们更易于分析。
He then co-authored a book with Oskar Morgenstern, which laid the foundations for expected utility theory and shaped the course of game theory. Around that time, John Nash introduced the concept of Nash equilibria, which helps describe the outcome of a game.
然后,他与奥斯卡·摩根斯坦(Oskar Morgenstern)合着了这本书,为预期效用理论奠定了基础,并塑造了博弈论的进程。 大约在那时,约翰·纳什(John Nash)引入了纳什均衡的概念,该概念有助于描述游戏的结果。
It did not take long to realize how vast the applications of game theory can be. From games to biology, philosophy and, wait for it, artificial intelligence. Game theory is nowadays closely related to settings where multiple players learn through reinforcement, an area called multi-agent reinforcement learning. Examples of applications in this case are teams of robots, where each player has to learn how to behave in favor of its team.
很快就意识到了博弈论的广泛应用。 从游戏到生物学,哲学,再到人工智能。 如今,博弈论与多个玩家通过强化学习的领域紧密相关,这一领域称为多主体强化学习。 在这种情况下,应用示例是机器人团队,其中每个玩家都必须学习如何为自己的团队做出行为。
Agent: equivalent to a player.
特工:相当于一名球员。
Reward: equivalent to a payoff.
奖励:相当于回报。
State: all the information necessary to describe the situation an agent is in.
状态:描述座席所处情况的所有必要信息。
Action: equivalent of a move in a game.
动作:相当于游戏中的一招。
Policy: similar to a strategy, it defines the action an agent will make when in particular states
策略:类似于策略,它定义了代理在特定状态下将执行的操作
Environment: everything the agent interacts with during learning.
环境:学习过程中代理与之互动的一切。
Imagine the following scenario: a team of drones is unleashed into a forest in order to predict and locate fires early enough for the firefighters to respond. The drones are autonomous and must explore the forest, learn which conditions are likely to cause fire, and cooperate with each other, so that they cover wide areas of the forest using little battery and communication.
想象以下情况:将一队无人驾驶飞机释放到森林中,以便尽早预测和定位火势,以使消防员能够响应。 这些无人驾驶飞机是自主的,必须探索森林,了解哪些情况可能引起火灾,并相互配合,以便它们几乎不需要电池和通讯就可以覆盖森林的广阔区域。
This application belongs to the area of environmental monitoring, where AI can lend its predictive skills to human intervention. In a technological world that is becoming increasingly complex and a physical world under threat, we can paraphrase Kipling’s quote to “Man could not be everywhere, and therefore he made drones.”
此应用程序属于环境监测领域,在此领域,AI可以将其预测技能应用于人类干预。 在一个日趋复杂的技术世界和一个处于威胁之中的物理世界中,我们可以将吉卜林的名言解释为“人无处不在,因此他制造了无人机”。
Decentralized architectures are another interesting application field. Technologies like the Internet of Things and Blockchain create immense networks. Information and processing is distributed in different physical entities, a trait that has been acknowledged to offer privacy, efficiency and democratization.
分散式架构是另一个有趣的应用领域。 物联网和区块链之类的技术创建了巨大的网络。 信息和处理分布在不同的物理实体中,这一特性已被公认可以提供隐私,效率和民主化。
Regardless of whether you want to use sensors to minimize energy consumption in the households of a country, or replace the banking system, decentralized is the new sexy.
无论您是想使用传感器来最大程度地减少一个国家家庭的能源消耗,还是要更换银行系统,去中心化都是新的魅力。
Making these networks smart, however, is challenging, as most of the AI algorithms we are proud of are data- and computation-hungry. Reinforcement learning algorithms can be employed for efficient data processing and rendering the network adaptive to changes in its environment. In this case, it is interesting, and to the benefit of overall efficiency, to study how the individual algorithms will cooperate.
但是,使这些网络变得智能化是一项挑战,因为我们引以为傲的大多数AI算法都需要大量数据和计算能力。 强化学习算法可用于有效的数据处理,并使网络适应环境变化。 在这种情况下,研究各个算法将如何协作,这对于整体效率的好处是有意思的,并且对整体效率有利。
Translating AI problems to simple games like the prisoner’s dilemma is tempting. This is a usual practice when testing new techniques, as it offers a computationally cheap and intuitive testbed. Nevertheless, it is important not to ignore the effect that the practical characteristics of the problem, such as noise, delays, and finite memory, have on the algorithm.
将AI问题转换为囚徒困境之类的简单游戏很诱人。 在测试新技术时,这是通常的做法,因为它提供了计算上便宜且直观的测试平台。 尽管如此,重要的是不要忽略问题的实际特征(例如噪声,延迟和有限内存)对算法的影响。
Perhaps the most misleading assumption in AI research is that of representing interaction with iterated static games. For example, an algorithm can apply the prisoner’s dilemma game every time it wants to make a decision, a formulation that assumes that the agent has not learned, or changed, along the way. But what about the effect learning will have on the behavior of the agent? Won’t interaction with others affect its strategy?
人工智能研究中最令人误解的假设可能是代表与迭代静态游戏的交互。 例如,算法可以在每次要做出决定的情况下应用囚徒的困境博弈,这种表述假设代理人一直没有学习或改变。 但是学习将对代理的行为产生什么影响呢? 与他人互动不会影响其策略吗?
Research in this area has focused on evolution of cooperation and Robert Axelrod has studied optimal strategies that arise in the iterated version of prisoner’s dilemma. The tournaments that Axelrod organized revealed that strategies that adapt with time and interaction, even as simple as Tit-for-Tat may sound, are very effective.The AI community has recently investigated learning under the sequential prisoner’s dilemma, but research in this area is still in a premature state.
该领域的研究集中于合作的发展 ,罗伯特·阿克塞尔罗德(Robert Axelrod)研究了在囚徒困境的反复版本中出现的最佳策略。 Axelrod举办的比赛显示,适应时间和互动的策略非常有效,即使听起来像Tit-for-Tat一样简单.AI社区最近在有序囚徒困境中研究了学习,但是在这一领域的研究是仍处于过早的状态。
What differentiates multi-agent from single-agent learning is the increased complexity. Training one deep neural network is already enough of a pain, while adding new networks, as parts of the agents, makes the problem exponentially harder.
多代理与单代理有何区别 学习是增加的复杂性。 训练一个深度神经网络已经很痛苦,而作为代理的一部分添加新网络使问题成倍增加。
One less obvious, but more important concern, is the lack of theoretical properties for this kind of problem. Single-agent reinforcement learning is a well-understood area, as Richard Bellman and Christopher Watkins have offered the algorithms and proofs necessary to learn. In the multi-agent case, however, the proofs lose their validity.
一个较不明显但更重要的问题是缺乏针对此类问题的理论性质。 理查德·贝尔曼(Richard Bellman)和克里斯托弗·沃特金斯(Christopher Watkins)提供了学习所需的算法和证明,因此,单主体强化学习是一个容易理解的领域。 但是,在多主体情况下,证明失去其有效性。
Just to illustrate some of the mind-puzzling difficulties that arise: an agent executes a learning algorithm to learn how to react optimally to its environment. In our case, the environment includes the other agents, which also execute the learning algorithm. Thus, the algorithm has to consider the effect of its action before it acts.
只是为了说明出现的一些令人费解的困难:代理执行学习算法,以学习如何对环境做出最佳React。 在我们的例子中,环境包括其他代理,它们也执行学习算法。 因此,该算法必须在行动之前考虑其行动的影响。
The concerns start where game theory started: in economics. Let’s begin with some assumptions made when studying a system under classical game theory.
担忧始于博弈论的起点:经济学。 让我们从在经典博弈论下研究系统时做出的一些假设开始。
Rationality: generally in game theory, and in order to derive Nash equilibria, perfect rationality is assumed. This roughly means that agents always act for their own sake.
合理性:通常在博弈论中,为了推导纳什均衡,我们假设完全合理。 这大致意味着代理总是为自己着想。
Complete information: each agent knows everything about the game, including the rules, what the other players know, and what their strategies are.
完整信息: 每个经纪人都知道有关游戏的一切,包括规则,其他玩家知道什么以及他们的策略是什么。
Common knowledge: there is common knowledge of a fact p in a group of agents when: all the agents know p, they all know that all agents know p, they all know that they all know that all agents know p, and so on ad infinitum. There are interesting puzzles, like the blue-eyed islanders, that describe the effect common knowledge has on a problem.
常识: 有常识 一组代理的事实P的时候:所有的代理商知道P,他们都知道,所有的代理商知道P,他们都知道,他们都知道,所有的代理商知道P,并如此循环往复 。 有一些有趣的难题,例如蓝眼睛的岛民 ,它们描述了常识对问题的影响。
In 1986 Kenn Arrow expressed his reservations towards classical game theory.
1986年,肯恩·阿罗(Kenn Arrow)对古典博弈论表示了保留。
In this paper, I want to disentangle some of the senses in which the hypothesis of rationality is used in economic theory. In particular, I want to stress that rationality is not a property of the individual alone, although it is usually presented that way. Rather, it gathers not only its force but also its very meaning from the social context in which it is embedded. It is most plausible under very ideal conditions. When these conditions cease to hold, the rationality assumptions become strained and possibly even self-contradictory.
在本文中 ,我想弄清在经济理论中使用合理性假设的一些感觉。 我要特别强调的是,理性通常不是个体的财产,尽管理性通常是这样表达的。 而是,它不仅从其力量中收集力量,而且从其所嵌入的社会环境中收集其意义。 在非常理想的条件下,这是最合理的。 当这些条件不再成立时,合理性假设变得紧张,甚至可能自相矛盾。
If you find that Arrow is a bit harsh with classical game theory, how rational would you say your last purchases have been? Or, how much consciousness and effort did you put into your meal today?
如果您发现Arrow对古典游戏理论有些苛刻,那么您说最后一次购买的价格有多合理? 或者,您今天在用餐中投入了多少意识和精力?
But Arrow is not so much worried about the assumption of rationality. He is worried about the implications of it. For an agent to be rational, you need to provide them with all the information necessary to make their decisions. This calls for omniscient players, which is bad in two ways: first, it creates impractical requirements for information storing and processing of players. Second, game theory is no longer a game theory, as you can replace all players by a central ruler (and where is the fun in that?).
但是阿罗并没有那么担心合理性的假设。 他担心它的含义。 为了使代理人变得理性,您需要向他们提供做出决策所需的所有信息。 这需要无所不知的玩家,这在两个方面都是不好的:首先,它对玩家的信息存储和处理提出了不切实际的要求。 其次,博弈论不再是博弈论 ,因为您可以用中央标尺代替所有玩家(这在哪里有乐趣?)。
The value of information in this view is another point of interest. We have already discussed that possessing all the information is infeasible. But what about assuming players with limited knowledge? Would that help?
此观点中信息的价值是另一个关注点。 我们已经讨论过拥有所有信息是不可行的。 但是,假设具有有限知识的玩家呢? 有帮助吗?
You may ask anyone involved in this area, but it suffices to say that optimization under uncertainty is tough. Yes, there still are the good-old Nash equilibria. The problem is that they are infinite. Game theory does not provide you with arguments to evaluate them. So, even if you reach one, you shouldn't make it such a big deal.
您可能会问涉及此领域的任何人,但这足以说在不确定性条件下进行优化是困难的。 是的,纳什均衡仍然存在。 问题在于它们是无限的。 博弈论并没有为您提供评估它们的论据。 因此,即使达到目标,也不应做得这么大。
By this point you should suspect that AI applications are much more complicated than the examples classical game theory concerns itself with. Just to mention a few obstacles on the path of applying the Nash equilibrium approach in a robotic application: imagine being the captain of a team of robots playing football in RoboCup. How fast, strong, and intelligent are your players and your opponents? What strategies does the opponent team use? How should you reward your players? Is a goal the only reason for congratulating, or will applauding a good pass also improve the team’s behavior? Clearly, just being familiar with the rules of football will not win you the game.
至此,您应该怀疑AI应用程序比经典游戏理论所关注的示例复杂得多。 仅提及在机器人应用程序中使用纳什均衡方法的道路上的一些障碍:想象一下,是一支在RoboCup中踢足球的机器人团队的队长。 您的球员和对手有多快,强大和聪明? 对手球队采用什么策略? 您应该如何奖励您的玩家? 进球是唯一值得祝贺的理由,还是称赞良好的传球也会改善球队的行为? 显然,仅仅熟悉足球规则不会赢得比赛。
If game theory has been raising debates for decades, if it has been founded on unrealistic assumptions and, for realistic tasks, if it offers complicated and little-understood solutions, why are we still going for it? Well, plainly enough, it’s the only thing we’ve got when it comes to group reasoning. If we actually understood how groups interact and cooperate to achieve their goals, psychology and politics would be much clearer.
如果博弈论已经引起了数十年的争论,如果它以不现实的假设为基础,并且对于现实的任务,如果它提供了复杂且鲜为人知的解决方案,那么我们为什么还要继续呢? 好吧,很明显,这是我们进行群体推理时唯一的一件事。 如果我们真正了解了团体如何互动和合作以实现其目标,那么心理和政治将更加清晰。
Researchers in the area of multi-agent reinforcement learning either completely emit a discussion on the theoretical properties of their algorithms (and nevertheless often exhibit good results) or traditionally study the existence of Nash equilibria. The latter approach seems, to the eyes of a young researcher in the field, like a struggle to prove, under severe, unrealistic assumptions, the theoretical existence of solutions that — being infinite and of questionable value — will never be leveraged in practice.
多主体强化学习领域的研究人员要么完全讨论其算法的理论特性(尽管常常表现出良好的结果),要么传统上研究纳什均衡的存在。 在该领域的年轻研究人员看来,后一种方法似乎是在努力证明,在严峻,不切实际的假设下,解决方案的理论存在性-无限和可疑的价值-绝不会在实践中加以利用。
The inception of evolutionary game theory is not recent, yet its far-reaching applications in the area of AI took long to be acknowledged. Originating in biology, it was introduced in 1973, by John M. Smith and George R. Price, as an alternative to classical game theory. The alterations are so profound that we can talk about a whole new approach.
进化博弈论的产生并不是最近才开始的,但是它在人工智能领域的深远应用早就被人们认可了。 它起源于生物学,于1973年由John M. Smith和George R. Price提出,作为经典博弈论的替代方法。 这些变化是如此深刻,以至于我们可以讨论一种全新的方法。
The subject of reasoning is no longer the player itself, but the population of players. Thus, probabilistic strategies are defined as the percentage of players that make a choice, not the probability of one player choosing an action as in classical game theory. This removes the necessity for rational, omniscient agents, as strategies evolve as patterns of behavior. The evolution process resembles Darwinian theory. Players reproduce following the principles of survival of the fittest and random mutations, and can be elegantly described by a set of differential equations, termed the replicator dynamics.
推理的对象不再是参与者本身,而是参与者的总数。 因此,概率策略被定义为做出选择的玩家百分比,而不是像经典博弈论那样,一个玩家选择一个动作的概率。 随着策略随着行为模式的发展,这消除了理性,无所不知的主体的必要性。 进化过程类似于达尔文理论。 玩家遵循优胜劣汰和随机突变的生存原理进行繁殖,并且可以用一组称为复制器动力学的微分方程式进行优雅地描述。
We can see the three important parts of this system in the illustration below. A population represents the team of agents, and is characterized by a mixture of strategies. The game rules determine the payoffs of the population, which can also be seen as the fitness values of an evolutionary algorithm. Finally, the replicator rules describe how the population will evolve based on the fitness values and the mathematical properties of the evolution process.
在下图中,我们可以看到该系统的三个重要部分。 人口代表代理团队,并具有多种策略。 博弈规则确定总体的收益,这也可以看作是进化算法的适应度值。 最后,复制器规则根据适应度值和进化过程的数学特性描述种群将如何进化。
The notion and pursuit of Nash equilibria is replaced by evolutionary stable strategies. A strategy can bear this characterization if it is immune to an invasion by a population of agents that follow another strategy, provided that the invading population is small. Thus, the behavior of the team can be studied under the well-understood area of stability of dynamical systems, such as Lyapunov stability.
纳什均衡的概念和追求被进化稳定策略所取代。 如果入侵者的数量很小,则该策略可以不受遵循另一种策略的特工群体的入侵的影响而具有这种特征。 因此,可以在充分了解的动力学系统稳定性(例如Lyapunov稳定性)下研究团队的行为。
The attainment of equilibrium requires a disequilibrium process. What does rational behavior mean in the presence of disequilibrium? Do individuals speculate on the equilibrating process? If they do, can the disequilibrium be regarded as, in some sense, a higher-order equilibrium process?
达到平衡需要一个不平衡过程。 存在不平衡时理性行为是什么意思? 个人会推测平衡过程吗? 如果确实如此,那么从某种意义上说,不平衡是否可以看作是一个更高阶的平衡过程?
In the above passage, Arrow seems to be struggling to pinpoint the dynamic properties of a game. Could evolutionary game theory be an answer to his questions?
在以上段落中,Arrow似乎在努力确定游戏的动态属性。 进化博弈论可以回答他的问题吗?
Quite recently, famous reinforcement learning algorithms, such as Q-learning, were studied under this new approach and significant conclusions were drawn. How this new tool is used ultimately depends on the application.
最近,在这种新方法下研究了著名的强化学习算法,例如Q学习,并得出了重要的结论。 最终如何使用此新工具取决于应用程序。
We can follow the forward approach, to derive the dynamic model of a learning algorithm. Or the inverse, where we start from some desired dynamic properties and engineer a learning algorithm that exhibits them.
我们可以遵循正向方法,得出学习算法的动态模型。 反之亦然,我们从一些所需的动态特性开始,并设计出一种展示它们的学习算法。
We can use the replicator dynamics descriptively, to visualize convergence. Or prescriptively, to tune the algorithm in order to converge to optimal solutions. The latter can immensely reduce the complexity entailed in training deep networks for tough tasks that we face today, by removing the need for blind tuning.
我们可以描述性地使用复制器动力学来可视化收敛。 或说明性地,调整算法以收敛到最佳解决方案。 后者可以消除盲目调整的必要性,从而极大地减少了为我们当今面临的艰巨任务训练深度网络时所需要的复杂性。
It’s not hard to trace when and why the paths of game theory and AI became convoluted. What’s harder, however, is to overlook the restrictions AI, and in particular multi-agent reinforcement learning, has to face when following classical game theoretic approaches.
不难发现何时以及为什么博弈论和AI的道路变得混乱。 但是,更难的是忽略遵循经典游戏理论方法时,AI尤其是多主体强化学习必须面对的限制。
Evolutionary game theory sounds promising, offering both theoretical tools and practical advantages, but we won’t really know until we try it. In this case, evolution will not arise naturally, but out of a conscious struggle of the research community for improvement. But isn’t that the essence of evolution?
进化博弈论听起来很有希望,既提供了理论工具,又提供了实践优势,但是直到我们尝试它,我们才真正知道。 在这种情况下,进化不会自然而然地发生,而是出于研究界有意识地寻求改进的斗争。 但这不是进化的本质吗?
It takes some effort to deviate from where inertia is pushing you, but reinforcement learning, despite general successes in AI, is in serious need of a lift.
偏离惯性推动您的位置需要花费一些精力,但是尽管AI取得了普遍成功,但强化学习仍然迫切需要提升。
翻译自: https://www.freecodecamp.org/news/game-theory-and-ai-where-it-all-started-and-where-it-should-all-stop-82f7bd53a3b4/
博弈论 ai 大数据开源库