Translator : Younix
目录
【译文】
导言
大纲
普通工程
非凡的工程
长期目标,近期研究
相关工作
结论
参考资料
【原文】
Introduction
Outline
Ordinary Engineering
Extraordinary Engineering
Long-term Goals, Near-term Research
Related Work
Conclusion
References
最近有许多关于人工智能风险的讨论,特别是指能力提高的人工智能可能给社会造成的潜在陷阱(短期和长期。 讨论对象包括AI研究人员,如Stuart Russell和Eric Horvitz和Tom Dietterich,企业家,如Elon Musk和Bill Gates,以及研究机构,如机器智能研究所(MIRI)和未来人类研究所(FHI);后者的主任Nick Bostrom甚至写了一本关于这个主题的畅销书。 最后,已拨出1000万美元用于研究如何确保人工智能安全和有益。 鉴于这一点,我认为人工智能研究人员讨论能力越来越强的人工智能系统可能带来的风险的性质和程度,无论是短期还是长期,都是有益的。 作为一名机器学习和人工智能的博士学生,本文将描述我自己对人工智能风险的看法,希望鼓励其他研究人员也详细阐述他们的想法。
为了本文的目的,我将定义“人工智能”为能够在有限或没有人类指导的情况下执行任务的技术,“高级人工智能”为执行比今天可能更复杂和领域一般任务的技术,以及“高能力人工智能”为能够在所有或几乎所有领域优于人类的技术。 由于本文的主要目标受众是其他研究人员,我使用了技术术语(例如。 弱监督学习,反向强化学习),只要它们是有用的,尽管我也试图使这篇文章在可能的情况下更普遍地被访问。
我认为区分两个问题很重要。 首先,人工智能是否值得与其他技术(如桥梁)同等程度的工程安全考虑)? 第二,人工智能是否需要额外的预防措施,而不是那些被认为是典型的预防措施? 我将认为,第一个答案是肯定的,甚至在短期内,目前的工程方法在机器学习领域甚至没有提供一个典型的安全或健壮性水平。 此外,我将认为,从长远来看,对第二个问题的答案也可能是肯定的,即有一些重要的方法可以使高能力的人工智能构成风险,而这些风险并没有得到典型工程问题的解决。
这篇文章的重点不是危言耸听;事实上,我认为人工智能对人类来说可能是净积极的。 相反,本文的重点是鼓励讨论人工智能带来的潜在陷阱,因为我相信现在所做的研究可以减轻许多这些陷阱。 没有这样的讨论,我们就不太可能理解哪些陷阱是最重要或最可能的,因此无法设计有效的研究方案来防止它们。
对讨论人工智能带来的风险的一个共同反对意见是,对这种风险的担忧似乎有点早,如果我们等到人工智能领域进一步发展之后,讨论可能会更密切。 我认为这一反对在抽象上是相当合理的;然而,正如我将在下面指出的,我认为我们确实合理地理解了人工智能可能造成的至少一些风险,其中一些风险即使在中期也将得到实现,并且有合理的研究方案可以解决这些风险,在许多情况下,这也将具有提高现有人工智能系统可用性的优势。
有许多与人工智能安全有关的问题,只是一个好的工程方法的问题。 例如,理想情况下,我们希望系统是透明的、模块化的、健壮的和工作在理解良好的假设下的。 不幸的是,机器学习作为一个领域,没有开发出非常好的方法来获得任何这些东西,因此这是一个重要的问题,以弥补。 换句话说,我认为我们应该把至少同样多的思想投入到建造人工智能中,就像我们在建造一座桥一样。
只是说得很清楚,我不认为机器学习研究人员是不好的工程师;看看任何开源工具,如Torch、Caffe、MLlib等,都清楚地表明,许多机器学习研究人员也是优秀的软件工程师。 相反,我认为,作为一个领域,我们的方法还不够成熟,无法解决统计模型的具体工程设计(与创建它们的算法相反)。 特别是,从机器学习算法中得到的统计模型往往是:
这些问题困扰着机器学习系统,这在机器学习研究人员中可能没有争议。 然而,与注重扩展能力的研究相比,解决这些问题的工作很少。 因此,这一领域的研究似乎特别有影响力,特别是考虑到在日益复杂和安全关键的情况下部署机器学习系统的愿望。
人工智能是否值得额外的安全预防措施,除了那些被认为是标准的工程实践在其他领域? 在这里,我只关注先进或高能力人工智能系统的长期影响。
我的初步答案是肯定的;似乎有几种不同的方式,人工智能可能会产生不良影响,每一种方式似乎个别地不太可能,但并非不可信。 即使到目前为止发现的每一种风险都不太可能,(一)总风险可能很大,特别是如果有更多的不明风险,(二)存在多个“近误”,也会促使更密切的调查,因为它可能暗示一些基本原则,使人工智能风险增加。 在续集中,我将关注所谓的“全球灾难性”风险,这意味着可能以物质方式影响地球人口的很大一部分的风险。 我之所以选择把重点放在这些风险上,是因为我认为,人工智能系统在损害少数人的方式(这将是一种法律责任,但也许不应在预防措施方面推动重大努力)和可能在全球范围造成损害的人工智能系统之间存在着重要的区别。 后者将证明有充分的预防措施是合理的,我想明确指出,这是我为自己设置的酒吧。
有了这一点,下面是先进或高能力人工智能可能具有特定全球灾难性风险的几种方式。
网络攻击。有两种趋势加在一起,使得人工智能辅助网络攻击的前景似乎令人担忧。 第一个趋势仅仅是网络攻击的日益普遍;甚至在今年,我们还看到俄罗斯攻击乌克兰、朝鲜攻击索尼和中国攻击美国人事管理办公室。 第二,“物联网”意味着越来越多的物理设备将连接到互联网。 假设存在软件来自主控制它们,许多互联网设备,如汽车,可能会被黑客攻击,然后被武器化,从而在短时间内获得决定性的军事优势。 这种攻击可以由一小群人在人工智能技术的帮助下实施,这将使它很难事先被发现。 与核裂变或合成生物学等其他可武器化技术不同,控制人工智能的分布将非常困难,因为它不依赖任何特定的原材料。 最后,请注意,即使是一个计算资源相对较小的团队也可能通过首先创建一个用于计算的僵尸网络来“引导”更多的计算能力;到目前为止,最大的僵尸网络已经跨越了3000万台计算机,其他几个僵尸网络已经超过了100万台。
自主武器。除了网络攻击之外,改进的自主机器人技术,加上无处不在的微型无人机(“无人机”),可以让恐怖分子和政府通过制造既便宜又难以探测或防御的武器(由于其体积小、机动性高),制造一种特别有害的远程战争形式)。 除了直接的恶意意图外,如果自主武器系统或其他强大的自主系统发生故障,那么它们可能会造成大量损害。
错误优化。一个高度能干的人工智能可以获得大量的权力,但追求一个过于狭窄的目标,最终损害人类或人类的价值,同时优化这一目标。 这在表面价值上看起来似乎不可信,但正如我将在下面指出的那样,提高AI能力比提高AI值更容易,这在理论上使这样的灾难成为可能。
失业。自动化程度的提高正在减少现有工作的数量,这已经是事实,因为一些经济学家和决策者正在讨论,如果工作的数量系统地小于寻求工作的人数,该怎么办。 如果人工智能系统允许大量的工作在相对较短的时间内自动化,那么我们可能没有时间计划或实施政策解决方案,然后可能会出现巨大的失业率飙升。 除了对失业者的直接影响外,这种激增还可能通过在全球范围内降低社会稳定而产生间接影响。
不透明系统。另外,越来越多的任务被委托给自主系统,从金融市场的交易到信息提要的汇总。 这些系统的不透明导致了诸如2010年Flash Crash之类的问题,并可能导致未来更大的问题。 从长远来看,随着人工智能系统变得越来越复杂,人类可能失去有意义地理解或干预这类系统的能力,如果在执行一级的职能中使用自治系统,这可能导致主权丧失(例如。 政府、经济)。
除了这些特定的风险,似乎很明显,最终,人工智能将能够在本质上的每个领域都优于人类。 在这一点上,如果不采取具体措施确保这一点,人类将继续对其未来产生直接的因果影响,这一点似乎令人怀疑。 虽然我认为这一天不会很快到来,但我认为现在值得思考的是,我们如何有意义地控制高能力的人工智能系统,我还认为,上面提出的许多风险(以及我们还没有想到的其他风险)将在较短的时间内发生。
最后,让我谈谈与其他人类工程系统相比,人工智能控制可能特别困难的一些具体方法:
总之:高能力人工智能构成了几个具体的全球灾难性风险,也有几个理由相信高能力人工智能将难以控制。 总之,这些都表明,控制高能力的人工智能系统是一个重要的问题,提出了独特的研究挑战。
上面我提出了一个论点,为什么从长远来看,人工智能可能需要大量的预防努力。 除此之外,我还相信,现在可以做一些重要的研究,以减少长期人工智能风险。 在本节中,我将详细介绍一些具体的研究项目,尽管我的清单并不是详尽无遗的。
上述内容构成了至少五个具体的研究方向,我认为今天可以在这方面取得重要进展,这将有意义地改善先进人工智能系统的安全,在许多情况下,在短期内也可能有附带利益。
在高层次上,虽然我已经隐含地提供了上面的研究计划,但也有其他建议的研究计划。 也许最早提出的程序是来自MIRI[6],它集中在人工智能对齐问题,即使在简化的设置(例如。 具有无限的计算能力或易于指定的目标),希望以后推广到更复杂的设置。 生命的未来研究所(FLI)还出版了一份研究优先事项文件[7,8]重点更广泛,包括非技术专题,如自主武器的管制和基于人工智能的技术引起的经济变化。 我不一定赞同这两份文件,但认为这两份文件都是朝着正确方向迈出的一大步。 理想情况下,MIRI、FLI和其他人都将证明为什么他们认为他们的问题值得研究,我们可以让最好的论点和反驳上升到顶端。 这在一定程度上已经发生了[9,10,11]但我想看到更多的情况,特别是在机器学习和人工智能[12,13]方面具有专门知识的学者。
此外,我提出的几个具体论点与其他人已经提出的论点相似。 人工智能驱动的失业问题已由Brynjolfsson和McAfee[14]研究,并在FLI研究文件中讨论。 人工智能追求狭隘目标的问题已经通过Bostrom的“论文论证”[15]以及正交性论文[16]来阐述,该论文指出信仰和价值观是相互独立的。 虽然我不同意正交性论文的最强形式,但上面提出的关于价值学习困难的论点在许多情况下可以得出类似的结论。
Omohundro[17]认为,先进的代理将在几乎任何价值体系下追求某些仪器收敛的驱动,这是类似代理的系统不同于没有代理的系统的一种方式。 良好的[18]是第一个认为人工智能能力可以迅速提高的人。 Yudkowsky认为,在没有多少初始资源[19]的情况下,人工智能很容易获得权力,尽管他的例子假设创造了先进的生物技术。
Christiano主张透明人工智能系统的价值,并提出了“顾问游戏”框架,作为透明[20]的潜在操作。
为了确保AI系统的安全,需要进行额外的研究,既要满足普通的短期工程需求,也要针对高能力的AI系统制定额外的预防措施。 在这两种情况下,今天都可以开展明确的研究方案,在许多情况下,这些方案似乎相对于其潜在的社会价值没有得到充分的研究。 因此,我认为,为提高人工智能系统的安全性而进行的有针对性的研究是一项有价值的工作,其额外好处是激发有趣的新研究方向。
致谢
感谢保罗·克里斯托诺、霍尔顿·卡诺夫斯基、珀西·梁、卢克·穆尔豪瑟、尼克·贝克斯特德、内特·苏亚雷斯和豪伊·莱佩尔对这篇文章的草稿提供了反馈。
[1]D.Sculley等人。机器学习:技术债务的高级信用卡. 2014.
[2]休·杜姆三世和丹尼尔·马库。 统计分类器的领域适配。人工智能研究杂志,第101-126页,2006年。
[3]辛诺·潘和强阳。 迁移学习调查。IEEE知识和数据工程交易,22(10):1345-1359,2010年。
[4]DimitrisBertsimas,DavidB。 布朗和康斯坦丁·卡拉曼尼斯。 鲁棒优化的理论与应用。SIAM Review,53(3):464-501,2011。
[5]安德鲁·吴和斯图亚特·罗素。 逆强化学习的算法。 在机器学习国际会议上,第663-670页,2000年。
[6]内特·苏亚雷斯和贝尼亚·法伦斯坦。将超级情报与人类利益相结合:一个技术研究议程. 2014.
[7]斯图尔特·罗素,丹尼尔·杜威和马克斯·特马克。健壮有益的人工智能的研究重点. 2015.
[8]丹尼尔·杜威,斯图尔特·罗素和马克斯·特马克。对健壮和有益的人工智能的研究问题的调查. 2015.
[9]保罗·克里斯托。指导问题. 2015.
[10]保罗·克里斯托。稳定自我提升作为AI安全问题. 2015.
[11]卢克·穆哈瑟。如何研究超智力策略. 2014.
斯图尔特·罗素[12]。神话和蒙辛. 2014.
[13]汤姆·迪特里奇和埃里克·霍维茨。人工智能的好处和风险. 2015.
[14]埃里克·布莱恩约弗森和安德鲁·麦克·阿菲。第二个机器时代:在技术辉煌的时代,工作、进步和繁荣。 WWNorton&Company,2014年。
尼克·博斯特罗姆[15](2003年)。先进人工智能中的伦理问题。人类和人工智能决策的认知、情感和伦理方面。
尼克·博斯特罗姆[16]。 “超级智能意志:先进人工智能中的动机和工具理性”头脑和机器22.2(2012):71-85。
Stephen M.Omohundro[17](2008年)。基本的人工智能驱动。人工智能和应用前沿(IOS出版社)。
[18]欧文J。 很好。 “关于第一台超智力机器的推测”计算机的进展6.99(1965):31-83。
[19]Eliezer Yudkowsky。 “人工智能作为全球风险的正负因素”全球灾难性风险1(2008):303。
[20]保罗·克里斯托。游戏顾问. 2015.
There has been much recent discussion about AI risk, meaning specifically the potential pitfalls (both short-term and long-term) that AI with improved capabilities could create for society. Discussants include AI researchers such as Stuart Russell and Eric Horvitz and Tom Dietterich, entrepreneurs such as Elon Musk and Bill Gates, and research institutes such as the Machine Intelligence Research Institute (MIRI) and Future of Humanity Institute (FHI); the director of the latter institute, Nick Bostrom, has even written a bestselling book on this topic. Finally, ten million dollars in funding have been earmarked towards research on ensuring that AI will be safe and beneficial. Given this, I think it would be useful for AI researchers to discuss the nature and extent of risks that might be posed by increasingly capable AI systems, both short-term and long-term. As a PhD student in machine learning and artificial intelligence, this essay will describe my own views on AI risk, in the hopes of encouraging other researchers to detail their thoughts, as well.
For the purposes of this essay, I will define “AI” to be technology that can carry out tasks with limited or no human guidance, “advanced AI” to be technology that performs substantially more complex and domain-general tasks than are possible today, and “highly capable AI” to be technology that can outperform humans in all or almost all domains. As the primary target audience of this essay is other researchers, I have used technical terms (e.g. weakly supervised learning, inverse reinforcement learning) whenever they were useful, though I have also tried to make the essay more generally accessible when possible.
I think it is important to distinguish between two questions. First, does artificial intelligence merit the same degree of engineering safety considerations as other technologies (such as bridges)? Second, does artificial intelligence merit additional precautions, beyond those that would be considered typical? I will argue that the answer is yes to the first, even in the short term, and that current engineering methodologies in the field of machine learning do not provide even a typical level of safety or robustness. Moreover, I will argue that the answer to the second question in the long term is likely also yes — namely, that there are important ways in which highly capable artificial intelligence could pose risks which are not addressed by typical engineering concerns.
The point of this essay is not to be alarmist; indeed, I think that AI is likely to be net-positive for humanity. Rather, the point of this essay is to encourage a discussion about the potential pitfalls posed by artificial intelligence, since I believe that research done now can mitigate many of these pitfalls. Without such a discussion, we are unlikely to understand which pitfalls are most important or likely, and thus unable to design effective research programs to prevent them.
A common objection to discussing risks posed by AI is that it seems somewhat early on to worry about such risks, and the discussion is likely to be more germane if we wait to have it until after the field of AI has advanced further. I think this objection is quite reasonable in the abstract; however, as I will argue below, I think we do have a reasonable understanding of at least some of the risks that AI might pose, that some of these will be realized even in the medium term, and that there are reasonable programs of research that can address these risks, which in many cases would also have the advantage of improving the usability of existing AI systems.
There are many issues related to AI safety that are just a matter of good engineering methodology. For instance, we would ideally like systems that are transparent, modular, robust, and work under well-understood assumptions. Unfortunately, machine learning as a field has not developed very good methodologies for obtaining any of these things, and so this is an important issue to remedy. In other words, I think we should put at least as much thought into building an AI as we do into building a bridge.
Just to be very clear, I do not think that machine learning researchers are bad engineers; looking at any of the open source tools such as Torch, Caffe, MLlib, and others make it clear that many machine learning researchers are also good software engineers. Rather, I think that as a field our methodologies are not mature enough to address the specific engineering desiderata of statistical models (in contrast to the algorithms that create them). In particular, the statistical models obtained from machine learning algorithms tend to be:
That these issues plague machine learning systems is likely uncontroversial among machine learning researchers. However, in comparison to research focused on extending capabilities, very little is being done to address them. Research in this area therefore seems particularly impactful, especially given the desire to deploy machine learning systems in increasingly complex and safety-critical situations.
Does AI merit additional safety precautions, beyond those that are considered standard engineering practice in other fields? Here I am focusing only on the long-term impacts of advanced or highly capable AI systems.
My tentative answer is yes; there seem to be a few different ways in which AI could have bad effects, each of which seems individually unlikely but not implausible. Even if each of the risks identified so far are not likely, (i) the total risk might be large, especially if there are additional unidentified risks, and (ii) the existence of multiple “near-misses” motivates closer investigation, as it may suggest some underlying principle that makes AI risk-laden. In the sequel I will focus on so-called “global catastrophic” risks, meaning risks that could affect a large fraction of the earth’s population in a material way. I have chosen to focus on these risks because I think there is an important difference between an AI system messing up in a way that harms a few people (which would be a legal liability but perhaps should not motivate a major effort in terms of precautions) and an AI system that could cause damage on a global scale. The latter would justify substantial precautions, and I want to make it clear that this is the bar I am setting for myself.
With that in place, below are a few ways in which advanced or highly capable AI could have specific global catastrophic risks.
Cyber-attacks. There are two trends which taken together make the prospect of AI-aided cyber-attacks seem worrisome. The first trend is simply the increasing prevalence of cyber-attacks; even this year we have seen Russia attack Ukraine, North Korea attack Sony, and China attack the U.S. Office of Personnel Management. Secondly, the “Internet of Things” means that an increasing number of physical devices will be connected to the internet. Assuming that software exists to autonomously control them, many internet-enabled devices such as cars could be hacked and then weaponized, leading to a decisive military advantage in a short span of time. Such an attack could be enacted by a small group of humans aided by AI technologies, which would make it hard to detect in advance. Unlike other weaponizable technology such as nuclear fission or synthetic biology, it would be very difficult to control the distribution of AI since it does not rely on any specific raw materials. Finally, note that even a team with relatively small computing resources could potentially “bootstrap” to much more computing power by first creating a botnet with which to do computations; to date, the largest botnet has spanned 30 million computers and several other botnets have exceeded 1 million.
Autonomous weapons. Beyond cyber-attacks, improved autonomous robotics technology combined with ubiquitous access to miniature UAVs (“drones”) could allow both terrorists and governments to wage a particularly pernicious form of remote warfare by creating weapons that are both cheap and hard to detect or defend against (due to their small size and high maneuverability). Beyond direct malicious intent, if autonomous weapons systems or other powerful autonomous systems malfunction then they could cause a large amount of damage.
Mis-optimization. A highly capable AI could acquire a large amount of power but pursue an overly narrow goal, and end up harming humans or human value while optimizing for this goal. This may seem implausible at face value, but as I will argue below, it is easier to improve AI capabilities than to improve AI values, making such a mishap possible in theory.
Unemployment. It is already the case that increased automation is decreasing the number of available jobs, to the extent that some economists and policymakers are discussing what to do if the number of jobs is systematically smaller than the number of people seeking work. If AI systems allow a large number of jobs to be automated over a relatively short time period, then we may not have time to plan or implement policy solutions, and there could then be a large unemployment spike. In addition to the direct effects on the people who are unemployed, such a spike could also have indirect consequences by decreasing social stability on a global scale.
Opaque systems. It is also already the case that increasingly many tasks are being delegated to autonomous systems, from trades in financial markets to aggregation of information feeds. The opacity of these systems has led to issues such as the 2010 Flash Crash and will likely lead to larger issues in the future. In the long term, as AI systems become increasingly complex, humans may lose the ability to meaningfully understand or intervene in such systems, which could lead to a loss of sovereignty if autonomous systems are employed in executive-level functions (e.g. government, economy).
Beyond these specific risks, it seems clear that, eventually, AI will be able to outperform humans in essentially every domain. At that point, it seems doubtful that humanity will continue to have direct causal influence over its future unless specific measures are put in place to ensure this. While I do not think this day will come soon, I think it is worth thinking now about how we might meaningfully control highly capable AI systems, and I also think that many of the risks posed above (as well as others that we haven’t thought of yet) will occur on a somewhat shorter time scale.
Let me end with some specific ways in which control of AI may be particularly difficult compared to other human-engineered systems:
In summary: there are several concrete global catastrophic risks posed by highly capable AI, and there are also several reasons to believe that highly capable AI would be difficult to control. Together, these suggest to me that the control of highly capable AI systems is an important problem posing unique research challenges.
Above I presented an argument for why AI, in the long term, may require substantial precautionary efforts. Beyond this, I also believe that there is important research that can be done right now to reduce long-term AI risks. In this section I will elaborate on some specific research projects, though my list is not meant to be exhaustive.
The above constitute at least five concrete directions of research on which I think important progress can be made today, which would meaningfully improve the safety of advanced AI systems and which in many cases would likely have ancillary benefits in the short term, as well.
At a high level, while I have implicitly provided a program of research above, there are other proposed research programs as well. Perhaps the earliest proposed program is from MIRI [6], which has focused on AI alignment problems that arise even in simplified settings (e.g. with unlimited computing power or easy-to-specify goals) in hopes of later generalizing to more complex settings. The Future of Life Institute (FLI) has also published a research priorities document [7, 8] with a broader focus, including non-technical topics such as regulation of autonomous weapons and economic shifts induced by AI-based technologies. I do not necessarily endorse either document, but think that both represent a big step in the right direction. Ideally, MIRI, FLI, and others will all justify why they think their problems are worth working on and we can let the best arguments and counterarguments rise to the top. This is already happening to some extent [9, 10, 11] but I would like to see more of it, especially from academics with expertise in machine learning and AI [12, 13].
In addition, several specific arguments I have advanced are similar to those already advanced by others. The issue of AI-driven unemployment has been studied by Brynjolfsson and McAfee [14], and is also discussed in the FLI research document. The problem of AI pursuing narrow goals has been elaborated through Bostrom’s “paperclipping argument” [15] as well as the orthogonality thesis [16], which states that beliefs and values are independent of each other. While I disagree with the orthogonality thesis in its strongest form, the arguments presented above for the difficulty of value learning can in many cases reach similar conclusions.
Omohundro [17] has argued that advanced agents would pursue certain instrumentally convergent drives under almost any value system, which is one way in which agent-like systems differ from systems without agency. Good [18] was the first to argue that AI capabilities could improve rapidly. Yudkowsky has argued that it would be easy for an AI to acquire power given few initial resources [19], though his example assumes the creation of advanced biotechnology.
Christiano has argued for the value of transparent AI systems, and proposed the “advisor games” framework as a potential operationalization of transparency [20].
To ensure the safety of AI systems, additional research is needed, both to meet ordinary short-term engineering desiderata as well as to make the additional precautions specific to highly capable AI systems. In both cases, there are clear programs of research that can be undertaken today, which in many cases seem to be under-researched relative to their potential societal value. I therefore think that well-directed research towards improving the safety of AI systems is a worthwhile undertaking, with the additional benefit of motivating interesting new directions of research.
Acknowledgments
Thanks to Paul Christiano, Holden Karnofsky, Percy Liang, Luke Muehlhauser, Nick Beckstead, Nate Soares, and Howie Lempel for providing feedback on a draft of this essay.
[1] D. Sculley, et al. Machine Learning: The High-Interest Credit Card of Technical Debt. 2014.
[2] Hal Daumé III and Daniel Marcu. Domain adaptation for statistical classifiers. Journal of Artificial Intelligence Research, pages 101–126, 2006.
[3] Sinno J. Pan and Qiang Yang. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.
[4] Dimitris Bertsimas, David B. Brown, and Constantine Caramanis. Theory and applications of robust optimization. SIAM Review, 53(3):464–501, 2011.
[5] Andrew Ng and Stuart Russell. Algorithms for inverse reinforcement learning. In International Conference in Machine Learning, pages 663–670, 2000.
[6] Nate Soares and Benja Fallenstein. Aligning Superintelligence with Human Interests: A Technical Research Agenda. 2014.
[7] Stuart Russell, Daniel Dewey, and Max Tegmark. Research priorities for robust and beneficial artificial intelligence. 2015.
[8] Daniel Dewey, Stuart Russell, and Max Tegmark. A survey of research questions for robust and beneficial AI. 2015.
[9] Paul Christiano. The Steering Problem. 2015.
[10] Paul Christiano. Stable self-improvement as an AI safety problem. 2015.
[11] Luke Muehlhauser. How to study superintelligence strategy. 2014.
[12] Stuart Russell. Of Myths and Moonshine. 2014.
[13] Tom Dietterich and Eric Horvitz. Benefits and Risks of Artificial Intelligence. 2015.
[14] Erik Brynjolfsson and Andrew McAfee. The second machine age: work, progress, and prosperity in a time of brilliant technologies. WW Norton & Company, 2014.
[15] Nick Bostrom (2003). Ethical Issues in Advanced Artificial Intelligence. Cognitive, Emotive and Ethical Aspects of Decision Making in Humans and in Artificial Intelligence.
[16] Nick Bostrom. “The superintelligent will: Motivation and instrumental rationality in advanced artificial agents.” Minds and Machines 22.2 (2012): 71-85.
[17] Stephen M. Omohundro (2008). The Basic AI Drives. Frontiers in Artificial Intelligence and Applications (IOS Press).
[18] Irving J. Good. “Speculations concerning the first ultraintelligent machine.” Advances in computers 6.99 (1965): 31-83.
[19] Eliezer Yudkowsky. “Artificial intelligence as a positive and negative factor in global risk.” Global catastrophic risks 1 (2008): 303.
[20] Paul Christiano. Advisor Games. 2015.