原文: http://www.deeplearningbook.org/contents/intro.html#pf6
Inventors have long dreamed of creating machines that think. Ancient Greek myths tell of intelligent objects, such as animated statues of human beings and tables that arrive full of food and drink when called。
一直以来,发明家们都梦想着能够创造会思考的机器。古老的希腊神话中提到了很多智能的东西,例如栩栩如生的人类雕塑,放满美食、饮料的桌子能随叫随到。
When programmable computers were first conceived, people wondered whetherthey might become intelligent, over a hundred years before one was built (Lovelace,1842). Today, artificial intelligence (AI) is a thriving field with many practical applications and active research topics. We look to intelligent software to automate routine labor, understand speech or images, make diagnoses in medicine and support basic scientific research.
当可编程电脑第一次成为构想时,人们就思考它是否会变得智能,这个问题困扰了人类100年,直到第一个可编程电脑的出现(Lovelace,1842)。今天,人工智能是一个繁荣昌盛的研究领域,有很多实际的应用 ,也有很多活跃的研究课题。我们期望智能软件能够,自动完成日常劳务, 理解语音或者图像,在医学上完成疾病诊断,并且能够支撑基础的科学研究。
In the early days of artificial intelligence, the field rapidly tackled and solved problems that are intellectually diffcult for human beings but relatively straight-forward for computers, problems that can be described by a list of formal, math-ematical rules. The true challenge to artificial intelligence proved to be solving the tasks that are easy for people to perform but hard for people to describe formally。Problems that we solve intuitively, that feel automatic, like recognizing spoken words or faces in images
在人工智能的早期,AI能够快速的解决那些对于人类很复杂,但是对于机器来说很直白的问题,一般对于机器来说,这些问题往往能够被描述为一系列形式化的数学公式。 然而,真正对人工智能有挑战的问题大多都是对人类来说很容易解决,但是人类很难形式化的去描述它。这部分问题我们往往能够很自然的凭直觉去解决,比如识别说话内容,辨别图像中的脸。
This book is about a solution to these more intuitive problems. This solution is to allow computers to learn from experience and understand the world in terms of ahierarchy of concepts, with each concept defined in terms of its relation to simpler concepts. By gathering knowledge from experience, this approach avoids the need for human operators to formally specify all of the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated conceptsby building them out of simpler ones. If we draw a graph showing how these concepts are built on top of each other, the graph is deep, with many layers. For this reason, we call this approach to AI deep learning.
这本书讨论的内容,就是关于如何解决这类对人类来说很直观的问题的方案。这种方案允许计算机从经验中学习,然后它从一系列层次化的概念中去理解这个世界,每一个概念的定义都与一个更简单的概念相关,通过从经验中获得知识,这种方法避免了人类去主动的去给计算机指定他们解决问题所需的知识。这些有层次概念结构使得计算机可以通过首先学习简单的概念,然后逐步建立起更复杂的概念。如果我们画一个流图来展示这些概念是如何在其他概念的基础上建立起来的话,那么这个图将会是深度的,多层的。基于这些原因,我们把这种人工智能的方法叫做深度学习。
Many of the early successes of AI took place in relatively sterile and formal environments and did not require computers to have much knowledge about the world. For example, IBM’s Deep Blue chess-playing system defeated world champion Garry Kasparov in 1997 (Hsu, 2002). Chess is of course a very simple world, containing only sixty-four locations and thirty-two pieces that can move in only rigidly circumscribed ways. Devising a successful chess strategy is a tremendous accomplishment, but the challenge is not due to the difficulty of describing the set of chess pieces and allowable moves to the computer. Chess can be completely described by a very brief list of completely formal rules, easily provided ahead of time by the programmer。
很多早期的人工智能的成功发生在单一、形式化的环境下,不需要计算机对世界有太多的知识。例如,IBM的深蓝象棋系统在1997年(Hsu)击败世界冠军Garry Kasparov。国际象棋本身当然是一个非常单一的环境,仅仅包含64个位置和32个棋子,并且他们的移动方式有严格的规定。设计一个成功的象棋策略是非凡的成就,但是这个设计任务的难点本身不在于描述象棋棋子以及他们可能的移动步伐。象棋完全可以用一个完全形式化的规则列表去描述,而这个份列表是程序员预先提供好的。
Ironically, abstract and formal tasks that are among the most difficult mental undertakings for a human being are among the easiest for a computer. Computers have long been able to defeat even the best human chess player, but are only recently matching some of the abilities of average human beings to recognize objects or speech. A person’s everyday life requires an immense amount of knowledge about the world. Much of this knowledge is subjective and intuitive, and therefore difficult to articulate in a formal way. Computers need to capture this same knowledge in order to behave in an intelligent way. One of the key challenges in artificial intelligence is how to get this informal knowledge into a computer。
具有讽刺意味的是,对于人类来说最难的脑力任务之一就是抽象、形式化任务,然而这对于计算机来说却是最简单的任务之一。即便是最好的人类棋手,计算机一直以来都能够击败。但在语音识别和对象识别这些认知领域,直到最近计算机才能够达到人类的平均水平。一个人的日常生活需要大量的关于世界的知识。这些知识大多是主观的,直观的,因此很难形式化的去清晰地表达。计算机需要获得这些主观的知识来表现的更加智能。人工智能最关键的挑战之一就是如何把这些非形式化的知识转化到计算机里面。
Several artificial intelligence projects have sought to hard-code knowledge about the world in formal languages. A computer can reason about statements in these formal languages automatically using logical inference rules. This is known as the knowledge base approach to artificial intelligence. None of these projects has led to a major success. One of the most famous such projects is Cyc (Lenat and Guha,1989). Cyc is an inference engine and a database of statements in a language called CycL. These statements are entered by a staff of human supervisors. It is an unwieldy process. People struggle to devise formal rules with enough complexity to accurately describe the world. For example, Cyc failed to understand a story about a person named Fred shaving in the morning (Linde, 1992). Its inference engine detected an inconsistency in the story: it knew that people do not have electrical parts, but because Fred was holding an electric razor, it believed the entity “FredWhileShaving” contained electrical parts. It therefore asked whether Fred was still a person while he was shaving
几种人工智能项目试图通过硬编码的方式去形式化的描述这个世界的知识。计算机能够自动的通过逻辑推理规则来推理这些用形式化语言描述的陈述。这被称为 the knowledge base 的人工智能方法。这类方法没有哪个项目取得特别成功。最著名的此类项目之一是Cyc(Lenat和古哈,1989)。Cyc是一个推理引擎以及一些用CycL语言描述的陈述组成的数据库。这些语句是人类再有监督的情况下输入的,这是一个笨拙繁琐的过程。人们难以制定足够形式化的规则来准确地描述世界。例如,Cyc不能理解一个名为Fred的人在早上刮胡子的故事(Linde,1992)。推理引擎检测到故事中的一个不一致的地方:它知道人们是不含有电子部分(自然体),但因为弗雷德有一个电动剃须刀,它认为“FredWhileShaving”这个实体是包含电子部分的(有电动剃须刀)。因此Cyc很疑惑剃须时的Fred是不是仍然是一个人。
The difficulties faced by systems relying on hard-coded knowledge suggest that AI systems need the ability to acquire their own knowledge, by extracting patterns from raw data. This capability is known as machine learning. The introduction of machine learning allowed computers to tackle problems involving knowledge of the real world and make decisions that appear subjective. A simple machine learning algorithm called logistic regression can determine whether to recommend cesarean delivery (Mor-Yosef et al., 1990). A simple machine learning algorithm called naive Bayes can separate legitimate e-mail from spam e-mail.
依赖于知识硬编码的系统面临的问题告诉我们AI系统需要能够自己获得知识,通过从原始数据从获得固存的模式。这种能力被称之为机器学习。机器学习的引入使得计算机能够处理一些涉及真实世界知识的问题,并且能够主观的做决定。一个简单的机器学习算法 逻辑斯地回归 能够决定是否推荐剖腹产(Mor-Yosef et al., 1990)。 一个简单的机器学习算法,朴素贝叶斯算法可以把正规邮件从垃圾邮件里面分离出来。
The performance of these simple machine learning algorithms depends heavily on the representation of the data they are given. For example, when logistic regression is used to recommend cesarean delivery, the AI system does not examine the patient directly. Instead, the doctor tells the system several pieces of relevant information, such as the presence or absence of a uterine scar. Each piece of information included in the representation of the patient is known as a feature.Logistic regression learns how each of these features of the patient correlates with various outcomes. However, it cannot influence the way that the features are defined in any way. If logistic regression was given an MRI scan of the patient, rather than the doctor’s formalized report, it would not be able to make useful predictions. Individual pixels in an MRI scan have negligible correlation with any complications that might occur during delivery
这些简单的机器学习算法的性能很大程度上取决于他们所得到数据的表现形式(representation)。例如当逻辑回归用于剖腹产推荐时,人工智能系统不会直接去检查病人。取而代之的是,医生会告诉这个系统几条相关的信息,例如病人是否有子宫疤痕。representation中的每一条信息被称之为特征。逻辑回归可以从这些病人的特征中学习出特征是如何与结果相关的。然而,他不能影响到这些特征是如何定义的。如果给逻辑回归分类器的是一个MRI扫描的数据,而不是医生的检测报告,那么逻辑回归很难做出有用的预测。MRI中每一个单独的数据与分娩过程中的任何一个并发症的关联都是微乎其微的。
This dependence on representations is a general phenomenon that appears throughout computer science and even daily life. In computer science, operations such as searching a collection of data can proceed exponentially faster if the collection is structured and indexed intelligently. People can easily perform arithmetic on Arabic numerals, but find arithmetic on Roman numerals much more time-consuming. It is not surprising that the choice of representation has an enormous effect on the performance of machine learning algorithms. For a simple visual example, see Fig. 1.1.3
这种对于representation的依赖性在整个计算机科学中是一种普遍的现象,甚至是日常生活中。计算机科学里,例如当我们在一个数据库中做搜索等操作,在更加明智的数据结构以及索引构建方法下,检索速度会呈指数增快。人们可以很容易的在阿拉伯数字上做数学计算,但是在罗马数字上做数据计算显然更耗时。很显然,representation的选择在机器学习算法的性能方面有很大的影响。一个简单的可视化例子,如1.1.3
Many artificial intelligence tasks can be solved by designing the right set of features to extract for that task, then providing these features to a simple machine learning algorithm. For example, a useful feature for speaker identification from sound is an estimate of the size of speaker’s vocal tract. It therefore gives a strong clue as to whether the speaker is a man, woman, or child
首先设计最合适的特征集合,然后把这些特征送到一个简单的机器学习算法中,很多人工智能任务都可以这么解决。例如,对于通过声音辨认说话者来说,说话人的声道的大小是一个很重要的特征。因为它给了一个很强的线索,来判断这个说话者是男是女,是老是少。
However, for many tasks, it is difficult to know what features should be extracted. For example, suppose that we would like to write a program to detect cars in photographs. We know that cars have wheels, so we might like to use the presence of a wheel as a feature. Unfortunately, it is difficult to describe exactly what a wheel looks like in terms of pixel values. A wheel has a simple geometric shape but its image may be complicated by shadows falling on the wheel, the sun glaring off the metal parts of the wheel, the fender of the car or an object in the foreground obscuring part of the wheel, and so on。
但是对于很多任务来说,我们很难知道该提取哪些特征。例如,假如我们需要来写一个检测图片中汽车的程序。我们知道汽车有轮子。因此我们可能会用是否有轮子作为一个特征。但是,从像素值的角度我们很难描述一个轮子究竟长什么样。一个轮子有简单的几何形状,但是他的图片可能十分复杂,因为有影子可能照到轮子上,太阳光分照射在轮子的金属部分,汽车的护栏,或者前景物体 遮挡了轮子的部分,等等。
One solution to this problem is to use machine learning to discover not onlythe mapping from representation to output but also the representation itself.This approach is known as representation learning. Learned representations often result in much better performance than can be obtained with hand-designed representations. They also allow AI systems to rapidly adapt to new tasks, with minimal human intervention. A representation learning algorithm can discover a good set of features for a simple task in minutes, or a complex task in hours to months. Manually designing features for a complex task requires a great deal of human time and effort; it can take decades for an entire community of researchers
对于特征问题,一种解决办法就是利用机器学习的方法去解决。不仅仅是学习从表达到输出结果的映射,同时还学习如何获得表达本身。这种方法就是我们熟知的表达学习 representation learning 。 相比手动设计的特征表达,自动学习的表达常常可以得到一个更加好的性能。他们也允许AI系统更快的适应新的任务,以最少的人工干预。一个表达学习算法可以再几分钟内为一个简单的任务学习一个很好的特征集合,对于复杂的任务,时间可能是几个小时到几个月。但是为复杂任务做人工的特征设计常常要花掉大量的时间和精力。可能需要花掉整个领域的科研人员几十年的时间。
The quintessential example of a representation learning algorithm is the au-toencoder. An autoencoder is the combination of an encoder function that convertsthe input data into a different representation, and a decoder function that convertsthe new representation back into the original format. Autoencoders are trained topreserve as much information as possible when an input is run through the encoderand then the decoder, but are also trained to make the new representation havevarious nice properties. Different kinds of autoencoders aim to achieve differentkinds of properties
最典型的表达学习算法就是autoencoder。一个autoencoder是结合了将输入数据转化为不同的特征表达的编码器与将一个新的特征表达转化为原始的数据格式的解码器。训练autoencoder,一方面是为了尽量的在编码和解码的过程中保留更多的信息,同时也为了获得更多样的更好的性质的特征表达。不同种类的autoencoder为了获得不同的性质。
When designing features or algorithms for learning features, our goal is usually to separate the factors of variation that explain the observed data. In this context,we use the word “factors” simply to refer to separate sources of influence; the factors are usually not combined by multiplication. Such factors are often not quantities that are directly observed. Instead, they may exist either as unobserved objects or unobserved forces in the physical world that affect observable quantities. Theymay also exist as constructs in the human mind that provide useful simplifying explanations or inferred causes of the observed data. They can be thought of asconcepts or abstractions that help us make sense of the rich variability in the data
当我们设计特征或者设计学习特征的算法时,我们的目标是一般都是把能解释我们观测的数据结果差别的 “差别因子” 从中分离出来。在这里,我们用 “因子”这个词来简单的指代不同的差别影响源;因子之间一般都不是简单的相乘叠加。这些因子一般不是能直接观测到的量。 相反,他们要么是以看不到的物体的形式,要么是以看不到的力的形式,来影响可观测的量。 这些因子也可能仅仅是存在于人类思维中的某种结构,他们提供了对原始数据的一种有效的简化解释或者一种因果上的推断。他们可以被认为是一种概念或者一种抽象化的事物,来帮助我们理解数据中丰富的变化。
When analyzing a speech recording, the factors of variation include the speaker’sage, their sex, their accent and the words that they are speaking. When analyzingan image of a car, the factors of variation include the position of the car, its color,and the angle and brightness of the sun.
当我们分析一个语音记录时,差别因子包含说话者的年纪,性别,口音和他们所说的话。当分析一个汽车图片时,差别因子包括 汽车的位置,颜色,观察角度以及太阳的亮度。
A major source of diffculty in many real-world artificial intelligence applications is that many of the factors of variation influence every single piece of data we are able to observe. The individual pixels in an image of a red car might be very close to black at night. The shape of the car’s silhouette depends on the viewing angle.Most applications require us to disentangle the factors of variation and discard the ones that we do not care about.
很多现实世界人工智能应用的一个主要的困难来源就在于差别因子影响着每一个我们可以观测的数据。一个红色汽车图像中的某个独立像素在晚上看可能是黑色,汽车剪影的形状很大程度上取决于观测的角度。大多数应用要求我们把这些差别因子分解开来,然后遗弃掉那些我们不关心的。
Of course, it can be very difficult to extract such high-level, abstract featuresfrom raw data. Many of these factors of variation, such as a speaker’s accent,can be identified only using sophisticated, nearly human-level understanding ofthe data. When it is nearly as difficult to obtain a representation as to solve theoriginal problem, representation learning does not, at first glance, seem to help us.
当然,要丛原始数据中提取这些高层次的抽象的特征是非常难的。很多差别因子,例如说话者的口音只能通过经验判断,接近于人类认知层面上对于数据的理解。当获得特征表达的问题的难度已经接近于原始问题本身时,凭直觉来说,表达学习似乎已经帮不上啥忙了。
Deep learning solves this central problem in representation learning by introducing representations that are expressed in terms of other, simpler representations.Deep learning allows the computer to build complex concepts out of simpler con-cepts. Fig. 1.2 shows how a deep learning system can represent the concept of animage of a person by combining simpler concepts, such as corners and contours,which are in turn defined in terms of edges.
通过引入由许多更简单的浅层表达组合得到高层表达,深度学习解决了表达学习这个中心问题。深度学习允许我们 通过简单的概念建立更复杂的概念。表1.2展示了一个深度学习系统如何通过结合简单的概念,例如角点,连通域表达出一个人脸的图像。而这些角点,连通域的概念是有边缘像素来定义的。
The quintessential example of a deep learning model is the feedforward deepnetwork or multilayer perceptron (MLP). A multilayer perceptron is just a mathe-matical function mapping some set of input values to output values. The function is formed by composing many simpler functions. We can think of each applicationof a different mathematical function as providing a new representation of the input
深度学习模型的一个典型例子是前馈深度网络,或者说多层感知器(MLP)。MLP只是一个把输入映射到输出的数学函数。这个函数是由很多个更简单的函数组合而成。我们可以认为每一个不同的数学公式的作用就是为输入数据提供了一种新的特征表达。
The idea of learning the right representation for the data provides one perspective on deep learning. Another perspective on deep learning is that it allows the computer to learn a multi-step computer program. Each layer of the representation can be thought of as the state of the computer’s memory after executing another set of instructions in parallel. Networks with greater depth can execute more in-structions in sequence. Being able to execute instructions sequentially offers greatpower because later instructions can refer back to the results of earlier instructions.According to this view of deep learning, not all of the information in a layer’s representation of the input necessarily encodes factors of variation that explainthe input. The representation is also used to store state information that helps toexecute a program that can make sense of the input. This state information couldbe analogous to a counter or pointer in a traditional computer program. It hasnothing to do with the content of the input specifically, but it helps the model to organize its processing
学习数据正确的特征表达是看待深度学习的一种观点。关于深度学习另一种观点就是使得计算机能够学习一个多步的计算机程序。在执行完一些并行的数据指令之后,每层representation可以被认为是当前电脑内存的一种状态。网络的层数越多,网络越深,它就可以顺序的执行更多的指令。这种能够按顺序执行指令的能力使得深度网络更加强大,因为后面层的指令可以参照前面指令的计算结果。依据这种深度学习的观点,并不是所有的层的特征表达都一定编码那些解释输入数据的“差别因子”。有一部分也会用来存储那些能够帮助理解输入数据的状态信息。这些状态信息类似于传统程序中的计数器,指针。他与输入数据的内容没啥关系,但是能够帮助模型本身组织他当前的进程。
There are two main ways of measuring the depth of a model. The first view isbased on the number of sequential instructions that must be executed to evaluatethe architecture. We can think of this as the length of the longest path througha flow chart that describes how to compute each of the model’s outputs givenits inputs. Just as two equivalent computer programs will have different lengthsdepending on which language the program is written in, the same function may bedrawn as a flowchart with different depths depending on which functions we allowto be used as individual steps in the flowchart. Fig. 1.3 illustrates how this choiceof language can give two different measurements for the same architecture
评价一个模型的深度,又要看两点。第一,就是评测这个模型时,我们需要执行的顺序指令的数目。我们可以把这个看做是 给定输入的情况下,计算输出的流程图中最长的路径有多长。正如 给定两个相同的程序,不同的编程语言会有不同的长度,流程图中条相同的步骤也可能会有不同的长度,这取决于我们我们用什么函数。图1.3解释了同一个框架,不同的语言下,给出了不同的结构。
Another approach, used by deep probabilistic models, regards the depth of a model as being not the depth of the computational graph but the depth of thegraph describing how concepts are related to each other. In this case, the depthof the flowchart of the computations needed to compute the representation ofeach concept may be much deeper than the graph of the concepts themselves.This is because the system’s understanding of the simpler concepts can be refinedgiven information about the more complex concepts. For example, an AI systemobserving an image of a face with one eye in shadow may initially only see one eye.After detecting that a face is present, it can then infer that a second eye is probably present as well. In this case, the graph of concepts only includes two layers—alayer for eyes and a layer for faces—but the graph of computations includes 2nlayers if we refine our estimate of each concept given the other n times.
另外一种评价模型深度的方法,大多用在的深度概率模型中,认为模型的深度不是计算流图的深度,而是描述模型的概念关系的流图的深度。这种情况下,用来计算每个概念的特征表达的计算流图的深度可能会比概念流图本身的深度 更深。这是因为,如果给定了关于复杂概念的相关信息,系统对于简单概念的理解将会更加精细。举例子来说,有一张人脸的图像,其中一个眼睛在阴影下,当一个AI系统观测这么一张图时,智能检测到一个眼睛。在检测到有一张人脸后,他就能推测出也许还有另外一个眼睛存在。这种情况下,概念流图只包含两层---一层是眼睛,一层是脸---但是对于计算流图来说,如果我们再把所给的概念精细化n次,计算流图就包含2n层,
Because it is not always clear which of these two views—the depth of thecomputational graph, or the depth of the probabilistic modeling graph—is most relevant, and because different people choose different sets of smallest elements from which to construct their graphs, there is no single correct value for the depth of an architecture, just as there is no single correct value for the length of a computer program. Nor is there a consensus about how much depth a model requires to qualify as “deep.” However, deep learning can safely be regarded as the study of models that either involve a greater amount of composition of learned functions or learned concepts than traditional machine learning does
正是因为我们并不知道这两种观点哪个更接近真实情况---究竟是计算流图的深度,还是概率模型(概念流图)的深度。同时,不同的人在构建他们的流图时选择的最小单元集合也不同,因此对于一个架构的深度,我们没有一个确定正确的值,也没有一个一致定论说什么样的模型才能被称为“深度”模型。然而,这样的问题并不会影响我们,深度学习可以被认为是一种比传统的机器学习更加复杂的学科,它的模型学习过程包含更多学习函数、更多需要学习的概念。
To summarize, deep learning, the subject of this book, is an approach to AI.Specifically, it is a type of machine learning, a technique that allows computersystems to improve with experience and data. According to the authors of thisbook, machine learning is the only viable approach to building AI systems thatcan operate in complicated, real-world environments. Deep learning is a particularkind of machine learning that achieves great power and flexibility by learning torepresent the world as a nested hierarchy of concepts, with each concept defined inrelation to simpler concepts, and more abstract representations computed in termsof less abstract ones. Fig. 1.4 illustrates the relationship between these differentAI disciplines. Fig. 1.5 gives a high-level schematic of how each works
总结下来,这本书的主题---深度学习是一种实现人工智能的途径。展开说,它是一种机器学习的方法,一种允许计算机系统随着经验和数据改变性能的技术。从笔者的角度来看,机器学习是唯一可行的来实现能够在复杂真实环境中运作的人工智能系统的途径。尤其是深度学习,它是那种高效灵活的,能够通过学习将世界以一个嵌套层次化的概念的形式展现出来,每一个概念都与一个更简单的概念相关,更抽象的概念可以被更具体一点的概念计算出来。 图1.4阐述了这些AI法则之间的不同关系。图1.5给出了阐述他是如何工作的高层的图解