《(1997)Machine Learning [CMU+T.M. Mitchell] 》读书笔记 - 第二章

/* ******* ******* ******* ******* ******* ******* ******* */ [*******] /* ******* ******* ******* ******* ******* ******* ******* */

<第一章> 网址: [ http://blog.csdn.net/qiqiduan_077/article/details/50021499 ]

START_DATE:2015-12-04,END_DATE: -。

/* ******* ******* ******* ******* ******* ******* ******* */ [*******] /* ******* ******* ******* ******* ******* ******* ******* */


本章节主要介绍两个基本的ML概念(即concept learninggeneral-to-specific ordering)和两个基本的ML理论算法(即FIND-SCANDIDATE-ELIMINATION)。对我个人而言,阅读第二章的过程是比较痛苦和无聊的,实际上一共阅读了两次(第一次“走马观花”,只看了一半;第二次“似懂非懂”,只能从头再来。归根到底,也许是自己*功力不足*、*定力不够*所导致的吧)。


1.第二章·CONCEPT LEARNING AND THE GENERAL-TO-SPECIFIC ORDERING

A. 什么是“Learning”? 

"The problem of inducing general functions from specific training examples is central to learning."

实际上,第一章已经对这个难以准确回答的问题进行了探讨,但是还不够深入、透彻。因而,第二章首先对这一重要主题进行了更为细化地解读(通俗的说,“Learning”意味着:至少能够做到“举一反三”)。


B.如何理解ML专业词汇"concept learning"? 

(1). "Concept learning: acquiring the definition of a general category given a sample of positive and negative training examples of the category."
(2). "Concept learning: inferring a boolean-valued function from training examples of its input and output."
(3). "Concept learning can be formulated as a problem of searching through a predefined space of potential hypotheses for the hypothesis that best fits the training examples."
(4). "In general, any concept learning task can be described by the set of instances over which the target function is defined, the target function, the set of candidate hypotheses considered by the learner, and the set of available training examples [NOTE that the concept or function to be learned is called the target concept]."

以上的4段英文内容都是对“concept learning”这一概念进行直观地解释和说明。我个人更喜欢第2个解释(因为其将“Learning”极端简化成了"Yes or No" 这样简单的判断问题,虽然这种做法有过于简化之嫌)。说明“Learning”这一抽象概念,最好的举例对象恐怕就是*婴儿的学习过程*吧。


“Concept learning can be viewed as the task of searchingthrough a large space of hypotheses implicitly defined by the hypothesis representation. The goal of this search is to find the hypothesis that best fits the training examples. It is important to note that by selecting a hypothesis representation, the designer of the learning algorithm implicitly defines the space of all hypotheses that the program can ever represent and therefore can ever learn. In many cases this search can be efficiently organized by taking advantage ofa naturally occurring structure over the hypothesis space—a general-to-specific ordering of hypotheses."

在第一章,已经简要地说明了ML与optimization(对应上文中的“the task of searching”)之间的关系。本章节再次从“concept learning”的角度说明“optimization”的普遍性和重要性。这也*痛苦地*意味着,迟早需要拾起<极限、可导、收敛、微积分>等概念和应用!


C."Inductive learning algorithms"背后的假设?

"The inductive learning hypothesis: any hypothesis found to approximate the target function well over a sufficiently large set of training examples will also approximate the target function well over other unobserved examples."

该假设在第一章中已经有所提及(the crucial assumption)。总而言之,其基本假设虽然异常简单,但它具有根本的解释力

"By taking advantage of this naturally occurring structure over the hypothesis space,we can design learning algorithms that exhaustively search even infinite hypothesis spaces without explicitly enumerating every hypothesis."

优秀的搜索算法设计一个巧妙之处在于:不用蛮力(enumerating every hypothesis),就能快速地找到最优解或者近似最优解(optimal or near-optimal solutions)。蛮力法虽然简单粗暴,但在一些问题上不一定效果就比其他的方法差。但是对于ML领域大多数问题来说,蛮力法会在巨大的搜索空间面前显得苍白无力!搜索算法设计的实质在于:如何获取这份巧妙?也许,最速下降法是这方面最经典的例子之一!


D. V.S. ?

两个假设(hypothesis)之间可以存在集合关系(例如,包含与被包含),也就是“more general than“ v.s. "more specific than"。但并不是任何两个假设之间一定存在这种关系(partial ordering)。此外,多个假设之间可能组成层次结构hierarchy)。书中给出了一个浅显易懂的例子 [阅读第二章即可发现]。以上的事实为ML理论算法的设计提供了指导。


E."FIND-S"算法很简单?

"One way is to begin with the most specific possible hypothesis in H, then generalize this hypothesis each time it fails to cover an observed positive training example. In fact, the FIND-S algorithm simply ignores every negative example! The search moves from hypothesis to hypothesis, searching from the most specific to progressively more general hypotheses along one chain of the partial ordering.FIND-S is guaranteed to output the most specific hypothesis within H that is consistent with the positive training examples."

FIND-S算法 [FINDING A MAXIMALLY SPECIFIC HYPOTHESIS] 作为一个理论算法而存在,有其自身独特的学术价值。实际上,许多具有实际应用价值的ML算法都可以看做是对其进行扩展 [以后的章节可能会涉及到]。不过,FIND-S算法真的很简单,简单到没有直接的应用价值!


"We wouldprefer a learning algorithm that could determine whether it had converged and, if not, at least characterize its uncertainty regarding the true identity of the target concept."

对于ML算法进行分析和设计,人们有自己的选择偏向(preference)。个人感觉: 这种选择偏向多是建立在实践和历史沉淀的基础之上。


F."CANDIDATE-ELIMINATION"算法更有趣?

(1). "Thekey idea in the candidate-elimination algorithm is to output a description of the set of all hypotheses consistent with the training examples."

(2). "The candidate-elimination algorithm providesa useful conceptual framework for introducing several fundamental issues in ML."

(3). "A hypothesis is consistent with the training examples if it correctly classifies these examples." "Theversion space, denoted VS(H, D), with respect to hypothesis space H and training examples D, is the subset of hypotheses from H consistent with the training examples in D."

(4). "The LAST-THEN-ELIMINATE algorithm first initializes the version space to contain all hypotheses in H, then eliminates any hypothesis found inconsistent with any training example." "It has many advantages, including the fact that it is guaranteed to output all hypotheses consistent with the training data. 

Unfortunately, it requires exhaustively enumerating all hypotheses in H—an unrealistic requirement for all but the most trivial hypothesis spaces."

(5). "The Candidate-Elimination algorithm employs a much more compact representation of the version space. The version space is represented by its most general and least general members. These members form general and specific boundary sets that delimit the version space within the partially ordered hypothesis space."

(6). "The CANDIDATE-ELIMINATION algorithmrepresents the version space by storing only its most general members (labeled G) and its most specific (labeled S)."

(7). "Version space representation theorem:the version space is precisely the set of hypotheses contained in G, plus those contained in S, plus those that lie between G and S in the partially ordered hypothesis space." "positive training examples may force the S boundary of the version space to become increasingly general. Negative training examples play the complimentary role of forcing the G boundary to become increasingly specific."

(8). "the S boundary of the version space forms a summary of the previously encountered positive examples that can be used to determine whether any given hypothesis is consistent with these examples. Any hypothesis more general than S will, by definition, cover any example that S covers and thus will cover any past positive example. In a dual fashion, the G boundary summarizes the information from previously encountered negative examples. Any hypothesis more specific than G is assured to be consistent with past negative examples."

相对于FIND-S算法,Candidate-Elimination算法(见书中的TABLE 2.5)要更有趣、有料,主要在于其利用假设之间存在的层次结构,设计了一个巧妙的搜索机制,以避免了穷尽搜索。上面的第8段英文给出了具体的解释和说明[非常巧妙的设计!!!可惜在第二次阅读才感受到,还好在第二次阅读感受到了]。


NOTE [启示]:层次结构(例如,树状形式等)常常在组织、综合信息等活动中发挥着独特的应用价值。


G.如何设计、改进训练样本(training samples)?

“In general, the optimal query strategy for a concept leaner is to generate instances that satisfy exactly half the hypotheses in the current version space. Because those instances whose classification is most ambiguous are precisely the instances whose true classification would provide the most new information for refining the version space."

理想情况下 [ "(1) there are no errors in the training examples, and (2) there is some hypothesis in H that correctly describes the target concept" ]——只针对Candidate-Elimination算法来说,其内在的层次结构,使得对训练样本进行设计、改进成为可能。尽管这种理想情况*在现实世界中*几乎不存在,但是*在理论的世界中*却很重要!


H.什么是"INDUCTIVE BIAS"?

(1). "How does the size of this hypothesis space influence the ability of the algorithm to generalize to unobserved instances? How does the size of the hypothesis space influence the number of training examples that must be observed? These arefundamental questions for inductive inference in general."

(2). "A fundamental property of inductive inference: a leaner that makes no a priori assumptions regarding the identity of the target concept hasno rational basis for classifying any unseen instances."

(3). "One advantage of viewing inductive inference systems in terms of their inductive bias is that it provides a non-procedural means of characterizing their policy for generalizing beyond the observed data. A second advantage is that itallows comparison of different learners according to the strength of the inductive bias they employ."

(4). "Inductive learning algorithms are able to classify unseen examples only because of their implicit inductive bias for selecting one consistent hypothesis over anotherAn unbiased learner cannot make inductive leaps to classify unseen examples."

NOTE: 幸运的是,“Bias”这一ML核心概念,在以后的章节中还会不断地接触到。这里暂时按下不表!诚实的说,初次接触这一概念的时候,疑惑了好久、好久(好在“熟能生巧”!)。


结束语: 纠结地啃完*第二章*这块硬骨头,更期待*第三章*的内容!

你可能感兴趣的:(机器学习,[ML])