The Science of Pattern Recognition
Achievements and Perspectives
Robert P.W. Duin1 and El˙zbieta P_ ekalska2
1 ICT group, Faculty of Electr. Eng., Mathematics and Computer Science
Delft University of Technology, The Netherlands
2 School of Computer Science, University of Manchester, United Kingdom
Summary. Automatic pattern recognition is usually considered as an engineering area which focusses on the development and evaluation of systems that imitate or assist humans in their ability of recognizing patterns. It may, however, also be considered as a science that studies the faculty of human beings (and possibly other biological systems) to discover, distinguish, characterize patterns in their environment and accordingly identify new observations. The engineering approach to pattern recognition is in this view an attempt to build systems that simulate this phenomenon. By doing that, scientific understanding is gained of what is needed in order to recognize patterns, in general.
自动模式识别通常被认为是这样的一个工程领域:专注于开发和评价模仿或辅助人类识别模式能力的系统,但是也可能被认为是这样的一门科学:学习人类(或其它生物系统)在所处环境中发现、区别和找出特征从而标识出观察结果的本领。模式识别中工程的观点是试图建立模拟生物识别能力的系统,通过工程中的实践,总的来说,科学上的理解在模式识别中的技术需求方面得到了发展。
Like in any science understanding can be built from different, sometimes even opposite viewpoints. We will therefore introduce the main approaches to the science of pattern recognition as two dichotomies of complementary scenarios. They give rise to four different schools, roughly defined under the terms of expert systems, neural networks, structural pattern recognition and statistical pattern recognition.
象任何科学一样,对模式识别的理解能够从不同方向来建立,有时甚至是相反的观点。我们将介绍模式识别科学中的主要方法,即两种不同方向且各有两个不同种类的技术,这些技术产生了四个不同学派,粗略地可以定义为:专家系统,神经网络,结构模式识别和统计模式识别。
We will briefly describe what has been achieved by these schools, what is common and what is specific, which limitations are encountered and which perspectives arise for the future. Finally, we will focus on the challenges facing pattern recognition in the decennia to come. They mainly deal with weaker assumptions of the models to make the corresponding procedures for learning and recognition wider applicable. In addition, new formalisms need to be developed.
我们将简要地描述这四个学派的发展成果,它们之间的相同点及不同点,它们各自碰到的局限性及未来发展的展望。最后,我们再来看模式识别在未来几十年所面临的挑战,这个挑战主要是解决在学习和识别更大范围适用性时所碰到的为建立相应处理的模型的脆弱问题。再有就是需要发展新的模式识别形式。
1 Introduction
1 介绍
We are very familiar with the human ability of pattern recognition. Since our early years we have been able to recognize voices, faces, animals, fruits or inanimate objects. Before the speaking faculty is developed, an object like a ball is recognized, even if it barely resembles the balls seen before. So, except for the memory, the skills of abstraction and generalization are essential to find our way in the world. In later years we are able to deal with much more complex patterns that may not directly be based on sensorial observations.
对于人类的识别能力我们是非常熟悉的。因为我们在早些年就已经会开发识别声音、脸、动物、水果或简单不动的东西的技术了。在开发出说话技术之前,一个象球的东西,甚至看上去只是象个球,就已经可以被识别出来了。所以除了记忆,抽象和推广能力是推进模式识别技术的关键技术。最近几年我们已可以处理更复杂的模式,这种模式可能不是直接基于通过感知器观察出来的。
For example, we can observe the underlying theme in a discussion or subtle patterns in human relations. The latter may become apparent, e.g. only by listening to somebody’s complaints about his personal problems at work that again occur in a completely new job. Without a direct participation in the
events, we are able to see both analogy and similarity in examples as complex as social interaction between people. Here, we learn to distinguish the pattern from just two examples.
例如,我们能够观察发现某个讨论会的中心议题或人与人之间关系的微妙的模式。后面一种模式是可能可以被明显观察到,例如倾听某人在新的工作中因人际关系问题而产生的抱怨,我们不用切身其中就能够发现这种相似和相同的例子,其复杂性莫过于人与人之间的社会相互影响。这里我们要学会区分只是从两个例子中得到的模式。
The pattern recognition ability may also be found in other biological systems:the cat knows the way home, the dog recognizes his boss from the footsteps or the bee finds the delicious flower. In these examples a direct connection can be made to sensory experiences. Memory alone is insufficient; an important role is that of generalization from observations which are similar,although not identical to the previous ones. A scientific challenge is to find out how this may work.
模式识别的能力也可以在其它生物中被发现到:猫可以知道回家的路,狗能够识别主人的脚印,蜜蜂会发现它要采蜜的花。这些例子中每一个直接联结都是通过感观来实现的。不只是记忆方面,推广能力是重要的一方面,从观察到的相似事物中,虽然前后不一样,也能够进行识别,发现动物是怎么做到这一点是一个科学挑战。
Scientific questions may be approached by building models and, more explicitly, by creating simulators, i.e. artificial systems that roughly exhibit the same phenomenon as the object under study. Understanding will be gained while constructing such a system and evaluating it with respect to the real object. Such systems may be used to replace the original ones and may even improve some of their properties. On the other hand, they may also perform worse in other aspects. For instance, planes fly faster than birds but are far from being autonomous. We should realize, however, that what is studied in this case may not be the bird itself, but more importantly, the ability to fly.
科学问题可以通过建立模型来解决,更确切的说是建立模拟器,例如人工系统通过学习来粗略地展示具有相同功能的东西,在建立这个系统和取得真实对象相关参数的过程中获得得了对这个事物的理解,这样的系统可以替换原来的对象,甚至可以提高原来的性能,但在其它方面可能是更差。例如,飞机可以飞得比鸟快,但在智能方面却远远不如鸟,然而,我们的研究不是为了达到跟鸟全部一样,更重要的是飞行能力。
Much can be learned about flying in an attempt to imitate the bird, but also when differentiating from its exact behavior or appearance. By constructing fixed wings instead of freely movable ones, the insight in how to fly grows.
通过模仿鸟的飞行可以学到很多飞行方面的技术,但无法学到其精确的分辨能力。通过建立固定不动的翅膀,而不是自由扇动的翅膀,我们知道了怎么飞行。
Finally, there are engineering aspects that may gradually deviate from the original scientific question. These are concerned with how to fly for a long time, with heavy loads, or by making less noise, and slowly shift the point of attention to other domains of knowledge.
最后,存在希望逐渐从原来的科学问题中引申出来的工程技术,如在重载下怎么飞得更长时间,怎么减少噪音,慢慢地把注意点转移到其它的知识领域。
The above shows that a distinction can be made between the scientific study of pattern recognition as the ability to abstract and generalize from observations and the applied technical area of the design of artificial pattern recognition devices without neglecting the fact that they may highly profit from each other. Note that patterns can be distinguished on many levels,starting from simple characteristics of structural elements like strokes, through features of an individual towards a set of qualities in a group of individuals,to a composite of traits of concepts and their possible generalizations. A pattern may also denote a single individual as a representative for its population, model or concept. Pattern recognition deals, therefore, with patterns, regularities,characteristics or qualities that can be discussed on a low level of sensory measurements (such as pixels in an image) as well as on a high level of the derived and meaningful concepts (such as faces in images). In this work, we will focus on the scientific aspects, i.e. what we know about the way pattern recognition works and, especially, what can be learned from our attempts to build artificial recognition devices.
上面表明,模式识别(源于观察的抽象和归纳能力)科学研究和应用技术领域中的人工智能模式识别设备设计存在差别,后者不会放过任何相互间互利的因素。注意这里所说的模式在很多层次上是有区分的,就如结构元素的简单特征(如笔画),体现了从在一组个体中表示某一个性质集的个体特征,到综合概念和归纳的特征。一个模式可能表示成一个单独个体,如某个总体、模型或概念的表示。结合模式、规律、特征或性质,模式识别所做的事可以说是在感观测定的低层次上(如图像的象素),也可以说是在推理和有意义概念的高层层次上(如图像中的人脸)。这里,我们注重在科学研究方面,如模式识别的实现途径是什么,特别是我们在建立人工识别设备需要具备什么技术。
A number of authors have already discussed the science of pattern recognition based on their simulation and modeling attempts. One of the first, in the beginning of the sixties, was Sayre [64], who presented a philosophical study on perception, pattern recognition and classification. He made clear that classification is a task that can be fulfilled with some success, but recognition either happens or not. We can stimulate the recognition by focussing on some aspects of the question. Although we cannot set out to fully recognize an individual, we can at least start to classify objects on demand. The way Sayre distinguishes between recognition and classification is related to the two subfields discussed in traditional texts on pattern recognition, namely unsupervised and supervised learning. They fulfill two complementary tasks. They act as automatic tools in the hand of a scientist who sets out to find the regularities in nature.
已经有些人在讨论基于模拟和建模尝试的模式识别科学了。在开始的六十年里,其中有个叫Sayre的人做了关于感知器、模式识别和分类的哲学研究,他断言分类方法在某些程度上可以被成功实现,但或许也会失败。根据问题的一些情况我们可以进行模拟识别。虽然我们不能完全识别某个个体,但是我们至少可以根据需要把对象分类出来。识别和分类的Sayre区分方法跟模式识别的两个传统的学习方法有关:无监督学习和有监督学习,这个两个方法可以实现识别和分类方法,科学家利用这个自动化工具来发现自然界中的规律。
Unsupervised learning (also related to exploratory analysis or cluster analysis) gives the scientist an automatic system to indicate the presence of yet unspecified patterns (regularities) in the observations. They have to be confirmed (verified) by him. Here, in the terms of Sayre, a pattern is recognized.
无监督学习(也称为试探性分析或聚类分析):这个方法给研究者一种在观察中自动表示未确定模式(规律)方法,通过这种方法模式种类被确定(检验)了下来,依此,根据Sayre观点,一个模式就可以被被识别出来了。
Supervised learning is an automatic system that verifies (confirms)the patterns described by the scientist based on a representation defined by him. This is done by an automatic classification followed by an evaluation.
有监督学习:是这样的一个自动系统,检验(确定)已被研究者通过一种表示方法定义好了的模式,这就是通过评估来实现的自动分类方法。
In spite of Sayre’s discussion, the concepts of pattern recognition and classification are still frequently mixed up. In our discussion, classification is a significant component of the pattern recognition system, but unsupervised learning may also play a role there. Typically, such a system is first presented with a set of known objects, the training set, in some convenient representation. Learning relies on finding the data descriptions such that the system can correctly characterize, identify or classify novel examples. After appropriate preprocessing and adaptations, various mechanisms are employed to train the entire system well. Numerous models and techniques are used and their performances are evaluated and compared by suitable criteria. If the final goal is prediction, the findings are validated by applying the best model to unseen data. If the final goal is characterization, the findings may be validated by complexity of organization (relations between objects) as well as by interpretability of the results.
尽管Sayre已做了相关论述,但是模式识别和分类的概念还是经常被混起来。我们认为,分类是模式识别系统的一个重要组成部分,但是无监督学习也可能可以实现一样的功能。典型的如:一个最初以已知对象集(训练集)得到的智能系统,这些对象以某种方便的方式来表示,学习过程依赖于发现对系统的数据描述,使该系统可以正确地表达、标识或分类出不同的例子。经过适当的预处理和适应性修改后,各种训练方法就可被很好地用到训练整个系统上,有许多的模型和技术也可以被用上,且它们的性能有相应的标准来进行评估和比较,如果最后的目标是可以预测的,则最后得到的系统可以通过把最佳模型应用到新数据来检验,如果最后的目标是可以被描述的,则最后得到系统可以通过综合检验,就象对结果进行解释说明一样。
Fig. 1 shows the three main stages of pattern recognition systems: Representation, Generalization and Evaluation, and an intermediate stage of Adaptation[20]. The system is trained and evaluated by a set of examples, the Design Set. The components are:
图1显示了模式识别系统的三个主要阶段:表示、推广和评估,还有一个中间过程是适配。这个系统通过一个设计样本集(Design Set)来训练和评估。每个组成部分分别描述如下:
• Design Set. It is used both for training and validating the system. Given the background knowledge, this set has to be chosen such that it is representative for the set of objects to be recognized by the trained system.There are various approaches how to split it into suitable subsets for training,validation and testing. See e.g. [22, 32, 62, 77] for details.
设计样本集:用于训练和检验识别系统。用于训练的样本被选择时必须是典型的对象。有各种不同的方法可以把样本集分成合适的子集以用于训练、检验和测试,可以看附录[22,32,62,77]中的详细介绍。
• Representation. Real world objects have to be represented in a formal way in order to be analyzed and compared by mechanical means such as a computer. Moreover, the observations derived from the sensors or other formal representations have to be integrated with the existing, explicitly formulated knowledge either on the objects themselves or on the class they may belong to. The issue of representation is an essential aspect of pattern recognition and is different from classification. It largely influences the success of the stages to come.
表示:真实世界中的对象得用一种合适的方法来表示,以利于被象计算机这样的机器工具来分析和比较。此外,不管是用于识别对象本身还是所从属的类别,从感应器或其它形式化表示方法中提取出来的观察结果也得和现存的形式化的知识相结合。表示的问题是模式识别的要点,且不同于分类,它会大大影响识别的成功率。
• Adaptation. It is an intermediate stage between Representation and Generalization,in which representations, learning methodology or problem statement are adapted or extended in order to enhance the final recognition.This step may be neglected as being transparent, but its role is essential.It may reduce or simplify the representation, or it may enrich it by emphasizing particular aspects, e.g. by a nonlinear transformation of features that simplifies the next stage. Background knowledge may appropriately be (re)formulated and incorporated into a representation. If needed, additional representations may be considered to reflect other aspects of the problem. Exploratory data analysis (unsupervised learning) may be used to guide the choice of suitable learning strategies.
适配:这是个中间阶段,介于表示和推广之间,在表示方法中,学习方法或问题表示形式被适应性地修改或扩展以提高最后的识别能力。这个阶段也可以被忽略,当作是透明的,但它的地位是重要的,它可以减少或简化表示方法,或通过特定方法使得表示方式更灵活,例如通过非线性变换来简化下个阶段的处理过程。背景知识可以适当地被形式化和组合成一种表示方法。如果需要,可以考虑加入其它的表示方法来反映其它问题形式。实验数据分析(无监督学习)可以被用来指导选择合适的学习策略。
• Generalization or Inference. In this stage we learn a concept from a training set, the set of known and appropriately represented examples, in such a way that predictions can be made on some unknown properties of new examples. We either generalize towards a concept or infer a set of general rules that describe the qualities of the training data. The most common property is the class or pattern it belongs to, which is the above mentioned classification task.
推广或推断:在这个阶段我们从一个训练集(已知的、以某种表示形式表示的对象集)中学会一个概念,据此就可以用来预测新对象的未知属性。我们既可以从一个概念进行推广也可以从一组描述训练数据性质的一般性规则中进行推断。找出属性最为相似的类别或模式,这个类别或模式便是所要的结果,这就是上面所提到的分类方法。
• Evaluation. In this stage we estimate how our system performs on known training and validation data while training the entire system. If the results are unsatisfactory, then the previous steps have to be reconsidered.
评估:这个阶段我们通过已知的训练和检验数据来评估训练出来的系统性能。如果评估结果不令人满意,则前面的步骤就得重新考虑设计或调整。
Different disciplines emphasize or just exclusively study different parts of this system. For instance, perception and computer vision deal mainly with the representation aspects [21], while books on artificial neural networks [62],machine learning [4, 53] and pattern classification [15] are usually restricted to generalization. It should be noted that these and other studies with the words “pattern” and “recognition” in the title often almost entirely neglect the issue of representation. We think, however, that the main goal of the field of pattern recognition is to study generalization in relation to representation[20].
识别系统的不同部分分别应用到不同的方法技术,如对于感知器和计算机视觉技术主要是应用于表示部分,而人工神经网络、机器视觉、模式分类则与推广技术紧密相关。要注意的是在这里和其它以“模式”和“识别”为题的学术中经常把表示的问题忽略掉,然而,我们认为:模式识别领域的主要目标是研究与表示方法相联系的推广能力。
In the context of representations, and especially images, generalization has been thoroughly studied by Grenander [36]. What is very specific and worthwhile is that he deals with infinite representations (say, unsampled images),thereby avoiding the frequently returning discussions on dimensionality and directly focussing on a high, abstract level of pattern learning. We like to mention two other scientists that present very general discussions on the pattern recognition system: Watanabe [75] and Goldfarb [31, 32]. They both emphasize the structural approach to pattern recognition that we will discuss later on. Here objects are represented in a form that focusses on their structure.A generalization over such structural representations is very difficult if one aims to learn the concept, i.e. the underlying, often implicit definition of a pattern class that is able to generate possible realizations. Goldfarb argues that traditionally used numeric representations are inadequate and that an entirely new, structural representation is necessary. We judge his research program as very ambitious, as he wants to learn the (generalized) structure of the concept from the structures of the examples. He thereby aims to make explicit what usually stays implicit. We admit that a way like his has to be followed if one ever wishes to reach more in concept learning than the ability to name the right class with a high probability, without having built a proper understanding.
在表示上下文上,特别是图像方面,推广能力已经被Granander充分研究透了,特别值得一提的是他解决了表示范围不限(如未样本化图像)的处理问题,因此避免了经常由此而产生的维度问题,从而可以直接专注于模式学习的高层次和抽象层次。对模式识别系统进行归纳性的讨论另两位科学家是:Watanabe和Goldfarb,他们都侧重于结构模式识别方法,我们在后面会提到这个方法,他们都强调要把识别对象进行结构化表示,如果抛开结构化表示,从学习概念入手进行推广是十分困难的,例如那些可能可以被实现的模式分类,但模式的定义却是无法明确表达的。Goldfarb提出传统上使用数字表示方法是不够的,采用一个全新的结构表示方法是必要的,我们觉得他要从样本结构中学习概念的结构(具有推广性的结构)是非常困难的,Goldfarb因此把目标转向把模糊的东西清晰化,我们承认这种做法在这种的情况下是需要的:不去建立一个合适的理解模型,但却想在概念学习上比通过概率来正确分类达到更好的效果。
In the next section we will discuss and relate well-known general scientific approaches to the specific field of pattern recognition. In particular, we like to point out how these approaches differ due to fundamental differences in the scientific points of view from which they arise. As a consequence, they are often studied in different traditions based on different paradigms. We will try to clarify the underlying cause for the pattern recognition field. In the following sections we sketch some perspectives for pattern recognition and define a number of specific challenges.
下一节我们将讨论和叙述在模式识别领域中众所周知的科学方法,从各自被提出的不同的基本科学观点上,我们将详细地指出这些方法的区别,正因如此,它们常是基于不同模式上在不同的传统领域里被研究,我们将尝试分清模式识别领域中的重要依据。在下一节我们将勾画出模式识别前景和一些具体要克服的问题。
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
2 Four Approaches to Pattern Recognition
2 模式识别四种方法
In science, new knowledge is phrased in terms of existing knowledge. The starting point of this process is set by generally accepted evident views, or observations and facts that cannot be explained further. These foundations,however, are not the same for all researchers. Different types of approaches may be distinguished originating from different starting positions. It is almost a type of taste from which perspective a particular researcher begins. As a consequence, different ‘schools’ may arise. The point of view, however, determines what we see. In other words, staying within a particular framework of thought we cannot achieve more than what is derived as a consequence of the corresponding assumptions and constraints. To create more complete and objective methods, we may try to integrate scientific results originating from different approaches into a single pattern recognition model. It is possible that confusion arises on how these results may be combined and where they essentially differ. But the combination of results of different approaches may also appear to be fruitful, not only for some applications, but also for the scientific understanding of the researcher that broadens the horizon of allowable starting points. This step towards a unified or integrated view is very important in science as only then a more complete understanding is gained or a whole theory is built.
对于科学,新的知识是从已有的知识发展出来的。这个过程的起始点是来源于一般可让人接受的、显而易见的观点,或无法被进一步解释的观察结果和因素。然而这些创建过程不同的研究者有不同的过程。从最初的观点可以区别出各种不同的方法类型,这几乎成了发现某个研究者的研究起点的方法。这样便导致了不同派别的产生。然而,不同的看问题的角度决定了我们对问题的理解,换句话说,在某个思想的框架下,我们只能从相应的假设和约束去推理。如果要建立更全面和客观的方法,我们可以尝试把来源于不同方法的科学成果集成到一个模式识别模型中,不过,在集成方法和区别方法上有可能会产生混淆。但是综合应用各种方法也有可能看上去是很有用的,不仅是对于一些应用,对于研究者的科学理解也是很有益处的,帮助他们从更宽的角度来研究问题,这个方法便是统一或集成的观点,这种观点在科学研究中非常重要,可以让你得到更全面的理解或建立一个完整的理论。
Below we will describe four approaches to pattern recognition which arise from two different dichotomies of the starting points. Next, we will present some examples illustrating the difficulties of their possible interactions. This discussion is based on earlier publications [16, 17].
下面我们将描述从两种不同出发点而区分出来的四种模式识别方法。后面我们还将举些例子说明这四个方法要相互交互应用的困难。
2.1 Platonic and Aristotelian Viewpoints
2.1 柏拉图和亚里士多德观点
Two principally different approaches to almost any scientific field rely on the so-called Platonic and Aristotelian viewpoints. In a first attempt they may be understood as top-down and bottom-up ways of building knowledge. They are also related to deductive (or holistic) and inductive (or reductionistic) principles. These aspects will be discussed in Section 4.
几乎所有的科学领域主要都是通过这两个途径来进行研究的:柏拉图和亚里士多德观点。首先可以分别把它们理解成从顶到下和从底到上的建立知识的方法。它们又分别一个跟演译推理(或从整体上研究)有关,另一个跟归纳推理(或从重现的角度)有关。这些问题将在第四小节会有介绍。
The Platonic approach starts from generally accepted concepts and global ideas of the world. They constitute a coherent picture in which many details are undefined. The primary task of the Platonic researcher is to recognize in his observations the underlying concepts and ideas that are already accepted by him. Many theories of the creation of the universe or the world rely on this scenario. An example is the drifts of the continents or the extinction of the mammoths. These theories do not result from a reasoning based on observations, but merely from a more or less convincing global theory (depending on the listener!) that seems to extrapolate far beyond the hard facts. For the Platonic researcher, however, it is not an extrapolation, but an adaptation of previous formulations of the theory to new facts. That is the way this approach works: existing ideas that have been used for a long time are gradually adapted to new incoming observations. The change does not rely on an essential paradigm shift in the concept, but on finding better, more appropriate relations with the observed world in definitions and explanations. The essence of the theory has been constant for a long time. So, in practise the Platonic researcher starts from a theory which can be stratified into to a number of hypotheses that can be tested. Observations are collected to test these hypotheses and, finally, if the results are positive, the theory is confirmed.
柏拉图方法以普遍可以被人接受的概念和公理为出发点,建立一个许多未被定义的具有逻辑连贯性的科学描述。柏拉图式科学研究者的主要工作是基于可以被接受的概念和方法来认识所观察到的事物。许多宇宙或世界的理论建立都是依赖于这个途径。这样的例子有大陆漂移说和孔龙灭绝说,这些理论不是通过观察来证明的,只是根据一个或多或少让人信服(依赖于不同人的理解)的理论,这个理论似乎是远超脱于那些不变的客观因素的推断。然而,对于柏拉图式研究者,这不是一个总结归纳过程,而是一个针对新因素做理论上公式形式的演译。这个方法的过程是这样的:依据已存在的理论,这些理论且并被应用很长时间了,在不断新的观察中这些理论逐渐被做适应性的修改,这种变化不是概念上的本质转换,而是在定义和解释的角度上,寻找与所观察到的世界更好更适合的关联。理论的基础已经在很长的时间内是稳定不变了,所以,在实践中柏拉图式研究者开始于这样的一个理论:这个理论可进行层次化,形成一些可以被检验的假设。收集观察到的事物,对假设进行检验,最后,如果得到的结果是正面的,则这个理论被确认了下来。
The observations are of primary interest in the Aristotelian approach. Scientific reasoning stays as closely as possible to them. It is avoided to speculate on large, global theories that go beyond the facts. The observations are always the foundation on which the researcher builds his knowledge. Based on them,patterns and regularities are detected or discovered, which are used to formulate some tentative hypotheses. These are further explored in order to arrive at general conclusions or theories. As such, the theories are not global, nor do they constitute high level descriptions. A famous guideline here is the socalled Occam’s razor principle that urges one to avoid theories that are more complex than strictly needed for explaining the observations. Arguments may arise, however, since the definition of complexity depends, e.g. on the mathematical formalism that is used.
观察在亚里士多德方法中起了主要作用。科学理论尽可能地与观察紧密相联系。这个方法躲避产生大的全局性的超脱于观察依据的理论。观察总是研究者建立他的理论的基础。根据观察,模式和规律被检测或发现出来,并且被用于证明一些试探性的假设。更进一步地,便可以达到一般性结论或理论。这样,得到的理论既不是全局性的,也不能用于建立高层次的表达。这里有一个著名的Occam剃刀原理:尽力避免产生超出解释观察所严格需要的更为复杂的理论。然而,对此可能会产生争议,因为对于复杂理论的定义是需要的,例如需要依赖于应用精确的形式描述。
The choice for a particular approach may be a matter of preference or determined by non-scientific grounds, such as upbringing. Nobody can judge what the basic truth is for somebody else. Against the Aristotelians may be held that they do not see the overall picture. The Platonic researchers, on the other hand, may be blamed for building castles in the air. Discussions between followers of these two approaches can be painful as well as fruitful.They may not see that their ground truths are different, leading to pointless debates. What is more important is the fact that they may become inspired by each other’s views. One may finally see real world examples of his concepts,while the other may embrace a concept that summarizes, or constitutes an abstraction of his observations.
对于一个特定途径的选择可能是一个优先选择的问题,或取决于非科学因素,如教育背景。没有人能够判断对于其他人来说什么是基本真理。相反地,亚里士多德式研究出来的理论可能无法说明事物的全局性的问题,另一方面,柏拉图式研究者可能被埋怨在建立空中楼阁。在二者之间进行有效地评判是件痛苦的事。他们可能会不明白二者的基本出发点是不同的,这样会导致没有结果的争论。重要的是二者之间可以互相启发。其中一方可能最终发现他的理论的实证,而另一方可能包含了这个理论,这个理论是对他所观察事物的总结或抽象。
2.2 Internal and the External Observations
2.2 内在的和外在的观察
In the contemporary view science is ‘the observation, identification, description,experimental investigation, and theoretical explanation of phenomena’or ‘any system of knowledge that is concerned with the physical world and its phenomena and that entails unbiased observations and systematic experimentation. So, the aspect of observation that leads to a possible formation of a concept or theory is very important. Consequently, the research topic of the science of pattern recognition, which aims at the generalization from observations for knowledge building, is indeed scientific. Science is in the end a brief explanation summarizing the observations achieved through abstraction and their generalization.
根据现代的观点,科学就是“观察,鉴定,描述,试验性研究和对现象的理论上解释”,或者是“跟物理世界及物理世界的现象有关系的任何知识体系,且其必须是源于无偏见的观察和系统性的试验。”所以,可能引导一个概念或理论形成的观察是非常重要的。因此,以从观察中得到一般性法则来构建科学知识为目标,这样的模式识别科学研究方法才具有真正的科学性。科学最终目标是为了概括性地简要地解释所观察到的现象,这是通过抽象和一般性推广来达到的。
Such an explanation may primarily be observed by the researcher in his own thinking. Pattern recognition research can thereby be performed by introspection. The researcher inspects himself how he generalizes from observations. The basis of this generalization is constituted by the primary observations. This may be an entire object (‘I just see that it is an apple’)or its attributes (‘it is an apple because of its color and shape’). We can also observe pattern recognition in action by observing other human beings(or animals) while they perform a pattern recognition task, e.g. when they recognize an apple. Now the researcher tries to find out by experiments and measurements how the subject decides for an apple on the basis of the stimuli presented to the senses. He thereby builds a model of the subject, from senses to decision making.
科学解释可能可以主要通过研究者自己的思维被观察到。由此模式识别可以通过自省的方式来进行研究。研究者反省自己怎样通过观察来得到理论的推广。建立推广的基础是源于对事物的观察。这可能是一个事物的整体(“我只明白它是一个苹果”)或是它的属性(“它是一个苹果,是因为它的颜色和形状象苹果”)。当其他人(或动物)在做诸如模式识别行为时,例如当他们在辨认一个苹果时,我也可以通过观察他们的行为来研究模式识别。这时研究者通过试验和数据试图发现是通过感观刺激怎样来决定它是一个苹果。于是他建立了跟这个目的有关的模型,即从感知到下决定的识别模型。
Both approaches result into a model. In the external approach, however,the senses may be included in the model. In the internal approach, this is either not possible or just very partially. We are usually not aware of what happens in our senses. Introspection thereby starts by what they offer to our thinking(and reasoning). As a consequence, models based on the internal approach have to be externally equipped with (artificial) senses, i.e. with sensors.
外在和内在的两种途径最后都是建立一个模型。在外在的途径中,无论如何感知是可能被包含在模型中。在内在的途径中,这不仅不可能而且也是十分局限性的。我们通常无法通过我们的感观来感知到事物的变化。从而通过内省发现哪些是有助于我们作判断的。由此,基于内在的途径来建立的模型必须配上外在的(人工)感知,例如感知器。
2.3 The Four Approaches
2.3 四种模式识别方法
The following four approaches can be distinguished by combining the two
dichotomies presented above:
下面四种方法可以通过上面所提到的柏拉图和亚里士多德观点把它们合并成两类来区别出来:
(1) Introspection by a Platonic viewpoint: object modeling.
(2) Introspection by an Aristotelian viewpoint: generalization.
(3) Extrospection by an Aristotelian viewpoint: system modeling.
(4) Extrospection by a Platonic viewpoint: concept modeling.
(1)柏拉图式内省:对象建模。
(2)亚里士多德式内省:推广。
(3)亚里士多德式外省:系统建模。
(4)柏拉图式外省:概念建模。
These four approaches will now be discussed separately. We will identify some
known procedures and techniques that may be related to these. See also Fig. 2.
现在来分别讨论这四种方法。我们将列出跟这四个方法有关的大家所熟知的过程和技术。如图2所示。
Object modeling. This is based on introspection from a Platonic viewpoint.The researcher thereby starts from global ideas on how pattern recognition systems may work and tries to verify them in his own thinking and reasoning.He thereby may find, for instance, that particular color and shape descriptions of an object are sufficient for him to classify it as an apple. More generally, he may discover that he uses particular reasoning rules operating on a fixed set of possible observations. The so-called syntactic and structural approaches to pattern recognition [26] thereby belong to this area, as well as the case-based reasoning [3]. There are two important problems in this domain: how to constitute the general concept of a class from individual object descriptions and how to connect particular human qualitative observations such as ‘sharp edge’or ‘egg shaped’ with physical sensor measurements.
对象建模:这是基于柏拉图观点的内省形式。研究者从能使模式识别系统工作起来的全局思路出发,设法检验他自己的思路和理论哪些是有用的。比如,他可能会发现用颜色和形状判断苹果就已经足够了。更一般地,他可能发现他可以用特定的规则对鉴别一组固定的观察到的事物。所谓的句法规则和结构模式识别就是属于这样的类型,即基于用例推理。在这方面有两个重要的问题:一个是怎么从个体对象描述中建立一个具有一般性意义的一个种类的概念,另一个是怎么把人对事物的感观认识(如“锐利边缘”或“蛋形状”)和物理感应器的度量联系起来。
Generalization. Let us leave the Platonic viewpoint and consider a researcher who starts from observations, but still relies on introspection. He wonders what he should do with just a set of observations without any framework.An important point is the nature of observations. Qualitative observations such as ‘round’, ‘egg-shaped’ or ‘gold colored’ can be judged as recognitions in themselves based on low-level outcomes of senses. It is difficult to neglect them and to access the outcomes of senses directly. One possibility for him is to use artificial senses, i.e. of sensors, which will produce quantitative descriptions. The next problem, however, is how to generalize from such numerical outcomes. The physiological process is internally unaccessible. A researcher who wonders how he himself generalizes from low level observations given by numbers may rely on statistics. This approach thereby includes the area of statistical pattern recognition.If we consider low-level inputs that are not numerical, but expressed in attributed observations as ‘red, egg-shaped’, then the generalization may be based on logical or grammatical inference. As soon, however, as the structure of objects or attributes is not generated from the observations, but derived (postulated) from a formal global description of the application knowledge,e.g. by using graph matching, the approach is effectively top-down and thereby starts from object or concept modeling.
推广:让我们先不考虑柏拉图模式,来看一个以观察为研究出发点但仍以依靠内省形式的研究者。在没有任何框架下,他对所得到的一组观察无从下手。一个重要点是观察的角度。可度量观察,诸如“圆形”、“蛋形”或“金黄色”,这些都是可以在低层次上的感知直接判断到。对于他来说,一个可能的办法是通过使用人工感知设备,如感应器,它可以得到可度量的描述。生理上的处理过程是内在的,令人难以明白的。研究者不明白为什么自己可以从几个低层次的观察中就可得到推广,他可能要依赖统计的方法,这个方法包括统计模式识别领域。如果我们考虑低层次的非数据输入,只表达成如“红,蛋形”这样的观察结果,于是这种推广可能是要基于逻辑和语法推广。然而对象或属性的结构一旦不是从观察中得到,而(假定)是从应用知识的全局描述中继承出来,例如运用图像匹配方法,那么这种方法实际上是自顶向下的方法,属于对象或概念建模类型。
System modeling. We now leave the internal platform and concentrate on research that is based on the external study of the pattern recognition abilities of humans and animals or their brains and senses. If this is done in a bottom-up way, the Aristotelian approach, then we are in the area of low level modeling of senses, nerves and possibly brains. These models are based on the physical and physiological knowledge of cells and the proteins and minerals that constitute them. Senses themselves usually do not directly generalize from observations. They may be constructed, however, in such a way that this process is strongly favored on a higher level. For instance, the way the eye (and the retina, in particular) is constructed, is advantageous for the detection of edges and movements as well as for finding interesting details in a global, overall picture. The area of vision thereby profits from this approach. It is studied how nerves process the signals they receive from the senses on a level close to the brain. Somehow this is combined towards a generalization of what is observed by the senses. Models of systems of multiple nerves are called neural networks. They appeared to have a good generalization ability and are thereby also used in technical pattern recognition applications in which the physiological origin is not relevant [4, 62].
系统建模:我们现在走出内在的体系方法,集中研究人和动物或他们头脑和感官产生模式识别能力的外在学习方法。如果采用自底向上的方法,即亚里士多德方法,我们便是在感官、神经和头脑这样低层次上建模的领域里。这些模型是基于细胞和蛋白质和组成它们的矿物质的物理和生理知识。感官本身通常不能直接从观察中得到结果,可能要进行构建,然而这种处理过程总是在高层次进行。例如,眼睛(确切地说是视网膜)辨认事物方法是通过边缘和运动检测,从全局(整个画面)来发现感兴趣的细节信息。视觉领域的研究便是得益于这种方法,通过研究神经如何处理从感应器官收到的信号,接近于对人脑的研究。多个神经的系统建模称为神经网络,他们有很好的推广能力,也被用在了与生理学无关的模式识别应用技术中[4,62]。
Concept modeling. In the external platform, the observations in the starting point are replaced by ideas and concepts. Here one still tries to externally model the given pattern recognition systems, but now in a top-down manner.
概念建模:属于外在的体系方法,以理论和概念为出发点,而不是所观察到的事物。这里仍然以外在建模方式来建立模式识别系统,但是这里是从顶向下的方法。
An example is the field of expert systems: by interviewing experts in a particular pattern recognition task, it is attempted to investigate what rules they use and in what way they are using observations. Also belief networks and probabilistic networks belong to this area as far as they are defined by experts and not learned from observations. This approach can be distinguished from the above system modeling by the fact that it is in no way attempted to model a physical or physiological system in a realistic way. The building blocks are the ideas, concepts and rules, as they live in the mind of the researcher. They are adapted to the application by external inspection of an expert, e.g. by interviewing him. If this is done by the researcher internally by introspection,we have closed the circle and are back to what we have called object modeling,as the individual observations are our internal starting point. We admit that the difference between the two Platonic approaches is minor here (in contrast to the physiological level) as we can also try to interview ourselves to create an objective (!) model of our own concept definitions.
专家系统是这方面的例子:通过在特定的模式识别任务中研究专家的方法,研究他们所用的规则,研究他们怎么运用观察到的事物。信心网络和概率网络被专家设定,而不是从观察事物中得到,它们也属于概念建模方法。概念建模和系统建模的区别在于概念建模不会模仿现实事物而去建立物理或生理模型系统。概念建模建立在研究者头脑中的方法、概念和原则。通过外在地考察某个专家(如跟专家交谈)来建立应用系统。如果内在的自省式研究者用了这个方法,则我们接近形成了一个循环,回到前面所讲的对象建模,即以个体的观察事物为建模出发点。我们承认两个柏拉图方法的区别在这里区别是很小的(相对于生理学的层次),即我们也可以尝试通过内省来建立我们自己定义的概念的一个对象模型。
2.4 Examples of Interaction
2.4 四种方法交叉运用的例子
The four presented approaches are four ways to study the science of pattern recognition. Resulting knowledge is valid for those who share the same starting point. If the results are used for building artificial pattern recognition devices, then there is, of course, no reason to restrict oneself to a particular approach. Any model that works well may be considered. There are, however,certain difficulties in combining different approaches. These may be caused by differences in culture, assumptions or targets. We will present two examples,one for each of the two dichotomies.
上面介绍的四种方法是研究模式识别科学的四种途径。根据不同的出发点区别出了这四种方法。如果要建立一个人工模式识别设备,是不一定限制一定要用某一种方法的,任何方法模型都可能可以被用上。然而,困难的是怎么去综合运用这些方法,可能是因为不同的情况、假设或目标需要这样地去做。对这四种方法的两个大类我们将举两个例子来说明。
Artificial neural networks constitute an alternative technique to be used for generalization within the area of statistical pattern recognition. It has taken, however, almost ten years since their introduction around 1985 before neural networks were fully acknowledged in this field. In that period, the neural network community suffered from lack of knowledge on the competing classification procedures. One of the basic misunderstandings in the pattern recognition field was caused by its dominating paradigm stating that learning systems should never be larger than strictly necessary, following the Occam’s razor principle. It could have not been understood how largely oversized systems such as neural networks would have ever been able to generalize without adapting to peculiarities in the data (the so-called overtraining). At the same time, it was evident in the neural network community that the larger neural network the larger its flexibility, following the analogy that a brain with many neurons would perform better in learning than a brain with a few ones. When this contradiction was finally solved (an example of Kuhn’s paradigm shifts [48]), the area of statistical pattern recognition was enriched with a new set of tools. Moreover, some principles were formulated towards understanding of pattern recognition that otherwise would have only been found with great difficulties.
人工神经网络技术在统计模式识别领域中的推广能力上成为了一个替代技术。然而这个技术从1985年被介绍出来到被完全接受已花了十年时间。在这十年里,研究神经网络的人因缺少竞争分类方法知识而受挫折。在模式识别领域中有一个让引起误解的主流观点:学习系统不要比限定的需求更复杂,需遵从奥克母剃刀原则。这个原则让人无法明白系统要做到多大才能不用去适配数据的特殊点(所谓过学习)就能具有推广能力,如神经网络系统的大小。与此同时,在神经网络中可以被证明的是神经网络越大则适应性越强,这是依据这样的推理:具有较多神经的脑子比具有较少神经的脑子学习能力要更好。当这个矛盾最后被解决后(这是一个库恩范式转移的例子),统计模式识别领域才被广泛应用了起来。此外,已有一些原理可以被用来形式化地理解模式识别,其它的原理则很难被理解。
In general, it may be expected that the internal approach profits from the results in the external world. It is possible that thinking, the way we generalize from observations, changes after it is established how this works in nature.For instance, once we have learned how a specific expert solves his problems,this may be used more generally and thereby becomes a rule in structural pattern recognition. The external platform may thereby be used to enrich the internal one.
通常情况下,可能会被认为内在的方法是得益于对外部世界的推理结论。可能会这样认为:我们从观察中推广得到的方法在实际运用时会发生改变。例如,一旦我们知道一个专家怎么去解决他的问题,于是可以把他的方法更一般化,结果形成结构模式识别中的一条规则。外在的方法可以被用于完善内在方法。
A direct formal fertilization between the Platonic and Aristotelian approaches is more difficult to achieve. Individual researchers may build some understanding from studying each other’s insights, and thereby become mutually inspired. The Platonist may become aware of realizations of his ideas and concepts. The Aristotelian may see some possible generalizations of the observations he collected. It is, however, still one of the major challenges in science to formalize this process.
把柏拉图和亚里士多德方法从形式上直接联系起来是很难达到的,但是研究者个人可以从互相交流中得到一些理解,并互相得到启发。柏拉图派的人可能知道他的理论和概念的实现方法。亚里士多德派的人可能从他收集到的观察中得到推广理论,然而,在科学上形式化这个过程还是一个主要的挑战性工作。
How should existing knowledge be formulated such that it can be enriched by new observations? Everybody who tries to do this directly encounters the problem that observations may be used to reduce uncertainty (e.g. by the parameter estimation in a model), but that it is very difficult to formalize uncertainty in existing knowledge. Here we encounter a fundamental ‘paradox’for a researcher summarizing his findings after years of observations and studies: he has found some answers, but almost always he has also generated more new questions. Growing knowledge comes with more questions. In any formal system, however, in which we manage to incorporate uncertainty(which is already very difficult), this uncertainty will be reduced after having incorporating some observations.We need an automatic hypothesis generation in order to generate new questions. How should the most likely ones be determined? We need to look from different perspectives in order to stimulate the creative process and bring sufficient inspiration and novelty to hypothesis generation. This is necessary in order to make a step towards building a complete theory. This, however, results in the computational complexity mentioned in the literature [60] when the Platonic structural approach to pattern recognition has to be integrated with the Aristotelian statistical approach.
怎么把现有的知识形式化,这样可以通过新观察的到数据来加以完善?每位想直接这样做的人都碰到这样的问题:通过观察数据可能可以减少不可靠性(例如在建模中的参数估计方法),但是在现有的知识体系中进行形式化非确定性问题是非常困难的。这里我们碰到一个研究者在总结他多年观察和研究得到的一个基本“缪论”:他找到了一些答案,但是几乎同时他又总是碰到新问题,得到的知识越多,产生的疑问也越多。然而,在任何正式系统中,我们可以设法引入不可靠性(但这是非常困难的),这个不可靠性在加入一些观察数据后会得到减少。我们需要一个自动化的假设产生方法来发现新问题。怎么去决定哪个最好呢?我需要从不同的角度看问题,以此来模拟这样的创造性处理过程并产生富有灵感和新奇的假设。这对于逐步建立一个完整的理论是需要的。然而,在附录[60]所引文章中提到:当柏拉图式的结构模式识别方法要集成亚里士多德统计模式识别方法时要考虑到计算复杂度问题。
The same problem may also be phrased differently: how can we express the uncertainty in higher level knowledge in such a way that it may be changed (upgraded) by low level observations? Knowledge is very often structural and has thereby a qualitative nature. On the lowest level, however, observations are often treated as quantities, certainly in automatic systems equipped with physical sensors. And here the Platonic – Aristotelian polarity meets the internal– external polarity: by crossing the border between concepts and observations we also encounter the border between qualitative symbolic descriptions and quantitative measurements.
同样的问题也可以以这样的不同方式来表示:怎样用更高层次的知识来表示不确定性并且可以通过低层次的观察来改变(或升级)?知识通常是具有结构的形式和对自然界定性的描述。然而,在最低层次配备了物理感应器的自动化系统中,观察数据通常是被量化了的数据。这里柏拉图---亚里士多德方法两个极端对应内在---外在两个极端:从概念方法到观察数据方法之间进行转变也相应会碰到从定性符号描述到定量测定的转变
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3 Achievements
3.成就
In this section we will sketch in broad terms the state of the art in building systems for generalization and recognition. In practical applications it is not the primary goal to study the way of bridging the gap between observations and concepts in a scientific perspective. Still, we can learn a lot from the heuristic solutions that are created to assist the human analyst performing a recognition task. There are many systems that directly try to imitate the decision making process of a human expert, such as an operator guarding a chemical process, an inspector supervising the quality of industrial production or a medical doctor deriving a diagnosis from a series of medical tests. On the basis of systematic interviews the decision making can become explicit and imitated by a computer program: an expert system [54]. The possibility to improve such a system by learning from examples is usually very limited and restricted to logical inference that makes the rules as general as possible, and the estimation of the thresholds on observations. The latter is needed as the human expert is not always able to define exactly what he means, e.g. by ‘an unusually high temperature’.
在这节里我们将主要介绍推广和识别方面的系统构建的发展状况。在实际应用中,并不是主要研究在科学方法上如何把观察数据和概念联系起来,但是我们仍然可以从分析人类识别过程中得到很多启发。有很多系统直接地模仿专家的决策过程,如确保化学处理的正确操作,工业产品质量的监督,从一系列医学检查报告的病情诊断。分析这些决策的目标是要能够被计算机程序清楚表达和模仿:专家系统。从用例学习中来提高这个系统的可靠性,通常受限于和取决于运用逻辑推理,但逻辑推理能够使得规则尽可能地具有推广性,还有对观察数据的阀值估计也是很难的,专家所表达的意思总是无法被精确表示,例如“一个非常高的温度”。
In order to relate knowledge to observations, which are measurements in automatic systems, it is often needed to relate knowledge uncertainty to imprecise, noisy, or generally invalid measurements. Several frameworks have been developed to this end, e.g. fuzzy systems [74], Bayesian belief networks[42] and reasoning under uncertainty [82]. Characteristic for these approaches is that the given knowledge is already structured and needs explicitly defined parameters of uncertainty. New observations may adapt these parameters by relating them to observational frequencies. The knowledge structure is not learned; it has to be given and is hard to modify. An essential problem is that the variability of the external observations may be probabilistic, but the uncertainty in knowledge is based on ‘belief’ or ‘fuzzy’ definitions. Combining them in a single mathematical framework is disputable [39].
在观察数据方面,这些数据从自动系统中测量得到,这里经常要考虑到数据不可靠性,如数据不精确,存在噪音,或者测量方法有问题。已有几个较为系统的方法解决了这些问题,如模糊理论,贝叶斯置信网络和不确定性推理,这些方法的特点是这些理论已自成体系,需要明确定义与不确定性有关的参数,通过观察发现的频率可以解决这些参数问题。这种识别方法是不需要学习的,方法一经确定下来就难以被修改。一个本质的问题是外在观察的变数是随机的,但是观察数据不可靠性却是用“概率”或“模糊”来表示,把它们组合到一个单一的数学框架下来实现是不太可能的。
In the above approaches either the general knowledge or the concept underlying a class of observations is directly modeled. In structural pattern recognition [26, 65] the starting point is the description of the structure of a single object. This can be done in several ways, e.g. by strings, contour descriptions, time sequences or other order-dependent data. Grammars that are inferred from a collection of strings are the basis of a syntactical approach to pattern recognition [26]. The incorporation of probabilities, e.g. needed for modeling the measurement noise, is not straightforward. Another possibility is the use of graphs. This is in fact already a reduction since objects are decomposed into highlights or landmarks, possibly given by attributes and also their relations, which may be attributed as well. Inferring a language from graphs is already much more difficult than from strings. Consequently, the generalization from a set of objects to a class is usually done by finding typical examples, prototypes, followed by graph matching [5, 78] for classifying new objects.
上面所谈到的方法,不管是一般化的知识还是对所观察到的数据进行概念性定义都是直接建模形式。而结构模式识别则起始于对单一对象的结构描述。这些描述形式有句子,轮廓描述,时序或其它有序的数据,从收集到的句子中进行推理得到的文法是运用上下文进行模式识别的基础。运用概率方法(因为测量中噪音的存在)并不是直接的建模方法。另一个可能用到的方法是运用图像比较,但自从采用把识别对象分解成各个特征或有意义的区域后这个方法实际上已较少被采用,而是采用图像的特征和特征之间的关系,这些关系也能一样地被用于识别中。从图像中发现语言种类比从句子中发现语言种类要困难得多。因此,从一个类别的一组对象进行推广通常被用来识别典型对象或原型,对于新的对象则采用图像匹配的方法来识别。
Generalization in structural pattern recognition is not straightforward. It is often based on the comparison of object descriptions using the entire available training set (the nearest neighbor rule) or a selected subset (the nearest prototype rule). Application knowledge is needed for defining the representation(strings, graphs) as well as for the dissimilarity measure to perform graph matching [51, 7]. The generalization may rely on an analysis of the matrix of dissimilarities, used to determine prototypes. More advanced techniques using the dissimilarity matrix will be described later.
结构模式识别中的推广无法直接进行,经常在可用整个训练集(采用最邻近法则)或其中一个子集(采用最邻近原型法则)采用对象描述比较方法。应用技术中需要确定表示方法(句子,图像),例如图像匹配时怎么确定图像间的不同点。推广可能依赖于对相异矩阵的分析,以此来确定原型。对于相异矩阵的更深层次应用将在后面介绍。
The 1-Nearest-Neighbor Rule (1-NN) is the simplest and most natural classification rule. It should always be used as a reference. It has a good asymptotic performance for metric measures [10, 14], not worse than twice the Bayes error, i.e. the lowest error possible. It works well in practice for finite training sets. Fig. 3 shows how it performs on the Iris data set in comparison to the linear and quadratic classifiers based on the assumption of normal distributions [27]. Thek-NN rule, based on a class majority vote over the k nearest neighbors in the training set, is, like the Parzen classifier, even Bayes consistent. These classifiers approximate the Bayes error for increasing training sets [14, 27].
1-最邻近法则(1-NN)是最简单和最为自然的分类法则。这个方法应当会总被考虑到。在距离度量中具有很好的渐近性能,不会超过最低贝叶斯错误率的两倍。在有限训练集中这个方法效果很好。如图3则是这个方法在Iris数据集中的应用结果图,并且跟线性二次分类方法作了比较,这个图假定是在一般条件下测试得到。K-NN是训练集中根据K维最邻近原则来判定的分类法则,如Parzen分类器,甚至跟贝叶斯方法具有一致性。这些分类器在下图中随着测试集增大而其贝叶斯错误率被估算出来。
However, such results heavily rely on the assumption that the training examples are identically and independently drawn (iid) from the same distribution as the future objects to be tested. This assumption of a fixed and stationary distribution is very strong, but it yields the best possible classifier. There are, however, other reasons, why it cannot be claimed that pattern recognition is solved by these statistical tools. The 1-NN and k-NN rules have to store the entire training set. The solution is thereby based on a comparison with all possible examples, including ones that are very similar, and asymptotically identical to the new objects to be recognized. By this, a class or
a concept is not learned, as the decision relies on memorizing all possible instances. There is simply no generalization.
然而,上面的结果严重依赖于这样的假设:训练用例是在相同属性上被测试时具有独立同分布性(iid)。这个固定不变的属性假设是十分必要的,它是分类器最好效果的保证。然而,这个也是为什么不能认为模式识别是统计分析工具应用的原因。1-NN和k-NN规则需要保存整个训练集,通过比较所有可能样例来进行识别,甚至十分相似的样例也要进行比较,渐近地识别出新的对象。这种方法中不用学习类别和概念,分类方法依赖于所保存的所有实例,这种方法简单但不具有推广性。
Other classification procedures, giving rise to two learning curves shown in Fig. 3, are based on specific model assumptions. The classifiers may perform well when the assumptions hold and may entirely fail, otherwise. An important observation is that models used in statistical learning procedures have almost necessarily a statistical formulation. Human knowledge, however, certainly in daily life, has almost nothing to do with statistics. Perhaps it is hidden in the human learning process, but it is not explicitly available in the context of human recognition. As a result, there is a need to look for effective model assumptions that are not phrased in statistical terms.
图3中的另两条学习曲线所表示的两个分类方法是基于特定的模型假设,在满足所定假设情况下分类器性能表现不错,否则可能是相反的结果。一个重要的方面是这个模型是运用统计学习的方法,大都必须拥有一个统计公式。然而人类日常生活中的认知是几乎不会运用统计方法的,也许这是隐藏在人类的学习过程中,但在人类对上下文识别中确实是没用到的。因此,有必有去寻找不用统计方法来进行有效识别的方法。
In Fig. 3 we can see that a more complex quadratic classifier performs initially worse than the other ones, but it behaves similarly to a simple linear classifier for large training sets. In general, complex problems may be better solved by complex procedures. This is illustrated in Fig. 4, in which the resulting error curves are shown as functions of complexity and training size.
从图3我们可以看出更复杂的二次分类器在开始时性能比其它要差,但在大训练集中类似于线性分类器。一般情况下复杂的问题用复杂的方法来识别效果较好,这个可以在图4中得到说明,图4中的曲线表示错误率跟功能复杂度和训练集大小的关系。
Like in Fig. 3, small training sets require simple classifiers. Larger training sets may be used to train more complex classifiers, but the error will increase, if pushed too far. This is a well-known and frequently studied phenomenon in relation to the dimensionality (complexity) of the problem. Objects described by many features often rely on complex classifiers, which may thereby lead to worse results if the number of training examples is insufficient. This is the curse of dimensionality, also known as the Rao’s paradox or the peaking phenomenon [44, 45]. It is caused by the fact that the classifiers badly generalize,due to a poor estimation of their parameters or their focus/adaptation to the noisy information or irrelevant details in the data. The same phenomenon can be observed while training complex neural networks without taking proper precautions. As a result, they will adapt to accidental data configurations, hence they willovertrain. This phenomenon is also well known outside the pattern recognition field. For instance, it is one of the reasons one has to be careful with extensive mass screening in health care: the more diseases and their relations are considered (the more complex the task), the more people will we be unnecessarily sent to hospitals for further examinations.
由图3可知对于小的训练集可用较为简单的分类器,训练集越大则分类器可能需要越复杂,但是如果太大了则错误率会随之上升,这是众所周知的,即与经常被研究的问题维度(复杂度)有关。由于分类器的复杂,识别对象经常用很多特征来描述,如果测试用例没有足够多可能会导致更坏的结果,这就是所谓的维数灾难,也称为Rao悖论或峰值现象。这是由于分类器的推广性差造成的,这是由于对参数估计的偏差或考虑到数据中的噪音或不相关信息所产生的。如果不去预防这种情况的产生,训练复杂的神经网络也会出现这种现象。由此,为了去适配那些特殊而意外的数据导致产生了过学习。过学习在模式识别领域外的其它研究中也是较为常见的,例如这也是一个人在担心他的健康过程中会去经常拍片的原因:病人越不安及他们的亲人越关心(问题越复杂),其实越是不需要做更多的检查。
An important conclusion from this research is that the cardinality of the set of examples from which we want to infer a pattern concept bounds the complexity of the procedure used for generalization. Such a method should be simple if there are just a few examples. A somewhat complicated concept can only be learnt if sufficient prior knowledge is available and incorporated in such a way that the simple procedure is able to benefit from it.
这方面的研究得到一个重要的结论:我们从用例中推断得到模式概念,用例集的数量决定了推广过程的复杂度。如果只是一些用例,则这样的方法会较简单。如果有充足的先验知识可被用上且有适当方法来运用上,则用简单的方法就可以实现,稍微复杂的概念也能够被学习出来。
An extreme consequence of the lack of prior knowledge is given by Watanabe as the Ugly Duckling Theorem [75]. Assume that objects are described by a set of atomic properties and we consider predicates consisting of all possible logic combinations of these properties in order to train a pattern recognition system. Then, all pairs of objects are equally similar in terms of the number of predicates they share. This is caused by the fact that all atomic properties, their presence as well as their absence, have initially equal weights.As a result, the training set is of no use. Summarized briefly, if we do not know anything about the problem we cannot learn (generalize and/or infer) from observations.
一个缺乏先验知识的极端结论是Watanabe的丑小鸭定理(Ugly Duckling Theorem)[75].假设对象被描述成一个原子性质集,对这些性质进行所有可能的逻辑合并,合并后再进行组合成对象的属性,以此来训练一个模式识别系统。于是任何一对对对象在一些共有的属性上是同等相似的。这是由于对于所有原子性质,跟对象的存在与否无关,初始时二者都具有一样的权值。由此,这里训练集是没有用的。简要地总结一下,就是如果我们对这个问题什么都不了解,我们不可能从观察中学会(推广或推导)。
An entirely different reasoning pointing to the same phenomenon is the No-Free-Lunch Theoremformulated by Wolpert [81]. It states that all classifiers perform equally well if averaged over all possible classification problems.This also includes a random assignment of objects to classes. In order to understand this theorem it should be realized that the considered set of all possible classification problems includes all possible ways a given data set can be distributed over a set of classes. This again emphasizes that learning cannot be successful without any preference or knowledge.
对这相同现象有一个完全不同的论证:Wolpert的没有免费的午餐定理(No-Free-Lunch Therorem)[81]。他指出如果平衡所有可能的分类问题,则所有的分类器的性能是一样的,这也包括指对一个随机的分类方法。要理解这个定理则必须明白:对于所有可能的分类问题,包括所有的可能分类方法,总有一组数据可以被用于对一组类别的识别。这又强调了没有进行优化选择或缺少先验知识,这样的学习是不会成功的。
In essence, it has been established that without prior or background knowledge,no learning, no generalization from examples is possible. Concerning specific applications based on strong models for the classes, it has been shown that additional observations may lower the specified gaps or solve uncertainties in these models. In addition, if these uncertainties are formulated in statistical terms, it will be well possible to diminish their influence by a statistical analysis of the training set. It is, however, unclear what the minimum prior knowledge is that is necessary to make the learning from examples possible.This is of interest if we want to uncover the roots of concept formation, such as learning of a class from examples. There exists one principle, formulated at the very beginning of the study of automatic pattern recognition, which may point to a promising direction. This is the principle of compactness [1], also phrased as a compactness hypothesis. It states that we can only learn from examples or phenomena if their representation is such that small variations in these examples cause small deviations in the representation. This demands that the representation is based on a continuous transformation of the real world objects or phenomena. Consequently, it is assumed that a sufficiently small variation in the original object will not cause the change of its class membership. It will still be a realization of the same concept. Consequently,we may learn the class of objects that belong to the same concept by studying the domain of their corresponding representations.
实质上,没有先验或背景知识,从用例中进行学习或推广是不可能的,这点已被证实。关于以强大分类模型为基础的特定方向应用,已被表明通过观察数据可以减小某些方面的识别差距或者解决模型中的不确定问题。还有,如果是统计方法中的不确定问题,则通过对训练数据的统计分析可能可以很好地减少不确定问题造成的影响。然而,无法确定的是什么样的最小化先验知识对于从用例中实现学习是必须的,这对于我们揭示概念信息的根本是有用的,如从用例中学习某个分类。存在这样一个法则:这个法则叫紧性法则,或称紧性假设,在研究自动化模式识别的刚开始阶段,对各种有利于识别的知识进行整理,这个法则认为如果用例之间存在小的差异,但结果对用例的表示也偏差不大,对这样的表示方法我们只能从用例或现象中进行学习分类方法。对于现实世界中连续变化的对象或现象,用紧性来表示是必要的。因此,可以假定原始对象非常小的变化不会导致其分类归属的变化,仍将属于同一个概念。因此,通过研究分类对象相应的主要属性,我们可以进行学习以实现地属于同一概念的对象的分类方法。
The Ugly Duckling Theorem deals with discrete logical representations.These cannot be solved by the compactness hypothesis unless some metric is assumed that replaces the similarity measured by counting differences in predicates.The No-Free-Lunch Theorem clearly violates the compactness assumption as it makes object representations with contradictory labelings equally probable. In practice, however, we encounter only specific types of problems.
丑小鸭原理是用于处理离散的逻辑表示问题。离散的逻辑表示问题无法用紧性假设来解决,除非一些度量方法如相似度用差异个数来表示。没有免费的午餐原理明显违反了紧性假设:它用对立的等概率来表示对象。然而,实践中我们碰到的只是些特定种类的问题。
Building proper representations has become an important issue in pattern recognition [20]. For a long time this idea has been restricted to the reduction of overly large feature sets to the sizes for which generalization procedures can produce significant results, given the cardinality of the training set. Several procedures have been studied based on feature selection as well as linear and nonlinear feature extraction [45]. A pessimistic result was found that about any hierarchical ordering of (sub)space separability that fulfills the necessary monotonicity constraints can be constructed by an example based on normal distributions only [11]. Very advanced procedures are needed to find such ‘hidden’ subspaces in which classes are well separable [61]. It has to be doubted, however, whether such problems arise in practice, and whether such feature selection procedures are really necessary in problems with finite sample sizes. This doubt is further supported by an insight that feature reduction procedures should rely on global and not very detailed criteria if their purpose is to reduce the high dimensionality to a size which is in agreement with the given training set.
建立一个合适的表示方法成为模式识别中一个重要的问题。很长一段时间来这方面的考虑仅限于对太大特征集尺寸的缩小,给定训练集的势,大尺寸的特征集具有较好的推广性。已有几种方法可用于特征的选择,如线性和非线性特征提取[45]。有一个让人悲观的结论:如果满足单调性约束的特征空间(或子空间)可分离性,则可以只通过某个用例在普通的属性上进行等级化地排序。然而,这不得不让人怀疑在实践中这样的问题是否会出现,这样的特征选择方法对于有限的样本大小是否真的需要。更让人引起怀疑的原因是这样的一个观点:如果是为了减小从给定训练集中得出来的高维空间的维数,特征减少方法应当在全局上进行,而不是依赖于一个非常详细的标准。
Feed-forward neural networks are a very general tool that, among others, offer the possibility to train a single system built between sensor and classification [4, 41, 62]. They thereby cover the representation step in the input layer(s) and the generalization step in the output layer(s). These layers are simultaneously optimized. The number of neurons in the network should be sufficiently large to make the interesting optima tractable. This, however,brings the danger of overtraining. There exist several ways to prevent that by incorporating some regularization steps in the optimization process. This replaces the adaptation step in Fig. 1. A difficult point here, however, is that it is not sufficiently clear how to choose regularization of an appropriate strength.The other important application of neural networks is that the use of various regularization techniques enables one to control the nonlinearity of the resulting classifier. This gives also a possibility to use not only complex, but also moderately nonlinear functions. Neural networks are thereby one of the most general tools for building pattern recognition systems.
前向式反馈神经网络是一个非常流行的工具,可以实现在感应器和分类之间通过训练建立一个识别系统[4,41,62]。这个系统包括输入层的表示过程和输出层的推广过程。输入层和输出层同时可以被进行优化。神经网络中的神经个数必须足够多以达到所需要的性能要求,不过,也有可能会导致过学习,有几个方法可以防止产生过学习,在优化过程中综合一些调整方法,如图1中的适应性修改步骤就可以达到这个目的,有个困难是还不十分清楚怎么去选择可以产生更好效果的调整方法。神经网络其它重要的应用是可以用各种调整技术来控制分类器分类的非线性,可以选择复杂的函数,也可以选择非线性函数。神经网络在建立模式识别系统中成为最为流行的工具之一。
In statistical learning, Vapnik has rigorously studied the problem of adapting the complexity of the generalization procedure to a finite training set[72, 73]. The resulting Vapnik-Chervonenkis(VC) dimension, a complexity measure for a family of classification functions, gives a good insight into the mechanisms that determine the final performance (which depends on the training error and the VC dimension). The resulting error bounds, however,are too general for a direct use. One of the reasons is that, like in the No-Free-Lunch Theorem, the set of classification problems (positions and labeling of the data examples) is not restricted to the ones that obey the compactness assumption.
在统计学习中,Vapnik严密解决了在有限训练集中进行推广过程的复杂度衡量问题,定义了一个VC维概念(Vapnik-Chervonenkis),一个对分类泛函复杂度的度量方法,对于最后的性能分析提供了一个很好方法(根据训练错误率和VC维)。然而,直接用错误边界来进行识别是非常通用了。其中的一个原因是就如没有免费的午餐原理,分类问题(位置和用例数据标识)集不服从紧性假设。
One of the insights gained by studying the complexity measures of polynomial functions is that they have to be as simple as possible in terms of the number of their free parameters to be optimized. This was already realized by Cover in 1965 [9]. Vapnik extended this finding around 1994 to arbitrary non-linear classifiers [73]. In that case, however, the number of free parameters is not necessarily indicative for the complexity of a given family of functions,but the VC dimension is. In Vapnik’s terms, the VC dimension reflects the flexibility of a family of functions (such as polynomials or radial basis functions)to separate arbitrarily labeled and positioned n-element data in a vector space of a fixed dimension. This VC dimension should be finite and small to guarantee the good performance of the generalization function.
有一个观点:在研究多项式函数复杂度度量中发现,多项式函数复杂度可以简单地用要被优化的自由参数个数来衡量。Cover于1965年就已经发现了这个问题。Vapnik约在1994年左右把这个发现应用到任意非线性分类器中,然而,在那里他认为用自由参数的数目来度量一个函数集的复杂度是不必要的,而应该用VC维。在Vapnik描述中,VC维反映了一个函数集(诸如多项式函数或径基函数)的适应性,这个函数集用来在一个固定维数的特征空间中分开已被标识和被确定好位置的n-元数据。VC维应当是有限而且小以确保具有好的推广性。
This idea was elegantly incorporated to the Support Vector Machine (SVM) [73], in which the number of parameters is as small as a suitably determined subset of the training objects (the support vectors) and into independent of the dimensionality of the vector space. One way to phrase this principle is that the structure of the classifier itself is simplified as far as possible (following the Occam’s razor principle). So, after a detor along huge neural networks possibly having many more parameters than training examples, pattern recognition was back to the small-is-beautiful principle, but now better understood and elegantly formulated.
这个思想被完美地运用到支持向量机(SVM)中,SVM中用于划分训练对象的参数个数(支持向量)很少,且不依赖于向量空间的维数。SVM方法可以说是把分类器结构尽可能地简化(遵从Occam剃刀原则)。所以在绕过具有比训练用例还多的参数的巨大神经网络系统后,模式识别又回到了“小而美”的原则,且现在已能够更好的理解和完美地形式化这个原则。
The use of kernels largely enriched the applicability of the SVM to nonlinear decision functions [66, 67, 73]. The kernel approach virtually generates nonlinear transformations of the combinations of the existing features. By using the representer theorem, a linear classifier in this nonlinear feature space can be constructed, because the kernel encodes generalized inner products of the original vectors only. Consequently, well-performing nonlinear classifiers built on training sets of almost any size in almost any feature space can be computed by using the SVM in combination with the ‘kernel trick’ [66].
核的应用大大地丰富了SVM在非线性分类器中的适用性。核方法实质是对特征空间的整体非线性变换。根据SVM所陈述的定理,在非线性特征空间可以建立一个线性分类器,因为核函数对原始特征向量进行了变换使之线性可分。因此,在任何特征空间尺度上的训练集中,SVM都可以发挥其很好的非线性分类性能,在此核函数是必不可少的。
This method has still a few limitations, however. It was originally designed for separable classes, hence it suffers when high overlap occurs. The use of slack variables, necessary for handling such an overlap, leads to a large number of support vectors and, consequently, to a large VC dimension. In such cases,other learning procedures have to be preferred. Another difficulty is that the class of admissible kernels is very narrow to guarantee the optimal solution.A kernel K has to be (conditionally) positive semidefinite (cpd) functions of two variables as only then it can be interpreted as a generalized inner product in reproducing kernel Hilbert space induced by K. Kernels were first considered as functions in Euclidean vector spaces, but they are now also designed to handle more general representations. Special-purpose kernels are defined in a number of applications such as text processing and shape recognition, in which good features are difficult to obtain. They use background knowledge from the application in which similarities between objects are defined in such a way that a proper kernel can be constructed. The difficulty is, again, the strong requirement of kernels as being cpd.
然而,SVM方法仍存在一些局限性。SVM原来是用于对象可分的分类中,因此当分类对象交迭严重时SVM方法就不适用了。就是运用处理对象交迭的松驰变量,也会导致产生大量的支持向量,由此产生VC维很大。这种情况下,最好采用其它的学习方法。另一个会导致的困难是能保证理想分类功能的可用的核函数很少。一个K核函数必须是两个变量的半正定(cpd)函数,只有这样的函数才能在Hilbert空间中应用。核函数初始被认为是欧几里得向量空间中的函数,但现在核函数已被应用在了更具一般性的表示方法。特定功能的核函数被用在了不同应用中,如文本处理和形状识别,这些应用中较难得到好的特征。对象间的相似性被定义了出来,依此运用应用背景知识就可以定义出适宜的核函数。这里的困难还是核函数必须是半正定的。
The next step is the so-called dissimilarity representation [56] in which general proximity measures between objects can be used for their representation.The measure itself may be arbitrary, provided that it is meaningful for the problem. Proximity plays a key role in the quest for an integrated structural and statistical learning model, since it is a natural bridge between these two approaches [6, 56]. Proximity is the basic quality to capture the characteristics of a set objects forming a group. It can be defined in various ways and contexts, based on sensory measurements, numerical descriptions, sequences, graphs, relations and other non-vectorial representations, as well as their combinations. A representation based on proximities is, therefore,universal.
下个步骤就是所谓的相异点表示法[56],对象间的一般相似性度量方法可以代替相异点表示法。度量方法本身可以是任意的,只要是能够解决问题就可以了。接近度在集成结构模式识别方法和统计学习方法中起着关键性的角色,在两个方法之间架起了一个天然桥梁[6,56]。在同一类的一组对象中获取其特征,接近度是一个基本考量方法。接近度可以用各种方法和上下文来定义,取决于感观度量、数字化描述、序列、图表、关联和其它的非向量表示法,也取决于于其中的组合方法。所以基于接近度的表示法是通用的。
Although some foundations are laid down [56], the ways for effective learning from general proximity representations are still to be developed. Since measures may not belong to the class of permissable kernels, the traditional SVM, as such, cannot be used. There exist alternative interpretations of indefinite kernels and their relation to pseudo-Euclidean and Krein spaces[38, 50, 55, 56, 58], in which learning is possible for non-Euclidean representations.In general, proximity representations are embedded into suitable vector spaces equipped with a generalized inner product or norm, in which numerical techniques can either be developed or adapted from the existing ones. It has been experimentally shown that many classification techniques may perform well for general dissimilarity representations.
虽然已有了一些理论基础,从接近度表示法来提高学习有效性仍被研究中。既然这方法和分类中的核函数无关,传统的SVM方法就无法适用这种情况了。存在对不定核的替代表示方法,这跟欧几里得几何和Krein空间有关[38,50,55,56,58],通过这个方法可以实现非欧几里得表示法的学习。一般地,接近率表示法被嵌入到具有一般化内积或范数的相应向量空间中,这个向量空间可以被标量化或适应性修改后的标量化。通过实验已表明一般相异点的表示法在许多分类技术中可以表现出很好的性能。
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
4 Perspectives
4 看法
Pattern recognition deals with discovering, distinguishing, detecting or characterizing patterns present in the surrounding world. It relies on extraction and representation of information from the observed data, such that after integration with background knowledge, it ultimately leads to a formulation of new knowledge and concepts. The result of learning is that the knowledge already captured in some formal terms is used to describe the present interdependencies such that the relations between patterns are better understood (interpreted) or used for generalization. The latter means that a concept,e.g. of a class of objects, is formalized such that it can be applied to unseen examples of the same domain, inducing new information, e.g. the class label of a new object. In this process new examples should obey the same deduction process as applied to the original examples.
模式识别技术应用于发现、区分、检测或提取存在于我们周围世界中的模式,这依赖于怎么从观察数据中进行信息提取和表示,结合背景知识,最终得到新知识和概念的形式化内容。学习的结果是得到一个用于表示模式之间相互依赖的形式化知识,以此更好地理解(或解释)观察数据或进行推广。推广的意思是指一个概念(如对象的一个种类)被形式化后,这个概念可以被应用于相同领域未知的用例,包括新的信息,例如对一个新对象进行标识,且对于新用例的处理应当遵从应用于原来用例的相同的演绎过程。
In the next subsections we will first recapitulate the elements of logical reasoning that contribute to learning. Next, this will be related to the Platonic and Aristotelian scientific approaches discussed in Section 2. Finally, two novel pattern recognition paradigms are placed in this view.
在下面几节中我们将先概括应用于学习的逻辑推理的方法要素,接着在第二节中讨论有关柏拉图及亚里士多德科学研究方法,最后根据所讨论内容举两个模式识别的例子。
4.1 Learning by Logical Reasoning
4.1 逻辑推理性学习
Learning from examples is an active process of concept formation that relies on abstraction (focus on important characteristics or reduction of detail)and analogy (comparison between different entities or relations focussing on some aspect of their similarity). Learning often requires dynamical, multilevel(seeing the details leading to unified concepts, which further build higher level concepts) and possibly multi-strategy actions (e.g. in order to support good predictive power as well as interpretability). A learning task is basically defined by input data (design set), background knowledge or problem context and a learning goal [52]. Many inferential strategies need to be synergetically integrated to be successful in reaching this goal. The most important ones are inductive, deductive and abductive principles, which are briefly presented next. More formal definitions can be sought in the literature on formal logic,philosophy or e.g. in [23, 40, 52, 83].
学习用例是一种对概念信息的处理方法,也是一个很有用的处理方法,它依赖于对信息的抽象(专注于提取重要特征或减少细节描述)和分析(在不同实体之间进行比较或关注相似属性之间的关系)。学习经常需要是动态的、多层次的(如怎么得到一致性的概念以更进一步地建立在更高层次上的概念)和多重目的的(如使之具有较好的预测能力,即可判断能力)。学习任务的确立是基于输入数据(设计样本集),背景知识或问题的上下文,以及学习目标。最重要的是归纳、演译和诱导原理,下面会有简要介绍,更多的形式定义可以查看形式逻辑、哲学文献,在[23,40,52,83]中也有描述。
Inductive reasoning is the synthetic inference process of arriving at a conclusion or a general rule from a limited set of observations. This relies on a formation of a concept or a model, given the data. Although such a derived inductive conclusion cannot be proved, its reliability is supported by empirical observations. As along as the related deductions are not in contradiction with experiments, the inductive conclusion remains valid. If, however, future observations lead to contradiction, either an adaption or a new inference is necessary to find a better rule. To make it more formal, induction learns a general rule R (concerning A and B) from numerous examples of A andB. In practice, induction is often realized in a quantitative way. Its strength relies then on probability theory and the law of large numbers, in which given a large number of cases, one can describe their properties in the limit and the corresponding rate of convergence.
归纳性推理是一个综合的推导过程,从有限的观察数据中得到结论或一般性规则。归纳性推理依赖于给定数据情况下概念或模型的信息表示方法。虽然归纳出来的结论没办法被证明,但是它的可靠性是依靠实践经验观察出来的。只要相关的演译推理不会和实验得到的结论相抵触,归纳出来的结论就依然有效。然而,如果将来观察发现出现了冲突,这时就需要去适配或推导来寻找一个更好的规则。更为形式化地表示,即从许多用例A和B进行归纳学习得到一个一般性规则R。在实践中,归纳经常是通过量化的方法来实现。归纲性推理强有力的理论支持是概率理论和大数据量原则,在给定一个大数据量的用例中,才能够描述出它们的一定范围的性质及相应的覆盖率。
Deductive reasoning is the analytic inference process in which existing knowledge of known facts or agreed-upon rules is used to derive a conclusion. Such a conclusion does not yield ‘new’ knowledge since it is a logical consequence of what has already been known, but implicitly (it is not of a greater generality than the premises). Deduction, therefore, uses a logical argument to make explicit what has been hidden. It is also a valid form of proof provided that one starts from true premises. It has a predictive power, which makes it complementary to induction. In a pattern recognition system, both evaluation and prediction rely on deductive reasoning. To make it more formal, let us assume that A is a set of observations, B is a conclusion and R is a general rule. Let B be a logical consequence of A and R, i.e. (A ∧ R) |= B, where |= denotes entailment. In a deductive reasoning, given A and using the rule R,the consequence B is derived.
演译推理是分析性推理过程,通过已知事实的现存知识或一致被认可的规则推导出一个结论。既然这样的结论是从已知的知识中进行逻辑推导的结果,所以不能算是“新”知识,但具有隐含性(它比前提条件不更具一般性)。演译,即是运用一套逻辑方法把隐藏在背后的知识清晰起来。它也是一个从真实的前提进行实证的有效形式。它具有预言性功能,能弥补归纳方法的不足。更形式化地表示可以这样:假设A是一组观察数据,B是一个结论,R是一个一般性规则,则B是A和R的逻辑推导结果,如(A ∧ R) |= B, |=表示蕴涵关系,演译推理过程中,给定A,运用规则R,结论B就能就此被推论出来。
Abductive reasoning is the constructive process of deriving the most likely or best explanations of known facts. This is a creative process, in which possible and feasible hypotheses are generated for a further evaluation. Since both abduction and induction deal with incomplete information, induction may be viewed in some aspects as abduction and vice versa, which leads to some confusion between these two [23, 52]. Here, we assume they are different. Concerning the entailment (A ∧ R)|= B, having observed the consequence B in the context of the rule R, A is derived to explain B.
溯因推理是构建推理过程,从最象或最具有解释性的已知事实中推理出结论。这是一个创造性过程,可能或可行性假设是因进一步需要推定而产生的。既然溯因推理和归纳推理是在不完整的信息中进行推理,从某些方面可能可以把归纳推理看成溯因推理,反之亦然,这样在二者之间会导致些混淆。这里,我们假定他们是不一样的,看这样的蕴涵关系:(A ∧ R)|= B,表示从规则R的上下文中观察出B结论,A被用来解释B。
In all learning paradigms there is an interplay between inductive, abductive and deductive principles. Both deduction and abduction make possible to conceptually understand a phenomenon, while induction verifies it. More precisely, abduction generates or reformulates new (feasible) ideas or hypotheses,induction justifies the validity of these hypothesis with observed data and deduction evaluates and tests them. Concerning pattern recognition systems,abduction explores data, transforms the representation and suggests feasible classifiers for the given problem. It also generates new classifiers or reformulates the old ones. Abduction is present in an initial exploratory step or in the Adaptation stage; see Fig. 1. Induction trains the classifier in the Generalization stage, while deduction predicts the final outcome (such as label) for the test data by applying the trained classifier in the Evaluation stage.
在所有的学习方法中,归纳、演译和溯因推理法则之间相互影响着。演译和溯因推理都有可能进行概念性地理解事物,归纳推理则用来检验它。更确切地说,溯因方法产生或变革新的(或可行的)思想或假设,归纳方法根据观察数据判断这些假设的合理性,演译方法评估和验证假设。对于模式识别系统,溯因方法探究源数据,转换表示方法,对既定的问题提出可行的分类方法,也可能产生新的分类方法或变革旧方法。溯因方法一般用于初始时的探究阶段或适配阶段(见图1),在推广阶段归纳方法被用来训练分类器,在评估阶段演译方法用来预测经过训练后的分类器在测试数据下的最后输出(如标识名)。
Since abduction is hardly emphasized in learning, we will give some more insights. In abduction, a peculiarity or an artifact is observed and a hypothesis is then created to explain it. Such a hypothesis is suggested based on existing knowledge or may extend it, e.g. by using analogy. So, the abductive process is creative and works towards new discovery. In data analysis,visualization facilitates the abductive process. In response to visual observations of irregularities or bizarre patterns, a researcher is inspired to look for clues that can be used to explain such an unexpected behavior. Mistakes and errors can therefore serve the purpose of discovery when strange results are inquired with a critical mind. Note, however, that this process is very hard to implement into automatic recognition systems as it would require to encode not only the detailed domain knowledge, but also techniques that are able to detect ‘surprises’ as well as strategies for their possible use. In fact,this requires a conscious interaction. Ultimately, only a human analyst can interactively respond in such cases, so abduction can be incorporated into semi-automatic systems well. In traditional pattern recognition systems, abduction is usually defined in the terms of data and works over pre-specified set of transformations, models or classifiers.
因为溯因推理在学习过程中较难被重视,我们就多做这方面的论述。在溯因推理中,一旦一个特性或一个典型结果被发现到,于是就会创建一个假设来解释它。这种假设是基于已存在的知识或知识的延伸来提出来的,例如运用类推方法。所以,溯因推理是创造性的过程,以新发现为目的。在数据分析中,可视化工具较便利于进行溯因推理。利用了对不规则或奇异的模式进行可视化观察,研究者得到灵感,找到用来解释意外现象的线索。当新的结果被批判性地检查时就会发现有错误或误差。然而,要注意的是溯因方法很难实现自动识别系统,因为这不仅需要消化运用相关细节知识,而且还要有在各种运用方法中检测产生异常的技术。实际上,这需要一个意识交互作用。总之,只有人类分析才能够交互地对这种事做出反应,所以溯因方法可以被很好地应用到半自动化系统中。在传统的模式识别系统中,溯因方法通常是用来定义有关数据并是在转换、建模和设计分类器之前进行。
4.2 Logical Reasoning Related to Scientific Approaches
4.2 与科学研究方法有关的逻辑推理
If pattern recognition (learning from examples) is merely understood as a process of concept formation from a set of observations, the inductive principle is the most appealing for this task. Indeed, it is the most widely emphasized in the literature, in which ‘learning’ is implicitly understood as ‘inductive learning’. Such a reasoning leads to inferring new knowledge (rule or model)which is hopefully valid not only for the known examples, but also for novel,unseen objects. Various validation measures or adaptation steps are taken to support the applicability of the determined model. Additionally, care has to be taken that the unseen objects obey the same assumptions as the original objects used in training. If this does not hold, such an empirical generalization becomes invalid. One should therefore exercise in critical thinking while designing a complete learning system. It means that one has to be conscious which assumptions are made and be able to quantify their sensibility, usability and validity with the learning goal.
如果模式识别(从用例中学习)只是被理解成一个从一组观察中得到概念形成的过程,那么归纳方法是这个任务中最需要的,这在文献中最被广为强调:“学习”意味着“归纳学习”。这样的推理导致推论出新的知识(法则或模型),这个新知识不仅对已知的用例非常有效,而且对新出现的、未知的对象也有效。各种验证方法或适配步骤被用于支持决策模型的适用性。另外,要注意的是未知对象遵从同样的假设,即原来的对象被用在了训练中,如果不是这样,那么这样得到经验上的推广就变得无效。要设计一个完整的学习系统,就应当批判地去实践检验它,意思是必须意识到要做出什么样的假设和什么假设能够量化敏感性、可用性和有效性以达到学习目标。
On the other hand, deductive reasoning plays a significant role in the Platonic approach. This top-down scenario starts from a set of rules derived from expert knowledge on problem domain or from a degree of belief in a hypothesis. The existing prior knowledge is first formulated in appropriate terms.These are further used to generate inductive inferences regarding the validity of the hypotheses in the presence of observed examples. So, deductive formalism(description of the object’s structure) or deductive predictions (based on the Bayes rule) precede inductive principles. A simple example in the Bayesian inference is the well-known Expectation-Maximization (EM) algorithm used in problems with incomplete data [13]. The EM algorithm iterates between the E-step and M-step until convergence. In the E-step, given a current (or initial)estimate of the unknown variable, a conditional expectation is found, which is maximized in the M-step and derives a new estimate. The E-step is based on deduction, while the M-step relies on induction. In the case of Bayesian nets, which model a set of concepts (provided by an expert) through a network of conditional dependencies, predictions (deductions) are made from the (initial) hypotheses (beliefs over conditional dependencies) using the Bayes theorem. Then, inductive inferences regarding the hypotheses are drawn from the data. Note also that if the existing prior knowledge is captured in some rules, learning may become a simplification of these rules such that their logical combinations describe the problem.
在另一方面,演译推理在柏拉图式科学研究中起着重要的角色。这是自顶向下的过程,起始于一组规则,这些规则从某个领域的专家知识中得到,或从假设的可信度中得到。首先先验知识被以某种表示方法形式化,形式化后的知识就可以被用来在现有的观察数据中运用归纳推导法检验假设的有效性。所以演译形式(描述对象结构)或演译预测(基于贝叶斯法则)是在归纳法则之前的过程。在贝叶斯推导中一个简单的例子是大家都知道的最大期望算法(Expectation-Maximization (EM)),这种算法用在数据不完整的问题中。EM算法在E步骤和M步骤之间循环进行直到能够被收敛。在E步骤中,给定一个未知变量的当前(或初始)估值,找到一个条件期望值,期望值在M步骤中被最大化并得到一个新估值。E步骤是基于演译方法,M步骤是运用归纳方法。在贝叶斯网络中,通过条件依赖的贝叶斯网络为一个概念(由专家提供)集进行建模,运用贝叶斯理论得到的(初始)假设(建立在条件依赖上的把握)进行预测(归纳)。然后,归纳推导从数据上进行检验。也要注意的是如果已存在的先验知识是在一些法则中得到,学习可能是对这些法则的简化过程,这样形成的逻辑组合被用来描述所要解决的问题。
In the Aristotelian approach to pattern recognition, observation of particulars and their explanation are essential for deriving a concept. As we already know, abduction plays a role here, especially for data exploration and characterization to explain or suggest a modification of the representation or an adaptation of the given classifier. Aristotelian learning often relies on the Occam’s razor principle which advocates to choose the simplest model or hypothesis among otherwise equivalent ones and can be implemented in a number of ways [8].
在模式识别亚里士多德式研究方法中,对细节的观察以及相应的解释是得到概念的本质方法。我们已经知道,溯因方法在这里起到了作用,特别是在数据探索和特征描述中,这两个被用来解释或提出对表示方法的修改或对已有识别器的适配。亚里士多德方法经常用到Occam剃刀法则:在多种等价物和多种实现方法中提倡选择最简单的模型或假设。
In summary, the Platonic scenario is dominantly inductive-deductive,while the Aristotelian scenario is dominantly inductive-abductive. Both frameworks have different merits and shortcomings. The strength of the Platonic approach lies in the proper formulation and use of subjective beliefs, expert knowledge and possibility to encode internal structural organization of objects. It is model-driven. In this way, however, the inductive generalization becomes limited, as there may be little freedom in the description to explore and discovery of new knowledge. The strength of the Aristotelian approach lies in a numerical induction and a well-developed mathematical theory of vector spaces in which the actual learning takes place. It is data-driven. The weakness,however, lies in the difficulty to incorporate the expert or background knowledge about the problem. Moreover, in many practical applications, it is known that the implicit assumptions of representative training sets, identical and identically distributed (iid) samples as well as stationary distributions do not hold.
总结一下,柏拉图研究用的主要方法是归纳和演译方法,亚里士多德主要是用归纳和溯因方法。两个方法体系有不同的优点和缺点。柏拉图方法的优势在于概念形式化和主观信心、专家知识的使用以及对对象内在结构性组织的构建,它是属于建模驱动方法。然而,这个方法中束缚了归纳性推广,在探索和发现新知识方面受到了约束。亚里士多德方法的优势在于数字化归纳和丰富的向量空间数学理论,学习过程实际上是在向量空间中进行,这是属于数据驱动方法。然而,缺点是难以把专家和背景知识应用到解决问题中。此外,在许多实际应用中,从典型训练集、具同一性和同分布(即固定分布)的样本中得到的隐假设没有加入专家和背景知识。
4.3 Two New Pattern Recognition Paradigms
4.3 两个新的模式识别模式
Two far-reaching novel paradigms have been proposed that deal with the drawbacks of the Platonic and Aristotelian approaches. In the Aristotelian scenario, Vapnik has introduced transductivelearning [73], while in the Platonic scenario, Goldfarb has advocated a new structural learning paradigm[31, 32]. We think these are two major perspectives of the science of pattern recognition.
有两个新颖的且超前的模式识别思想被提了出来,它们分别用以解决柏拉图和亚里士多德方法的缺陷。对于亚里士多德研究方法,Vapnik提出转化推理学习方法,对于柏拉图研究方法,Goldfarb提出一种新的结构学习模式。我们认为这是模式识别科学领域中两个主要的观点。
Vapnik [73] formulated the main learning principle as: ‘If you possess a restricted amount of information for solving some problem, try to solve the problem directly and never solve a more general problem as an intermediate step.’ In the traditional Aristotelian scenario, the learning task is often transformed to the problem of function estimation, in which a decision function is determined globally for the entire domain (e.g. for all possible examples in a feature vector space). This is, however, a solution to a more general problem than necessary to arrive at a conclusion (output) for specific input data. Consequently, the application of this common-sense principle requires a reformulation of the learning problem such that novel (unlabeled) examples are considered in the context of the given training set. This leads to the transductive principle which aims at estimating the output for a given input only when required and may differ from an instance to instance. The training sample, considered either globally, or in the local neighborhoods of test examples, is actively used to determine the output. As a result, this leads to confidence measures of single predictions instead of globally estimated classifiers. It provides ways to overcome the difficulty of iid samples and stationary distributions. More formally, in a transductive reasoning, given an entailment A |= (B ∪ C), if the consequence B is observed as the result of A, then the consequence C becomes more likely.
Vapnik提出主要的学习法则是:如果用于解决问题的信息有限,则应当试着寻找直接解决问题的方法,不要去解决更为通用的问题,如中间问题。在传统的亚里士多德式研究过程中,学习的任务经常转为函数估计问题,其中的决策函数用于全局地决定整个问题域(如为了解决特征向量空间中所有的可能用例),然而,这是一个为了解决更为通用问题的方法,不是为特定输入数据而达到的解决方法(输出)。结果,运用这种普通法则的应用需要对学习问题重新进行形式化,这样新的(未标识)用例要被考虑进已有训练集的上下文中。这导致了转化推理的产生,这种方法是只有在需要的时候才从输入数据来估计输出数据,可能会在实例与实例之间进行比较,对于训练样本,可以是较为全面的测试用例,也可以是局部的相邻部分,决定了决策结果。所以,这里用的是对每个决策的信心度量,而不是对分类器进行全局性估量。这样可以克服样本要具有同一同分布和固定分布的困难。转化推理更为形式化的表示可以是这样的关系:A |= (B ∪ C),如果B被观察出来是A,则C因和B相似也被认为是A。
The truly transductive principle requires an active synergy of inductive,deductive and abductive principles in a conscious decision process. We believe it is practised by people who analyze complex situations, deduce and validate possible solutions and make decisions in novel ways. Examples are medical doctors, financial advisers, strategy planners or leaders of large organizations. In the context of automatic learning, transduction has applications to learning from partially labeled sets and otherwise missing information, information retrieval, active learning and all types of diagnostics. Some proposals can be found e.g. in [34, 46, 47, 73]. Although many researchers recognize the importance of this principle, many remain also reluctant. This may be caused by unfamiliarity with this idea, few existing procedures, or by the accompanying computational costs as a complete decision process has to be constantly inferred anew.
在意识决策过程中真正的转化推理法则需要是对归纳、演译和溯因法则进行互补和综合,我们相信人类分析复杂事物、推理和验证可能性结论及用新奇方法进行决策是这样进行的,例如象那些医生、金融顾问、战略规划者和大型组织的领导者。在自动化学习过程中,转化推理拥有从局部标识的数据集中进行学习的程序,还有分别从丢失信息、不完整信息、已学知识和各种诊断学中进行学习的程序。这里可以发现一些学习方案,例如可以见文献[34, 46, 47, 73]。虽然有许多研究者认识到了这个转化推理法则的重要性,也有很多人对此表示怀疑,这也许是因为对这个思想不熟悉、相应的程序很少,或者是因为这样一个完整的决策每次都要重新被推断需要较多的计算开销。
In the Platonic scenario, Goldfarb and his colleagues have developed structural inductive learning, realized by the so-called evolving transformation systems (ETS) [31, 32]. Goldfarb first noticed the intrinsic and impossible to overcome inadequacy of vector spaces to truly learn from examples [30]. The reason is that such quantitative representations loose all information on object structure; there is no way an object can be generated given its numeric encoding. The second crucial observation was that all objects in the universe have a formative history. This led Goldfarb to the conclusion that an object representation should capture the object’s formative evolution, i.e. the way the object is created through a sequence of suitable transformations in time. The creation process is only possible through structural operations. So, ‘the resulting representation embodies temporal structural information in the form of a formative, or generative, history’ [31]. Consequently, objects are treated as evolving structural processes and a class is defined by structural processes, which are ‘similar’. This is an inductive structural/symbolic class representation,the central concept in ETS. This representation is learnable from a (small) set of examples and has the capability to generate objects from the class.
在柏拉图科学研究方法中,Goldfarb和他的同事研发出基于结构归纳学习方法,运用到所谓的演变转化系统(ETS)中[31,32]。Goldfarb首先注意到一个固有的不可克服的困难,即因向量空间信息不充备而无法真正从用例中学会识别方法,这个原因是定量的表示方法无法确切地表示对象结构的所有信息,没有方法可以量化地表示一个对象。其次还有一个重要的观察是,发现宇宙中所有事物都有一个形式不断演变的历程。于是Goldfarb得到这样的结论,一个对象的表示方法应当抓住对象的形式演变过程,例如,这个方法要能够表示通过一系列适当的不断变化的事物创建过程,这个创建过程只有通过结构化操作才有可能实现。所以,“由此而产生的表示方法嵌入了带时间的结构信息,反映了一个演变或创建历程”[31]。因此,用演变式地结构处理来看待对象,用结构式处理来定义某个类别,这两个方面是“相似”的。这是一个采用归纳方法、结构化/符号化的类别表示方法,是ETS中的中心概念。这个表示方法可以在一个(小的)用例集中进行学习,拥有从某个类别中生成对象的能力。
The generative history of a class starts from a single progenitor and is encoded as a multi-level hierarchical system. On a given level, the basic structural elements are defined together with their structural transformations, such that both are used to constitute a new structural element on a higher level.This new element becomes meaningful on that level. Similarity plays an important role, as it is used as a basic quality for a class representation as a set of similar structural processes. Similarity measure is learned in training to induce the optimal finite set of weighted structural transformations that are necessary on the given level, such that the similarity of an object to the class representation is large. ‘This mathematical structure allows one to capture dynamically, during the learning process, the compositional structure of objects/events within a given inductive, or evolutionary, environment’ [31].
一个种类的生成历程起始于一个单一的源点,这个历程被描述成一个多层次的分等级的系统。在某个层次上,基本的结构元素和它们的结构转变被定义在一起,这样它们就可以一起被用来建立更高层次上新的结构元素,新元素在所在层次上就变成有意义起来。相似性起着一个重要的角色,是一个类别表示方法的基本性质,一个类别表示方法是通过对结构上相似性判断分出来的。相似性衡量方法通过训练被学习出来,通过归纲方法进行优化在某个层上必须要有的权重化结构转变信息的有限集,这样在某个种类表示方法中的某个对象的相似性是很大的。“在归纳推理中,或在演进式环境中,这个数学结构在学习过程中允许动态地获取对象或事件的合成结构。”[31]
Goldfarb’s ideas bear some similarity to the ones of Wolfram, presented in his book on ‘a new kind of science’ [80]. Wolfram considers computation as the primary concept in nature; all processes are the results of cellular automata(Cellular automata are discrete dynamical systems that operate on a regular lattice in space and time, and are characterized by ‘local’ interactions.)type of computational processes, and thereby inherently numerical. He observes that repetitive use of simple computational transformations can cause very complex phenomena, especially if computational mechanisms are used at different levels. Goldfarb also discusses dynamical systems, in which complexity is built from simpler structures, through hierarchical folding up (or enrichment). The major difference is that he considers structure of primary interest, which leads to evolving temporal structural processes instead of computational ones.
Goldfarb的思想与Wolfram有些相似,在他的书‘a new kind of science’[80]中有这方面的介绍。Woldfram认为计算是自然中首要的概念,所有的处理都是实现在胞元自动机类型上的计算过程(胞元自动机是离散的动态系统,在时间和空间中操作在有规则的格子上,具有局部交互的特点),显然是数字化的。他观察到反复使用简单的计算转换方法可以产生复杂的现象,特别是如果在不同层次进行计算。Goldfarb也论述类似的动态系统,认为复杂的事物可以通过分等级的折叠(或富集)方法,从更简单的结构来建立。主要不同的是他考虑主要有用的结构,且这种结构可以在结构上随时间的变化而进行演变,而不是计算出来。
In summary, Goldfarb proposes a revolutionary paradigm: an ontological model of a class representation in an epistemological context, as it is learnable from examples. This is a truly unique unification. We think it is the most complete and challenging approach to pattern recognition to this date, a breakthrough. By including the formative history of objects into their representation, Goldfarb attributes them some aspects of human consciousness. The far reaching consequence of his ideas is a generalized measurement process that will be one day present in sensors. Such sensors will be able to measure ‘in structural units’ instead of numerical units (say, meters) as it is currently done. The inductive process over a set of structural units lies at the foundation of new inductive informatics. The difficulty, however, is that the current formalism in mathematics and related fields is not yet prepared for adopting these far-reaching ideas. We, however, believe, they will pave the road and be found anew or rediscovered in the next decennia.
总之,Goldfarb提出的是一个具有创新性的识别模式:以认识论为背景,建立类别表示方法的存在论模型,且可以从用例中进行学习。这是一个真正的终极方法。我们认为这是迄今为止最为完整且极具有挑战性的方法。通过把对象的演变历史融入到表示方法中,Goldfarb加入了人类意识形为。他的思想遥不可及的是一般化的类似于生物的测量方法,这个方法有一天将会被用在传感器中,这样的传感器可以‘在结构单元’上进行测量,而不是现在所实现的数值单元(如米)。归纳推理在一组结构单元上进行处理,这些结构单元是归纳推理产生新信息的基础。然而,困难的是当前数学和相关领域上的形式体系并未能为采用这些遥不可及的思想做好准备,不过,我们相信这些思想是通往成功之路,且在几十年后会被再次找到或重新发现。
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
5 Challenges
5 挑战
A lot of research effort is needed before the two novel and far-reaching paradigms are ready for practical applications. So, this section focuses on several challenges that naturally come in the current context and will be summarized for the design of automatic pattern recognition procedures. A number of fundamental problems, related to the various approaches, have already been identified in the previous sections and some will return here on a more technical level. Many of the points raised in this section have been more extensively discussed in [17]. We will emphasize these which have only been touched or are not treated in the standard books [15, 71, 76] or in the review by Jain et al. [45]. The issues to be described are just a selection of the many which are not yet entirely understood. Some of them may be solved in the future by the development of novel procedures or by gaining an additional understanding. Others may remain an issue of concern to be dealt with in each application separately. We will be systematically describe them, following the line of advancement of a pattern recognition system; see also Fig. 1:• Representation and background knowledge. This is the way in which individual real world objects and phenomena are numerically described or encoded such that they can be related to each other in some meaningful mathematical framework. This framework has to allow the generalization to take place.
在上面两个新奇且遥不可及的识别模式被实际应用之前,还需要做许多的研究努力。所以这节我们承接上文的讨论,重点讨论相关的几个挑战性问题,然后总结一下自动模式识别方法的设计问题。跟各种识别方法相关的一些基础问题在前面的章节中已被提了出来,这里将就这些问题在技术层次上做更深的讨论。在引文[17]中对本节所列的观点有更广泛地论述。我们将着重描述已解决的问题或一般书本上没提到的问题或Jain等其他人提出的观点,所描述的这些问题只是许多未被完全理解的一部分,其中一些问题可能在将来因新技术的发现或其它理论知识的产生而被解决掉,其它问题仍然要结合每个具体应用分开来考虑。我们将根据模式识别系统中的各个阶段系统地对这些问题进行描述,参考图1:表示方法和识别背景,这是对现实世界对象或现象的每个个体进行数据化描述或编码的方法,这样它们就可被应用到那些复杂的数学处理系统中,这个系统具有较好的推广性。
• Design set. This is the set of objects available or selected to develop the recognition system.
设计样本集:这是用于开发识别系统所要用到的或被选择出来的识别对象集。
• Adaptation. This is usually a transformation of the representation such that it becomes more suitable for the generalization step.
适配:这通常是指表示方法的转化方法,以此可以更适合用于推广步骤。
• Generalization. This is the step in which objects of the design set are related such that classes of objects can be distinguished and new objects can be accurately classified.
推广:这个步骤跟设计样本集中的对象有关,这里对象的类别可以被区分出来,新的对象能够被准确地分类。
• Evaluation. This is an estimate of the performance of a developed recognition system.
评估:这是对已开发出来的识别系统进行评估。
5.1 Representation and Background Knowledge
5.1 表示方法和知识背景
The problem of representation is a core issue for pattern recognition [18, 20].Representation encodes the real world objects by some numerical description,handled by computers in such a way that the individual object representations can be interrelated. Based on that, later a generalization is achieved, establishing descriptions or discriminations between classes of objects. Originally, the issue of representation was almost neglected, as it was reduced to the demand of having discriminative features provided by some expert. Statistical learning is often believed to start in a given feature vector space. Indeed, many books on pattern recognition disregard the topic of representation, simply by assuming that objects are somehow already represented [4, 62]. A systematic study on representation [20, 56] is not easy, as it is application or domain-dependent(where the word domain refers to the nature or character of problems and the resulting type of data). For instance, the representations of a time signal,an image of an isolated 2D object, an image of a set of objects on some background, a 3D object reconstruction or the collected set of outcomes of a medical examination are entirely different observations that need individual approaches to find good representations. Anyway, if the starting point of a pattern recognition problem is not well defined, this cannot be improved later in the process of learning. It is, therefore, of crucial importance to study the representation issues seriously. Some of them are phrased in the subsequent sections.
表示方法是模式识别中的核心问题[18,20]。表示方法通过数字化方法对现实世界中的对象进行转换描述,运用计算机进行处理,这样每个独立的对象表示方法被相互关联了起来。以此为基础,推广方法在后面得到应用,在识别对象的种类间建立描述或区分方法。最初,表示方法问题几乎被忽视了,只是考虑怎么减少由专家所提供的具有区分能力的特征。统计学习经常被相信可以用在一个已有特征空间中。实际上,许多模式识别书本忽视表示方法问题,只是简单地假设对象以某种方法但又不确切的方法来表示[4,62]。系统化地研究表示方法[20,56]是不容易的,表示方法依赖于具体的应用及相关领域(这里的领域是指自然界或问题特征或数据表达类型)。例如一个时序信号、一个独立的表示2D对象的图像、在某背景下表示某个对象集的图像、一个3D对象重构或所收集到的医生诊断报告,这些都是完全不同的观察数据,需要运用不同的方法才能找到较好的表示方法。总之,如果模式识别问题的出发点没有被很好定义,在以后的学习处理中难以提高识别性能。所以,认真地研究表示方法问题是十分重要的。后面的几节会描述其中一些表示方法问题。
The use of vector spaces. Traditionally, objects are represented by vectors in a feature vector space. This representation makes it feasible to perform some generalization (with respect to this linear space), e.g. by estimating density functions for classes of objects. The object structure is, however, lost in such a description. If objects contain an inherent, identifiable structure or organization,then relations between their elements, like relations between neighboring pixels in an image, are entirely neglected. This also holds for spatial properties encoded by Fourier coefficients or wavelets weights. These original structures may be partially rediscovered by deriving statistics over a set of vectors representing objects, but these are not included in the representation itself. One may wonder whether the representation of objects as vectors in a space is not oversimplified to be able to reflect the nature of objects in a proper way. Perhaps objects might be better represented by convex bodies, curves or by other structures in a metric vector space. The generalization over sets of vectors,however, is heavily studied and mathematically well developed. How to generalize over a set of other structures is still an open question.
运用向量空间:传统地,对象被用特征向量空间中的向量来表示。这种表示方法比较适合执行推广操作(如线性空间),例如可以通过估计用于进行对象分类的密度函数。然而,这样的描述会丢失对象的结构信息。如果对象含有一个固有的、可以确认的结构或组织,则其中各元素之间的关系,例如图像中相邻象素间的关系,会完全被忽视,这对于在通过傅立叶系数或小波权重变换得到的空间特征中也会被忽视结构特征。通过在表示对象的向量集中进行统计可能会部份地重新发现原来的结构特征,但这些都没有包含在表示方法中。可能有人会怀疑用某空间中的向量来表示对象是否不会过分单纯地在某种程度上影响对象的自然性。或许对象通过凸集、凸曲线或其它可度量向量空间中的结构来表示会更好。然而,在向量集中进行推广研究是很复杂的,需要许多数学知识。怎样在其它结构表示体中进行推广仍然是一个未解决的问题。
The essential problem of the use of vector spaces for object representation is originally pointed out by Goldfarb [30, 33]. He prefers a structural representation in which the original object organization (connectedness of building structural elements) is preserved. However, as a generalization procedure for structural representations does not exist yet, Goldfarb starts from the evolving transformation systems [29] to develop a novel system [31]. As already indicated in Sec. 4.3 we see this as a possible direction for a future breakthrough.
运用向量空间来表示对象产生的基本问题最初是由Goldfarb指出来的[30,33]。他提倡用结构表示方法,这样对象原来的结构(组成对象的结构元素的连通性)可以被保留下来。然而针对结构化表示的推广方法还没有出现,于是Goldfarb以演进转化系统[29]为原理开发一个新奇的识别系统[31]。诸如在4.3节中所指明的那样,我们看到这是在未来进行突破性研究的可能方向。
Compactness. An important, but seldom explicitly identified property of representations is compactness [1]. In order to consider classes, which are bounded in their domains, the representation should be constrained: objects that are similar in reality should be close in their representations (where the closeness is captured by an appropriate relation, possibly a proximity measure). If this demand is not satisfied, objects may be described capriciously and, as a result, no generalization is possible. This compactness assumption puts some restriction on the possible probability density functions used to describe classes in a representation vector space. This, thereby, also narrows the set of possible classification problems. A formal description of the probability distribution of this set may be of interest to estimate the expected performance of classification procedures for an arbitrary problem.
紧性:这是很重要的,但很少有被明确地指明表示方法的性质是要紧性的[1]。为了能够识别可以进行区别的对象,表示方法应当这样被约束:现实中相似的对象在表示上也应当是相近的(这里的相近可以通过某种关系或估计方法来衡量)。如果这个条件不被满足,则对象的描述具有不稳定性,由此不可能进行推广。在向量空间表示方法中,紧性假设在对描述分类方法的未确定的概率密度函数上做了些限制,这样,由此也把所可能的分类问题集缩小了。对于一个特定的识别问题,对这个问题的概率密度分布的有效描述也许有助于对分类方法的性能估计。
In Sec. 3, we pointed out that the lack of a formal restriction of pattern recognition problems to those with a compact representation was the basis of pessimistic results like the No-Free-Lunch Theorem [81] and the classification error bounds resulting from the VC complexity measure [72, 73]. One of the main challenges for pattern recognition to find a formal description of compactness that can be used in error estimators the average over the set of possible pattern recognition problems.
在第三节中,我们曾指出,对于带有那些紧性表示方法的模式识别问题,缺乏有效限制是导致悲观结果的基本原因,就如没有免费的午餐理论所描述的那样,识别错误的反复产生源于VC维的复杂性[72,73]。寻找一个有效的紧性描述方法是模式识别中的一个主要的挑战问题之一,紧性描述可以被用在对所可能存在的模式识别问题的平均错误估计上。
Representation types. There exists numerous ways in which representations can be derived. The basic ‘numerical’ types are now distinguished as:
表示表示方法种类:已存在几种方法来用于表示方法中。基本的“数字化”描述类型区分如下:
• Features. Objects are described by characteristic attributes. If these attributes are continuous, the representation is usually compact in the corresponding feature vector space. Nominal, categorical or ordinal attributes may cause problems. Since a description by features is a reduction of objects to vectors, different objects may have identical representations, which may lead to class overlap.
特征:对象被描述成特征属性。如果这些属性是连续性的,则在相应的特征向量空间中的表示方法通常是紧密的。不重要的、绝对的或有序的属性可能会产生问题。既然一个特征描述是一个通过向量来对对象的约简,不同的对象可能会具有相同的表示,所以会导致种类交迭在一起。
• Pixels or other samples. A complete representation of an object may be approximated by its sampling. For images, these are pixels, for time signals,these are time samples and for spectra, these are wavelengths. A pixel representation is a specific, boundary case of a feature representation, as it describes the object properties in each point of observation.
象素或其它样本:对一个对象的完整表示方法可能是要通过取样来进行估值。对于图像,就是对象素点进行取样,对于时序信号,则是进行时间取样,对于光谱,则是对波长进行取样。象素表示方法是一种精细地带界线的特征表示方法,它描述了所观察到的对象的每一点的性质。
• Probability models. Object characteristics may be reflected by some probabilistic model. Such models may be based on expert knowledge or trained from examples. Mixtures of knowledge and probability estimates are difficult, especially for large models.
概率模型:对于某些建立在概率上的识别模型,对象特征的提取会有些困难。概率模型可能是基于专家知识或从样例中训练出来,把知识和概率估计综合起来是有困难的,特别是对于大模型。
• Dissimilarities, similarities or proximities. Instead of an absolute description by features, objects are relatively described by their dissimilarities to a collection of specified objects. These may be carefully optimized prototypes or representatives for the problem, but also random subsets may work well [56]. The dissimilarities may be derived from raw data, such as images, spectra or time samples, from original feature representations or from structural representations such as strings or relational graphs. If the dissimilarity measure is nonnegative and zero only for two identical objects, always belonging to the same class, the class overlap may be avoided by dissimilarity representations.
不同点,相似点或相近点:不采用取特征的绝对描述,可以通过对收集到的对象比较出相异点来描述对象。这些可能是经过严密最优化出来的可以解决问题的典型或代表数据,但其中的任意子集也可能很好地解决问题[56]。相异点可以从原始数据中得到,诸如图像、光谱或时序信号样本,也可以从原来的特征表示方法中得到,也可以从结构表示方法中得到,如字符串或相关联的图表。如果两个要区分的对象的相异性的值大于或等于零,则它们属于同一个种类,运用相异性表示方法要避免种类描述交迭在一起。
• Conceptual representations. Objects may be related to classes in various ways, e.g. by a set of classifiers, each based on a different representation, training set or model. The combined set of these initial classifications or clusterings constitute a new representation [56]. This is used in the area of combining clusterings [24, 25] or combining classifiers [49].
概念形式表示方法:对象可以通过各种形式与类别相关联,例如可以通过分类器集,其中每个分类器运用不同表示方法、不同的训练集或不同的识别模型。通过对这些最初的分类或聚类方法的组合来建立一个新的表示方法[56],这是运用在组合聚类和组合分类领域中。
In the structural approaches, objects are represented in qualitative ways. The most important are strings or sequences, graphs and their collections and hierarchical representations in the form of ontological trees or semantic nets.
在结构表示方法中,对象通过定性的方法被表示。这种方法最多表示在字符串或时序信息中,也最多表示在图表和图表集以及分等级的图表表示方法中,这类表示方法运用了本体树或语义网络的形式。
Vectorial object descriptions and proximity representations provide a good way for generalization in some appropriately determined spaces. It is, however, difficult to integrate them with the detailed prior or background knowledge that one has on the problem. On the other hand, probabilistic models and,especially, structural models are well suited for such an integration. The later,however, constitute a weak basis for training general classification schemes. Usually, they are limited to assigning objects to the class model that fits best, e.g. by the nearest neighbor rule. Other statistical learning techniques are applied to these if given an appropriate proximity measure or a vectorial representation space found by graph embeddings [79].
向量形式的对象描述和相似性的表示方法在某些适宜的决策空间中提供了较好的推广方法。然而,困难的是如何把跟问题有关的先验或背景知识结合起来。另一方面,概率模型,特别是结构模型能够非常好地把先验或背景知识结合起来,然而,相结合后,建立的是一个弱识别器,用于进行普通分类的训练。通常,在这个分类模型中可以达到最好分类(例如采用最邻近法则)的对象不多。如果有一个适当的相似性度量方法或一个结合图表形式的向量表示空间,统计学习技术则可以被用上。
It is a challenge to find representations that constitute a good basis for modeling object structure and which can also be used for generalizing from examples. The next step is to find representations not only based on background knowledge or given by the expert, but to learn or optimize them from examples.
在对象结构建模中建立一个较好的分类器,寻找一个相应的对象表示方法,且这个表示方法能够用于从用例中进行推广,这是一个挑战性问题。下面就来介绍如何找到不仅可以基于背景知识(或能由专家给出),也可以从用例中进行学习或最优化的表示方法。
5.2 Design Set
5.2 设计样本集
A pattern recognition problem is not only specified by a representation, but also by the set of examples given for training and evaluating a classifier in various stages. The selection of this set and its usage strongly influence the overall performance of the final system. We will discuss some related issues.
模式识别问题不仅与表示方法有关,也跟用于在分类器设计各个阶段进行训练和测试的用例集有关。用例集的选择和使用大大地影响了最后识别系统的整体性能。我们来讨论与此相关的一些问题。
Multiple use of the training set. The entire design set or its parts are used in several stages during the development of a recognition system. Usually,one starts from some exploration, which may lead to the removal of wrongly represented or erroneously labeled objects. After gaining some insights into the problem, the analyst may select a classification procedure based on the observations. Next, the set of objects may go through some preprocessing and normalization. Additionally, the representation has to be optimized, e.g. by a feature/object selection or extraction. Then, a series of classifiers has to be trained and the best ones need to be selected or combined. An overall evaluation may result in a re-iteration of some steps and different choices.
训练集的多方面运用:在开发一个识别系统的过程中,整个样本集或其中的一部分要被用在几个设计阶段。通常,在还是探索的开始阶段,可以排除那些被错误表示或不正确标识的对象。在对问题进行深入研究后,分析研究人员会基于观察数据选择某个分类方法。下一步,样本对象集就被用于一些预处理或归一化过程中。另外,表示方法在这过程中需要被优化,如进行特征/对象选择或提取。然后,一系列分类器得进行训练,训练后,其中最好的被选择出来或进行组合。最后反复在几个步骤和不同选择的方法中进行全面测试评估。
In this complete process the same objects may be used a number of times for the estimation, training, validation, selection and evaluation. Usually, an expected error estimation is obtained by a cross-validation or hold-out method [32, 77]. It is well known that the multiple use of objects should be avoided as it biases the results and decisions. Re-using objects, however, is almost unavoidable in practice. A general theory does not exist yet, that predicts how much a training set is ‘worn-out’ by its repetitive use and which suggests corrections that can diminish such effects.
在整个处理过程中,相同的对象可能要被使用好几次,被用于进行(参数)估计、训练、检验、选择和评估。通常,通过交叉验证和留取方法,可预见的误差可以被估计出来[32,77]。大家都知道应当避免样本对象的多次使用,因为这样会使识别结果和决策出现偏差。然而,在实践中对象的重复使用几乎是无法避免的。还没有这方面的通用理论,即预测训练集被重复使用多少次就不能用了,以及要做怎样的修正以减少这样的影响。
Representativeness of the training set. Training sets should be representative for the objects to be classified by the final system. It is common to take a randomly selected subset of the latter for training. Intuitively, it seems to be useless to collect many objects represented in the regions where classes do not overlap. On the contrary, in the proximity of the decision boundary, a higher sampling rate seems to be advantageous. This depends on the complexity of the decision function and the expected class overlap, and is, of course,inherently related to the chosen procedure.
训练集的典型性:训练集应当是具有代表性的对象集,以能够被最终识别系统识别。通常是随机地选取最新的样本子集进行训练。凭直觉,似乎收集许多表示在分类交迭处的对象是没有用的。相反地,在决策边界附近进行更高的取样率是有用的。这个依赖于决策问题的复杂度和分类的交迭程度,当然也跟方法的选择有关。
Another problem are the unstable, unknown or undetermined class distributions.Examples are the impossibility to characterize the class of non-faces in the face detection problem, or in machine diagnostics, the probability distribution of all casual events if the machine is used for undetermined production purposes. A training set that is representative for the class distributions cannot be found in such cases. An alternative may be to sample the domain of the classes such that all possible object occurrences are approximately covered. This means that for any object that could be encountered in practice there exists a sufficiently similar object in the training set, defined in relation to the specified class differences. Moreover, as class density estimates cannot be derived for such a training set, class posterior probabilities cannot be estimated. For this reason such a type of domain based sampling is only appropriate for non-overlapping classes. In particular, this problem is of interest for non-overlapping (dis)similarity based representations [18].
另一个问题是不稳定、未知或无法确定的种类分布问题。在人脸检测问题中,对于不是人脸的种类用例是无法进行描述它的特征的,或者在机器诊断中,如果诊断机器是被用在无法诊断的情况下,所有的偶然事件的概率分布是无法估计的,能够代表性地表示种类分布的训练集不可以出现在这些情况中。一个替代的方法是在所在类别范围内进行取样,这样对象所有可能出现的情况就可以近似地被覆盖到,意思是训练集取自种类分布的不同部分,对于在实践中任何可能被碰到的对象,在训练集中都相应存在一个非常相似的对象。更进一步地,因种类分布密度无法从那样的训练集中进行估计,种类的后验概率也无法被估计出来。因为这个原因,对于进行这样取样的分类方法只适合用于没有发生交迭的种类,更确切地说是表示方法上不存在相似性(或相异性)的交迭。
Consequently, we wonder whether it is possible to use a more general type of sampling than the classical iid sampling, namely the domain sampling. If so, the open questions refer to the verification of dense samplings and types of new classifiers that are explicitly built on such domains.
因此,我们想知道是否可能使用比独立同分布原则更为通用的取样方法,即域取样。如果可能,则还需要解决的问题是密集取样的验证问题,及显式地建立在域取样方法上的新分类方法选择问题。
5.3 Adaptation
5.3 适配
Once a recognition problem has been formulated by a set of example objects in a convenient representation, the generalization over this set may be considered, finally leading to a recognition system. The selection of a proper generalization procedure may not be evident, or several disagreements may exist between the realized and preferred procedures. This occurs e.g. when the chosen representation needs a non-linear classifier and only linear decision functions are computationally feasible, or when the space dimensionality is high with respect to the size of the training set, or the representation cannot be perfectly embedded in a Euclidean space, while most classifiers demand that. For reasons like these, various adaptations of the representation may be considered. When class differences are explicitly preserved or emphasized,such an adaptation may be considered as a part of the generalization procedure. Some adaptation issues that are less connected to classification are discussed below.
一旦识别问题通过一组用适宜的表示方法表示的用例对象集来形式化后,于是可能就要考虑在这个用例集上的推广问题,最后才产生一个识别系统。选择一个合适的推广方法可能不容易,或者在现实和理想之间存在一些冲突。例如会出现这样的情况,已选择的表示方法需要应用在非线性分类器上但计算上只有线性判断函数可行,或者如特征空间的维数很高导致了训练集数据很大,或者表示方法不能很好地结合到欧拉空间中,但大部分分类器需要在欧拉空间中进行计算。因为这些原因,对表示方法的各种适应性修改就要被考虑进来。当种类之间的区别需要被明确地保留或强调出来,这样适配方法可能要被考虑作为推广过程的一部份。下面介绍一些跟分类联系不太紧密的适配问题。
Problem complexity. In order to determine which classification procedures might be beneficial for a given problem, Ho and Basu [43] proposed to investigate its complexity. This is an ill-defined concept. Some of its aspects include data organization, sampling, irreducibility (or redundancy) and the interplay between the local and global character of the representation and/or of the classifier. Perhaps several other attributes are needed to define complexity such that it can be used to indicate a suitable pattern recognition solution to a given problem; see also [2].
问题复杂度:为了确定哪个分类方法可用于解决问题,Ho和Basu[43]建议考察问题的复杂度。这是一个不确切的概念。复杂度的问题关系到数据的组织方法、取样方法、还原性(或冗余性),还有表示方法和分类器中的局部和全局特征之间的相互影响。也许还有其它一些属性还需要被用来定义复杂度,这样才能够被用来确定针对某一问题的合适的模式识别解决方案,这方面问题可以查看文献[2]。
Selection or combining. Representations may be complex, e.g. if objects are represented by a large amount of features or if they are related to a large set of prototypes. A collection of classifiers can be designed to make use of this fact and later combined. Additionally, also a number of representations may be considered simultaneously. In all these situations, the question arises on which should be preferred: a selection from the various sources of information or some type of combination. A selection may be random or based on a systematic search for which many strategies and criteria are possible [49]. Combinations may sometimes be fixed, e.g. by taking an average, or a type of a parameterized combination like a weighted linear combination as a principal component analysis; see also [12, 56, 59].
选择或合并:表示方法可能是复杂的,例如,如果对象被一个很大的特征空间来表示,或者跟一个很大的原型集有关。分类器的选择是根据这个因素来进行选择,然后再合并选择出来的分类器。另外,多种表示方法也可能被同时考虑进来。在所有这些情形中,要优先考虑这个问题:要从多种途径来选择。选择方法可能是随机的,也可能是通过系统地寻找,这里有很多的选择策略和准则[49]。组合有时可能是用固定的方法,如通过取平均,或者参数化的组合,象用于主成份分析的带权值的线性组合,参见[12,56,59]。
The choice favoring either a selection or combining procedure may also be dictated by economical arguments, or by minimizing the amount of necessary measurements, or computation. If this is unimportant, the decision has to be made according to the accuracy arguments. Selection neglects some information,while combination tries to use everything. The latter, however, may suffer from overtraining as weights or other parameters have to be estimated and may be adapted to the noise or irrelevant details in the data. The sparse solutions offered by support vector machines [67] and sparse linear programming approaches [28, 35] constitute a way of compromise. How to optimize them efficiently is still a question.
不管是选择还是组合过程,选择方法的依据是实用与否,或是否能够最小化设计所需要的资源,或者跟计算复杂度有关。如果这些是不重要的,那就以识别准确性为依据。选择会忽略一些信息,而组合则试图避免丢失一些信息。然而,当权值或其它参数在被估计时为了适应数据中的噪音或无关信息则会产生过学习的问题。支持向量机[67]提出了稀疏解决办法,用稀疏线性规划方法来建立一个折衷的解决方法。如何有效地优化仍然还是一个问题。
Nonlinear transformations and kernels. If a representation demands or allows for a complicated, nonlinear solution, a way to proceed is to transform the representation appropriately such that linear aspects are emphasized. A simple (e.g. linear) classifier may then perform well. The use of kernels, see Sec. 3, is a general possibility. In some applications, indefinite kernels are proposed as being consistent with the background knowledge. They may result in non-Euclidean dissimilarity representations, which are challenging to handle;see [57] for a discussion.
非线性转化和核:如果一个表示方法需要或允许被复杂且非线性的方法来表示,则下一次需要对表示方法进行转化以可以用上线性的方法,于是一个简单(如线性)的分类器可以发挥很好作用。一般的方法是使用核(见第三节的描述)。在一些应用中,模糊核能够与背景知识相一致,但可能需要在非欧拉空间中的相异性表示方法,这是个待解决的挑战性问题,文献[57]中有这方面讨论。
5.4 Generalization
5.4 推广
The generalization over sets of vectors leading to class descriptions or discriminants was extensively studied in pattern recognition in the 60’s and 70’s of the previous century. Many classifiers were designed, based on the assumption of normal distributions, kernels or potential functions, nearest neighbor rules,multi-layer perceptrons, and so on [15, 45, 62, 76]. These types of studies were later extended by the fields of multivariate statistics, artificial neural networks and machine learning. However, in the pattern recognition community, there is still a high interest in the classification problem, especially in relation to practical questions concerning issues of combining classifiers, novelty detection or the handling of ill-sampled classes.
在上个世纪六七十年代,以种类描述或判定为目的的、在向量集上的推广方法被做了充分研究。许多分类器被设计了出来,这些分类器都是基于正态分布假设,运用核或势函数、最邻近原则、多层感知器等等方法[15,45,62,76]。这些在以后的多元统计、人工神经网络和机器学习中被更深入地研究了。然而,在模式识别科学研究中,分类问题仍然是一个吸引人去研究的问题,特别是在组合分类器、新奇对象检测或取样不全的分类中。
Handling multiple solutions. Classifier selection or classifier combination.Almost any more complicated pattern recognition problem can be solved in multiple ways. Various choices can be made for the representation,the adaptation and the classification. Such solutions usually do not only differ in the total classification performance, they may also make different errors. Some type of combining classifiers will thereby be advantageous [49]. It is to be expected that in the future most pattern recognition systems for real world problems are constituted of a set of classifiers. In spite of the fact that this area is heavily studied, a general approach on how to select, train and combine solutions is still not available. As training sets have to be used for optimizing several subsystems, the problem how to design complex systems is strongly related to the above issue of multiple use of the training set.
解决包含多识别器的方法,分类器选择或合并:几乎任何较为复杂的模式识别问题都可以通过多种方法来解决。因表示方法、适配方法和分类方法的不同会有多种选择方案。不同的选择方案不仅会产生不同整体分类性能,也可能产生不同的错误。对其中一些分类方法进行组合会产生较好的效果[49]。可以被预见在将来为解决现实世界问题的几乎所有的模式识别系统都是由一组识别器构建起来的。尽管在这方面已被研究了很多,但仍没有一个用于选择、训练和合并的通用方法。因为训练集要被用在优化几个子系统中,所以怎么设计综合性的系统跟前面提到的如何多次应用训练集的问题有很大的关系。
Classifier typology. Any classification procedure has its own explicit or built-in assumptions with respect to data inherent characteristics and the class distributions. This implies that a procedure will lead to relatively good performance if a problem fulfils its exact assumptions. Consequently, any classification approach has its problem for which it is the best. In some cases such a problem might be far from practical application. The construction of such problems may reveal which typical characteristics of a particular procedure are. Moreover, when new proposals are to be evaluated, it may be demanded that some examples of its corresponding typical classification problem are published, making clear what the area of application may be; see [19].
分类器种类研究:任何分类程序都它自己的明确的或内建的假设,这个是跟数据固有特性和种类分布有关的假设。这意味着如果能够完全满足这些严密的假设,分类程序可以具有相当好的性能。因此,任何分类方法都有如何确定哪个是最好的问题。在一些情况中,这样的问题可能跟实际应用关系不大。这种问题也可能是在一个特定识别过程选择哪个是典型特征的问题。还有,当评估一个新方法时,可能需要公开其中一些用于该识别问题的用例,搞清楚应用范围,详见文献[19]。
Generalization principles. The two basic generalization principles, see Section 4, are probabilistic inference, using the Bayes-rule [63] and the minimum description length principle that determines the most simple model in agreement with the observations (based on Occam’s razor) [37]. These two principles are essentially different. The first one is sensitive to multiple copies of an existing object in the training set, while the second one is not. Consequently,the latter is not based on densities, but just on object differences or distances.An important issue is to find in which situations each of these principle should be recommended and whether the choice should be made in the beginning, in the selection of the design set and the way of building a representation, or it should be postpone until a later stage.
推广原则:从第4节中可以看到有两个推广基本原则,一个是基于概率推导,运用贝叶斯法则[63],另一个最小化描述法则,选择与观察一致的最简单模型(Occam剃刀法则)[37]。这两个原则有本质的区别。第一种原则对于训练集中多次运用一个相同对象影响很大,但第二种原则则不会。由此,后面一种原则不是基于概率密度,只是根据对象的不同点或距离。一个重要的问题是如何发现哪种情况下应该用哪种原则,以及相应地怎么建立表示方法,或者到后面的步骤来做。
The use of unlabeled objects and active learning. The above mentioned principles are examples of statistical inductive learning, where a classifier is induced based on the design set and it is later applied to unknown objects. The disadvantage of such approach is that a decision function is in fact designed for all possible representations, whether valid or not. Transductive learning, see Section 4.3, is an appealing alternative as it determines the class membership only for the objects in question, while relying on the collected design set or its suitable subset [73]. The use of unlabeled objects, not just the one to be classified, is a general principle that may be applied in many situations. It may improve a classifier based on just a labeled training set. If this is understood properly, the classification of an entire test set may yield better results than the classification of individuals.
未标识对象和主动学习的运用:上面提到的原则是统计推理学习的模式,这样的分类器基于样本集进行推理,然后应用于未知对象中。这样方法的缺点是决策函数要从所有可能的对象表示进行设计,对每个可能对象进行判断。4.3节中的转化推理学习是一个吸引人的替代方法,它通过质询的方法来判断对象的类别归属,而不是依赖于所收集到的样本集或其中合适的子集[73]。
Classification or class detection. Two-class problems constitute the traditional basic line in pattern recognition, which reduces to finding a discriminant or a binary decision function. Multi-class problems can be formulated as a series of two-class problems. This can be done in various ways, none of them is entirely satisfactory. An entirely different approach is the description of individual classes by so-called one-class classifiers [69, 70]. In this way the focuss is given to class description instead of to class separation. This brings us to the issue of the structure of a class.
分类或种类甄别:二分类问题是模式识别传统的基本问题,它简化了寻找判别或二分决策函数方法。多分类问题可以用一系列的二分类问题来实现解决,关于这个相应地有多种方法可以用,但是没有一种方法可以完全让人满意的。有一种完全不同的方法则是用所谓的单类别分类器分别针对某个类别进行描述[69,70]。这个方法用类别描述来代替类别分离。这样就带来了种类结构的问题。
Traditionally classes are defined by a distribution in the representation space. However, the better such a representation, the higher its dimensionality, the more difficult it is to estimate a probability density function. Moreover, as we have seen above, it is for some applications questionable whether such a distribution exist. A class is then a part of a possible non-linear manifold in a high-dimensional space. It has a structure instead of a density distribution.It is a challenge to use this approach for building entire pattern recognition systems.
传统的种类被定义成为一个在表示空间中的分布。然而,这种表示方法表示得越全面,所需要的维数就越高,估计概率密度函数也就越困难。还有,正如上面我们所明白的,在某些应用中这样的(可分离的)分布是否存在也是让人怀疑的。于是我们把一个种类表示成一个在高维空间中可能为非线性拓扑空间的一部分,用结构的方法来表示,而不是用概率密度分布,用这个方法来建立一个完整的模式识别系统是具有挑战性的。
5.5 Evaluation
5.5 评估
Two questions are always apparent in the development of recognition systems. The first refers to the overall performance of a particular system once it is trained, and has sometimes a definite answer. The second question is more open and asks which good recognition procedures are in general.
在开发识别系统中明显存在两个问题。第一个问题跟一个特定系统整体性能有关,这个系统一旦经过训练后,就相应需要准确知道其性能。第二个问题更是还未被解决,即哪种识别方法在通用性上是好的。
Recognition system performance. Suitable criteria should be used to evaluate the overall performance of the entire system. Different measures with different characteristics can be applied, however, usually, only a single criterion is used. The basic ones are the average accuracy computed over all validation objects or the accuracy determined by the worst-case scenario. In the first case, we again assume that the set of objects to be recognized is well defined (in terms of distributions). Then, it can be sampled and the accuracy of the entire system is estimated based on the evaluation set. In this case, however,we neglect the issue that after having used this evaluation set together with the training set, a better system could have been found. A more interesting point is how to judge the performance of a system if the distribution of objects is ill-defined or if a domain based classification system is used as discussed above. Now, the largest mistake that is made becomes a crucial factor for this type of judgements. One needs to be careful, however, as this may refer to an unimportant outlier (resulting e.g. from invalid measurements).
识别系统性能:评估整个系统的整体性能需要相应合适的标准。可以采用具有不同特点的不同评估方法,然而,总是只有一个标准被用上。基本方法是计算所有被验证对象上的平均正确率,或者在较差环境下的准确率。在第一种方法中,我们又是假设被识别的对象集是经过明确定义的(在分布上),然后,依此进行取样,整个系统的识别率在这个评估集上被估计出来。然而,这个方法中,我们忽略了这样一个问题,即(误以为)把评估集和训练集一起用上后,就能够发现该识别系统是否更好。一个更为有意思的一点是如果对象的分布是不清楚的或者采用上面我们所讨论的分类系统,那么该怎么去判断这个系统的性能,于是用这种判断方法是最大的错误。然而,要注意的一点是,我们可能会被不重要的表面数据所误导(源于如不合理的评估方法)。
Practice shows that a single criterion, like the final accuracy, is insufficient to judge the overall performance of the whole system. As a result, multiple performance measures should be taken into account, possibly at each stage. These measures should not only reflect the correctness of the system, but also its flexibility to cope with unusual situations in which e.g. specific examples should be rejected or misclassification costs incorporated.
实践表明单一的评估标准,如以最终准确率为依据,对于判断整个系统的性能是不足够的。由此,应当采用多种性能评估方法,尽可能地运用在每个识别阶段。这些评估方法不只反映系统的正确性,也要反映非常情况下的适应性,例如对于特殊用例应当会拒识,或加入错识代价。
Prior probability of problems. As argued above, any procedure has a problem for which it performs well. So, we may wonder how large the class of such problems is. We cannot state that any classifier is better than any other classifier, unless the distribution of problems to which these classifiers will be applied is defined. Such distributions are hardly studied. What is done at most is that classifiers are compared over a collection of benchmark problems. Such sets are usually defined ad hoc and just serve as an illustration. The collection of problems to which a classification procedure will be applied is not defined. As argued in Section 3, it may be as large as all problems with a compact representation, but preferably not larger.
问题的先验概率:就如上面所讨论的,任何识别方法都有一个问题,即对于哪个种类会识别得很好。所以,我们可能会很想知道识别很好的种类有多少。除非分类的问题域被定义好,否则我们无法断定某个分类器一定会比其它分类性能要好。做得最多的是把分类器在一些基准问题集上进行比较。这样的问题集通常经过特别定义和只用于分析说明,但并不定义要用哪个分类方法。正如第三节中所讨论的那样,对于紧性表示方法这种问题跟其它所有问题一样是个大问题,但最好不是个更大的问题。
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
6 Discussion and Conclusions
6 讨论和结论
Recognition of patterns and inference skills lie at the core of human learning. It is a human activity that we try to imitate by mechanical means. There are no physical laws that assign observations to classes. It is the human consciousness that groups observations together. Although their connections and interrelations are often hidden, some understanding may be gained in the attempt of imitating this process. The human process of learning patterns from examples may follow along the lines of trial and error. By freeing our minds of fixed beliefs and petty details we may not only understand single observations but also induce principles and formulate concepts that lie behind the observed facts. New ideas can be born then. These processes of abstraction and concept formation are necessary for development and survival. In practice,(semi-)automatic learning systems are built by imitating such abilities in order to gain understanding of the problem, explain the underlying phenomena and develop good predictive models.
模式识别和推理技能是人类学习能力的核心所在。我们试图通过机器工具来模仿这样的人类活动,但是还没找到把观察数据对应到相关种类的物理规律。人类的知觉认识能够把观察数据划分一起,虽然机器识别和人类识别的联系和相互关系是不明显的,但是通过模仿人类识别过程可以让我们得到一些在识别方法上的理解。人类从用例中学习模式的过程是经过尝试和纠正错误的过程,通过充分发挥我们的智力,依靠已掌握的真理及细节,我们不仅能够理解观察到的数据,还能够推理和形式化隐藏在观察数据后面的法则及概念,然后能够产生新的想法,这些抽象和概念形成的过程是人类发展和生存的需要。实际上,(半)自动化学习系统正是通过模仿这样的能力建立起来的,通过模仿来获取对问题的理解、解释潜在的现象和开发出具有很好预测能力的模型。
It has, however, to be strongly doubted whether statistics play an important role in the human learning process. Estimation of probabilities, especially in multivariate situations is not very intuitive for majority of people. Moreover,the amount of examples needed to build a reliable classifier by statistical means is much larger than it is available for humans. In human recognition,proximity based on relations between objects seems to come before features are searched and may be, thereby, more fundamental. For this reason and the above observation, we think that the study of proximities, distances and domain based classifiers are of great interest. This is further encouraged by the fact that such representations offer a bridge between the possibilities of learning in vector spaces and the structural description of objects that preserve relations between objects inherent structure. We think that the use of proximities for representation, generalization and evaluation constitute the most intriguing issues in pattern recognition.
然而,让人强烈怀疑的是统计方法在人类学习过程中是否扮演一个重要的角色。概率估计,特别是在多元状态下完全不是成年人的本能。况且,通过统计手段来建立一个可靠的分类器,对于人类来说需要非常巨大的用例数目。因为这个原因和上面我们所观察到的,对于相似性、距离及有关分类器的其它研究是非常有意义的。如果能够找到都可以把向量空间的概率及对象的结构描述(对象内部结构间存在着相互关系)联系起来的表示方法,是更令人鼓舞的。我们认为有关表示方法、推广及评估方法的应用构成了模式识别中最引人兴趣的问题。
The existing gap between structural and statistical pattern recognition partially coincides with the gap between knowledge and observations. Prior knowledge and observations are both needed in a subtle interplay to gain new knowledge. The existing knowledge is needed to guide the deduction process and to generate the models and possible hypotheses needed by induction,transduction and abduction. But, above all, it is needed to select relevant examples and a proper representation. If and only if the prior knowledge is made sufficiently explicit to set this environment, new observations can be processed to gain new knowledge. If this is not properly done, some results may be obtained in purely statistical terms, but these cannot be integrated with what was already known and have thereby to stay in the domain of observations. The study of automatic pattern recognition systems makes perfectly clear that learning is possible, only if the Platonic and Aristotelian scientific approaches cooperate closely. This is what we aim for.
结构模式识别和统计模式识别间存在的差别部分地反映在知识和观察数据之间的差别上。先验知识和观察数据在识别中都需要,且互相影响。已知的知识被用来进行推理和建立生成模型,归纳推理、转化推理和溯因推理过程中还需要一些可能性的假设。但是,除此之外,还需要选择相关的用例和合适的表示方法。如果且只是如果先验知识在解决问题中是充分且明确的,则新的观察能够被用来产生新的发现。如果无法做到这样,则可以在纯粹的统计方法中得到结果,但是无法结合已经知道的知识,导致只局限于在观察数据上。自动模式识别系统的研究中已完全清楚:只有把柏拉图和亚里士多德科学研究方法紧密结合起来,(模仿人类的)学习才有可能实现。这是我们要达到的目标。