6 Discussion and Conclusions
6 讨论和结论
Recognition of patterns and inference skills lie at the core of human learning. It is a human activity that we try to imitate by mechanical means. There are no physical laws that assign observations to classes. It is the human consciousness that groups observations together. Although their connections and interrelations are often hidden, some understanding may be gained in the attempt of imitating this process. The human process of learning patterns from examples may follow along the lines of trial and error. By freeing our minds of fixed beliefs and petty details we may not only understand single observations but also induce principles and formulate concepts that lie behind the observed facts. New ideas can be born then. These processes of abstraction and concept formation are necessary for development and survival. In practice,(semi-)automatic learning systems are built by imitating such abilities in order to gain understanding of the problem, explain the underlying phenomena and develop good predictive models.
模式识别和推理技能是人类学习能力的核心所在。我们试图通过机器工具来模仿这样的人类活动,但是还没找到把观察数据对应到相关种类的物理规律。人类的知觉认识能够把观察数据划分一起,虽然机器识别和人类识别的联系和相互关系是不明显的,但是通过模仿人类识别过程可以让我们得到一些在识别方法上的理解。人类从用例中学习模式的过程是经过尝试和纠正错误的过程,通过充分发挥我们的智力,依靠已掌握的真理及细节,我们不仅能够理解观察到的数据,还能够推理和形式化隐藏在观察数据后面的法则及概念,然后能够产生新的想法,这些抽象和概念形成的过程是人类发展和生存的需要。实际上,(半)自动化学习系统正是通过模仿这样的能力建立起来的,通过模仿来获取对问题的理解、解释潜在的现象和开发出具有很好预测能力的模型。
It has, however, to be strongly doubted whether statistics play an important role in the human learning process. Estimation of probabilities, especially in multivariate situations is not very intuitive for majority of people. Moreover,the amount of examples needed to build a reliable classifier by statistical means is much larger than it is available for humans. In human recognition,proximity based on relations between objects seems to come before features are searched and may be, thereby, more fundamental. For this reason and the above observation, we think that the study of proximities, distances and domain based classifiers are of great interest. This is further encouraged by the fact that such representations offer a bridge between the possibilities of learning in vector spaces and the structural description of objects that preserve relations between objects inherent structure. We think that the use of proximities for representation, generalization and evaluation constitute the most intriguing issues in pattern recognition.
然而,让人强烈怀疑的是统计方法在人类学习过程中是否扮演一个重要的角色。概率估计,特别是在多元状态下完全不是成年人的本能。况且,通过统计手段来建立一个可靠的分类器,对于人类来说需要非常巨大的用例数目。因为这个原因和上面我们所观察到的,对于相似性、距离及有关分类器的其它研究是非常有意义的。如果能够找到都可以把向量空间的概率及对象的结构描述(对象内部结构间存在着相互关系)联系起来的表示方法,是更令人鼓舞的。我们认为有关表示方法、推广及评估方法的应用构成了模式识别中最引人兴趣的问题。
The existing gap between structural and statistical pattern recognition partially coincides with the gap between knowledge and observations. Prior knowledge and observations are both needed in a subtle interplay to gain new knowledge. The existing knowledge is needed to guide the deduction process and to generate the models and possible hypotheses needed by induction,transduction and abduction. But, above all, it is needed to select relevant examples and a proper representation. If and only if the prior knowledge is made sufficiently explicit to set this environment, new observations can be processed to gain new knowledge. If this is not properly done, some results may be obtained in purely statistical terms, but these cannot be integrated with what was already known and have thereby to stay in the domain of observations. The study of automatic pattern recognition systems makes perfectly clear that learning is possible, only if the Platonic and Aristotelian scientific approaches cooperate closely. This is what we aim for.
结构模式识别和统计模式识别间存在的差别部分地反映在知识和观察数据之间的差别上。先验知识和观察数据在识别中都需要,且互相影响。已知的知识被用来进行推理和建立生成模型,归纳推理、转化推理和溯因推理过程中还需要一些可能性的假设。但是,除此之外,还需要选择相关的用例和合适的表示方法。如果且只是如果先验知识在解决问题中是充分且明确的,则新的观察能够被用来产生新的发现。如果无法做到这样,则可以在纯粹的统计方法中得到结果,但是无法结合已经知道的知识,导致只局限于在观察数据上。自动模式识别系统的研究中已完全清楚:只有把柏拉图和亚里士多德科学研究方法紧密结合起来,(模仿人类的)学习才有可能实现。这是我们要达到的目标。
References
[1] A.G. Arkadev and E.M. Braverman. Computers and Pattern Recognition.
Thompson, Washington, DC, 1966.
[2] M. Basu and T.K. Ho, editors. Data Complexity in Pattern Recognition.
Springer, 2006.
[3] R. Bergmann. Developing Industrial Case-Based Reasoning Applications.
Springer, 2004.
[4] C.M. Bishop. Neural Networks for Pattern Recognition. Oxford
University Press, 1995.
[5] H. Bunke. Recent developments in graph matching. In International
Conference on Pattern Recognition, volume 2, pages 117–124, 2000.
[6] H. Bunke, S. G¨unter, and X. Jiang. Towards bridging the gap between
statistical and structural pattern recognition: Two new concepts in graph
matching. In International Conference on Advances in Pattern Recognition,
pages 1–11, 2001.
[7] H. Bunke and K. Shearer. A graph distance metric based on the maximal
common subgraph. Pattern Recognition Letters, 19(3-4):255–259, 1998.
[8] V.S. Cherkassky and F. Mulier. Learning from data: Concepts, Theory
and Methods. John Wiley & Sons, Inc., New York, NY, USA, 1998.
The Science of Pattern Recognition. Achievements and Perspectives 255
[9] T.M. Cover. Geomerical and statistical properties of systems of linear
inequalities with applications in pattern recognition. IEEE Transactions
on Electronic Computers, EC-14:326–334, 1965.
[10] T.M. Cover and P.E. Hart. Nearest Neighbor Pattern Classification.
IEEE Transactions on Information Theory, 13(1):21–27, 1967.
[11] T.M. Cover and J.M. van Campenhout. On the possible orderings in the
measurement selection problem. IEEE Transactions on Systems, Man,
and Cybernetics, SMC-7(9):657–661, 1977.
[12] I.M. de Diego, J.M. Moguerza, and A. Mu˜noz. Combining kernel information
for support vector classification. In Multiple Classifier Systems,
pages 102–111. Springer-Verlag, 2004.
[13] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society,
Series B, 39(1):1–38, 1977.
[14] L. Devroye, L. Gy¨orfi, and G. Lugosi. A Probabilistic Theory of Pattern
Recognition. Springer-Verlag, 1996.
[15] R.O. Duda, P.E. Hart, and D.G. Stork. Pattern Classification. John
Wiley & Sons, Inc., 2nd edition, 2001.
[16] R.P.W. Duin. Four scientific approaches to pattern recognition. In Fourth
Quinquennial Review 1996-2001. Dutch Society for Pattern Recognition
and Image Processing, pages 331–337. NVPHBV, Delft, 2001.
[17] R.P.W. Duin and E. P_ ekalska. Open issues in pattern recognition. In
Computer Recognition Systems, pages 27–42. Springer, Berlin, 2005.
[18] R.P.W. Duin, E. P_ ekalska, P. Pacl′ık, and D.M.J. Tax. The dissimilarity
representation, a basis for domain based pattern recognition? In L. Goldfarb,
editor, Pattern representation and the future of pattern recognition,
ICPR 2004 Workshop Proceedings, pages 43–56, Cambridge, United
Kingdom, 2004.
[19] R.P.W. Duin, E. P_ ekalska, and D.M.J. Tax. The characterization of classi-
fication problems by classifier disagreements. In International Conference
on Pattern Recognition, volume 2, pages 140–143, Cambridge, United
Kingdom, 2004.
[20] R.P.W. Duin, F. Roli, and D. de Ridder. A note on core research issues for
statistical pattern recognition. Pattern Recognition Letters, 23(4):493–
499, 2002.
[21] S. Edelman. Representation and Recognition in Vision. MIT Press,
Cambridge, 1999.
[22] B. Efron and R.J. Tibshirani. An Introduction to the Bootstrap. Chapman
& Hall, London, 1993.
[23] P. Flach and A. Kakas, editors. Abduction and Induction: essays on their
relation and integration. Kluwer Academic Publishers, 2000.
[24] A. Fred and A.K. Jain. Data clustering using evidence accumulation. In
International Conference on Pattern Recognition, pages 276–280, Quebec
City, Canada, 2002.
256 Robert P.W. Duin and El˙zbieta P_ekalska
[25] A. Fred and A.K. Jain. Robust data clustering. In Conf. on Computer
Vision and Pattern Recognition, pages 442 –451, Madison - Wisconsin,
USA, 2002.
[26] K.S. Fu. Syntactic Pattern Recognition and Applications. Prentice-Hall,
1982.
[27] K. Fukunaga. Introduction to Statistical Pattern Recognition. Academic
Press, 1990.
[28] G.M. Fung and O.L. Mangasarian. A Feature Selection Newton Method
for Support Vector Machine Classification. Computational Optimization
and Aplications, 28(2):185–202, 2004.
[29] L. Goldfarb. On the foundations of intelligent processes – I. An evolving
model for pattern recognition. Pattern Recognition, 23(6):595–616, 1990.
[30] L. Goldfarb, J. Abela, V.C. Bhavsar, and V.N. Kamat. Can a vector
space based learning model discover inductive class generalization in a
symbolic environment? Pattern Recognition Letters, 16(7):719–726, 1995.
[31] L. Goldfarb and D. Gay. What is a structural representation? Fifth
variation. Technical Report TR05-175, University of New Brunswick,
Fredericton, Canada, 2005.
[32] L. Goldfarb and O. Golubitsky. What is a structural measurement
process? Technical Report TR01-147, University of New Brunswick,
Fredericton, Canada, 2001.
[33] L. Goldfarb and J. Hook. Why classical models for pattern recognition are
not pattern recognition models. In International Conference on Advances
in Pattern Recognition, pages 405–414, 1998.
[34] T. Graepel, R. Herbrich, and K. Obermayer. Bayesian transduction. In
Advances in Neural Information System Processing, pages 456–462, 2000.
[35] T. Graepel, R. Herbrich, B. Sch¨olkopf, A. Smola, P. Bartlett, K.-R.
M¨uller, K. Obermayer, and R. Williamson. Classification on proximity
data with LP-machines. In International Conference on Artificial Neural
Networks, pages 304–309, 1999.
[36] U. Grenander. Abstract Inference. John Wiley & Sons, Inc., 1981.
[37] P. Gr¨unwald, I.J. Myung, and Pitt M., editors. Advances in Minimum
Description Length: Theory and Applications. MIT Press, 2005.
[38] B. Haasdonk. Feature space interpretation of SVMs with indefinite kernels.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
25(5):482–492, 2005.
[39] I. Hacking. The emergence of probability. Cambridge University Press,
1974.
[40] G. Harman and S. Kulkarni. Reliable Reasoning: Induction and Statistical
Learning Theory. MIT Press, to appear.
[41] S. Haykin. Neural Networks, a Comprehensive Foundation, second
edition. Prentice-Hall, 1999.
[42] D. Heckerman. A tutorial on learning with Bayesian networks.
In M. Jordan, editor, Learning in Graphical Models, pages 301–354. MIT
Press, Cambridge, MA, 1999.
The Science of Pattern Recognition. Achievements and Perspectives 257
[43] T.K. Ho and M. Basu. Complexity measures of supervised classification
problems. IEEE Transactions on Pattern Analysis and Machine Intelligence,
24(3):289–300, 2002.
[44] A. K. Jain and B. Chandrasekaran. Dimensionality and sample size considerations
in pattern recognition practice. In P. R. Krishnaiah and L. N.
Kanal, editors, Handbook of Statistics, volume 2, pages 835–855. North-
Holland, Amsterdam, 1987.
[45] A.K. Jain, R.P.W. Duin, and J. Mao. Statistical pattern recognition:
A review. IEEE Transactions on Pattern Analysis and Machine Intelligence,
22(1):4–37, 2000.
[46] T. Joachims. Transductive inference for text classification using support
vector machines. In I. Bratko and S. Dzeroski, editors, International
Conference on Machine Learning, pages 200–209, 1999.
[47] T. Joachims. Transductive learning via spectral graph partitioning. In
International Conference on Machine Learning, 2003.
[48] T.S. Kuhn. The Structure of Scientific Revolutions. University of Chicago
Press, 1970.
[49] L.I. Kuncheva. Combining Pattern Classifiers. Methods and Algorithms.
Wiley, 2004.
[50] J. Laub and K.-R. M¨uller. Feature discovery in non-metric pairwise data.
Journal of Machine Learning Research, pages 801–818, 2004.
[51] A. Marzal and E. Vidal. Computation of normalized edit distance
and applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 15(9):926–932, 1993.
[52] R.S. Michalski. Inferential theory of learning as a conceptual basis for
multistrategy learning. Machine Learning, 11:111–151, 1993.
[53] T. Mitchell. Machine Learning. McGraw Hill, 1997.
[54] Richard E. Neapolitan. Probabilistic reasoning in expert systems: theory
and algorithms. John Wiley & Sons, Inc., New York, NY, USA, 1990.
[55] C.S. Ong, S. Mary, X.and Canu, and Smola A.J. Learning with nonpositive
kernels. In International Conference on Machine Learning, pages
639–646, 2004.
[56] E. P_ ekalska and R.P.W. Duin. The Dissimilarity Representation for
Pattern Recognition. Foundations and Applications. World Scientific,
Singapore, 2005.
[57] E. P_ ekalska, R.P.W. Duin, S. G¨unter, and H. Bunke. On not making dissimilarities
Euclidean. In Joint IAPR International Workshops on SSPR
and SPR, pages 1145–1154. Springer-Verlag, 2004.
[58] E. P_ ekalska, P. Pacl′ık, and R.P.W. Duin. A Generalized Kernel Approach
to Dissimilarity Based Classification. Journal of Machine Learning
Research, 2:175–211, 2002.
[59] E. P_ ekalska, M. Skurichina, and R.P.W. Duin. Combining Dissimilarity
Representations in One-class Classifier Problems. In Multiple Classifier
Systems, pages 122–133. Springer-Verlag, 2004.
258 Robert P.W. Duin and El˙zbieta P_ekalska
[60] L.I. Perlovsky. Conundrum of combinatorial complexity. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 20(6):666–670, 1998.
[61] P. Pudil, J. Novovi′cova, and J. Kittler. Floating search methods in feature
selection. Pattern Recognition Letters, 15(11):1119–1125, 1994.
[62] B. Ripley. Pattern Recognition and Neural Networks. Cambridge
University Press, Cambridge, 1996.
[63] C.P. Robert. The Bayesian Choice. Springer-Verlag, New York, 2001.
[64] K.M. Sayre. Recognition, a study in the philosophy of artificial intelligence.
University of Notre Dame Press, 1965.
[65] M.I. Schlesinger and Hlav′ac. Ten Lectures on Statistical and Structural
Pattern Recognition. Kluwer Academic Publishers, 2002.
[66] B. Sch¨olkopf and A.J. Smola. Learning with Kernels. MIT Press,
Cambridge, 2002.
[67] J. Shawe-Taylor and N. Cristianini. Kernel methods for pattern analysis.
Cambridge University Press, UK, 2004.
[32] M. Stone. Cross-validation: A review. Mathematics, Operations and
Statistics, (9):127–140, 1978.
[69] D.M.J. Tax. One-class classification. Concept-learning in the absence
of counter-examples. PhD thesis, Delft University of Technology, The
Netherlands, 2001.
[70] D.M.J. Tax and R.P.W. Duin. Support vector data description. Machine
Learning, 54(1):45–56, 2004.
[71] F. van der Heiden, R.P.W. Duin, D. de Ridder, and D.M.J. Tax.
Classification, Parameter Estimation, State Estimation: An Engineering
Approach Using MatLab. Wiley, New York, 2004.
[72] V. Vapnik. Estimation of Dependences based on Empirical Data. Springer
Verlag, 1982.
[73] V. Vapnik. Statistical Learning Theory. John Wiley & Sons, Inc., 1998.
[74] L.-X. Wang and J.M. Mendel. Generating fuzzy rules by learning
from examples. IEEE Transactions on Systems, Man, and Cybernetics,
22(6):1414–1427, 1992.
[75] S. Watanabe. Pattern Recognition, Human and Mechanical. John Wiley
& Sons, 1985.
[76] A. Webb. Statistical Pattern Recognition. John Wiley & Sons, Ltd., 2002.
[77] S.M.Weiss and C.A. Kulikowski. Computer Systems That Learn. Morgan
Kaufmann, 1991.
[78] R.C. Wilson and E.R. Hancock. Structural matching by discrete relaxation.
IEEE Transactions on Pattern Analysis and Machine Intelligence,
19(6):634–648, 1997.
[79] R.C. Wilson, B. Luo, and E.R. Hancock. Pattern vectors from algebraic
graph theory. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 27(7):1112–1124, 2005.
[80] S. Wolfram. A new kind of science. Wolfram Media, 2002.
[81] D.H. Wolpert. The Mathematics of Generalization. Addison-Wesley,
1995.
The Science of Pattern Recognition. Achievements and Perspectives 259
[82] R.R. Yager, M. Fedrizzi, and J. (Eds) Kacprzyk. Advances in the
Dempster-Shafer Theory of Evidence. Wesley, 1994.
[83] C.H. Yu. Quantitative methodology in the perspectives of abduction,
deduction, and induction. In Annual Meeting of American Educational
Research Association, San Francisco, CA, 2006.