

本文为德国艾伯哈特-卡尔斯-图宾根大学(作者:Alina Kloss)的计算机科学硕士论文,共101页。



Detecting and identifying the differentobjects in an image fast and reliably is an important skill for interactingwith one’s environment. The main problem is that in theory, all parts of animage have to be searched for objects on many different scales to make surethat no object instance is missed. It however takes considerable time andeffort to actually classify the content of a given image region and both time andcomputational capacities that an agent can spend on classification are limited.

Humans use a process called visualattention to quickly decide which locations of an image need to be processed indetail and which can be ignored. This allows us to deal with the huge amount ofvisual information and to employ the capacities of our visual systemefficiently. For computer vision, researchers have to deal with exactly thesame problems, so learning from the behaviour of humans provides a promisingway to improve existing algorithms. In the presented master’s thesis, a modelis trained with eye tracking data recorded from 15 participants that were askedto search images for objects from three different categories. It uses a deepconvolutional neural network to extract features from the input image that arethen combined to form a saliency map. This map provides information about whichimage regions are interesting when searching for the given target object andcan thus be used to reduce the parts of the image that have to be processed indetail. The method is based on a recent publication of K¨ummerer et al., but incontrast to the original method that computes general, task independentsaliency, the presented model is supposed to respond differently when searchingfor different target categories.

1 引言

2 基础知识与相关工作回顾

3 研究方法

4 调节训练网络

5 评估与研究结果

6 讨论

7 未来工作展望

8 附录



