What is zero-shot learning?

以下摘自Quora上Ian Goodfellow(GAN提出者,牛~)

Zero-shot learning is being able to solve a task despite not having received any training examples of that task. For a concrete example, imagine recognizing a category of object in photos without ever having seen a photo of that kind of object before. If you’ve read a very detailed description of a cat, you might be able to tell what a cat is in a photograph the first time you see it.You can learn more about zero-shot learning in Sec. 15.2 of the Deep Learning textbook:http://www.deeplearningbook.org/contents/representation.html

译文:

zero-shot 学习是对于一个任务,尽管没有接收到任何相关的训练样例,但是依旧能够解决这个任务。举个例子,让你从相片里识别出一类对象,比如猫,而此前你并没有见过任何一张关于猫的相片。如果你阅读过非常详细的关于对猫的描述,你也许在第一次见到一张有猫的照片的时候就可以把它识别出来。

下附所给链接里representation的关于one-shot和zero-shot的详细解释

Two extreme forms of transfer learning areone-shot learningandzero-shotlearning, sometimes also calledzero-data learning. Only one labeled exampleof the transfer task is given for one-shot learning, while no labeled examples aregiven at all for the zero-shot learning task.One-shot learning (Fei-Fei et al., 2006) is possible because the representationlearns to cleanly separate the underlying classes during the first stage. During thetransfer learning stage, only one labeled example is needed to infer the label of manypossible test examples that all cluster around the same point in representationspace. This works to the extent that the factors of variation corresponding tothese invariances have been cleanly separated from the other factors, in the learnedrepresentation space, and we have somehow learned which factors do and do notmatter when discriminating objects of certain categories.As an example of a zero-shot learning setting, consider the problem of havinga learner read a large collection of text and then solve object recognition problems.
It may be possible to recognize a specific object class even without having seen animage of that object, if the text describes the object well enough. For example,having read that a cat has four legs and pointy ears, the learner might be able toguess that an image is a cat, without having seen a cat before.Zero-data learning (Larochelle et al., 2008) and zero-shot learning (Palatucciet al., 2009; Socher et al., 2013b) are only possible because additional informationhas been exploited during training. We can think of the zero-data learning scenarioas including three random variables: the traditional inputsx, the traditionaloutputs or targetsy, and an additional random variable describing the task,T.The model is trained to estimate the conditional distributionp(y | x, T) whereTis a description of the task we wish the model to perform. In our example ofrecognizing cats after having read about cats, the output is a binary variableywithy= 1 indicating “yes” andy= 0 indicating “no.” The task variableTthenrepresents questions to be answered such as “Is there a cat in this image?” If wehave a training set containing unsupervised examples of objects that live in thesame space asT, we may be able to infer the meaning of unseen instances ofT.In our example of recognizing cats without having seen an image of the cat, it isimportant that we have had unlabeled text data containing sentences such as “catshave four legs” or “cats have pointy ears.”Zero-shot learning requiresTto be represented in a way that allows some sortof generalization. For example,Tcannot be just a one-hot code indicating anobject category. Socher et al. (2013b) provide instead a distributed representationof object categories by using a learned word embedding for the word associatedwith each category.A similar phenomenon happens in machine translation (Klementiev et al., 2012;Mikolov et al., 2013b; Gouws et al., 2014): we have words in one language, andthe relationships between words can be learned from unilingual corpora; on theother hand, we have translated sentences which relate words in one language withwords in the other. Even though we may not have labeled examples translatingwordAin languageXto wordBin languageY, we can generalize and guess atranslation for wordAbecause we have learned a distributed representation forwords in languageX, a distributed representation for words in languageY, andcreated a link (possibly two-way) relating the two spaces, via training examplesconsisting of matched pairs of sentences in both languages. This transfer will bemost successful if all three ingredients (the two representations and the relationsbetween them) are learned jointly.Zero-shot learning is a particular form of transfer learning. The same principleexplains how one can performmulti-modal learning, capturing a representation in one modality, a representation in the other, and the relationship (in general a jointdistribution) between pairs (x, y) consisting of one observationxin one modalityand another observationyin the other modality (Srivastava and Salakhutdinov,2012). By learning all three sets of parameters (fromxto its representation, fromyto its representation, and the relationship between the two representations),concepts in one representation are anchored in the other, and vice-versa, allowingone to meaningfully generalize to new pairs. The procedure is illustrated in figure.

What is zero-shot learning?_第1张图片

Figure 15.3: Transfer learning between two domainsxandyenables zero-shot learning.Labeled or unlabeled examples ofxallow one to learn a representation functionfxandsimilarly with examples ofyto learnfy. Each application of thefxandfyfunctionsappears as an upward arrow, with the style of the arrows indicating which function isapplied. Distance inhxspace provides a similarity metric between any pair of pointsinxspace that may be more meaningful than distance inxspace. Likewise, distanceinhyspace provides a similarity metric between any pair of points inyspace. Bothof these similarity functions are indicated with dotted bidirectional arrows. Labeledexamples (dashed horizontal lines) are pairs (x, y) which allow one to learn a one-wayor two-way map (solid bidirectional arrow) between the representationsfx(x) and therepresentationsfy(y) and anchor these representations to each other. Zero-data learningis then enabled as follows. One can associate an imagextestto a wordytest, even if noimage of that word was ever presented, simply because word-representationsfy(ytest)and image-representationsfx(xtest) can be related to each other via the maps betweenrepresentation spaces. It works because, although that image and that word were neverpaired, their respective feature vectorsfx(xtest) andfy(ytest) have been related to eachother. Figure inspired from suggestion by Hrant Khachatrian.

你可能感兴趣的:(What is zero-shot learning?)