Emotion Recognition in ContextAbstractUnderstanding what a person is experiencing from herframe. of reference is essential in our everyday life. For thisreason, one can think that machines with this type of abilitywould interact better with people. However, there are nocurrent systems capable of understanding in detail people’semotional states. Previous research on computer vision torecognize emotions has mainly focused on analyzing the fa-cial expression, usually classifying it into the 6 basic emo-tions [11]. However, the context plays an important role inemotion perception, and when the context is incorporated,we can infer more emotional states. In this paper we presentthe “Emotions in Context Database” (EMOTIC), a datasetof images containing people in context in non-controlledenvironments. In these images, people are annotated with26 emotional categories and also with the continuous di-mensions valence, arousal, and dominance [21]. With theEMOTIC dataset, we trained a Convolutional Neural Net-work model that jointly analyses the person and the wholescene to recognize rich information about emotional states.With this, we show the importance of considering the con-text for recognizing people’s emotions in images, and pro-vide a benchmark in the task of emotion recognition in vi-sual context.1. IntroductionUnderstanding how people feel plays a crucial role insocial interaction. This capacity is necessary to perceive,anticipate and respond with care to people’s reactions. Weare remarkably skilled at this task and we regularly makeguesses about people’s emotions in our everyday life. Par-ticularly, when we observe someone, we can estimate a lotof information about that person’s emotional state, evenwithout any additional knowledge about this person. Asan example, take a look at the images of Figure 1. Let usput ourselves in these people’s situations and try to estimatewhat they feel. In Figure 1.a we can recognize an emo-tional state of anticipation, since this person is constantlylooking at the road to correctly adapt his trajectory. We canalso recognize that this person feels excitement and that heis engaged or absorbed with the activity he is performing.We can also say that the overall emotion that he is feelingis positive, he is active and he seems confident with the ac-tivity he is performing, so he is in control of the situation.Similar detailed estimations can be made about the peoplemarked with a red rectangle in the other images of Figure 1.Recognizing people’s emotional states from images is anactive area of research among the computer vision com-munity. Section 2 describes some of the recent works inthis topic. Overall, in the last years we observe an impres-sive progress in recognizing the 6 basic emotions (anger,disgust, fear, happiness, sadness, and surprise) from facialexpression. Some interesting efforts have been also in theunderstanding of body language and in the use body posefeatures to recognize some specific emotional states. How-ever, in this previous research on emotion recognition, thecontext of the subject is usually ignored.Some works in psychology show evidences on the im-portance of context in the perception of emotions [3]. Inmost of the cases, when we analyze a wider view insteadof focusing on the person, we can recognize additional af-fective information that can not be recognized if the contextis not taken into account. For example, in Figure 1(c), wecan see that the boy feels annoyance because he has to eatan apple while the girl next to him has chocolate, which issomething that he feels yearning (strong desire) for. Thepresence of the girl, the apple and the chocolate are nec-essary clues to understand well what he indicates with hisfacial expression.In fact, if we consider the context, we can make reason-able guesses about emotional states even when the face ofthe person not visible, as illustrated in Figures 1.b and 1.d.The person in the red rectangle of Figure 1.b is picking adoughnut and he probably feels yearning to eat it. He isparticipating in a social event with his colleagues, showingengagement. He is feeling pleasure eating the doughnutsand happiness for the relaxed break along with other peo-ple. In Figure 1.d, the person is admiring the beautiful land-scape with esteem. She seems to be enjoying the moment(hap代写Java语言、Emotion Recognition in Context作业代做、代写Java语言、数据库编程帮做piness), and she seems calmed and relaxed (peace). Wedo not know exactly what is on the people’s minds, but weare able to reasonably extract relevant affective informationjust by looking at them in their situations.This paper addresses the problem of recognizing emo-tional states of people in context. The first contribution ofour work is the Emotions in Context Database (EMOTIC),which is described in Section 2. The EMOTIC databaseis a collection of images with people in their context, an-notated according to the emotional states that an observercan estimate from the whole situation. Specifically, imagesare annotated with two complementary systems: (1) an ex-tended list of 26 affective categories that we collected, and(2)thecommoncontinuousdimensionsV alence,Arousal,and Dominance [21]. The combination of these two emo-tion representation approaches produces a detailed modelthat gets closer to the richness of emotional states that hu-mans can perceive [2].Using the EMOTIC database, we test a ConvolutionalNeural Network (CNN) model for recognizing emotions incontext. Section 4 describes the model, while Section 5presents our experiments. From our results, we make twointeresting conclusions. First, we see that the context con-tributes relevant information for emotional states recogni-tion. Second, we observed that combining categories andcontinuous dimensions during the training results in a morerobust system for recognizing emotional states.2. Related workMost of the research in computer vision to recognizeemotional states in people is contextualized in facial ex-pression analysis (e.g.,[4, 13]). We find a large variety ofmethods developed to recognize the 6 basic emotions de-fined by the psychologists Ekman and Friesen [11]. SomeofthesemethodsarebasedontheFacialActionCodingSys-tem [15, 29]. This system uses a set of specific localizedmovements of the face, called Action Units, to encode thefacial expression. These Action Units can be recognizedfrom geometric-based features and/or appearance featuresextracted from face images [23, 19, 12]. Recent works foremotion recognition based on facial expression use CNNsto recognize the emotions and/or the Action Units [4].Instead of recognizing emotion categories, some recentworks on facial expression [28] use the continuous dimen-sions of the VAD Emotional State Model [21] to representemotions. The VAD model describes emotions using 3 nu-merical dimensions: Valence (V), that measures how pos-itive or pleasant an emotion is, ranging from negative topositive; Arousal (A), that measures the agitation level ofthe person, ranging from non-active / in calm to agitated /ready to act; and Dominance (D) that measures the controllevelofthesituationbytheperson, rangingfrom submissive/ non-control to dominant / in-control. On the other hand,Duetal. [10]proposedasetof21facialemotioncategories,definedasdifferentcombinationsofthebasicemotions, like‘happily surprised’ or ‘happily disgusted’. This categoriza-tion gives more detail about the expressed emotion.Although most of the works in recognizing emotions arefocused on face analysis, there are a few works in computervision that address emotion recognition using other visualclues apart from the face. For instance, some works [22]consider the location of shoulders as additional informationto the face features to recognize basic emotions. More gen-erally, Schindler et al. [27] used the body pose to recognizethe 6 basic emotions, performing experiments on a smalldataset of non-spontaneous poses acquired under controlledconditions.In the recent years we also observed a significant emer-gence of affective datasets to recognize people’s emotions.The studies [17, 18] establish the relationship between af-fect and body posture using as ground truth the base-rateof human observers. The data consists of a spontaneoussubset acquired under a restrictive setting (people playingWii games). In EMOTIW challenge [7], AFEW database[8] focuses on emotion recognition in video frames takenfrom movies and TV shows, while the HAPPEI database[9] addresses the problem of group level emotion estima-tion. In this work we can see a first attempt to the usecontext for the problem of predicting happiness in groupsof people. Finally, the MSCOCO dataset has been recentlyannotated with object attributes [24], including some feel-转自:http://ass.3daixie.com/2018060769870039.html