Our emotions influence every aspect of our lives -- how we learn, how we communicate, how we make decisions. Yet they’re absent from our digital lives; the devices and apps we interact with have no way of knowing how we feel. Scientist Rana el Kaliouby aims to change that. She demos a powerful new technology that reads your facial expressions and matches them to corresponding emotions. This “emotion engine” has big implications, she says, and could change not just how we interact with machines -- but with each other.
This talk was presented at an official TED conference, and was featured by our editors on the home page.
00:12
Our emotions influence every aspect of our lives, from our health and how we learn, to how we do business and make decisions, big ones and small. Our emotions also influence how we connect with one another. We've evolved to live in a world like this, but instead, we're living more and more of our lives like this -- this is the text message from my daughter last night -- in a world that's devoid of emotion. So I'm on a mission to change that. I want to bring emotions back into our digital experiences.
00:48
I started on this path 15 years ago. I was a computer scientist in Egypt, and I had just gotten accepted to a Ph.D. program at Cambridge University. So I did something quite unusual for a young newlywed Muslim Egyptian wife: With the support of my husband, who had to stay in Egypt, I packed my bags and I moved to England. At Cambridge, thousands of miles away from home, I realized I was spending more hours with my laptop than I did with any other human. Yet despite this intimacy, my laptop had absolutely no idea how I was feeling. It had no idea if I was happy, having a bad day, or stressed, confused, and so that got frustrating. Even worse, as I communicated online with my family back home, I felt that all my emotions disappeared in cyberspace. I was homesick, I was lonely, and on some days I was actually crying, but all I had to communicate these emotions was this. (Laughter) Today's technology has lots of I.Q., but no E.Q.; lots of cognitive intelligence, but no emotional intelligence. So that got me thinking, what if our technology could sense our emotions? What if our devices could sense how we felt and reacted accordingly, just the way an emotionally intelligent friend would? Those questions led me and my team to create technologies that can read and respond to our emotions, and our starting point was the human face.
02:30
So our human face happens to be one of the most powerful channels that we all use to communicate social and emotional states, everything from enjoyment, surprise, empathy and curiosity. In emotion science, we call each facial muscle movement an action unit. So for example, action unit 12, it's not a Hollywood blockbuster, it is actually a lip corner pull, which is the main component of a smile. Try it everybody. Let's get some smiles going on. Another example is action unit 4. It's the brow furrow. It's when you draw your eyebrows together and you create all these textures and wrinkles. We don't like them, but it's a strong indicator of a negative emotion. So we have about 45 of these action units, and they combine to express hundreds of emotions.
03:18
Teaching a computer to read these facial emotions is hard, because these action units, they can be fast, they're subtle, and they combine in many different ways. So take, for example, the smile and the smirk. They look somewhat similar, but they mean very different things. (Laughter) So the smile is positive, a smirk is often negative. Sometimes a smirk can make you become famous. But seriously, it's important for a computer to be able to tell the difference between the two expressions.
03:50
So how do we do that? We give our algorithms tens of thousands of examples of people we know to be smiling, from different ethnicities, ages, genders, and we do the same for smirks. And then, using deep learning, the algorithm looks for all these textures and wrinkles and shape changes on our face, and basically learns that all smiles have common characteristics, all smirks have subtly different characteristics. And the next time it sees a new face, it essentially learns that this face has the same characteristics of a smile, and it says, "Aha, I recognize this. This is a smile expression."
04:30
So the best way to demonstrate how this technology works is to try a live demo, so I need a volunteer, preferably somebody with a face. (Laughter) Cloe's going to be our volunteer today.
04:45
So over the past five years, we've moved from being a research project at MIT to a company, where my team has worked really hard to make this technology work, as we like to say, in the wild. And we've also shrunk it so that the core emotion engine works on any mobile device with a camera, like this iPad. So let's give this a try.
05:06
As you can see, the algorithm has essentially found Cloe's face, so it's this white bounding box, and it's tracking the main feature points on her face, so her eyebrows, her eyes, her mouth and her nose. The question is, can it recognize her expression? So we're going to test the machine. So first of all, give me your poker face. Yep, awesome. (Laughter) And then as she smiles, this is a genuine smile, it's great. So you can see the green bar go up as she smiles. Now that was a big smile. Can you try a subtle smile to see if the computer can recognize? It does recognize subtle smiles as well. We've worked really hard to make that happen. And then eyebrow raised, indicator of surprise. Brow furrow, which is an indicator of confusion. Frown. Yes, perfect. So these are all the different action units. There's many more of them. This is just a slimmed-down demo. But we call each reading an emotion data point, and then they can fire together to portray different emotions. So on the right side of the demo -- look like you're happy. So that's joy. Joy fires up. And then give me a disgust face. Try to remember what it was like when Zayn left One Direction. (Laughter) Yeah, wrinkle your nose. Awesome. And the valence is actually quite negative, so you must have been a big fan. So valence is how positive or negative an experience is, and engagement is how expressive she is as well. So imagine if Cloe had access to this real-time emotion stream, and she could share it with anybody she wanted to. Thank you. (Applause)
06:45
So, so far, we have amassed 12 billion of these emotion data points. It's the largest emotion database in the world. We've collected it from 2.9 million face videos, people who have agreed to share their emotions with us, and from 75 countries around the world. It's growing every day. It blows my mind away that we can now quantify something as personal as our emotions, and we can do it at this scale.
07:11
So what have we learned to date? Gender. Our data confirms something that you might suspect. Women are more expressive than men. Not only do they smile more, their smiles last longer, and we can now really quantify what it is that men and women respond to differently. Let's do culture: So in the United States, women are 40 percent more expressive than men, but curiously, we don't see any difference in the U.K. between men and women. (Laughter) Age: People who are 50 years and older are 25 percent more emotive than younger people. Women in their 20s smile a lot more than men the same age, perhaps a necessity for dating. But perhaps what surprised us the most about this data is that we happen to be expressive all the time, even when we are sitting in front of our devices alone, and it's not just when we're watching cat videos on Facebook. We are expressive when we're emailing, texting, shopping online, or even doing our taxes.
08:17
Where is this data used today? In understanding how we engage with media, so understanding virality and voting behavior; and also empowering or emotion-enabling technology, and I want to share some examples that are especially close to my heart. Emotion-enabled wearable glasses can help individuals who are visually impaired read the faces of others, and it can help individuals on the autism spectrum interpret emotion, something that they really struggle with. In education, imagine if your learning apps sense that you're confused and slow down, or that you're bored, so it's sped up, just like a great teacher would in a classroom. What if your wristwatch tracked your mood, or your car sensed that you're tired, or perhaps your fridge knows that you're stressed, so it auto-locks to prevent you from binge eating. (Laughter) I would like that, yeah. What if, when I was in Cambridge, I had access to my real-time emotion stream, and I could share that with my family back home in a very natural way, just like I would've if we were all in the same room together?
09:27
I think five years down the line, all our devices are going to have an emotion chip, and we won't remember what it was like when we couldn't just frown at our device and our device would say, "Hmm, you didn't like that, did you?" Our biggest challenge is that there are so many applications of this technology, my team and I realize that we can't build them all ourselves, so we've made this technology available so that other developers can get building and get creative. We recognize that there are potential risks and potential for abuse, but personally, having spent many years doing this, I believe that the benefits to humanity from having emotionally intelligent technology far outweigh the potential for misuse. And I invite you all to be part of the conversation. The more people who know about this technology, the more we can all have a voice in how it's being used. So as more and more of our lives become digital, we are fighting a losing battle trying to curb our usage of devices in order to reclaim our emotions. So what I'm trying to do instead is to bring emotions into our technology and make our technologies more responsive. So I want those devices that have separated us to bring us back together. And by humanizing technology, we have this golden opportunity to reimagine how we connect with machines, and therefore, how we, as human beings, connect with one another.
10:57
Thank you.
11:00
(Applause)
00:12
我们的情感影响着我们生活的方方面面, 它影响我们的健康,影响我们如何学习、做生意以及做决定, 影响着大大小小各各方面。 我们的情感还影响着我们与他人的联系的方式。 我们进化成可以生活在现在这样的世界, 然而我们却越来越生活成这样子—— 这是我女儿昨晚给我发的短信—— 这是个缺乏情感的世界。 所以我现在正致力于改变那种情况。 我想把情感带回到我们的数字体验中来。
00:48
15年前我就开始走上了这条道路。 那时我是一个生活在埃及的计算机科学家, 并且刚刚接受了剑桥大学的博士学位项目。 我做了一件对于一个年轻的 埃及穆斯林新婚妻子来说非常不寻常的事情: 我的丈夫不能离开埃及,但在他的支持下, 我独自收拾行李搬到英国去了。 在离家数千里之外的剑桥, 我意识到我花在笔记本电脑上的时间 要多于我与其他人相处的时间。 然而尽管我和电脑如此亲密,电脑却对我的感受毫无所知。 它根本不知道我是快乐, 还是经历着糟糕的一天,或者是感到有压力、困惑, 这就很让人不爽。 而且更糟的是,当我回家后在线跟家人聊天时, 我觉得我所有的情感都在网络空间中消失了。 我想家,我感到孤独,而且有些日子我真的哭了, 而我也仅仅只能用这个表情来表达我的情感。 (笑声) 现今有很多技术具有智商,但是还没有具有情商的, 很多技术具有认知性智能,但还没有具有情绪性智能的。 这让我想到, 如果我们的技术可以识别我们的情绪将会怎样? 如果我们的设备能识别我们的感受并做出相应的反应, 就像情商高的朋友所做的那样将会怎样? 这些问题引导着我和我的团队 去创造可以阅读我们的情绪并做出反应的技术, 我们的起点是人脸。
02:30
人脸是交流的最强大的渠道之一, 我们所有人都用它来表达社会和情绪状态, 从喜悦、惊讶 到同情、好奇等等。 在情感科学中,我们将每一个面肌运动称为一个动作单元。 例如,动作单元12, 这不是好莱坞大片, 这就是简单的嘴角上扬,它是微笑的主要构成。 大家都试一下。让我们都微笑起来。 另一个例子是动作单元4。它是眉间纹。 当你将眉毛拧到一起的时候 你就创造出了这些纹理和皱纹。 我们不喜欢它,但它是一个非常强的负面情绪指示器。 我们大概有45个这样的单元, 它们的组合可以表达上百种情绪。
03:18
教会电脑去读取这些面部情绪很难, 因为这些动作单元行动很微妙,而且稍纵即逝, 而且它们有很多的组合方式。 例如,微笑和假笑。 它们看起来有几分相似,但意味却是天差地别。 (笑声) 微笑是正面的, 假笑常常是负面的。 有时一个假笑可以让你出名。 但是严肃地讲,让电脑能够 描述这两种表情的区别是很重要的。
03:50
那我们是如何做的呢? 我们给我们的算法 成千上万的不同种族、年龄和性别的人们 正在微笑的例子, 然后我们也用同样的方法研究假笑。 然后使用深度学习, 算法可以观察我们脸上的所有这些纹理和皱纹 以及形状变化, 并且基本上得知所有的微笑都有共同特性, 而所有的假笑都有些微的不同特性。 然后下一次当它看到一个新面孔时, 它就基本上能知道 这张面孔上有和微笑相同的特性, 然后它就会说:“啊哈,我知道了,这是一个微笑的表情。”
04:30
所以展示这种技术如何工作的最好方式 是来一个现场演示, 所以我需要一位志愿者,最好是个“有脸”的人。 (笑声) 克洛将成为我们今天的志愿者。
04:45
在过去的5年间,我们从只是麻省理工学院的一个研究项目 到成立一个公司, 在公司里我的团队非常非常努力地工作以使这项技术成功, 就像我们说的那样,我们在荒野里生存。 我们还将它缩小了,这样的话这个核心情绪引擎 就能在一个带摄像头的移动设备上运行,比如这个iPad。 让我们来试一试。
05:06
正如你们看到的,此算法基本上找到了克洛的脸, 就是这个白色的边界框, 它在跟踪她脸上的主要特征点, 她的眉毛、眼睛、嘴巴和鼻子。 问题是,它能识别她的表情吗? 那么我们测试一下这台机器。 首先,做一个面无表情的样子。嗯,好极了。(笑声) 然后当她微笑时,这是一个真诚的微笑,很好。 大家可以看到当她微笑时这些绿条增长了。 这是一个大大的微笑。 你能试着轻轻微笑一下,看看电脑能否识别出来吗? 它确实也能识别轻轻的微笑。 我们付出了很多的努力才使它能够做到这些。 眉毛上扬,是惊喜的标志。 眉间的皱纹,是困惑的标志。 皱眉。嗯,很完美。 这些都是不同的行动单元。还有很多这样的行动单元。 这只是一个小型的演示。 我们称每一次读取为一个情感数据点, 然后它们可以组合在一起来描绘不同的情绪。 因此在演示的右边,你看起来很开心。 那表示快乐,快乐就被启动了。 再做一个厌恶的表情。 试着回想一下当泽恩离开单向乐队时的情景。 (笑声) 是的,皱一下鼻。很好。 而“抗体效价”一项也呈现负值,因此你一定是他们的铁杆粉丝。 抗体效价是用来描述一种体验的积极或消极程度的, 而“参与度”是用来描述她的表现力的。 所以大家可以想象一下如果克洛能够使用这种实时的情感流, 并且能分享给任何她想分享的人的情景。 谢谢。 (掌声)
06:45
迄今为止,我们已经积累了120亿这种情感数据点。 这是世界上最大的情感数据库。 我们是从两百九十万个面部视频中去收集的, 这些视频来自那些同意将他们的情感与我们一起分享的人们, 并且这些人们来自全世界75个国家。 它每天都在发展。 它发散了我的思维: 原来我们可以将情绪这么个性化的东西进行量化, 并且是在这样的规模下去做这件事。
07:11
到现在我们从这些数据中学到了什么呢? 性别差异。 我们的数据证实了某些你可能正在猜测的事情。 女性比男性更具表现力。 不仅是她们笑得更多,更因为她们笑得更久, 并且我们现在可以真实地量化男性和女性 在反应方面的差异性。 让我们从文化方面来看:在美国, 女性的表现力要比男性高40%, 但奇怪的是,在英国我们看不到男女在这方面的任何差异。 (笑声) 在年龄方面:50岁及以上的人 情绪化比小于50岁的人高25%。 女性在20来岁的时候要比同龄的男性笑得更多, 也许这是约会的必需品。 但也许这些数据带给我们最大的惊喜是 我们每时每刻都在表达, 即使当我们独自坐在电子设备前, 而且不仅是我们在脸书上看猫的视频时。 不管我们在发邮件、发短信、网购,甚至报税的时候 我们无时无刻不在表达自己。
08:17
那么如今这些数据用在何处呢? 用在弄明白我们如何和传媒结合, 从而搞明白网络扩散和投票行为, 以及情绪授权技术。 我想分享一些触动我心的例子。 情绪授权可佩戴眼镜 可以帮助那些视力受损的人读懂他人的脸部表情, 也可帮助患有自闭症的人们解读情绪, 因为解读情绪对他们来说是很困难的。 在教育方面,想象如果你的学习类应用程序 察觉出你有困惑,应用程序会放慢速度, 或者你无聊了,它则会加快进程, 就像教室里经验丰富的老师一样。 再想象一下你的手表可以感知你的情绪, 或你的车可以觉察出你疲惫了, 或者说你的冰箱知道你有压力, 所以它会自动上锁防止你暴饮暴食。(笑声) 我会喜欢这个的,没错。 设想当我在剑桥时, 我可以连接到实时情绪流, 我可以和我家里的亲人 用很自然的方式分享一些东西, 就像我和家人在同一间房里所做的事一样将会怎样?
09:27
我猜想也就在五年后, 所有的电子设备都会有一个情绪芯片, 我们将会体验到我们皱眉后电子设备回应 “嗯,你不喜欢这个,对吧?” 这一举动实现时的感受。 我们最大的挑战就是 现在关于这方面的科技有许多用途, 我和我的团队意识到我们无法 靠我们自己就把所有事情都完成, 所以我们把这项科技开放, 这样其他开发者就能创造创新。 我们知道这有潜在的风险, 还有可能被滥用, 但就我个人来说,花了这么多年做这件事, 我相信情绪智能技术 给人类带来的好处 远超过被滥用的可能性。 所以我邀请大家一起加入。 越多的人知道这项技术, 我们就越能说出如何使用的想法。 所以随着我们的生活越来越数字化, 我们其实在打一场处于劣势的战争,试图去控制我们的电子设备的用途 从而开拓我们的情绪。 所以相反地,我所做的就是把情绪带到我们的科技中 让我们的科技更加有响应性。 我想要那些把我们分离开来的电子设备 重新把我们聚在一起。 现在是黄金时机,我们可以通过人性化科技 重新想象我们该如何和这些机器交流结合, 从而重新想象,作为人类的我们 如何与彼此交流结合。
10:57
谢谢。
11:00
(掌声)