期望损失机器学习
Machine learning (ML) has spread into many different fields and disciplines. Dipping your toes into a new field is the best way to grow and learn new things. The following is a summary of how researchers have applied machine learning to improve the lives of those who are deaf and hard of hearing.
机器学习(ML)已扩展到许多不同的领域和学科。 将脚趾浸入新领域是成长和学习新事物的最佳方法。 以下是研究人员如何应用机器学习来改善聋哑和听力障碍者的生活的总结。
论文(按顺序) (Papers (In order))
All of these papers are accessible without any university sponsorship or payment.
所有这些论文都可以在没有大学赞助或任何付款的情况下获得。
Why aren’t better assistive technologies available for those communicating using ASL?
为什么没有更好的辅助技术提供给使用ASL进行通信的人员?
Grammatical Facial Expressions Recognition with Machine Learning
机器学习中的语法面部表情识别
A Machine Learning Approach to Fitting Prescription for Hearing Aids
一种适合助听器处方的机器学习方法
AudioVision: Sound Detection for the Deaf and Hard-of-hearing
AudioVision:聋人和听力障碍者的声音检测
说话的手套 (Gloves that talk)
This article by Keith Kirkpatrick introduces problems that deaf and hard of hearing communities have when talking with people who do not know sign language.
Keith Kirkpatrick的这篇文章介绍了与不懂手语的人交谈时,聋哑人和听力较弱的社区所遇到的问题。
Robotics, NLP, ASL, Wearables
机器人技术,自然语言处理,ASL,可穿戴设备
口译问题 (Interpreting issues)
Hard of hearing individuals rely on interpreting services, either in person or online, to interact with the hearing world at the doctor's office, courtroom, or coffee shop. However, these services are not always available and online interpreting is plagued with the problems of mobile internet: slow, inconsistent, or non-existent.
难听的个人依靠亲自或在线的口译服务在医生办公室,法庭或咖啡厅与听力世界互动。 但是,这些服务并非总是可用,并且在线口译受到移动互联网问题的困扰:缓慢,不一致或不存在。
可能的解决方案 (A possible solution)
A solution to the lack of interpreters are gloves that can translate American Sign Language (ASL) to English. With motion sensors embedded the glove record the user's motions and translate the motion into the correct sign. Several different ML algorithms can be used to find the correct sign: K-means, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN).
缺乏口译员的一种解决方案是,可以将美国手语(ASL)翻译成英语的手套。 带有运动传感器的嵌入式手套可以记录用户的运动并将该运动转换为正确的手势。 可以使用几种不同的ML算法来找到正确的符号:K均值,卷积神经网络(CNN),递归神经网络(RNN)。
还有很长的路要走 (A long way to go)
SignAloud and BrightSign are two companies highlighted in the article. BrightSign is recognized as superior to SignAloud because users can record their own versions of signs for better translation. However, both of these products fall short of real interpretation because they do not consider facial expressions. Facial expressions are a huge part of ASL and a lot of meaning can be lost if they are not considered. This is why you see ASL interpreters taking off their masks while interpreting for officials.
SignAloud和BrightSign是本文中重点介绍的两家公司。 BrightSign被认为优于SignAloud,因为用户可以记录自己的标志版本以实现更好的翻译。 但是,这两种产品都没有真正的解释,因为它们没有考虑面部表情。 面部表情是ASL的重要组成部分,如果不考虑它们,可能会失去很多含义。 这就是为什么您会看到ASL口译员在为官员口译时脱下口罩的原因。
Additionally, gloves like these have been received negatively in the deaf community. The deaf community would like to see more deaf or hard of hearing individuals involved in the development of these products and not just the testing.
另外,在聋人社区也收到负面的这类手套。 聋人社区希望看到更多聋哑人士或听力障碍人士参与这些产品的开发,而不仅仅是测试。
检测语法面部表情 (Detecting Grammatical Facial Expressions)
Researchers from Brazil developed an ML model to detect Grammatical Facial Expressions (GFEs) in Libras, the Brazilian sign language.
来自巴西的研究人员开发了一种ML模型,以检测巴西手语天秤座中的语法面部表情(GFE)。
Computer Vision, MLP, Neural Networks
计算机视觉,MLP,神经网络
问题 (The problem)
Facial expressions are a huge part of every signed language. The same signs with but a different facial expression can change the sentence from a statement to a question or negate the whole thing. Unfortunately, most research focuses on detecting specific signs or predefined sentences.
面部表情是每种手语的重要组成部分。 具有相同表情但具有不同表情的表情可以将句子从陈述更改为问题,或者使整个内容无效。 不幸的是,大多数研究都集中在检测特定的符号或预定义的句子上。
This is a huge barrier for realtime automated translation.
这是实时自动化翻译的巨大障碍。
可能的解决方案 (Possible solution)
In this paper, researchers and deaf individuals defined nine different GFEs for the model to recognize. They recorded deaf individuals using these identified GFEs in a variety of different sentences and used this data to train a Multilayer Perceptron (MLP).
在本文中,研究人员和聋人为模型识别定义了九种不同的GFE。 他们使用这些已识别的GFE在各种不同的句子中记录了聋人,并使用此数据训练了多层感知器(MLP)。
结论 (Conclusions)
The results of their model were mixed. The model was able to recognize a few facial expressions very well but it was unable to recognize 4 of the 9 GFEs. The authors did make two observations.
他们的模型的结果好坏参半。 该模型能够很好地识别一些面部表情,但无法识别9个GFE中的4个。 作者的确做了两个观察。
- Temporal (time series) data is needed to recognize GFEs. (RNN>CNN) 需要时间(时间序列)数据来识别GFE。 (RNN> CNN)
- Not all GFEs are the same. Some require depth information and others find it noisy. 并非所有GFE都是一样的。 一些需要深度信息,而另一些则感到嘈杂。
This research is another stepping stone to achieving real-time translation from signed to spoken languages.
这项研究是实现从手语到口语的实时翻译的又一个垫脚石。
助听器优化 (Hearing Aid Optimization)
Researchers from Korea use National Acoustic Laboratory (NAL) data to create models that optimize the hearing aid fitting process.
韩国的研究人员使用国家声学实验室(NAL)数据创建了优化助听器拟合过程的模型。
Deep learning, ANN, Transfer learning
深度学习,人工神经网络,转移学习
安装助听器 (Fitting hearing aids)
The process of getting a hearing aid is similar to getting glasses. An audiologist plays a series of sounds for the patient asks them how it sounds and then adjusts the hearing aids accordingly.
获得助听器的过程类似于获得眼镜。 听力师会播放一系列声音,询问患者声音如何,然后相应地调整助听器。
This process is aided by hearing aid software from the NAL that can help with finding the right values. However, it’s still a guessing game that can be imperfect.
NAL的助听器软件可以辅助此过程,该软件可以帮助您找到正确的值。 但是,它仍然是一个不完美的猜谜游戏。
Having a better starting point means fewer adjustments and could lead to a better quality of life for hard of hearing individuals.
有一个更好的起点意味着更少的调整,并且可能会改善听力水平较差的人的生活质量。
输入和输出 (Inputs and Output)
The patient’s hearing loss information is used as inputs into the Neural Network (NN) and the insertion gains for the hearing aids are the outputs.
病人的听力损失信息用作神经网络(NN)的输入,助听器的插入增益是输出。
There are six different frequency bands that hearing aids are configured for. The authors encode hearing loss information (inputs) into 252 bits and use the six different frequencies as the outputs.
助听器配置有六个不同的频段。 作者将听力损失信息(输入)编码为252位,并使用六个不同的频率作为输出。
结果 (Results)
The authors found that a neural network model was able to predict the correct gain level for each frequency within a 1% error.
作者发现,神经网络模型能够在1%的误差范围内预测每个频率的正确增益水平。
定制声音检测 (Customized Sound Detection)
Students from the University of Washington created an app that uses ML to identify sounds specific to user environments.
华盛顿大学的学生创建了一个使用ML识别特定于用户环境的声音的应用程序。
In the short paper, the students outlined problems with generic sound detection apps that currently exist. Most of the time the apps are not unique to the individual or the environment which reduces the accuracy of sound detection.
在简短的论文中,学生概述了当前存在的通用声音检测应用程序存在的问题。 大多数情况下,应用程序并非个人或环境所独有,这会降低声音检测的准确性。
To resolve this issue, the students built an app that allows users to record sounds unique to their environment and have custom models trained on those sounds.
为了解决此问题,学生们构建了一个应用程序,该应用程序允许用户录制其环境所独有的声音,并针对这些声音训练定制模型。
The result was a sound detection system personalized to each deaf and hard-of-hearing person.
结果是为每个聋哑人和听力障碍者量身定制的声音检测系统。
机器学习与其他一切 (Machine Learning & Everything else)
翻译自: https://towardsdatascience.com/machine-learning-hearing-loss-d60dab084e3f
期望损失机器学习