同步公众号(arXiv每日学术速递)
【1】 Semi-supervised Neural Chord Estimation Based on a Variational Autoencoder with Discrete Labels and Continuous Textures of Chords
标题:基于离散标签和连续纹理的变分自动编码器的半监督神经和弦估计
作者: Yiming Wu, Kazuyoshi Yoshii
链接:https://arxiv.org/abs/2005.07091
【2】 FaceFilter: Audio-visual speech separation using still images
标题:FaceFilter:使用静止图像的视听语音分离
作者: Soo-Whan Chung, Hong-Goo Kang
链接:https://arxiv.org/abs/2005.07074
【3】 Converting Anyone’s Emotion: Towards Speaker-Independent Emotional Voice Conversion
标题:转换任何人的情绪:走向说话人独立的情感语音转换
作者: Kun Zhou, Haizhou Li
备注:Submitted to Interspeech 2020
链接:https://arxiv.org/abs/2005.07025
【4】 You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
标题:您不需要更多数据:通过文本到语音数据增强改善端到端语音识别
作者: Aleksandr Laptev, Sergey Rybin
备注:Submitted to Interspeech 2020
链接:https://arxiv.org/abs/2005.07157
【5】 ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification
标题:ECAPA-TDNN:基于TDNN的说话人确认中强调的信道关注、传播和聚集
作者: Brecht Desplanques, Kris Demuynck
备注:submitted to INTERSPEECH 2020
链接:https://arxiv.org/abs/2005.07143
【6】 Vibration Analysis in Bearings for Failure Prevention using CNN
标题:基于CNN的轴承故障预防振动分析
作者: Luis A. Pinedo-Sanchez, Carlos A. Carballo-Monsivais
链接:https://arxiv.org/abs/2005.07057
【7】 Classification of Infant Crying in Real-World Home Environments Using Deep Learning
标题:使用深度学习对真实家庭环境中的婴儿哭泣进行分类
作者: Xuewen Yao, Kaya de Barbaro
链接:https://arxiv.org/abs/2005.07036
【8】 DARTS-ASR: Differentiable Architecture Search for Multilingual Speech Recognition and Adaptation
标题:DARTS-ASR:用于多语言语音识别和适应的可区分体系结构搜索
作者: Yi-Chen Chen, Hung-yi Lee
链接:https://arxiv.org/abs/2005.07029
【9】 Foreground-Background Ambient Sound Scene Separation
标题:前景-背景环境声场景分离
作者: Michel Olvera (UL, Gilles Gasso (LITIS)
备注:Submitted to EUSIPCO 2020
链接:https://arxiv.org/abs/2005.07006
【10】 deepSELF: An Open Source Deep Self End-to-End Learning Framework
标题:Deep SELF:一种开源Deep self端到端学习框架
作者: Tomoya Koike, Yoshiharu Yamamoto
链接:https://arxiv.org/abs/2005.06993
【11】 The universality of skipping behaviours on music streaming platforms
标题:音乐流媒体平台上跳过行为的普遍性
作者: Jonathan Donier
链接:https://arxiv.org/abs/2005.06987
【12】 Consonant gemination in Italian: the nasal and liquid case
标题:意大利语中的辅音双生:鼻腔和液体病例
作者: Maria-Gabriella Di Benedetto, Luca De Nardis
链接:https://arxiv.org/abs/2005.06960
【13】 Consonant gemination in Italian: the affricate and fricative case
标题:意大利语中的辅音成对:仿音和摩擦格
作者: Maria Gabriella Di Benedetto, Luca De Nardis
链接:https://arxiv.org/abs/2005.06959
【14】 Streaming keyword spotting on mobile devices
标题:移动设备上的流关键字定位
作者: Oleg Rybakov, Stella Laurenzo
链接:https://arxiv.org/abs/2005.06720
【15】 Memory Controlled Sequential Self Attention for Sound Recognition
标题:用于声音识别的记忆控制顺序自我注意
作者: Arjun Pankajakshan, Emmanouil Benetos
备注:Submitted to INTERSPEECH 2020
链接:https://arxiv.org/abs/2005.0665
【16】 Automatic Estimation of Inteligibility Measure for Consonants in Speech
标题:语音辅音智能测度的自动估计
作者: Ali Abavisani, Mark Hasegawa-Johnson
备注:5 pages, 1 figure, 7 tables, submitted to Inter Speech 2020 Conference
链接:https://arxiv.org/abs/2005.06065
【17】 Generalized Multi-view Shared Subspace Learning using View Bootstrapping
标题:基于视图自举的广义多视图共享子空间学习
作者: Krishna Somandepalli, Shrikanth Narayanan
链接:https://arxiv.org/abs/2005.0603
【18】 Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
标题:Flowtron:一种基于自回归流的文语合成生成网络
作者: Rafael Valle, Bryan Catanzaro
链接:https://arxiv.org/abs/2005.05957
【19】 The IOA System for Deep Noise Suppression Challenge using a Framework Combining Dynamic Attention and Recursive Learning
标题:使用动态注意和递归学习相结合的框架的深度噪声抑制挑战的IOA系统
作者: Andong Li, Xiaodong Li
链接:https://arxiv.org/abs/2005.05855
【20】 Creative Quantum Computing: Inverse FFT, Sound Synthesis, Adaptive Sequencing and Musical Composition
标题:创造性量子计算:逆FFT,声音合成,自适应排序和音乐创作
作者: Eduardo R. Miranda
链接:https://arxiv.org/abs/2005.05832
【21】 AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN
标题:AdaDurIAN:用榴莲实现神经文本到语音的少发自适应
作者: Zewang Zhang, Shan Liu
备注:Submitted to InterSpeech 2020
链接:https://arxiv.org/abs/2005.05642
【22】 FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
标题:FeatherWave:一种高效的多频带线性预测高保真神经声码器
作者: Qiao Tian, Shan Liu
备注:Submitted to Interspeech2020
链接:https://arxiv.org/abs/2005.05551
【23】 Discriminative Multi-modality Speech Recognition
标题:可辨别的多模态语音识别
作者: Bo Xu, Jacob Wang
备注:Accepted to CVPR 2020
链接:https://arxiv.org/abs/2005.05592
【24】 DiscreTalk: Text-to-Speech as a Machine Translation Problem
标题:DiscreTalk:作为机器翻译问题的文本到语音
作者: Tomoki Hayashi, Shinji Watanabe
备注:Submitted to INTERSPEECH 2020. The demo is available on this https URL
链接:https://arxiv.org/abs/2005.05525
【25】 Exploring TTS without T Using Biologically/Psychologically Motivated Neural Network Modules (ZeroSpeech 2020)
标题:使用生物/心理动机神经网络模块探索没有T的TTS(ZeroSpeech 2020)
作者: Takashi Morita, Hiroki Koda
备注:Submitted to INTERSPEECH 2020
链接:https://arxiv.org/abs/2005.05487
【26】 Audio and Contact Microphones for Cough Detection
标题:用于咳嗽检测的音频和接触式麦克风
作者: Thomas Drugman, Thierry Dutoit
备注:arXiv admin note: substantial text overlap with arXiv:2001.00537
链接:https://arxiv.org/abs/2005.0531
【27】 Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
标题:多波段Melgan:用于高质量文本到语音转换的更快波形生成
作者: Geng Yang, Lei Xie
备注:Submitted to Interspeech2020
链接:https://arxiv.org/abs/2005.05106
【28】 Online Monaural Speech Enhancement Using Delayed Subband LSTM
标题:基于延迟子带LSTM的在线单声道语音增强
作者: Xiaofei Li, Radu Horaud
备注:Paper submitted to Interspeech 2020
链接:https://arxiv.org/abs/2005.05037
【29】 GACELA – A generative adversarial context encoder for long audio inpainting
标题:Gacela-一种用于长音频修复的生成性对抗上下文编码器
作者: Andres Marafioti, Nathanaël Perraudin
链接:https://arxiv.org/abs/2005.05032
【30】 Chirp Complex Cepstrum-based Decomposition for Asynchronous Glottal Analysis
标题:基于Chirp复倒谱分解的异步声门分析
作者: Thomas Drugman, Thierry Dutoit
链接:https://arxiv.org/abs/2005.04724
【31】 Cognitive-driven convolutional beamforming using EEG-based auditory attention decoding
标题:使用基于EEG的听觉注意解码的认知驱动卷积波束形成
作者: Ali Aroudi, Simon Doclo
链接:https://arxiv.org/abs/2005.04669
【32】 Dual-track Music Generation using Deep Learning
标题:使用深度学习的双轨音乐生成
作者: Sudi Lyu, Rong Song
链接:https://arxiv.org/abs/2005.04353
【33】 SpEx+: A Complete Time Domain Speaker Extraction Network
标题:SPEX+:一个完整的时域说话人提取网络
作者: Meng Ge, Haizhou Li
备注:submit to INTERSPEECH 2020
链接:https://arxiv.org/abs/2005.04686
【34】 From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
标题:从说话人确认到多说话人语音合成,带反馈约束的深度转移
作者: Zexin Cai, Ming Li
备注:Submitted to INTERSPEECH 2020
链接:https://arxiv.org/abs/2005.04587
【35】 Temporal-Framing Adaptive Network for Heart Sound Segmentation without Prior Knowledge of State Duration
标题:无需状态持续时间先验知识的时间组帧自适应网络心音分割算法
作者: Xingyao Wang, Gari D. Clifford
链接:https://arxiv.org/abs/2005.04426
【36】 Incremental Learning for End-to-End Automatic Speech Recognition
标题:用于端到端自动语音识别的增量学习
作者: Li Fu, Libo Zi
链接:https://arxiv.org/abs/2005.0428
【37】 Asteroid: the PyTorch-based audio source separation toolkit for researchers
标题:Asteroid:为研究人员提供的基于PyTorch的音频源分离工具包
作者: Manuel Pariente, Emmanuel Vincent
备注:Submitted to Interspeech 2020
链接:https://arxiv.org/abs/2005.04132
【38】 Neural Spatio-Temporal Beamformer for Target Speech Separation
标题:用于目标语音分离的神经时空波束形成器
作者: Yong Xu, Dong Yu
备注:submitted to Interspeech2020
链接:https://arxiv.org/abs/2005.03889
【39】 Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention
标题:使用基于CTC的软VAD和全局查询注意力的噪声鲁棒关键字识别和说话人验证的多任务网络
作者: Myunghun Jung, Hoirin Kim
备注:Submitted to Interspeech 2020
链接:https://arxiv.org/abs/2005.03867
【40】 Distilling Knowledge from Pre-trained Language Models via Text Smoothing
标题:通过文本平滑从预先训练的语言模型中提取知识
作者: Xing Wu, Dianhai Yu
链接:https://arxiv.org/abs/2005.0384