\quad 文本匹配在信息检索、自动问答、对话系统当中有广泛的应用,这些任务都可以抽象成query和候选doc之间的匹配问题。工作期间我零零碎碎的去熟悉和掌握相关模型和方法,不过我还是觉得很有必要将这些东西系统的整理一遍。
\quad web检索引擎整体流程:
\quad 标红的部分即为文本匹配所在的位置,可以说是整个检索引擎的最核心部分。
\quad 传统的方法主要基于人工提取的特征,因此问题的焦点在于如何设置合适的文本匹配学习算法来学习到最优的匹配模型。
\quad 常用方法有:BM25、TF-IDF、偏最小二乘(PLS)、正则化隐空间映射(RMLS)、监督语义索引模型(SSI)、双语话题模型(BLTM)、统计机器翻译模型(SMT)。
\quad 与传统的机器学习方法相比,深度学习方法在四个方面有所改善:
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data
A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval
Convolutional Neural Network Architectures for Matching Natural Language Sentences
Convolutional neural tensor network architecture for community-based question answering
Deep Sentence Embedding Using the Long Short Term Memory Network: Analysis and Application to Information Retrieval
A Deep Architecture for Semantic Matching with Multiple Positional Sentence Representations
Text Matching as Image Recognition
Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN
End-to-End Neural Ad-hoc Ranking with Kernel Pooling
Convolutional neural networks for soft-matching N-grams in ad-hoc search
A Deep Relevance Matching Model for Ad-hoc Retrieval
Siamese Recurrent Architectures for Learning Sentence Similarity
A Decomposable Attention Model for Natural Language Inference
Enhanced LSTM for Natural Language Inference
Learning to Match using Local and Distributed Representations of Text for Web Search
Bilateral Multi-Perspective Matching for Natural Language Sentences
Natural Language Inference Over Interaction Space(DIIN)
Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information
Simple and Effective Text Matching with Richer Alignment Features
Modeling Multi-turn Conversation with Deep Utterance Aggregation
基本任务之一:BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
长文本匹配解决方法:Simple Applications of BERT for Ad Hoc Document Retrieval