At Embibe (AI platform for learning outcomes), we are leveraging modern NLP to solve problems like content ingestion, knowledge graph completion, smart meta tagging, question generation, question answering, concepts summarisation, conversational assistants for students, vernacular academic translation, evaluation of descriptive answers, etc. Applying modern NLP for real world applications demands interpretability, to make the system more transparent, explainable and robust. Let’s look into Rise of Modern NLP and the Need of Interpretability!
在 Embibe (学习成果的AI平台)上,我们正在利用现代NLP解决诸如内容摄取,知识图完成,智能元标记,问题生成,问题回答,概念总结,学生的对话助手,白话学术翻译,语言能力评估等问题在现实世界中应用现代NLP要求具有可解释性,以使系统更加透明,可解释和更强大。 让我们看一下现代NLP的兴起和可解释性的需求!
Modern NLP is at the forefront of computational linguistics, which is concerned with computational modelling of natural language.
现代自然语言处理处于计算语言学的最前沿,它与自然语言的计算建模有关。
Chomsky’s apprehension on the potential of Computational Linguistics during the 1950s, specifically on the theoretical foundation of those statistical models, was something analogous to Einstein’s reaction to Quantum Physics, “God does not play dice”. These are pivotal moments when the world witnessed the rise of alternative theories. However, by all means, the foundation laid by Chomsky for linguistics theory still remains relevant and aid in progress, analysis, and understanding of the computational linguistics.
乔姆斯基对1950年代计算语言学的潜力的担忧,特别是在那些统计模型的理论基础上,与爱因斯坦对量子物理学的React类似, “ 上帝不玩骰子”。 这些是世界目睹替代理论兴起的关键时刻。 但是,无论如何,乔姆斯基为语言学理论奠定的基础仍然具有现实意义,并有助于进步,分析和理解计算语言学。
“It’s true there’s been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success … which I think is novel in the history of science. It interprets success as approximating unanalyzed data.” — Noam Chomsky
“的确,在尝试将统计模型应用于各种语言问题方面,有很多工作要做。 我认为已经取得了一些成功,但有很多失败。 有一个成功的概念……我认为这在科学史上是新颖的。 它将成功解释为近似于未分析的数据。” —诺姆·乔姆斯基
He mentioned, the notion of success is not a success. Well, the lacunae could be the theoretical foundations, but empirically, it could be thought of as the “Interpretability”, which accounts for analysability, transparency, accountability, and explainability of these computational models.
他提到,成功的概念不是成功。 好吧,这个空缺可能是理论基础,但是从经验上讲,它可以被认为是“可解释性”,它解释了这些计算模型的可分析性,透明性,可解释性和可解释性。
The major advancement of Computational linguistics is attributed to three subsequent phases: statistical modeling, classical machine learning, and deep learning. These phases are increasingly complex for interpretability.
计算语言学的主要进步归功于三个随后的阶段:统计建模,经典机器学习和深度学习。 这些阶段对于可解释性而言越来越复杂。
Statistical modeling dealt with statistical analysis and inference from data, and it got predictive power in the form of machine learning. There are three important aspects of solving problems using Machine Learning,
统计建模处理统计分析和从数据推断,并以机器学习的形式获得了预测能力。 使用机器学习解决问题有三个重要方面,
- Designing Input Features. 设计输入功能。
- Deriving Features Representation. 派生特征表示。
- Architecting Model Internals. 架构内部模型。
Classic ML techniques have always given a sense of control as features were explicitly specified, and mostly, driven from human intuition. Features representation that used to be aggregative and statistical in nature, was also under the realm of interpretability, i.e. Tf-Idf based vector representation, etc. ML models like decision trees, logistic regression, support vector machines, or other parametric models were also easy to reason with. Extensions of these models became complex with the use of techniques like non-linear kernels, ensembles, boosting, etc to further improve performance. However, it was still possible to understand model internals.
由于已明确指定功能,并且大多数情况下是受人类直觉驱动的,因此经典ML技术始终具有控制感。 过去本质上是聚合和统计的特征表示也处于可解释性的范围内,即基于Tf-Idf的向量表示等。决策树,逻辑回归,支持向量机或其他参数模型等ML模型也很容易进行推理。 这些模型的扩展由于使用非线性内核,合奏,增强等技术来进一步提高性能而变得复杂。 但是,仍然有可能了解模型内部。
Continuous efforts to improve performance on classic NLP tasks like named entity recognition, sentiment analysis, classification, etc and the constant push of adding increasingly complex tasks such as question answering, summarization, machine translation, etc have attracted increasing attention from the research community.
不断努力提高经典NLP任务(例如命名实体识别,情感分析,分类等)的性能,并不断推动添加日益复杂的任务(例如问题解答,摘要,机器翻译等),这引起了研究界的越来越多的关注。
The rise of modern NLP is attributed to the evolution of a simple model, perceptrons. Extension of perceptrons was not just a second order with techniques like ensembles or boosting, but rather exponential if not asymptotic, with the advent of deep neural networks.
现代自然语言处理的兴起归因于简单模型感知器的发展。 随着深度神经网络的出现,感知器的扩展不仅仅是诸如合奏或增强之类的技术的二阶,而且如果不是渐近的,还具有指数性。
“I am convinced that machines can and will think in our lifetime.” — Oliver Selfridge (The Thinking Machines — 1961).
“我坚信机器可以并且将在我们的一生中思考。” —奥利弗·塞尔弗里奇(《思维机器》,1961年)。
A look back into the journey of the tiny perceptron turning into the deep learning tsunami would mark a few important milestones. To mention a few, the birth of Perceptron in 1958 coupled with research foresight of “Thinking Machines” in the 1960s, followed by the invention of backpropagation in the 1980s, and empowered by proliferation of data coupled with super compute capabilities in early 2010s. All of these have compounded the chemistry of millions of perceptrons interacting with each other, and hence the rise of Deep Learning, and Modern NLP.
回顾一下小型感知器转变为深度学习海啸的旅程,将标志着几个重要的里程碑。 仅举几例,Perceptron于1958年诞生,再加上1960年代对“ Thinking Machines”的研究远见,随后在1980年代发明了反向传播技术,并在2010年代初借助数据的扩散和超级计算功能实现了这种技术。 所有这些都使数百万个彼此相互作用的感知器的化学React更加复杂,因此,深度学习和现代NLP的兴起。
Naturally, deep learning has revitalized computational linguistics; latent statistical patterns learned using neural mechanisms gave an incredible performance. Only to reinforce, human baselines are outperformed by deep learning models on certain well defined NLP tasks with increasing complexity year after year. Compositional nature of images made Convolution Neural Networks a huge success, whereas natural language differs from images as it not only has compositional dependencies but as well the sequential state. Recurrent Neural Networks and Long Short Term Memory (LSTM) networks outperformed the state of the art, and recently, the attention mechanism gave unprecedented success with novel Transformers.
自然,深度学习使计算语言学得到了复兴。 使用神经机制学习的潜在统计模式提供了令人难以置信的性能。 只是为了加强,在某些定义明确的NLP任务上,深度学习模型的性能要优于人工基准,而且复杂度逐年增加。 图像的组成性质使卷积神经网络取得了巨大的成功,而自然语言则与图像不同,因为它不仅具有组成依赖性,而且还具有顺序状态。 递归神经网络和长期短期记忆(LSTM)网络的性能超越了现有技术,最近,注意力机制在新型《变形金刚》中获得了空前的成功。
The key success of Modern NLP is also attributed to self-supervised pre-training objectives to learn contextual embeddings and the ability to transfer learning to downstream task-specific models. Self-supervised pre-training objectives have relinquished the need for massive labeled data. On the other hand, transfer learning has relinquished the need for huge computational costs. As a result, we can see the exponential growth of complex models.
Modern NLP的关键成功还归功于自我监督的预训练目标,以学习上下文嵌入以及将学习转移到下游任务特定模型的能力。 自我监督的预训练目标已不再需要大量的标记数据。 另一方面,转移学习已不再需要大量的计算成本。 结果,我们可以看到复杂模型的指数增长。
So what?
所以呢?
- Deep Learning has made feature engineering redundant and hence extinct! 深度学习使功能工程变得多余,因此已经绝种!
- Underlying representations of tokens became dense and complex 代币的基本表示变得密集而复杂
- Internals of complex architectures of deep neural networks became difficult to understand. 深度神经网络的复杂架构的内部结构变得难以理解。
As a result, we can not directly emphasize on how the decision is made, what features are important, or where the causation comes from? The success of modern NLP amplifies the challenges of interpretability.
结果,我们不能直接强调决策的制定方式,重要的特征或因果关系的来源? 现代自然语言处理的成功放大了可解释性的挑战。
Interpretability plays a key role in domain adoption as well as it builds confidence for real-world applications. We can cluster on-going research effort to interpret neural NLP models in the following questions:
可解释性在域采用中起着关键作用,并建立了对实际应用程序的信心。 我们可以集中进行中的研究工作来解释以下问题中的神经NLP模型:
- Is linguistic knowledge learned or ignored? 语言知识是否被学习或忽略?
- Why does the model work the way it works? 模型为何以其工作方式运作?
- Can we explain model predictions? 我们可以解释模型预测吗?
- What makes NLP models vulnerable? 是什么使NLP模型容易受到攻击?
- How can Knowledge Graph advance modern NLP and its interpretability? 知识图谱如何促进现代NLP及其可解释性?
Let’s dive deep to understand what do we mean by each of these questions.
让我们深入了解这些问题的含义。
Linguistic Knowledge: Ignored or Learnt?
语言知识:被忽略还是被学习?
Linguistics, the study of language and its structure, including the study of grammar, syntax, and phonetics, etc. It is intuitive to humans that the ability to understand, reason, and generate natural language would not be possible unless the system is able to learn linguistic components. In classical NLP, linguistic features like part-of-speech tagging (POS), named-entity-recognition, dependency tree, subject-verb agreements, coreference resolution, etc were derived using rule-driven or statistical learning approaches. Deep Neural Network models like RNNs, LSTMs, Transformers, etc do not need these hand-crafted features but are still able to outperform on certain well defined real-world tasks like classification, semantic analysis, question answering, summarization, text generation, etc. So, the question to be answered is “What (If at all) Linguistic Knowledge is Learned by Modern NLP Models”. (coming soon)
语言学,语言及其结构的研究,包括语法,语法和语音等方面的研究。对人类而言,直觉的是,除非系统能够做到,否则无法理解,推理和生成自然语言的能力学习语言成分。 在经典的NLP中,使用规则驱动或统计学习方法来衍生语言特征,例如词性标记(POS),命名实体识别,依赖树,主谓词一致,共指解析等。 像RNN,LSTM,Transformers等的深度神经网络模型不需要这些手工制作的功能,但仍能够在某些明确定义的现实世界任务中胜过表现,例如分类,语义分析,问题解答,摘要,文本生成等。因此,要回答的问题是“ 现代NLP模型可以学习什么(如果有的话)语言知识”。 (快来了)
Why Does the Model Work the Way It Works?
模型为何以其工作方式起作用?
Black box systems are good for modularity and integration, but the system needs to be transparent to analyze and improve. Transparency is a key pillar of Interpretability. “Model Understanding” is a niche area which deals with the internals of models. This requires a detailed analysis of what each layer of blocks in given DNNs learns, how they interact with each other, and hence contribute to the model decision.
黑匣子系统有利于模块化和集成,但系统必须透明以进行分析和改进。 透明度是可解释性的关键Struts。 “模型理解”是一个专门研究模型内部的利基领域。 这需要对给定DNN中每个块的层学习什么,它们如何相互作用以及对模型决策做出贡献的详细分析。
Basically, how learning of a model can be attributed to its building blocks or underlying mechanisms? A deeper understanding of how the model works would facilitate interpretability, and open up opportunities to improve the system further. For instance, attention mechanism is a key idea to drive success home for state of the art LSTMs, or Transformers models. “How Attention Enables Learning in NLP Models?” (coming soon) would be interesting to study deeper.
基本上,如何将模型的学习归因于其构建基块或基础机制? 对模型如何工作的更深入了解将有助于解释性,并为进一步改进系统提供机会。 例如,注意力机制是将最先进的LSTM或Transformers模型推向成功的关键思想。 “注意力如何在NLP模型中实现学习?” (即将推出) ,对其进行深入研究将很有趣。
Prediction may be Okay, Can we Explain It?
预测可能还可以,我们可以解释一下吗?
Well, knowing what linguistic knowledge is learned by model, and how the underlying mechanism enables learning for these NLP models are building blocks towards NLP interpretability. It is of utmost importance to move “Towards Plausible and Faithful Explanations for NLP Models?” (coming soon) This requires an in-depth study of how input tokens impact model decisions, so to attribute prediction back to tokens, and deriving token importance. And, how can we generate explanations from these important tokens? Are these generated explanations faithful? or, what’s the best way to generate a faithful explanation? Can these explanations play an active role in understanding the underlying robustness of a model? This is one of the active lines of research, where a lot of progress has been made recently.
好了,知道模型学习了哪些语言知识,以及底层机制如何使这些NLP模型的学习成为构建NLP可解释性的基础。 至关重要的是“迈向NLP模型的合理可信的解释? ” (即将推出),这需要对输入令牌如何影响模型决策进行深入研究,以便将预测归因于令牌,并得出令牌的重要性。 而且,我们如何从这些重要的标记中生成解释? 这些产生的解释是否忠实? 或者,产生忠实解释的最佳方法是什么? 这些解释是否可以在理解模型的基本稳健性方面发挥积极作用? 这是活跃的研究领域之一,最近已经取得了很多进展。
On the Backdrop of Success, What makes Modern NLP models Vulnerable?
在成功的背景下,是什么使现代NLP模型易受攻击?
Modern NLP has made modest progress into real-world applications, i.e. conversational chatbots, real-time translations, automated question answering, hate speech, or fake news detection. Is it possible to hack these models for malicious intent, like legitimizing fake news, or stealing models without access to training data?
现代NLP已在实际应用中取得了适度的进步,例如会话聊天机器人,实时翻译,自动问答,仇恨言论或虚假新闻检测。 是否有可能出于恶意意图破解这些模型,例如使假新闻合法化,或者在不访问训练数据的情况下窃取模型?
A Transparent, Interpretable, and Explainable system would be better prepared to understand “The Challenges and Mitigations of Modern NLP Vulnerabilities” (coming soon). Where in, risks of adversarial attacks, underlying bias, unreliable evaluation criteria, and the possibility of extracting learned state of models can be understood and steps can be taken to mitigate such risks.
透明,可解释和可解释的系统将更好地准备理解“现代NLP漏洞的挑战和缓解”(即将推出)。 在其中,可以理解对抗攻击的风险,潜在的偏见,不可靠的评估标准以及提取模型学习状态的可能性,并且可以采取措施来减轻此类风险。
What about the Knowledge Graph? Can It Advance Modern NLP and Interpretability Further?
知识图呢? 它可以进一步提高现代NLP和可解释性吗?
Traditionally, Knowledge Graph, structured information represented in the form of a graph, is at the heart of information retrieval based systems for domain-specific use cases. Mainly because Knowledge Graphs can be built deterministically by experts, easier to understand, seamless to integrate, effective for specific use cases, and straightforward to interpret. Hence, systems relying on knowledge graphs are easily adopted in different domains. Retrieval systems before the dawn of Modern NLP were mainly developed on top of Knowledge Graphs.
传统上,知识图(以图的形式表示的结构化信息)是针对特定领域用例的基于信息检索的系统的核心。 主要是因为知识图可以由专家确定性地构建,易于理解,无缝集成,对特定用例有效并且易于解释。 因此,依赖知识图的系统很容易在不同领域中采用。 现代NLP诞生之前的检索系统主要是在知识图的基础上开发的。
Self-supervised learning enables modern NLP to learn statistical patterns without worrying about experts’ intervention. These systems become scalable and powerful across varied and complex use-cases but may fail on very naive tasks, as simple facts are ignored because of a lack of statistical significance in the data. That’s where, if Knowledge Graphs can be integrated with modern NLP systems, it would bring the best of both the worlds to make systems comprehensive. Knowledge graphs can also align internal representations of features to make it more meaningful. “Knowledge Inception for Advanced and Interpretable NLP” (coming soon) would be an active area of research in the coming times.
自我监督学习使现代NLP可以学习统计模式,而不必担心专家的干预。 这些系统在各种复杂的用例中变得可伸缩且功能强大,但由于缺乏数据的统计意义而被忽略了简单的事实,因此可能无法在非常幼稚的任务上失败。 在那里,如果知识图谱可以与现代NLP系统集成,它将带来两全其美的优势,以使系统全面。 知识图还可以使要素的内部表示对齐以使其更有意义。 “高级和可解释性NLP的知识入门”(即将推出)将是未来时代的一个活跃研究领域。
Exploring limits of the modern NLP on the above dimensions gives a good understanding of why interpretability matters, what are the challenges, what is the progress made on them, and what questions still remain open? Although we have tried to be as broad as possible, by no means this is an exhaustive survey of the current state of NLP. It is intriguing to know how a modern NLP would become analyzable, transparent, robust, faithful, explainable, and secure in the coming times. On the other hand, it is equally fascinating to integrate KG and NLP, and which would not only help NLP interpretable but improve adoption into various domains such as education, healthcare, agriculture, etc.
在上述维度上探索现代NLP的局限性,可以很好地理解为什么可解释性很重要,面临哪些挑战,在这些挑战上取得了什么进展以及仍然存在哪些问题? 尽管我们已经尝试了尽可能广泛的内容,但这绝不是NLP当前状态的详尽调查。 令人着迷的是,现代NLP在未来的时代将如何变得可分析,透明,强大,忠实,可解释和安全。 另一方面,将KG和NLP集成在一起同样令人着迷,这不仅可以帮助NLP解释,而且可以提高在教育,医疗保健,农业等各个领域的采用率。
I would like to acknowledge the efforts of all collaborators for publishing this article, specifically, reviews and feedback given by Prof. Amit Sheth, and the support of Aditi Avasthi.
我要感谢所有合作者为发布本文所做的努力,特别是 Amit Sheth教授的 评论和反馈 以及 Aditi Avasthi 的支持 。
[1] Manning CD. Computational Linguistics and Deep Learning, MIT Press 2015
[1]曼宁CD。 计算语言学和深度学习,麻省理工学院出版社,2015年
[2] Norvig P. On Chomsky and the two cultures of statistical learning, Springer 2017
[2] Norvig P. 关于Chomsky和统计学习的两种文化 ,Springer,2017年
[3] Belkinov and Glass. Analysis Methods in Neural Language Processing: A Survey, MIT Press 2019
[3]贝尔金诺夫和格拉斯。 《神经语言处理中的分析方法:调查》,麻省理工学院出版社2019年
[4] Manning and Schutze. Foundations of statistical natural language processing, 1999
[4]曼宁和舒兹。 统计自然语言处理基础 ,1999年
[5] Kurşuncu, Gaur, Sheth, Wickramarachchi and Yadav. Knowledge-infused Deep Learning, ACM 2020
[5] Kurşuncu , Gaur , Sheth ,Wickramarachchi和Yadav。 知识注入式深度学习,ACM 2020
[6] Arrieta et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI, Elsevier, 2020
[6] Arrieta等。 可解释的人工智能(XAI):概念,分类法,负责任AI的机遇与挑战,Elsevier,2020年
[7] Rumelhart, Hinton and Williams. Learning representations by back-propagating errors, Nature 1986
[7]鲁梅尔哈特,欣顿和威廉姆斯。 通过反向传播错误学习表示,Nature 1986
[8] Turing-nlg: A 17-billion-parameter language model by microsoft, Microsoft Research Blog, 2020
[8] Turing-nlg:微软提供的170亿参数语言模型,微软研究博客,2020年
[9] Zhang, Sheng, Alhazmi and Li. Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey, ACM 2020
[9] Zhang , Sheng , Alhazmi和Li 。 自然语言处理中深度学习模型的对抗性攻击:一项调查,ACM 2020
[10] Clark, Khandelwal, Levy and Manning. What Does BERT Look At? An Analysis of BERT’s Attention, ACL Workshop BlackboxNLP 2019
[10] Clark , Khandelwal , Lev y和Manning 。 BERT看什么? BERT的注意力分析,ACL Workshop BlackboxNLP 2019
[11] Ribeiro, Singh and Guestrin. “Why should I trust you?” Explaining the predictions of any classifier, ACM 2016
[11] Ribeiro , Singh和Guestrin 。 “ 我为什么要相信你?” 解释任何分类的预测 ,ACM 2016
[12] Bender, Koller. “Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data”, ACL 2020
[12]本德尔,科勒。 “迈向NLU:关于数据时代的意义,形式和理解”,ACL 2020
翻译自: https://towardsdatascience.com/rise-of-modern-nlp-and-the-need-of-interpretability-97dd4a655ac3