关于 #今日arXiv精选
这是「AI 学术前沿」旗下的一档栏目,编辑将每日从arXiv中精选高质量论文,推送给读者。
Effective Sequence-to-Sequence Dialogue State Tracking
Comment: Accepted at EMNLP 2021
Link: http://arxiv.org/abs/2108.13990
Abstract
Sequence-to-sequence models have been applied to a wide variety of NLP tasks,but how to properly use them for dialogue state tracking has not beensystematically investigated. In this paper, we study this problem from theperspectives of pre-training objectives as well as the formats of contextrepresentations. We demonstrate that the choice of pre-training objective makesa significant difference to the state tracking quality. In particular, we findthat masked span prediction is more effective than auto-regressive languagemodeling. We also explore using Pegasus, a span prediction-based pre-trainingobjective for text summarization, for the state tracking model. We found thatpre-training for the seemingly distant summarization task works surprisinglywell for dialogue state tracking. In addition, we found that while recurrentstate context representation works also reasonably well, the model may have ahard time recovering from earlier mistakes. We conducted experiments on theMultiWOZ 2.1-2.4 data sets with consistent observations.
Thermostat: A Large Collection of NLP Model Explanations and Analysis Tools
Comment: Accepted to EMNLP 2021 System Demonstrations
Link: http://arxiv.org/abs/2108.13961
Abstract
In the language domain, as in other domains, neural explainability takes anever more important role, with feature attribution methods on the forefront.Many such methods require considerable computational resources and expertknowledge about implementation details and parameter choices. To facilitateresearch, we present Thermostat which consists of a large collection of modelexplanations and accompanying analysis tools. Thermostat allows easy access toover 200k explanations for the decisions of prominent state-of-the-art modelsspanning across different NLP tasks, generated with multiple explainers. Thedataset took over 10k GPU hours (>one year) to compile; compute time that thecommunity now saves. The accompanying software tools allow to analyseexplanations instance-wise but also accumulatively on corpus level. Users caninvestigate and compare models, datasets and explainers without the need toorchestrate implementation details. Thermostat is fully open source,democratizes explainability research in the language domain, circumventsredundant computations and increases comparability and replicability.
Robust Retrieval Augmented Generation for Zero-shot Slot Filling
Comment: Accepted at EMNLP 2021. arXiv admin note: substantial text overlap with arXiv:2104.08610
Link: http://arxiv.org/abs/2108.13934
Abstract
Automatically inducing high quality knowledge graphs from a given collectionof documents still remains a challenging problem in AI. One way to make headwayfor this problem is through advancements in a related task known as slotfilling. In this task, given an entity query in form of [Entity, Slot, ?], asystem is asked to fill the slot by generating or extracting the missing valueexploiting evidence extracted from relevant passage(s) in the given documentcollection. The recent works in the field try to solve this task in anend-to-end fashion using retrieval-based language models. In this paper, wepresent a novel approach to zero-shot slot filling that extends dense passageretrieval with hard negatives and robust training procedures for retrievalaugmented generation models. Our model reports large improvements on both T-RExand zsRE slot filling datasets, improving both passage retrieval and slot valuegeneration, and ranking at the top-1 position in the KILT leaderboard.Moreover, we demonstrate the robustness of our system showing its domainadaptation capability on a new variant of the TACRED dataset for slot filling,through a combination of zero/few-shot learning. We release the source code andpre-trained models.
Backdoor Attacks on Pre-trained Models by Layerwise Weight Poisoning
Comment: Accepted by EMNLP2021 main conference
Link: http://arxiv.org/abs/2108.13888
Abstract
\textbf{P}re-\textbf{T}rained \textbf{M}odel\textbf{s} have been widelyapplied and recently proved vulnerable under backdoor attacks: the releasedpre-trained weights can be maliciously poisoned with certain triggers. When thetriggers are activated, even the fine-tuned model will predict pre-definedlabels, causing a security threat. These backdoors generated by the poisoningmethods can be erased by changing hyper-parameters during fine-tuning ordetected by finding the triggers. In this paper, we propose a strongerweight-poisoning attack method that introduces a layerwise weight poisoningstrategy to plant deeper backdoors; we also introduce a combinatorial triggerthat cannot be easily detected. The experiments on text classification tasksshow that previous defense methods cannot resist our weight-poisoning method,which indicates that our method can be widely applied and may provide hints forfuture model robustness studies.
When Retriever-Reader Meets Scenario-Based Multiple-Choice Questions
Comment: 10 pages, accepted to Findings of EMNLP 2021
Link: http://arxiv.org/abs/2108.13875
Abstract
Scenario-based question answering (SQA) requires retrieving and readingparagraphs from a large corpus to answer a question which is contextualized bya long scenario description. Since a scenario contains both keyphrases forretrieval and much noise, retrieval for SQA is extremely difficult. Moreover,it can hardly be supervised due to the lack of relevance labels of paragraphsfor SQA. To meet the challenge, in this paper we propose a jointretriever-reader model called JEEVES where the retriever is implicitlysupervised only using QA labels via a novel word weighting mechanism. JEEVESsignificantly outperforms a variety of strong baselines on multiple-choicequestions in three SQA datasets.
Contrastive Domain Adaptation for Question Answering using Limited Text Corpora
Comment: Accepted to EMNLP 2021
Link: http://arxiv.org/abs/2108.13854
Abstract
Question generation has recently shown impressive results in customizingquestion answering (QA) systems to new domains. These approaches circumvent theneed for manually annotated training data from the new domain and, instead,generate synthetic question-answer pairs that are used for training. However,existing methods for question generation rely on large amounts of syntheticallygenerated datasets and costly computational resources, which render thesetechniques widely inaccessible when the text corpora is of limited size. Thisis problematic as many niche domains rely on small text corpora, whichnaturally restricts the amount of synthetic data that can be generated. In thispaper, we propose a novel framework for domain adaptation called contrastivedomain adaptation for QA (CAQA). Specifically, CAQA combines techniques fromquestion generation and domain-invariant learning to answer out-of-domainquestions in settings with limited text corpora. Here, we train a QA system onboth source data and generated data from the target domain with a contrastiveadaptation loss that is incorporated in the training objective. By combiningtechniques from question generation and domain-invariant learning, our modelachieved considerable improvements compared to state-of-the-art baselines.
Enjoy the Salience: Towards Better Transformer-based Faithful Explanations with Word Salience
Comment: EMNLP 2021 Pre-print
Link: http://arxiv.org/abs/2108.13759
Abstract
Pretrained transformer-based models such as BERT have demonstratedstate-of-the-art predictive performance when adapted into a range of naturallanguage processing tasks. An open problem is how to improve the faithfulnessof explanations (rationales) for the predictions of these models. In thispaper, we hypothesize that salient information extracted a priori from thetraining data can complement the task-specific information learned by the modelduring fine-tuning on a downstream task. In this way, we aim to help BERT notto forget assigning importance to informative input tokens when makingpredictions by proposing SaLoss; an auxiliary loss function for guiding themulti-head attention mechanism during training to be close to salientinformation extracted a priori using TextRank. Experiments for explanationfaithfulness across five datasets, show that models trained with SaLossconsistently provide more faithful explanations across four different featureattribution methods compared to vanilla BERT. Using the rationales extractedfrom vanilla BERT and SaLoss models to train inherently faithful classifiers,we further show that the latter result in higher predictive performance indownstream tasks.
Plan-then-Generate: Controlled Data-to-Text Generation via Planning
Comment: Accepted to Findings of EMNLP 2021
Link: http://arxiv.org/abs/2108.13740
Abstract
Recent developments in neural networks have led to the advance indata-to-text generation. However, the lack of ability of neural models tocontrol the structure of generated output can be limiting in certain real-worldapplications. In this study, we propose a novel Plan-then-Generate (PlanGen)framework to improve the controllability of neural data-to-text models.Extensive experiments and analyses are conducted on two benchmark datasets,ToTTo and WebNLG. The results show that our model is able to control both theintra-sentence and inter-sentence structure of the generated output.Furthermore, empirical comparisons against previous state-of-the-art methodsshow that our model improves the generation quality as well as the outputdiversity as judged by human and automatic evaluations.
Automatic Rule Generation for Time Expression Normalization
Comment: Accepted to Findings of EMNLP 2021
Link: http://arxiv.org/abs/2108.13658
Abstract
The understanding of time expressions includes two sub-tasks: recognition andnormalization. In recent years, significant progress has been made in therecognition of time expressions while research on normalization has laggedbehind. Existing SOTA normalization methods highly rely on rules or grammarsdesigned by experts, which limits their performance on emerging corpora, suchas social media texts. In this paper, we model time expression normalization asa sequence of operations to construct the normalized temporal value, and wepresent a novel method called ARTime, which can automatically generatenormalization rules from training data without expert interventions.Specifically, ARTime automatically captures possible operation sequences fromannotated data and generates normalization rules on time expressions withcommon surface forms. The experimental results show that ARTime cansignificantly surpass SOTA methods on the Tweets benchmark, and achievescompetitive results with existing expert-engineered rule methods on theTempEval-3 benchmark.
Discretized Integrated Gradients for Explaining Language Models
Comment: Accepted in EMNLP 2021
Link: http://arxiv.org/abs/2108.13654
Abstract
As a prominent attribution-based explanation algorithm, Integrated Gradients(IG) is widely adopted due to its desirable explanation axioms and the ease ofgradient computation. It measures feature importance by averaging the model'soutput gradient interpolated along a straight-line path in the input dataspace. However, such straight-line interpolated points are not representativeof text data due to the inherent discreteness of the word embedding space. Thisquestions the faithfulness of the gradients computed at the interpolated pointsand consequently, the quality of the generated explanations. Here we proposeDiscretized Integrated Gradients (DIG), which allows effective attributionalong non-linear interpolation paths. We develop two interpolation strategiesfor the discrete word embedding space that generates interpolation points thatlie close to actual words in the embedding space, yielding more faithfulgradient computation. We demonstrate the effectiveness of DIG over IG throughexperimental and human evaluations on multiple sentiment classificationdatasets. We provide the source code of DIG to encourage reproducible research.
T3-Vis: a visual analytic framework for Training and fine-Tuning Transformers in NLP
Comment: 10 pages, 4 figures, accepted to EMNLP 2021 System Demonstration
Link: http://arxiv.org/abs/2108.13587
Abstract
Transformers are the dominant architecture in NLP, but their training andfine-tuning is still very challenging. In this paper, we present the design andimplementation of a visual analytic framework for assisting researchers in suchprocess, by providing them with valuable insights about the model's intrinsicproperties and behaviours. Our framework offers an intuitive overview thatallows the user to explore different facets of the model (e.g., hidden states,attention) through interactive visualization, and allows a suite of built-inalgorithms that compute the importance of model components and different partsof the input sequence. Case studies and feedback from a user focus groupindicate that the framework is useful, and suggest several improvements.
Scheduled Sampling Based on Decoding Steps for Neural Machine Translation
Comment: Code is at https://github.com/Adaxry/ss_on_decoding_steps. To appear in EMNLP-2021 main conference. arXiv admin note: text overlap with arXiv:2107.10427
Link: http://arxiv.org/abs/2108.12963
Abstract
Scheduled sampling is widely used to mitigate the exposure bias problem forneural machine translation. Its core motivation is to simulate the inferencescene during training by replacing ground-truth tokens with predicted tokens,thus bridging the gap between training and inference. However, vanillascheduled sampling is merely based on training steps and equally treats alldecoding steps. Namely, it simulates an inference scene with uniform errorrates, which disobeys the real inference scene, where larger decoding stepsusually have higher error rates due to error accumulations. To alleviate theabove discrepancy, we propose scheduled sampling methods based on decodingsteps, increasing the selection chance of predicted tokens with the growth ofdecoding steps. Consequently, we can more realistically simulate the inferencescene during training, thus better bridging the gap between training andinference. Moreover, we investigate scheduled sampling based on both trainingsteps and decoding steps for further improvements. Experimentally, ourapproaches significantly outperform the Transformer baseline and vanillascheduled sampling on three large-scale WMT tasks. Additionally, our approachesalso generalize well to the text summarization task on two popular benchmarks.
Distilling the Knowledge of Large-scale Generative Models into Retrieval Models for Efficient Open-domain Conversation
Comment: EMNLP21-Findings
Link: http://arxiv.org/abs/2108.12582
Abstract
Despite the remarkable performance of large-scale generative models inopen-domain conversation, they are known to be less practical for buildingreal-time conversation systems due to high latency. On the other hand,retrieval models could return responses with much lower latency but showinferior performance to the large-scale generative models since theconversation quality is bounded by the pre-defined response set. To takeadvantage of both approaches, we propose a new training method called G2R(Generative-to-Retrieval distillation) that preserves the efficiency of aretrieval model while leveraging the conversational ability of a large-scalegenerative model by infusing the knowledge of the generative model into theretrieval model. G2R consists of two distinct techniques of distillation: thedata-level G2R augments the dialogue dataset with additional responsesgenerated by the large-scale generative model, and the model-level G2Rtransfers the response quality score assessed by the generative model to thescore of the retrieval model by the knowledge distillation loss. Throughextensive experiments including human evaluation, we demonstrate that ourretrieval-based conversation system trained with G2R shows a substantiallyimproved performance compared to the baseline retrieval model while showingsignificantly lower inference latency than the large-scale generative models.
Few-Shot Table-to-Text Generation with Prototype Memory
Comment: Accepted to Findings of EMNLP 2021
Link: http://arxiv.org/abs/2108.12516
Abstract
Neural table-to-text generation models have achieved remarkable progress onan array of tasks. However, due to the data-hungry nature of neural models,their performances strongly rely on large-scale training examples, limitingtheir applicability in real-world applications. To address this, we propose anew framework: Prototype-to-Generate (P2G), for table-to-text generation underthe few-shot scenario. The proposed framework utilizes the retrievedprototypes, which are jointly selected by an IR system and a novel prototypeselector to help the model bridging the structural gap between tables andtexts. Experimental results on three benchmark datasets with threestate-of-the-art models demonstrate that the proposed framework significantlyimproves the model performance across various evaluation metrics.
·
·