2019.10.8 note

2019.10.8 note

Commonsense Knowledge Base Completion

This work proposes a task of complete commonsense knowledge base (generating new reasonable triplets in the knowledge base). This work also proposes a novel dataset.

Commonsense Knowledge Mining from Pretrained Models

This work proposes to generating a triplet into a sentence (according to a hand made template) and using a pretrained model to evaluate whether the triplet is reasonable (according to PMI, etc.).

  1. For generating sentences from triplets. This work proposes a COHERENCY RANKING based method that the sentence with the highest log likelihood according to a pretrained language model is selected.
    We refer to this method of generating a sentence. It also proposes that hand made template based method, grammar + template based method. After generating a reasonable sentence, unsupervised algorithms can work well.
  2. This work also considers supervised algorithms like DNN, Factorized, and Prototypical models described in the paper Commonsense mining as knowledge base completion? A study on the impact of novelty.

Commonsense mining as knowledge base completion? A study on the impact of novelty

This work analyses if knowledge base completion models can be used to mine commonsense knowledge from raw text. This work proposes novelty of predicted triples with respect to the training set as an important factor in interpreting results and also proposes DNN, Factorized, and Prototypical models.

Identity Mappings in Deep Residual Networks

This work analyzes the propagation formulations behind the residual building blocks, which suggest that the forward and backward signals can be directly propagated from one block to any other block, when using identity mappings as the skip connections and after-addition activation.

Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention

  1. This work discusses gradient vanishing in deep transformer layers and proposes depth-scaled initialization method to settle it. (However, the calculation of gradient is wrong in the paper.)
  2. This work also proposes merged attention mechanism.

2019.10.8 note_第1张图片

Inter-sentence Relation Extraction with Document-level Graph Convolutional Neural Network

Inter-sentence relation extraction deals with a number of complex semantic relationships in documents, which require local, non-local, syntactic and semantic dependencies. Existing methods do not fully exploit such dependencies. This work builds a labeled edge GCN model on a document-level graph. The edge includes syntactic dependency edge, coreference edge, adjacent sentence edge, adjacent word edge and self-node edge.

Joint Type Inference on Entities and Relations via Graph Convolutional Networks

This work develops a new paradigm for the task of joint entity relation extraction. It first identifies entity spans, then performs a joint inference on entity types and relation types. To tackle the joint type inference task, this work proposes a novel GCN running on an entity-relation bipartite graph. Code is available at https://github.com/changzhisun/AntNRE
2019.10.8 note_第2张图片

K-BERT: Enabling Language Representation with Knowledge Graph

To enable language representation with knowledge graph, this work injects triplets in knowledge graphs as domain knowledge into sentences. This work also proposes soft position embedding and visible matrix mechanisms to treat a sentence as a tree.

ON LAYER NORMALIZATION IN THE TRANSFORMER ARCHITECTURE

It discusses the Pre-Norm and Post-Norm in the initialization of transformer. It proves that for Post-LN, the gradient norm of last weight matrix parameters in FFN layer is O ( d ln ⁡ d ) O(d\sqrt{\ln d}) O(dlnd ) in initialization while for Pre-LN, it is O ( d ln ⁡ d / L ) O(d\sqrt{\ln d/L}) O(dlnd/L ). Some theoretical analysis may be conducted following this work.

你可能感兴趣的:(Paper,Reading)