python 常用NLP开源项目

fastNLP:
https://github.com/fastnlp/fastNLP
简介:fastNLP是一款轻量级的自然语言处理(NLP)工具包,目标是快速实现NLP任务以及构建复杂模型。

fairseq:
https://github.com/pytorch/fairseq
简介:Fairseq(-py) is a sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling and other text generation tasks.

OpenNMT
https://opennmt.net/
简介:OpenNMT is an open source ecosystem for neural machine translation and neural sequence learning.

tensor2tensor
https://github.com/tensorflow/tensor2tensor
简介:Tensor2Tensor, or T2T for short, is a library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

Trax
https://github.com/google/trax
简介:Trax is a library for deep learning that focuses on sequence models and reinforcement learning. It combines performance with code clarity and maintained documentation and tests.
Trax includes basic models (like ResNet, LSTM, Transformer) and RL algorithms (like REINFORCE, A2C, PPO). It is also actively used for research and includes new models like the Reformer and RL algorithms like AWR.

transformers:
https://github.com/huggingface/transformers
简介: Transformers (formerly known as pytorch-transformers and pytorch-pretrained-bert) provides state-of-the-art general-purpose architectures (BERT, GPT-2, RoBERTa, XLM, DistilBert, XLNet, T5, CTRL…) for Natural Language Understanding (NLU) and Natural Language Generation (NLG) with over thousands of pretrained models in 100+ languages and deep interoperability between PyTorch & TensorFlow 2.0.

tokenizers:
https://github.com/huggingface/tokenizers
简介:Provides an implementation of today’s most used tokenizers, with a focus on performance and versatility.

tensorflow_NLP
https://github.com/tensorflow/models/tree/master/official/nlp/
简介:tensorflow/models/official/nlp provides a modeling library for constructing NLP model achitectures, as well as TF2 reference implementations for state-of-the-art models.

torchtext
https://github.com/pytorch/text
简介:This repository consists of:
torchtext.data: Generic data loaders, abstractions, and iterators for text (including vocabulary and word vectors)
torchtext.datasets: Pre-built loaders for common NLP datasets

TensorFlow Text
https://github.com/tensorflow/text
简介:TensorFlow Text provides a collection of text related classes and ops ready to use with TensorFlow 2.0. The library can perform the preprocessing regularly required by text-based models, and includes other features useful for sequence modeling not provided by core TensorFlow.
The benefit of using these ops in your text preprocessing is that they are done in the TensorFlow graph. You do not need to worry about tokenization in training being different than the tokenization at inference, or managing preprocessing scripts.

Embedding Projector
http://projector.tensorflow.org/
简介:Embedding 效果可视化

未完待续 …

你可能感兴趣的:(NLP,nlp)