本文是Hugging Face的用户手册。加入 Hugging Face 社区,在模型、数据集和空间上进行协作,通过加速推理获得更快的示例。
适用于 PyTorch、TensorFlow 和 JAX 的先进机器学习。
Transformers 提供 API 和工具,可轻松下载和训练最先进的预训练模型。使用预训练模型可以降低计算成本和碳足迹,并节省从头开始训练模型所需的时间和资源。这些模型支持不同模式的常见任务,例如:
自然语言处理:文本分类、命名实体识别、问答、语言建模、摘要、翻译、多项选择和文本生成。
计算机视觉:图像分类、对象检测和分割。
音频:自动语音识别和音频分类。
多模态:表格问答、光学字符识别、从扫描文档中提取信息、视频分类和视觉问答。
Transformer 支持 PyTorch、TensorFlow 和 JAX 之间的框架互操作性。这提供了在模型生命周期的每个阶段使用不同框架的灵活性;在一个框架中用三行代码训练模型,然后在另一个框架中加载它进行推理。还可以将模型导出为 ONNX 和 TorchScript 等格式,以便在生产环境中进行部署。
立即加入 Hub、论坛或 Discord 上不断壮大的社区!
如果您正在寻求 Hugging Face 团队的定制支持
该文档分为五个部分:
“入门”提供了库的快速浏览以及启动和运行的安装说明。
如果您是初学者,教程是一个很好的起点。本节将帮助您获得开始使用库所需的基本技能。
操作指南向您展示如何实现特定目标,例如微调用于语言建模的预训练模型或如何编写和共享自定义模型。
CONCEPTUAL GUIDES 对变形金刚的模型、任务和设计理念背后的基本概念和思想进行了更多的讨论和解释。
API 描述了所有类和函数:
MAIN CLASSES 详细介绍了最重要的类,如配置、模型、分词器和管道。
MODELS 详细介绍了与库中实现的每个模型相关的类和函数。
INTERNAL HELPERS 详细介绍了内部使用的实用程序类和函数。
下表显示了库中每个模型的当前支持,无论它们是否具有 Python 分词器(称为“慢速”)。由 Tokenizers 库支持的“快速”分词器,无论它们在 Jax 中是否支持(通过 Flax)、PyTorch 和/或 TensorFlow。
Model | PyTorch support | TensorFlow support | Flax Support |
---|---|---|---|
ALBERT | ✅ | ✅ | ✅ |
ALIGN | ✅ | ❌ | ❌ |
AltCLIP | ✅ | ❌ | ❌ |
Audio Spectrogram Transformer | ✅ | ❌ | ❌ |
Autoformer | ✅ | ❌ | ❌ |
Bark | ✅ | ❌ | ❌ |
BART | ✅ | ✅ | ✅ |
BARThez | ✅ | ✅ | ✅ |
BARTpho | ✅ | ✅ | ✅ |
BEiT | ✅ | ❌ | ✅ |
BERT | ✅ | ✅ | ✅ |
Bert Generation | ✅ | ❌ | ❌ |
BertJapanese | ✅ | ✅ | ✅ |
BERTweet | ✅ | ✅ | ✅ |
BigBird | ✅ | ❌ | ✅ |
BigBird-Pegasus | ✅ | ❌ | ❌ |
BioGpt | ✅ | ❌ | ❌ |
BiT | ✅ | ❌ | ❌ |
Blenderbot | ✅ | ✅ | ✅ |
BlenderbotSmall | ✅ | ✅ | ✅ |
BLIP | ✅ | ✅ | ❌ |
BLIP-2 | ✅ | ❌ | ❌ |
BLOOM | ✅ | ❌ | ✅ |
BORT | ✅ | ✅ | ✅ |
BridgeTower | ✅ | ❌ | ❌ |
BROS | ✅ | ❌ | ❌ |
ByT5 | ✅ | ✅ | ✅ |
CamemBERT | ✅ | ✅ | ❌ |
CANINE | ✅ | ❌ | ❌ |
Chinese-CLIP | ✅ | ❌ | ❌ |
CLAP | ✅ | ❌ | ❌ |
CLIP | ✅ | ✅ | ✅ |
CLIPSeg | ✅ | ❌ | ❌ |
CLVP | ✅ | ❌ | ❌ |
CodeGen | ✅ | ❌ | ❌ |
CodeLlama | ✅ | ❌ | ✅ |
Conditional DETR | ✅ | ❌ | ❌ |
ConvBERT | ✅ | ✅ | ❌ |
ConvNeXT | ✅ | ✅ | ❌ |
ConvNeXTV2 | ✅ | ✅ | ❌ |
CPM | ✅ | ✅ | ✅ |
CPM-Ant | ✅ | ❌ | ❌ |
CTRL | ✅ | ✅ | ❌ |
CvT | ✅ | ✅ | ❌ |
Data2VecAudio | ✅ | ❌ | ❌ |
Data2VecText | ✅ | ❌ | ❌ |
Data2VecVision | ✅ | ✅ | ❌ |
DeBERTa | ✅ | ✅ | ❌ |
DeBERTa-v2 | ✅ | ✅ | ❌ |
Decision Transformer | ✅ | ❌ | ❌ |
Deformable DETR | ✅ | ❌ | ❌ |
DeiT | ✅ | ✅ | ❌ |
DePlot | ✅ | ❌ | ❌ |
DETA | ✅ | ❌ | ❌ |
DETR | ✅ | ❌ | ❌ |
DialoGPT | ✅ | ✅ | ✅ |
DiNAT | ✅ | ❌ | ❌ |
DINOv2 | ✅ | ❌ | ❌ |
DistilBERT | ✅ | ✅ | ✅ |
DiT | ✅ | ❌ | ✅ |
DonutSwin | ✅ | ❌ | ❌ |
DPR | ✅ | ✅ | ❌ |
DPT | ✅ | ❌ | ❌ |
EfficientFormer | ✅ | ✅ | ❌ |
EfficientNet | ✅ | ❌ | ❌ |
ELECTRA | ✅ | ✅ | ✅ |
EnCodec | ✅ | ❌ | ❌ |
Encoder decoder | ✅ | ✅ | ✅ |
ERNIE | ✅ | ❌ | ❌ |
ErnieM | ✅ | ❌ | ❌ |
ESM | ✅ | ✅ | ❌ |
FairSeq Machine-Translation | ✅ | ❌ | ❌ |
Falcon | ✅ | ❌ | ❌ |
FastSpeech2Conformer | ✅ | ❌ | ❌ |
FLAN-T5 | ✅ | ✅ | ✅ |
FLAN-UL2 | ✅ | ✅ | ✅ |
FlauBERT | ✅ | ✅ | ❌ |
FLAVA | ✅ | ❌ | ❌ |
FNet | ✅ | ❌ | ❌ |
FocalNet | ✅ | ❌ | ❌ |
Funnel Transformer | ✅ | ✅ | ❌ |
Fuyu | ✅ | ❌ | ❌ |
GIT | ✅ | ❌ | ❌ |
GLPN | ✅ | ❌ | ❌ |
GPT Neo | ✅ | ❌ | ✅ |
GPT NeoX | ✅ | ❌ | ❌ |
GPT NeoX Japanese | ✅ | ❌ | ❌ |
GPT-J | ✅ | ✅ | ✅ |
GPT-Sw3 | ✅ | ✅ | ✅ |
GPTBigCode | ✅ | ❌ | ❌ |
GPTSAN-japanese | ✅ | ❌ | ❌ |
Graphormer | ✅ | ❌ | ❌ |
GroupViT | ✅ | ✅ | ❌ |
HerBERT | ✅ | ✅ | ✅ |
Hubert | ✅ | ✅ | ❌ |
I-BERT | ✅ | ❌ | ❌ |
IDEFICS | ✅ | ❌ | ❌ |
ImageGPT | ✅ | ❌ | ❌ |
Informer | ✅ | ❌ | ❌ |
InstructBLIP | ✅ | ❌ | ❌ |
Jukebox | ✅ | ❌ | ❌ |
KOSMOS-2 | ✅ | ❌ | ❌ |
LayoutLM | ✅ | ✅ | ❌ |
LayoutLMv2 | ✅ | ❌ | ❌ |
LayoutLMv3 | ✅ | ✅ | ❌ |
LayoutXLM | ✅ | ❌ | ❌ |
LED | ✅ | ✅ | ❌ |
LeViT | ✅ | ❌ | ❌ |
LiLT | ✅ | ❌ | ❌ |
LLaMA | ✅ | ❌ | ✅ |
Llama2 | ✅ | ❌ | ✅ |
LLaVa | ✅ | ❌ | ❌ |
Longformer | ✅ | ✅ | ❌ |
LongT5 | ✅ | ❌ | ✅ |
LUKE | ✅ | ❌ | ❌ |
LXMERT | ✅ | ✅ | ❌ |
M-CTC-T | ✅ | ❌ | ❌ |
M2M100 | ✅ | ❌ | ❌ |
MADLAD-400 | ✅ | ✅ | ✅ |
Marian | ✅ | ✅ | ✅ |
MarkupLM | ✅ | ❌ | ❌ |
Mask2Former | ✅ | ❌ | ❌ |
MaskFormer | ✅ | ❌ | ❌ |
MatCha | ✅ | ❌ | ❌ |
mBART | ✅ | ✅ | ✅ |
mBART-50 | ✅ | ✅ | ✅ |
MEGA | ✅ | ❌ | ❌ |
Megatron-BERT | ✅ | ❌ | ❌ |
Megatron-GPT2 | ✅ | ✅ | ✅ |
MGP-STR | ✅ | ❌ | ❌ |
Mistral | ✅ | ❌ | ❌ |
Mixtral | ✅ | ❌ | ❌ |
mLUKE | ✅ | ❌ | ❌ |
MMS | ✅ | ✅ | ✅ |
MobileBERT | ✅ | ✅ | ❌ |
MobileNetV1 | ✅ | ❌ | ❌ |
MobileNetV2 | ✅ | ❌ | ❌ |
MobileViT | ✅ | ✅ | ❌ |
MobileViTV2 | ✅ | ❌ | ❌ |
MPNet | ✅ | ✅ | ❌ |
MPT | ✅ | ❌ | ❌ |
MRA | ✅ | ❌ | ❌ |
MT5 | ✅ | ✅ | ✅ |
MusicGen | ✅ | ❌ | ❌ |
MVP | ✅ | ❌ | ❌ |
NAT | ✅ | ❌ | ❌ |
Nezha | ✅ | ❌ | ❌ |
NLLB | ✅ | ❌ | ❌ |
NLLB-MOE | ✅ | ❌ | ❌ |
Nougat | ✅ | ✅ | ✅ |
Nyströmformer | ✅ | ❌ | ❌ |
OneFormer | ✅ | ❌ | ❌ |
OpenAI GPT | ✅ | ✅ | ❌ |
OpenAI GPT-2 | ✅ | ✅ | ✅ |
OpenLlama | ✅ | ❌ | ❌ |
OPT | ✅ | ✅ | ✅ |
OWL-ViT | ✅ | ❌ | ❌ |
OWLv2 | ✅ | ❌ | ❌ |
PatchTSMixer | ✅ | ❌ | ❌ |
PatchTST | ✅ | ❌ | ❌ |
Pegasus | ✅ | ✅ | ✅ |
PEGASUS-X | ✅ | ❌ | ❌ |
Perceiver | ✅ | ❌ | ❌ |
Persimmon | ✅ | ❌ | ❌ |
Phi | ✅ | ❌ | ❌ |
PhoBERT | ✅ | ✅ | ✅ |
Pix2Struct | ✅ | ❌ | ❌ |
PLBart | ✅ | ❌ | ❌ |
PoolFormer | ✅ | ❌ | ❌ |
Pop2Piano | ✅ | ❌ | ❌ |
ProphetNet | ✅ | ❌ | ❌ |
PVT | ✅ | ❌ | ❌ |
QDQBert | ✅ | ❌ | ❌ |
Qwen2 | ✅ | ❌ | ❌ |
RAG | ✅ | ❌ | ❌ |
REALM | ✅ | ❌ | ❌ |
Reformer | ✅ | ❌ | ❌ |
RegNet | ✅ | ✅ | ✅ |
RemBERT | ✅ | ✅ | ❌ |
ResNet | ✅ | ✅ | ✅ |
RetriBERT | ✅ | ❌ | ❌ |
RoBERTa | ✅ | ✅ | ✅ |
RoBERTa-PreLayerNorm | ✅ | ✅ | ✅ |
RoCBert | ✅ | ❌ | ❌ |
RoFormer | ✅ | ✅ | ✅ |
RWKV | ✅ | ❌ | ❌ |
SAM | ✅ | ✅ | ❌ |
SeamlessM4T | ✅ | ❌ | ❌ |
SeamlessM4Tv2 | ✅ | ❌ | ❌ |
SegFormer | ✅ | ✅ | ❌ |
SEW | ✅ | ❌ | ❌ |
SEW-D | ✅ | ❌ | ❌ |
SigLIP | ✅ | ❌ | ❌ |
Speech Encoder decoder | ✅ | ❌ | ✅ |
Speech2Text | ✅ | ✅ | ❌ |
SpeechT5 | ✅ | ❌ | ❌ |
Splinter | ✅ | ❌ | ❌ |
SqueezeBERT | ✅ | ❌ | ❌ |
SwiftFormer | ✅ | ❌ | ❌ |
Swin Transformer | ✅ | ✅ | ❌ |
Swin Transformer V2 | ✅ | ❌ | ❌ |
Swin2SR | ✅ | ❌ | ❌ |
SwitchTransformers | ✅ | ❌ | ❌ |
T5 | ✅ | ✅ | ✅ |
T5v1.1 | ✅ | ✅ | ✅ |
Table Transformer | ✅ | ❌ | ❌ |
TAPAS | ✅ | ✅ | ❌ |
TAPEX | ✅ | ✅ | ✅ |
Time Series Transformer | ✅ | ❌ | ❌ |
TimeSformer | ✅ | ❌ | ❌ |
Trajectory Transformer | ✅ | ❌ | ❌ |
Transformer-XL | ✅ | ✅ | ❌ |
TrOCR | ✅ | ❌ | ❌ |
TVLT | ✅ | ❌ | ❌ |
TVP | ✅ | ❌ | ❌ |
UL2 | ✅ | ✅ | ✅ |
UMT5 | ✅ | ❌ | ❌ |
UniSpeech | ✅ | ❌ | ❌ |
UniSpeechSat | ✅ | ❌ | ❌ |
UnivNet | ✅ | ❌ | ❌ |
UPerNet | ✅ | ❌ | ❌ |
VAN | ✅ | ❌ | ❌ |
VideoMAE | ✅ | ❌ | ❌ |
ViLT | ✅ | ❌ | ❌ |
VipLlava | ✅ | ❌ | ❌ |
Vision Encoder decoder | ✅ | ✅ | ✅ |
VisionTextDualEncoder | ✅ | ✅ | ✅ |
VisualBERT | ✅ | ❌ | ❌ |
ViT | ✅ | ✅ | ✅ |
ViT Hybrid | ✅ | ❌ | ❌ |
VitDet | ✅ | ❌ | ❌ |
ViTMAE | ✅ | ✅ | ❌ |
ViTMatte | ✅ | ❌ | ❌ |
ViTMSN | ✅ | ❌ | ❌ |
VITS | ✅ | ❌ | ❌ |
ViViT | ✅ | ❌ | ❌ |
Wav2Vec2 | ✅ | ✅ | ✅ |
Wav2Vec2-BERT | ✅ | ❌ | ❌ |
Wav2Vec2-Conformer | ✅ | ❌ | ❌ |
Wav2Vec2Phoneme | ✅ | ✅ | ✅ |
WavLM | ✅ | ❌ | ❌ |
Whisper | ✅ | ✅ | ✅ |
X-CLIP | ✅ | ❌ | ❌ |
X-MOD | ✅ | ❌ | ❌ |
XGLM | ✅ | ✅ | ✅ |
XLM | ✅ | ✅ | ❌ |
XLM-ProphetNet | ✅ | ❌ | ❌ |
XLM-RoBERTa | ✅ | ✅ | ✅ |
XLM-RoBERTa-XL | ✅ | ❌ | ❌ |
XLM-V | ✅ | ✅ | ✅ |
XLNet | ✅ | ✅ | ❌ |
XLS-R | ✅ | ✅ | ✅ |
XLSR-Wav2Vec2 | ✅ | ✅ | ✅ |
YOLOS | ✅ | ❌ | ❌ |
YOSO | ✅ | ❌ | ❌ |