Transformers从零到精通教程——Pipeline

一、Pipeline

1.查看支持的任务类型

from transformers.pipelines import SUPPORTED_TASKS, get_supported_tasks

print(SUPPORTED_TASKS.items(), get_supported_tasks())

2.Pipeline的创建与使用方式

1.根据任务类型直接创建Pipeline, 默认都是英文的模型

from transformers import pipeline

pipe = pipeline("text-classification")
pipe("very good!")
# [{'label': 'POSITIVE', 'score': 0.9998525381088257}]

2.指定任务类型,再指定模型,创建基于指定模型的Pipeline

from transformers import pipeline

# https://huggingface.co/models
pipe = pipeline("text-classification", 
                model="uer/roberta-base-finetuned-dianping-chinese")
pipe("我觉得不太行!")
# [{'label': 'negative (stars 1, 2 and 3)', 'score': 0.9735506772994995}]

3.预先加载模型,再创建Pipeline

from transformers import AutoModelForSequenceClassification, AutoTokenizer, pipeline

model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
pipe("你真是个人才!")
# [{'label': 'positive (stars 4 and 5)', 'score': 0.8717765808105469}]

3.GPU推理

  • 推理默认在 cpu
pipe.model.device
# device(type='cpu')

%%time # 魔法命令,统计时间

for i in range(100):
    pipe("你真是个人才!")

'''
CPU times: total: 19.4 s
Wall time: 4.94 s
'''

import torch
import time
times = []
for i in range(100):
    torch.cuda.synchronize()
    start = time.time()
    pipe("我觉得不太行!")
    torch.cuda.synchronize()
    end = time.time()
    times.append(end - start)
print(sum(times) / 100)
# 0.05427998542785645 CPU
# 0.012370436191558839 GPU

pipe = pipeline("text-classification", model="uer/roberta-base-finetuned-dianping-chinese", device=0)
pipe.model.device
# device(type='cuda', index=0)

4.确定Pipeline参数

qa_pipeline = pipeline("question-answering", model="uer/roberta-base-chinese-extractive-qa")
qa_pipeline(question="王训志是谁?", context="王训志是帅哥!")
# {'score': 0.004711466375738382, 'start': 4, 'end': 6, 'answer': '帅哥'}
  • 具体做法是,查看 qa_pipeline的类 QuestionAnsweringPipeline,然后 Ctrl+鼠标左键查看 __call__方法源码

Transformers从零到精通教程——Pipeline_第1张图片
Transformers从零到精通教程——Pipeline_第2张图片


5.其他Piepeline示例

  • 零样本目标检测
checkpoint = "google/owlvit-base-patch32"
detector = pipeline(model=checkpoint, task="zero-shot-object-detection")

import requests
from PIL import Image

url = "https://unsplash.com/photos/oj0zeY2Ltk4/download?ixid=MnwxMjA3fDB8MXxzZWFyY2h8MTR8fHBpY25pY3xlbnwwfHx8fDE2Nzc0OTE1NDk&force=true&w=640"
im = Image.open(requests.get(url, stream=True).raw)
im

Transformers从零到精通教程——Pipeline_第3张图片

predictions = detector(im,
        candidate_labels = ["hat", "book"])

from PIL import ImageDraw

draw = ImageDraw.Draw(im)

for prediction in predictions:
    box = prediction["box"]
    label = prediction["label"]
    score = prediction["score"]
    xmin, ymin, xmax, ymax = box.values()
    draw.rectangle((xmin, ymin, xmax, ymax), outline="red", width=1)
    draw.text((xmin, ymin), f"{label}: {round(score,2)}", fill="red")

im

Transformers从零到精通教程——Pipeline_第4张图片


6.Pipeline的背后实现

'''
1.处理输入
2.模型输出
3.id2label model.config.id2label
'''
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")
tokenizer = AutoTokenizer.from_pretrained("uer/roberta-base-finetuned-dianping-chinese")

text = "我觉得不行!"
input = tokenizer(text, return_tensors="pt") # 注意return_tensors="pt"
output = model(**input)
logits = torch.softmax(output.logits, dim= -1)


id = torch.argmax(logits).item()

# id2label = {
#     1:"Positive",
#     0:"Negtivate",
# }
model.config.id2label = id2label
print(text, "\n", model.config.id2label.get(id))

你可能感兴趣的:(人工智能)