CLIP跨语言多模态模型 pytorch下安装

github地址

GitHub - openai/CLIP: Contrastive Language-Image Pretraining​​​​​​​

创建python环境

conda create -n CLIP python=3.8

安装pytorch和torchvision

conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=11.0

下载包:ftfy regex tqdm 和CLIP

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

应用举例

import torch
import clip
from PIL import Image

device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)

image = preprocess(Image.open("clip.jpg")).unsqueeze(0).to(device)
text = clip.tokenize(["two dogs", "this is a dog", "two dogs on grass", "there are two dogs"]).to(device)
with torch.no_grad():
    image_features = model.encode_image(image)
    text_features = model.encode_text(text)

    logits_per_image, logits_per_text = model(image, text)
    probs = logits_per_image.softmax(dim=-1).cpu().numpy()

print("Label probs:", probs)

比如输入一张狗的照片

CLIP跨语言多模态模型 pytorch下安装_第1张图片

输出结果:

Label probs: [[0.2998 0.102  0.4163 0.1819]]

即这张图片属于每个描述的概率

你可能感兴趣的:(人工智能,语言模型,多模态,pytorch)