凡人的AI工具箱

PyTorch深度学习框架60天进阶学习计划 - 第28天：多模态模型实践（一）

引言：跨越感知的边界

欢迎来到我们的PyTorch学习旅程第28天！今天我们将步入AI世界中最激动人心的领域之一：多模态学习。想象一下，如果你的模型既能"看"又能"读"，并且能够理解图像与文字之间的联系，这将为我们打开怎样的可能性？

今天我们将专注于构建图文匹配系统，学习如何使用CLIP（Contrastive Language-Image Pre-training）架构来实现跨模态特征空间对齐。这就像教会模型同时精通两种"语言"——图像语言和文本语言，并在它们之间建立翻译桥梁。

让我们开始这段精彩的旅程吧！

1. 多模态学习与CLIP架构概述

1.1 什么是多模态学习？

多模态学习是指同时处理和理解来自不同感知渠道（模态）的信息，如视觉、语言、音频等。就像人类能够同时理解看到的图像和听到的声音一样，多模态学习使AI能够整合和处理来自不同源的信息。

1.2 CLIP架构简介

CLIP（Contrastive Language-Image Pre-training）是由OpenAI开发的一种革命性架构，它通过对比学习的方式将文本和图像投影到同一特征空间。CLIP的核心思想是：

使用两个独立的编码器：一个处理图像，一个处理文本
通过对比学习将相关的图像-文本对拉近，将不相关的推远
创建一个统一的特征空间，使得语义相似的图像和文本在该空间中靠近

1.3 CLIP架构的优势

优势	描述
零样本迁移能力	训练后无需额外微调即可应用于多种下游任务
强大的泛化能力	能够泛化到训练中未见过的视觉概念
跨模态检索	能够使用文本查找图像，或使用图像查找相关文本
开放词汇识别	不受预定义类别限制，可以识别任意文本描述的内容
多语言潜力	可扩展到多语言场景，实现跨语言图像理解

2. CLIP架构详解

CLIP架构由以下核心组件构成：

2.1 CLIP架构组件

组件	功能
图像编码器	将图像转换为特征向量（通常使用Vision Transformer或ResNet）
文本编码器	将文本转换为特征向量（通常使用Transformer架构）
投影层	将两种模态的特征映射到共同的多模态空间
对比损失函数	优化模型使得匹配的图文对在特征空间中靠近
温度参数	控制特征分布的"软硬程度"，影响对比学习的难度

2.2 CLIP训练流程

收集大量图像-文本对（如图像及其描述）
图像通过图像编码器生成图像特征
文本通过文本编码器生成文本特征
计算批次内所有可能图像-文本对的相似度矩阵
使用对比损失优化模型，使匹配对的相似度高，不匹配对的相似度低
不断迭代上述过程，直到模型收敛

2.3 CLIP训练与推理流程图

┌───────────────┐     ┌─────────────────┐
│   图像数据     │      │    文本数据     │
└───────┬───────┘     └────────┬────────┘
        │                      │
        ▼                      ▼
┌───────────────┐     ┌─────────────────┐
│  图像编码器     │     │   文本编码器      │
│  (ResNet/ViT) │     │ (Transformer)   │
└───────┬───────┘     └────────┬────────┘
        │                      │
        ▼                      ▼
┌───────────────┐     ┌─────────────────┐
│  图像特征向量   │     │  文本特征向量     │
└───────┬───────┘     └────────┬────────┘
        │                      │
        ▼                      ▼
┌──────────────────────────────┐
│     特征空间对齐（投影）        │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│        对比损失计算            │
└──────────────┬───────────────┘
               │
               ▼
┌──────────────────────────────┐
│         模型优化              │
└──────────────────────────────┘

3. 构建图文匹配系统：实战时间

现在让我们使用PyTorch实现一个简化版的CLIP模型，用于图文匹配。我们将使用预训练的ResNet作为图像编码器，使用预训练的BERT作为文本编码器。

3.1 环境准备

首先，我们需要安装必要的库：

# 安装所需库
# pip install torch torchvision transformers pillow matplotlib tqdm

3.2 导入必要的库

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader
from torchvision import models, transforms
from transformers import BertModel, BertTokenizer
from PIL import Image
import matplotlib.pyplot as plt
import os
import json
import random
from tqdm import tqdm
import numpy as np

3.3 定义我们的CLIP模型

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import models
from transformers import BertModel

class ImageEncoder(nn.Module):
    def __init__(self, embed_dim=512):
        super().__init__()
        # 使用预训练的ResNet50作为图像编码器
        self.model = models.resnet50(pretrained=True)
        # 移除最后的分类层
        self.model.fc = nn.Identity()
        # 添加投影层，将特征映射到指定维度
        self.projection = nn.Linear(2048, embed_dim)
        
    def forward(self, x):
        features = self.model(x)
        projected_features = self.projection(features)
        # 归一化特征向量
        return F.normalize(projected_features, p=2, dim=1)

class TextEncoder(nn.Module):
    def __init__(self, embed_dim=512):
        super().__init__()
        # 使用预训练的BERT作为文本编码器
        self.model = BertModel.from_pretrained('bert-base-uncased')
        # 添加投影层
        self.projection = nn.Linear(768, embed_dim)
        
    def forward(self, input_ids, attention_mask):
        # 获取BERT的[CLS]令牌输出作为文本表示
        outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
        cls_features = outputs.last_hidden_state[:, 0, :]  # 取[CLS]标记的特征
        projected_features = self.projection(cls_features)
        # 归一化特征向量
        return F.normalize(projected_features, p=2, dim=1)

class CLIP(nn.Module):
    def __init__(self, embed_dim=512, temperature=0.07):
        super().__init__()
        self.image_encoder = ImageEncoder(embed_dim)
        self.text_encoder = TextEncoder(embed_dim)
        self.temperature = temperature  # 温度参数控制softmax的平滑程度
        self.logit_scale = nn.Parameter(torch.ones([]) * np.log(1 / temperature))
        
    def forward(self, images, input_ids, attention_mask):
        # 获取图像和文本特征
        image_features = self.image_encoder(images)
        text_features = self.text_encoder(input_ids, attention_mask)
        
        # 计算图像-文本相似度矩阵
        logit_scale = self.logit_scale.exp()
        logits_per_image = logit_scale * image_features @ text_features.t()
        logits_per_text = logits_per_image.t()
        
        return logits_per_image, logits_per_text
    
    def encode_image(self, images):
        """单独编码图像"""
        return self.image_encoder(images)
    
    def encode_text(self, input_ids, attention_mask):
        """单独编码文本"""
        return self.text_encoder(input_ids, attention_mask)

3.4 创建数据集类

import torch
from torch.utils.data import Dataset
from PIL import Image
import os
from torchvision import transforms

class ImageTextDataset(Dataset):
    def __init__(self, image_paths, captions, tokenizer, transform=None, max_length=64):
        """
        图文对数据集
        Args:
            image_paths (list): 图像路径列表
            captions (list): 对应的文本描述列表
            tokenizer: BERT分词器
            transform: 图像变换
            max_length: 文本最大长度
        """
        self.image_paths = image_paths
        self.captions = captions
        self.tokenizer = tokenizer
        self.max_length = max_length
        
        # 设置默认的图像变换
        if transform is None:
            self.transform = transforms.Compose([
                transforms.Resize((224, 224)),
                transforms.ToTensor(),
                transforms.Normalize(mean=[0.485, 0.456, 0.406], 
                                     std=[0.229, 0.224, 0.225])
            ])
        else:
            self.transform = transform
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        # 加载并转换图像
        image_path = self.image_paths[idx]
        try:
            image = Image.open(image_path).convert('RGB')
            image = self.transform(image)
        except Exception as e:
            print(f"Error loading image {image_path}: {e}")
            # 生成一个随机图像作为替代
            image = torch.randn(3, 224, 224)
        
        # 处理文本
        caption = self.captions[idx]
        encoding = self.tokenizer(
            caption,
            padding='max_length',
            truncation=True,
            max_length=self.max_length,
            return_tensors='pt'
        )
        
        # 去掉批次维度
        input_ids = encoding['input_ids'].squeeze()
        attention_mask = encoding['attention_mask'].squeeze()
        
        return {
            'image': image,
            'input_ids': input_ids,
            'attention_mask': attention_mask,
            'caption': caption  # 保留原始文本以便后续分析
        }

class ImageTextContrastiveDataset(ImageTextDataset):
    """
    增强版数据集，为对比学习提供正负样本对
    """
    def __init__(self, image_paths, captions, tokenizer, transform=None, max_length=64, negative_samples=1):
        super().__init__(image_paths, captions, tokenizer, transform, max_length)
        self.negative_samples = negative_samples
    
    def __getitem__(self, idx):
        # 获取正样本对
        pos_sample = super().__getitem__(idx)
        
        # 为每个样本构建负样本
        neg_indices = []
        for _ in range(self.negative_samples):
            neg_idx = random.randint(0, len(self) - 1)
            while neg_idx == idx:  # 确保不选择自身作为负样本
                neg_idx = random.randint(0, len(self) - 1)
            neg_indices.append(neg_idx)
        
        # 获取负样本文本
        neg_captions = [self.captions[i] for i in neg_indices]
        neg_encodings = self.tokenizer(
            neg_captions,
            padding='max_length',
            truncation=True,
            max_length=self.max_length,
            return_tensors='pt'
        )
        
        # 将正负样本组合在一起
        pos_sample['neg_input_ids'] = neg_encodings['input_ids']
        pos_sample['neg_attention_mask'] = neg_encodings['attention_mask']
        pos_sample['neg_captions'] = neg_captions
        
        return pos_sample

3.5 实现对比损失函数和训练逻辑

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.optim import Adam
from torch.optim.lr_scheduler import CosineAnnealingLR
from tqdm import tqdm
import numpy as np

class CLIPLoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.cross_entropy = nn.CrossEntropyLoss()
        
    def forward(self, logits_per_image, logits_per_text):
        # 创建标签：对角线上为匹配的图文对
        batch_size = logits_per_image.shape[0]
        labels = torch.arange(batch_size).to(logits_per_image.device)
        
        # 计算图像到文本和文本到图像的损失
        loss_i2t = self.cross_entropy(logits_per_image, labels)
        loss_t2i = self.cross_entropy(logits_per_text, labels)
        
        # 总损失是两者的平均
        total_loss = (loss_i2t + loss_t2i) / 2
        
        return total_loss

class CLIPTrainer:
    def __init__(self, model, train_dataloader, val_dataloader=None, 
                 device='cuda' if torch.cuda.is_available() else 'cpu',
                 lr=1e-4, weight_decay=0.01, epochs=10):
        self.model = model.to(device)
        self.train_dataloader = train_dataloader
        self.val_dataloader = val_dataloader
        self.device = device
        
        # 初始化优化器
        self.optimizer = Adam(model.parameters(), lr=lr, weight_decay=weight_decay)
        self.scheduler = CosineAnnealingLR(self.optimizer, T_max=epochs)
        self.criterion = CLIPLoss()
        self.epochs = epochs
        
        # 跟踪指标
        self.train_losses = []
        self.val_losses = []
        self.best_val_loss = float('inf')
    
    def train_epoch(self):
        self.model.train()
        total_loss = 0
        
        for batch in tqdm(self.train_dataloader, desc='Training'):
            # 将数据移至设备
            images = batch['image'].to(self.device)
            input_ids = batch['input_ids'].to(self.device)
            attention_mask = batch['attention_mask'].to(self.device)
            
            # 前向传播
            logits_per_image, logits_per_text = self.model(images, input_ids, attention_mask)
            
            # 计算损失
            loss = self.criterion(logits_per_image, logits_per_text)
            
            # 反向传播和优化
            self.optimizer.zero_grad()
            loss.backward()
            self.optimizer.step()
            
            total_loss += loss.item()
        
        avg_loss = total_loss / len(self.train_dataloader)
        self.train_losses.append(avg_loss)
        return avg_loss
    
    def validate(self):
        if self.val_dataloader is None:
            return None
        
        self.model.eval()
        total_loss = 0
        
        with torch.no_grad():
            for batch in tqdm(self.val_dataloader, desc='Validating'):
                # 将数据移至设备
                images = batch['image'].to(self.device)
                input_ids = batch['input_ids'].to(self.device)
                attention_mask = batch['attention_mask'].to(self.device)
                
                # 前向传播
                logits_per_image, logits_per_text = self.model(images, input_ids, attention_mask)
                
                # 计算损失
                loss = self.criterion(logits_per_image, logits_per_text)
                total_loss += loss.item()
        
        avg_loss = total_loss / len(self.val_dataloader)
        self.val_losses.append(avg_loss)
        
        # 保存最佳模型
        if avg_loss < self.best_val_loss:
            self.best_val_loss = avg_loss
            torch.save(self.model.state_dict(), 'best_clip_model.pth')
            
        return avg_loss
    
    def train(self):
        print(f"Training on {self.device}")
        
        for epoch in range(self.epochs):
            print(f"\nEpoch {epoch+1}/{self.epochs}")
            
            # 训练一个epoch
            train_loss = self.train_epoch()
            print(f"Training Loss: {train_loss:.4f}")
            
            # 验证
            if self.val_dataloader is not None:
                val_loss = self.validate()
                print(f"Validation Loss: {val_loss:.4f}")
            
            # 更新学习率
            self.scheduler.step()
            current_lr = self.scheduler.get_last_lr()[0]
            print(f"Learning Rate: {current_lr:.6f}")
        
        # 保存最终模型
        torch.save(self.model.state_dict(), 'final_clip_model.pth')
        print("Training completed!")
        
        return self.train_losses, self.val_losses

3.6 完整的训练流程

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, random_split
from torchvision import transforms
from transformers import BertTokenizer
import matplotlib.pyplot as plt
import os
import json
import random
import numpy as np
from PIL import Image
from tqdm import tqdm
import argparse

# 导入我们自己定义的模块
from clip_model import CLIP
from clip_dataset import ImageTextDataset
from clip_trainer import CLIPTrainer

# 设置随机种子以便复现结果
def set_seed(seed=42):
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    if torch.cuda.is_available():
        torch.cuda.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

def load_flickr8k_data(root_dir):
    """
    加载Flickr8k数据集，这是一个常用的图文配对数据集
    Args:
        root_dir: 数据集根目录
    Returns:
        image_paths: 图像路径列表
        captions: 对应的文本描述列表
    """
    images_dir = os.path.join(root_dir, 'Images')
    captions_file = os.path.join(root_dir, 'captions.txt')
    
    # 读取图像文件列表
    image_files = [f for f in os.listdir(images_dir) if f.endswith(('.jpg', '.jpeg', '.png'))]
    
    # 读取描述文件
    with open(captions_file, 'r', encoding='utf-8') as f:
        captions_data = f.readlines()
    
    # 解析描述文件并匹配图像
    image_paths = []
    captions = []
    
    for line in captions_data[1:]:  # 跳过标题行
        parts = line.strip().split(',')
        if len(parts) >= 2:
            image_name = parts[0]
            caption = parts[1]
            
            if image_name in image_files:
                image_path = os.path.join(images_dir, image_name)
                image_paths.append(image_path)
                captions.append(caption)
    
    return image_paths, captions

def create_dummy_data(num_samples=100):
    """
    创建虚拟数据用于测试
    """
    # 设定一些文本描述模板
    objects = ["cat", "dog", "bird", "car", "flower", "tree", "house", "person", "book", "computer"]
    colors = ["red", "blue", "green", "yellow", "black", "white", "purple", "orange"]
    actions = ["running", "sleeping", "playing", "standing", "sitting", "flying", "driving", "reading"]
    locations = ["in the park", "on the beach", "in the house", "on the street", "in the garden"]
    
    # 创建随机图像（只是为了测试，实际应用中应使用真实图像）
    os.makedirs("dummy_images", exist_ok=True)
    image_paths = []
    captions = []
    
    for i in range(num_samples):
        # 创建随机彩色图像
        img = Image.new('RGB', (224, 224), color=(
            random.randint(0, 255),
            random.randint(0, 255),
            random.randint(0, 255)
        ))
        
        # 保存图像
        image_path = f"dummy_images/image_{i}.jpg"
        img.save(image_path)
        image_paths.append(image_path)
        
        # 生成随机描述
        obj = random.choice(objects)
        color = random.choice(colors)
        action = random.choice(actions)
        location = random.choice(locations)
        
        caption = f"A {color} {obj} {action} {location}"
        captions.append(caption)
    
    return image_paths, captions

def plot_training_curves(train_losses, val_losses=None):
    """绘制训练曲线"""
    plt.figure(figsize=(10, 5))
    plt.plot(train_losses, label='Training Loss')
    if val_losses:
        plt.plot(val_losses, label='Validation Loss')
    plt.xlabel('Epochs')
    plt.ylabel('Loss')
    plt.title('CLIP Training Curves')
    plt.legend()
    plt.grid(True)
    plt.savefig('training_curves.png')
    plt.show()

def visualize_image_text_matches(model, dataloader, device, num_examples=5):
    """可视化图像-文本匹配结果"""
    model.eval()
    
    # 获取一批数据
    batch = next(iter(dataloader))
    images = batch['image'].to(device)
    input_ids = batch['input_ids'].to(device)
    attention_mask = batch['attention_mask'].to(device)
    captions = batch['caption']
    
    # 获取相似度
    with torch.no_grad():
        logits_per_image, _ = model(images, input_ids, attention_mask)
        similarities = logits_per_image.cpu().numpy()
    
    # 可视化前num_examples个样本
    plt.figure(figsize=(15, num_examples * 3))
    
    for i in range(min(num_examples, len(images))):
        # 获取当前图像
        img = images[i].cpu().permute(1, 2, 0).numpy()
        # 反归一化
        mean = np.array([0.485, 0.456, 0.406])
        std = np.array([0.229, 0.224, 0.225])
        img = img * std + mean
        img = np.clip(img, 0, 1)
        
        # 获取相似度
        similarity_scores = similarities[i]
        
        # 绘制图像
        plt.subplot(num_examples, 2, i*2+1)
        plt.imshow(img)
        plt.title(f"Image {i}")
        plt.axis('off')
        
        # 绘制相似度条形图
        plt.subplot(num_examples, 2, i*2+2)
        top_k = min(5, len(similarity_scores))
        indices = np.argsort(similarity_scores)[::-1][:top_k]
        
        plt.barh(range(top_k), [similarity_scores[idx] for idx in indices])
        plt.yticks(range(top_k), [captions[idx][:50] + '...' for idx in indices])
        plt.xlabel('Similarity Score')
        plt.title(f"Top {top_k} Caption Matches")
        plt.tight_layout()
    
    plt.savefig('image_text_matches.png')
    plt.show()

def main(args):
    # 设置随机种子
    set_seed(args.seed)
    
    # 设置设备
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")
    
    # 加载数据
    if args.use_dummy_data:
        print("Using dummy data for testing...")
        image_paths, captions = create_dummy_data(args.num_samples)
    else:
        print(f"Loading data from {args.data_dir}...")
        image_paths, captions = load_flickr8k_data(args.data_dir)
    
    print(f"Loaded {len(image_paths)} image-caption pairs")
    
    # 初始化BERT分词器
    tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
    
    # 创建数据集
    transform = transforms.Compose([
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    dataset = ImageTextDataset(
        image_paths=image_paths,
        captions=captions,
        tokenizer=tokenizer,
        transform=transform,
        max_length=args.max_length
    )
    
    # 分割数据集
    train_size = int(0.8 * len(dataset))
    val_size = len(dataset) - train_size
    train_dataset, val_dataset = random_split(dataset, [train_size, val_size])
    
    # 创建数据加载器
    train_dataloader = DataLoader(
        train_dataset,
        batch_size=args.batch_size,
        shuffle=True,
        num_workers=args.num_workers
    )
    
    val_dataloader = DataLoader(
        val_dataset,
        batch_size=args.batch_size,
        shuffle=False,
        num_workers=args.num_workers
    )
    
    # 初始化模型
    model = CLIP(embed_dim=args.embed_dim, temperature=args.temperature)
    
    # 初始化训练器
    trainer = CLIPTrainer(
        model=model,
        train_dataloader=train_dataloader,
        val_dataloader=val_dataloader,
        device=device,
        lr=args.learning_rate,
        weight_decay=args.weight_decay,
        epochs=args.epochs
    )
    
    # 训练模型
    print("Starting training...")
    train_losses, val_losses = trainer.train()
    
    # 绘制训练曲线
    plot_training_curves(train_losses, val_losses)
    
    # 可视化图像-文本匹配
    if args.visualize:
        print("Visualizing image-text matches...")
        visualize_image_text_matches(model, val_dataloader, device, num_examples=5)
    
    print("Done!")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="CLIP Training")
    
    # 数据相关参数
    parser.add_argument("--data_dir", type=str, default="./flickr8k", help="Directory with images and captions")
    parser.add_argument("--use_dummy_data", action="store_true", help="Use dummy data for testing")
    parser.add_argument("--num_samples", type=int, default=100, help="Number of dummy samples to generate")
    parser.add_argument("--max_length", type=int, default=64, help="Maximum length of text")
    
    # 模型相关参数
    parser.add_argument("--embed_dim", type=int, default=512, help="Embedding dimension")
    parser.add_argument("--temperature", type=float, default=0.07, help="Temperature parameter")
    
    # 训练相关参数
    parser.add_argument("--batch_size", type=int, default=32, help="Batch size")
    parser.add_argument("--learning_rate", type=float, default=1e-4, help="Learning rate")
    parser.add_argument("--weight_decay", type=float, default=0.01, help="Weight decay")
    parser.add_argument("--epochs", type=int, default=10, help="Number of epochs")
    parser.add_argument("--num_workers", type=int, default=4, help="Number of workers for data loading")
    parser.add_argument("--seed", type=int, default=42, help="Random seed")
    
    # 其他参数
    parser.add_argument("--visualize", action="store_true", help="Visualize results")
    
    args = parser.parse_args()
    main(args)

3.7 使用训练好的模型进行图文检索

import torch
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import transforms
from transformers import BertTokenizer
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import os
from tqdm import tqdm

# 导入自定义模块
from clip_model import CLIP

class CLIPRetrieval:
    def __init__(self, model_path, device='cuda' if torch.cuda.is_available() else 'cpu'):
        """
        初始化CLIP检索系统
        Args:
            model_path: 预训练模型路径
            device: 使用的设备
        """
        self.device = device
        self.model = CLIP(embed_dim=512).to(device)
        
        # 加载预训练权重
        self.model.load_state_dict(torch.load(model_path, map_location=device))
        self.model.eval()
        
        # 初始化BERT分词器
        self.tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
        
        # 图像转换
        self.transform = transforms.Compose([
            transforms.Resize((224, 224)),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        ])
        
        # 存储索引
        self.image_features = None
        self.image_paths = None
    
    def build_image_index(self, image_dir):
        """
        为图像目录构建特征索引
        Args:
            image_dir: 图像目录路径
        """
        # 获取所有图像文件
        self.image_paths = [
            os.path.join(image_dir, f) for f in os.listdir(image_dir) 
            if f.lower().endswith(('.jpg', '.jpeg', '.png'))
        ]
        
        # 收集所有图像特征
        all_features = []
        
        print(f"Indexing {len(self.image_paths)} images...")
        for img_path in tqdm(self.image_paths):
            try:
                # 加载并转换图像
                img = Image.open(img_path).convert('RGB')
                img_tensor = self.transform(img).unsqueeze(0).to(self.device)
                
                # 计算特征
                with torch.no_grad():
                    features = self.model.encode_image(img_tensor)
                
                all_features.append(features.cpu())
            except Exception as e:
                print(f"Error processing {img_path}: {e}")
                
        # 将所有特征连接为一个张量
        self.image_features = torch.cat(all_features, dim=0)
        print(f"Indexed {len(self.image_features)} images successfully")
    
    def text_to_image_search(self, query_text, top_k=5):
        """
        使用文本查询图像
        Args:
            query_text: 查询文本
            top_k: 返回的结果数量
        Returns:
            top_image_paths: 最相似图像的路径
            similarities: 相似度分数
        """
        if self.image_features is None or self.image_paths is None:
            raise ValueError("Image index not built. Call build_image_index first.")
        
        # 对文本进行编码
        encoding = self.tokenizer(
            query_text,
            padding='max_length',
            truncation=True,
            max_length=64,
            return_tensors='pt'
        )
        
        input_ids = encoding['input_ids'].to(self.device)
        attention_mask = encoding['attention_mask'].to(self.device)
        
        # 计算文本特征
        with torch.no_grad():
            text_features = self.model.encode_text(input_ids, attention_mask)
        
        # 计算与所有图像的相似度
        text_features = text_features.cpu()
        similarities = F.cosine_similarity(text_features, self.image_features)
        
        # 获取最相似的图像
        top_indices = similarities.argsort(descending=True)[:top_k]
        top_image_paths = [self.image_paths[idx] for idx in top_indices]
        top_similarities = similarities[top_indices].tolist()
        
        return top_image_paths, top_similarities
    
    def image_to_text_search(self, query_image_path, text_candidates, top_k=5):
        """
        使用图像查询文本
        Args:
            query_image_path: 查询图像路径
            text_candidates: 候选文本列表
            top_k: 返回的结果数量
        Returns:
            top_texts: 最相似的文本
            similarities: 相似度分数
        """
        # 加载并处理查询图像
        img = Image.open(query_image_path).convert('RGB')
        img_tensor = self.transform(img).unsqueeze(0).to(self.device)
        
        # 编码图像
        with torch.no_grad():
            image_features = self.model.encode_image(img_tensor)
        
        # 编码所有候选文本
        text_features = []
        for text in text_candidates:
            encoding = self.tokenizer(
                text,
                padding='max_length',
                truncation=True,
                max_length=64,
                return_tensors='pt'
            )
            
            input_ids = encoding['input_ids'].to(self.device)
            attention_mask = encoding['attention_mask'].to(self.device)
            
            with torch.no_grad():
                features = self.model.encode_text(input_ids, attention_mask)
                text_features.append(features)
        
        # 连接所有文本特征
        text_features = torch.cat(text_features, dim=0)
        
        # 计算相似度
        image_features = image_features.cpu()
        text_features = text_features.cpu()
        similarities = F.cosine_similarity(image_features, text_features)
        
        # 获取最相似的文本
        top_indices = similarities.argsort(descending=True)[:top_k]
        top_texts = [text_candidates[idx] for idx in top_indices]
        top_similarities = similarities[top_indices].tolist()
        
        return top_texts, top_similarities
    
    def visualize_text_search_results(self, query_text, top_k=5):
        """
        可视化文本到图像搜索结果
        Args:
            query_text: 查询文本
            top_k: 结果数量
        """
        # 执行搜索
        top_image_paths, similarities = self.text_to_image_search(query_text, top_k)
        
        # 可视化结果
        plt.figure(figsize=(15, 10))
        for i, (img_path, similarity) in enumerate(zip(top_image_paths, similarities)):
            # 加载图像
            img = Image.open(img_path).convert('RGB')
            
            # 显示图像
            plt.subplot(1, top_k, i + 1)
            plt.imshow(img)
            plt.title(f"Similarity: {similarity:.3f}")
            plt.axis('off')
        
        plt.suptitle(f'Images matching: "{query_text}"', fontsize=16)
        plt.tight_layout()
        plt.subplots_adjust(top=0.9)
        plt.savefig('text_search_results.png')
        plt.show()

# 使用示例
def demo_clip_retrieval():
    """演示CLIP检索系统的使用"""
    # 初始化检索系统
    retrieval = CLIPRetrieval(model_path='best_clip_model.pth')
    
    # 构建图像索引
    image_dir = 'dummy_images'  # 替换为你的图像目录
    retrieval.build_image_index(image_dir)
    
    # 执行文本到图像搜索
    query_text = "A red cat sitting in the garden"
    retrieval.visualize_text_search_results(query_text, top_k=5)
    
    # 执行图像到文本搜索
    query_image_path = 'dummy_images/image_0.jpg'  # 替换为你的查询图像
    text_candidates = [
        "A red cat sitting in the garden",
        "A blue car driving on the street",
        "A person reading a book in the park",
        "A yellow dog playing in the house",
        "A white flower in the garden"
    ]
    
    top_texts, similarities = retrieval.image_to_text_search(
        query_image_path, text_candidates
    )
    
    print(f"Image: {query_image_path}")
    print("Top matching texts:")
    for text, sim in zip(top_texts, similarities):
        print(f"{sim:.3f}: {text}")

if __name__ == "__main__":
    demo_clip_retrieval()

4. 理解CLIP的核心机制与训练技巧

4.1 对比学习详解

对比学习（Contrastive Learning）是CLIP模型的核心训练方法。它的基本思想是：

将语义相关的样本对（如图片和对应的描述）在特征空间中拉近
将语义不相关的样本对（如图片和随机的描述）在特征空间中推远

这种学习范式使得模型能够学习到跨模态的语义对齐，从而实现图文匹配。

4.2 CLIP模型核心训练技巧

技巧	描述	作用
批量大小	使用大批量训练	提供更多的负样本，提高对比学习效果
温度参数	调整logit缩放的温度系数	控制softmax分布的锐度，影响学习难度
数据增强	对图像应用数据增强	提高模型的鲁棒性和泛化能力
对称损失	同时使用图像→文本和文本→图像两个方向的损失	确保两个模态的特征空间对齐
学习率预热	初期使用较小学习率，然后逐渐增大	稳定初期训练，避免早期发散
权重衰减	应用适当的权重正则化	防止过拟合，提高泛化能力
特征归一化	对特征向量进行L2归一化	使用余弦相似度度量特征向量间的相似度
模型选择	为不同模态选择合适的编码器	平衡性能与计算效率

4.3 困难负样本挖掘

标准的对比学习在批次内随机选择负样本，但随着训练的进行，这些负样本可能变得"过于简单"，模型很容易将它们与正样本区分开。为了让模型继续进步，我们可以使用困难负样本挖掘技术：

在线困难负样本挖掘：在每个批次中，选择与锚点样本最相似的负样本作为困难负样本
硬采样：仅使用困难负样本进行训练
混合采样：结合随机负样本和困难负样本
加权对比损失：根据样本对的困难程度动态调整权重

以下是一个实现困难负样本挖掘的代码示例：

import torch
import torch.nn as nn
import torch.nn.functional as F

class HardNegativeMiningLoss(nn.Module):
    def __init__(self, temperature=0.07, hard_negative_ratio=0.5):
        super().__init__()
        self.temperature = temperature
        self.hard_negative_ratio = hard_negative_ratio
        self.cross_entropy = nn.CrossEntropyLoss()
        
    def forward(self, image_features, text_features):
        """
        具有困难负样本挖掘的对比损失
        Args:
            image_features: 图像特征 [batch_size, embed_dim]
            text_features: 文本特征 [batch_size, embed_dim]
        Returns:
            loss: 总损失
        """
        # 确保特征已经归一化
        image_features = F.normalize(image_features, p=2, dim=1)
        text_features = F.normalize(text_features, p=2, dim=1)
        
        # 计算余弦相似度矩阵
        batch_size = image_features.shape[0]
        sim_matrix = torch.matmul(image_features, text_features.t()) / self.temperature
        
        # 创建目标标签：对角线为正样本
        labels = torch.arange(batch_size).to(sim_matrix.device)
        
        # 挖掘困难负样本
        # 移除对角线元素（正样本）以寻找困难负样本
        mask = torch.ones_like(sim_matrix, dtype=torch.bool)
        mask.fill_diagonal_(False)
        
        # 对于每个锚点，找到最相似的负样本
        hard_negative_indices = []
        
        for i in range(batch_size):
            # 获取当前行/列的非对角元素
            neg_sim_i2t = sim_matrix[i][mask[i]]
            neg_sim_t2i = sim_matrix[:, i][mask[:, i]]
            
            # 找到最相似的负样本索引
            hard_neg_i2t = torch.argsort(neg_sim_i2t, descending=True)
            hard_neg_t2i = torch.argsort(neg_sim_t2i, descending=True)
            
            # 转换为原始索引
            hard_neg_i2t_indices = torch.nonzero(mask[i]).squeeze()[hard_neg_i2t]
            hard_neg_t2i_indices = torch.nonzero(mask[:, i]).squeeze()[hard_neg_t2i]
            
            hard_negative_indices.append((hard_neg_i2t_indices, hard_neg_t2i_indices))
        
        # 计算常规对比损失
        i2t_loss = self.cross_entropy(sim_matrix, labels)
        t2i_loss = self.cross_entropy(sim_matrix.t(), labels)
        regular_loss = (i2t_loss + t2i_loss) / 2
        
        # 如果不使用困难负样本，直接返回常规损失
        if self.hard_negative_ratio <= 0:
            return regular_loss
        
        # 计算困难负样本损失
        hard_loss = 0
        # 只使用最困难的K个负样本
        k = max(1, int(batch_size * self.hard_negative_ratio))
        
        for i in range(batch_size):
            # 获取硬负样本的索引
            hard_i2t_indices = hard_negative_indices[i][0][:k]
            hard_t2i_indices = hard_negative_indices[i][1][:k]
            
            # 创建只包含正样本和困难负样本的相似度向量
            i2t_indices = torch.cat([torch.tensor([i]), hard_i2t_indices]).to(sim_matrix.device)
            t2i_indices = torch.cat([torch.tensor([i]), hard_t2i_indices]).to(sim_matrix.device)
            
            i2t_sim = sim_matrix[i, i2t_indices]
            t2i_sim = sim_matrix[t2i_indices, i]
            
            # 创建对应的标签
            i2t_label = torch.zeros(k + 1, dtype=torch.long).to(sim_matrix.device)
            t2i_label = torch.zeros(k + 1, dtype=torch.long).to(sim_matrix.device)
            
            # 计算困难对比损失
            i2t_hard_loss = self.cross_entropy(i2t_sim.unsqueeze(0), i2t_label)
            t2i_hard_loss = self.cross_entropy(t2i_sim.unsqueeze(0), t2i_label)
            
            hard_loss += (i2t_hard_loss + t2i_hard_loss) / 2
        
        hard_loss /= batch_size
        
        # 组合常规损失和困难样本损失
        total_loss = (1 - self.hard_negative_ratio) * regular_loss + self.hard_negative_ratio * hard_loss
        
        return total_loss


# 使用示例
def demo_hard_negative_mining():
    # 创建随机特征
    batch_size = 8
    embed_dim = 512
    
    image_features = torch.randn(batch_size, embed_dim)
    text_features = torch.randn(batch_size, embed_dim)
    
    # 标准对比损失
    standard_loss = HardNegativeMiningLoss(hard_negative_ratio=0)
    std_loss_value = standard_loss(image_features, text_features)
    
    # 困难负样本对比损失
    hard_negative_loss = HardNegativeMiningLoss(hard_negative_ratio=0.5)
    hard_loss_value = hard_negative_loss(image_features, text_features)
    
    print(f"Standard Contrastive Loss: {std_loss_value.item():.4f}")
    print(f"Hard Negative Mining Loss: {hard_loss_value.item():.4f}")
    
    return std_loss_value, hard_loss_value

清华大学全五版的《DeepSeek教程》完整的文档需要的朋友，关注我私信：deepseek 即可获得。

怎么样今天的内容还满意吗？再次感谢朋友们的观看，关注GZH：凡人的AI工具箱，回复666，送您价值199的AI大礼包。最后，祝您早日实现财务自由，还请给个赞，谢谢！

你可能感兴趣的:(深度学习,pytorch,学习,AI编程,人工智能,python)

ATmega16微控制器编程与应用实践 love彤彤
本文还有配套的精品资源，点击获取简介：ATmega16是一个基于AVR架构的8位微控制器，广泛用于嵌入式系统控制应用。本文将详细介绍如何在ATmega16上实现1602液晶显示、独立键盘操作、数码管扫描、蜂鸣器控制和流水灯设计等常用功能。通过这些功能的实践项目，读者可以掌握C语言在嵌入式系统开发中的应用，包括I/O口编程、定时器设置、中断处理和串行通信等关键技术。1.ATmega16微控制器简介A
强化学习RLHF详解贝塔西塔强化学习大模型人工智能深度学习机器学习算法语言模型
RLHF（ReinforcementLearningfromHumanFeedback）模型详解一、背景1.传统强化学习的局限性传统的强化学习（ReinforcementLearning,RL）依赖于预定义的奖励函数（RewardFunction），但在复杂任务（如自然语言生成、机器人控制）中，设计精确的奖励函数极为困难。例如：模糊目标：生成“高质量文本”难以量化，无法用简单的指标（如BLEU、R
强人工智能是否会诞生于现在的AI之中一花·一叶人工智能语言模型
为什么我认为当前AI方法无法实现真正的人工智能？随着大模型的发展日新月异，越来越多的人开始相信我们正在接近通用人工智能（AGI）。然而，作为一名人工智能领域的算法工程师，我反而越来越确信：现有的技术路径——以Transformer为核心的深度神经网络，可能已经达到了它的能力上限。我们或许正站在一个新时代的门槛上：真正的强人工智能将不会诞生于现有的范式中，而需要一条全新的算法路径。Transform
PyTorch-Llama: 从零开始实现LLaMA 2模型教程乔昕连
PyTorch-Llama:从零开始实现LLaMA2模型教程pytorch-llamaLLaMA2implementedfromscratchinPyTorch项目地址:https://gitcode.com/gh_mirrors/py/pytorch-llama1.项目介绍PyTorch-Llama是一个在PyTorch平台上完全从零开始实现的LLaMA2模型仓库。该模型是一个强大的自回归语言模
音视频会议服务搭建(设计方案)-01 卜锦元音视频webrtc golang 流媒体websocket 音视频
前言最近在做音视频会议系统服务搭建的工作任务，因为内容过多，我会逐篇分享相关的设计方案、开发思路、编程语言、使用的组件集合等等。如果你也有大型音视频会议系统搭建架构的需求，希望这些可以对你有所帮助。EchoMeet音视频会议系统架构设计项目概述EchoMeet是基于WebRTC技术的企业级音视频会议解决方案，采用三层音视频架构和Go+Node.js双后端微服务设计，实现了高并发、低延迟、可扩展的视
pytorch小记（二十六）：全面解读 PyTorch 的 `torch.matmul`
pytorch小记（二十六）：全面解读PyTorch的`torch.matmul`PyTorch中的`torch.matmul`详解与使用指南一、什么是`torch.matmul`二、基本用法示例1.向量点积（1-D×1-D）2.二维矩阵乘法（2-D×2-D）3.批量矩阵乘法（≥3-D）4.向量与矩阵混合三、与`mm`、`bmm`的区别四、性能与数值稳定性五、典型应用场景六、注意事项七、总结在深度
pytorch小记（二十七）：深入理解 PyTorch 中的 `.contiguous()`：内存布局与数据不变性
pytorch小记（二十七）：深入理解PyTorch中的`.contiguous`：内存布局与数据不变性深入理解PyTorch中的`.contiguous()`：内存布局与数据不变性一、张量连续性（contiguity）概念二、`.contiguous()`的作用三、`.contiguous()`是否改变数值？四、与`.clone()`的区别五、常见使用场景六、总结深入理解PyTorch中的.co
Github 2025-07-03Go开源项目日报Top10 老孙正经胡说 github 开源 Github趋势分析开源项目 Python Golang
根据GithubTrendings的统计，今日(2025-07-03统计)共有10个项目上榜。根据开发语言中项目的数量，汇总情况如下：开发语言项目数量Go项目10JavaScript项目2Go编程语言：构建简单、可靠和高效的软件创建周期：3474天开发语言：Go协议类型：BSD3-Clause“New”or“Revised”LicenseStar数量：117964个Fork数量：16989次关注人
Github 2025-07-01 开源项目月报 Top16
根据GithubTrendings的统计，本月(2025-07-01统计)共有16个项目上榜。根据开发语言中项目的数量，汇总情况如下：开发语言项目数量TypeScript项目5JupyterNotebook项目4Python项目4Rust项目2JavaScript项目1非开发语言项目1Shell项目1Dockerfile项目1Java项目1C++项目1Vue项目1各种有趣主题的精彩清单创建周期：3
【unitrix】 4.12 通用2D仿射变换矩阵(matrix/types.rs） liuyuan77 我的unitrix库 rust
一、源码这段代码定义了一个通用的2D仿射变换矩阵结构，可用于表示二维空间中的各种线性变换。///通用2D仿射变换矩阵（元素仅需实现Copytrait）//////该矩阵可用于表示二维空间中的任意仿射变换，支持以下应用场景：///1.平面几何转换（平移/旋转/缩放/剪切）///2.颜色空间线性变换（如RGB到YUV转换）///3.带物理单位的量值转换（如像素到毫米的映射）///4.动画系统中的插值变
【Rust日报】Rust稳定2024版本将于 2025年2月20日发布
fastembed-rs-AI嵌入库FastEmbed的Rust实现,提供了快速的文本嵌入、图像嵌入和候选项重新排序功能。它具有以下主要特性:支持同步使用,无需依赖Tokio。使用@pykeio/ort进行高性能的ONNX推理。使用@huggingface/tokenizers进行快速编码。支持使用@rayon-rs/rayon进行批量嵌入生成和并行计算。默认模型是FlagEmbedding,在M
C++标准库大全(STL)
C++标准库大全(STL)1.容器（Containers）*问题类型：序列容器（std::vector,std::deque,std::list,std::forward_list,std::array,std::string）：各自的特点、底层实现、优缺点和适用场景？容器特点底层实现优点缺点适用场景std::vector动态数组，支持快速随机访问连续内存+三指针（数据头/尾/容量尾）随机访问O(
彻底解决Qt中文乱码以及汉字编码的问题(UTF-8/GBK) 梦新嵌入式 qt
原文链接：https://blog.csdn.net/m0_46577050/article/details/133804929
【unitrix】 4.13 类型级加一计算(add1.rs） liuyuan77 我的unitrix库 rust
一、源码这段代码实现了一个类型系统中的"加一"操作，通过Rust的特性(trait)和泛型编程来实现。//!类型级别的加一实现//!编制人:$ource//!修改版次:0版完成版//!本版次创建时间:2025年7月2日//!最后修改时间:无//!待完善问题：Float+1未实现//!实现规则：//!1.基础类型：Z0(0)→P1(1),P1(1)→B0(2),N1(-1)→Z0(0)//!2.B0
springboot切面编程aop的使用虾米大王 springboot spring boot java 数据库
Spring支持AspectJ的注解式切面编程。（1）使用@Aspect声明是一个切面。（2）使用@After、@Before、@Around定义建言（advice），可直接将拦截规则（切点）作为参数。（3）其中@After、@Before、@Around参数的拦截规则为切点（PointCut），为了使切点复用，可使用@PointCut专门定义拦截规则，然后在@After、@Before、@Aro
给定一个长度为n的数列，将这个数列按从小到大的顺序排列。1＜=n＜=200 （蓝桥杯训练题库）c/c++
#includeinti,n,j,v;intsort(int*a,intn){for(i=0;ia[j]){v=a[i];a[i]=a[j];a[j]=v;}}intmain(){scanf("%d",&n);inta[200];for(i=0;i#includeusingnamespacestd;intmain(){intn;cin>>n;inta[200];for(inti=0;i>a[i];
[学习]M-QAM的数学原理与调制解调原理详解（仿真示例）
M-QAM的数学原理与调制解调原理详解QAM（正交幅度调制）作为现代数字通信的核心技术，其数学原理和实现方法值得深入探讨。本文将分为数学原理、调制解调原理和实现要点三个部分进行系统阐述。文章目录M-QAM的数学原理与调制解调原理详解一、数学原理二、调制原理三、解调原理四、实现要点五、16QAM的Python仿真实现5.1完整仿真代码5.2关键代码解析5.3仿真结果分析六、性能优化方向七、MATLA
利用人名语言分类案例演示RNN、LSTM和GRU的区别（基于PyTorch） .30-06Springfield rnn lstm gru 分类人工智能 python pytorch
文章目录一、程序结构1.1程序整体结构1.2各模块功能关系流程图二、数据预处理模块详解2.1定义字符集和语言类别2.2读取数据2.3人名转换为one-hot编码张量2.4自定义数据集类2.5数据加载器三、模型定义模块详解3.1RNN模型3.2LSTM模型3.3GRU模型四、模型训练与测试模块详解4.1测试模型基本功能4.2模型训练主函数五、结果可视化与对比模块详解六、模型预测模块详解七、案例结果分
YOLOv8 轴承缺陷检测使用YOLOv8进行训练、评估和可视化预测结果包含1440张图片的轴承缺陷检测数据集 YOLO格式或XML格式 OICQQ67658008 YOLO xml 深度学习轴承缺陷数据检测算法人工智能
轴承缺陷检测4类1440张names:[‘aocao’,‘aoxian’,‘cashang’,‘huahen’]名称：：[‘凹槽’，‘凹陷’,‘卡伤’,‘划痕’]共1440张，8:1:1比例划分train：1152张，val：144张，test：144张标注文件为YOLO适用的txt格式或xml格式。可以直接用于模型训练。YOLOv8轴承缺陷检测importosimporttorchfromIPy
lesson1：Python入门知识你的电影很有趣 python 开发语言
目录文章目录前言一、python的语言特性1、语法简练2、解释型语言2.1解释型语言特点2.2编译型语言特点2.3执行效率比较3、标准库/第三方库4、支持面向对象二、windows常用命令三、程序的基本组成1、输入input2、运算3、输出print总结前言开始学习python的第一课一、python的语言特性1、语法简练变量不需要声明类型2、解释型语言2.1解释型语言特点需要解释器通过解释器逐行
c++中的绑定器 2301_80355452 c++开发语言算法
C++中的“绑定器”其实是指函数绑定工具，主要是用来将函数、对象、参数等绑定在一起，用于后续调用。这在回调函数、事件处理、异步编程中非常常见。接下来我会详细、通俗地介绍“绑定器”的核心概念、用法，以及一些常用的标准库工具。一、什么是“绑定器”？简单来说，绑定器就是一个能够绑定（封装）函数和其参数的工具或对象，让你可以像调用普通函数一样调用它，背后实际上是调用被绑定的函数并传递绑定的参数。举个生活例
学习笔记-JVM GC 绝不秃头的L君学习笔记 jvm jvm.gc
1.GC分类PartialGC并不会收集整个堆空间，仅仅包括新生代和老年代，不包含永久代（元空间）。YoungGC:只收集YoungGen的垃圾收集过程。OldGC：只收集OldGen的垃圾收集过程。（只有CMS的并发收集是这个模式）MixedGC：收集整个YoungGen以及部分OldGen的垃圾收集过程。（只有G1有这个模式）FullGC收集整个堆，包括YoungGen、OldGen以及Per
Flask实现MTV分层不会吃萝卜的兔子 flask flask分层 flask MTV分层
版本python3.6flask1.0.2每个版本的路径可能不同，但结构大体一样步骤1.简化入口文件run.pyfromflaskdemoimportappapp.run(host="127.0.0.1",port=80)2.配置文件config.pyDEBUG=False3.模型文件modes.py我的模型文件没有写，你也设置多个model放在一个文件夹下，注意修改路径4.视图文件views.p
spring bean生命周期学习记录不会吃萝卜的兔子 spring 学习 java
在SimpleAutowareConfig??中1，第三级缓存存放createBean的lambda表达式（BeanFactory采用函数式接口，使用时才会创建），当获取循环引用获取早期对象时（只实例化的bean），这个早期对象不知道有没有被AOP修饰，但AOP代理，要拿到完整的对象，才能正确代理，但是代理在beanAfterPostProcessor发生在popularBean属性填充之前。2，
从入门到精通：YashanDB数据库学习指南数据库
在现代的数据库技术领域，性能瓶颈和数据一致性问题是开发人员和数据库管理员（DBA）面临的重要挑战。随着数据量的激增和对实时分析的需求上升，如何有效管理和利用数据库显得尤为重要。YashanDB作为一款新兴数据库，提供了一系列功能以应对这些挑战，适合希望深入理解数据库体系结构的开发者和DBA。本文旨在提供一份全面的YashanDB学习指南，内容涵盖系统架构、核心功能，并为实际应用提供具体建议，使读者
Skia图形库：绘制2D世界的核心组件
文章摘要Skia是Google开发的跨平台2D图形库，核心组件包括：SkCanvas（绘图入口）、SkPaint（画笔样式）、SkPath（几何形状）、SkImage/SkBitmap（图片处理）、SkFont/SkTypeface（文本渲染）以及SkSurface（画布载体）。支持CPU/GPU/PDF/SVG多种渲染后端，广泛应用于Chrome、Flutter等项目中，提供从基础绘图到高级特效
10倍速开发！飞算JavaAI实战：5分钟生成SpringCloud完整工程 LCG元工具 Python 深度学习人工智能 spring cloud spring 后端
目录一、颠覆性架构设计二、5分钟生成实战步骤1：定义服务架构（YAML配置）步骤2：执行AI生成命令（Python驱动）步骤3：验证生成结果（终端操作）三、双流程图解析横向对比：传统开发vsAI生成纵向核心流程四、量化性能对比五、生产级部署方案安全审计实现高可用部署架构六、技术前瞻性分析七、附录：完整技术图谱传统SpringCloud工程搭建平均耗时8小时，而使用飞算JavaAI只需5分钟，开发效
AI+大数据：社交网络分析在金融风控中的完整流程 AI智能应用 AI大模型应用入门实战与进阶人工智能大数据 ai
AI+大数据：社交网络分析在金融风控中的完整流程关键词：AI、大数据、社交网络分析、金融风控、完整流程摘要：本文详细讲述了在金融风控领域运用AI和大数据进行社交网络分析的完整流程。通过通俗易懂的语言，从背景知识入手，解释核心概念，阐述算法原理，分享项目实战经验，探讨实际应用场景，推荐相关工具资源，展望未来发展趋势与挑战，旨在让读者全面了解这一复杂技术在金融风控中的应用。背景介绍目的和范围我们的目的
【学习笔记】jvm liu1251303815 学习笔记 jvm
1、jvm基础1.1什么是jvm?jvm是一种规范。jvm是一种什么样的规范？具体实现：hotspot2classFileFormat3：类加载-初始化3.1、loading加载class文件到内存中3.1.1引起类加载的情况new对象时调用静态属性，静态方法时。(访问staticfinal变量除外staticfinal修饰的是基本数据类型,或者字符串类型时,会替换为常量。比如有一个类A{stat
C# 的DllImport CHANGHAI1982 编程交流 C#c#api winapi dll .net google
最近在读《编程之美》，打算用C#实现其中一个题目，就是如何控制CPU的使用率在50%，使得在资源管理器中CPU利用率维持在一条直线。单核的还容易办到，但是现在的机器一般都是多核的，这样就需要调用Win32APISetThreadAffinityMask来给线程制定CPU去执行。但这个API只能在C++调用，那么在C#里如何调用呢？更进一步，就是在C#里为什么没有全部的WIN32API可以调用呢？有
mongodb3.03开启认证 21jhf mongodb
下载了最新mongodb3.03版本，当使用--auth 参数命令行开启mongodb用户认证时遇到很多问题，现总结如下：（百度上搜到的基本都是老版本的，看到db.addUser的就是，请忽略） Windows下我做了一个bat文件，用来启动mongodb，命令行如下： mongod --dbpath db\data --port 27017 --directoryperdb --logp
【Spark103】Task not serializable bit1129 Serializable
Task not serializable是Spark开发过程最令人头疼的问题之一，这里记录下出现这个问题的两个实例，一个是自己遇到的，另一个是stackoverflow上看到。等有时间了再仔细探究出现Task not serialiazable的各种原因以及出现问题后如何快速定位问题的所在，至少目前阶段碰到此类问题，没有什么章法 1. package spark.exampl
你所熟知的 LRU(最近最少使用) dalan_123 java
关于LRU这个名词在很多地方或听说，或使用，接下来看下lru缓存回收的实现 1、大体的想法 a、查询出最近最晚使用的项 b、给最近的使用的项做标记通过使用链表就可以完成这两个操作，关于最近最少使用的项只需要返回链表的尾部；标记最近使用的项，只需要将该项移除并放置到头部，那么难点就出现你如何能够快速在链表定位对应的该项？这时候多
Javascript 跨域周凡杨 JavaScript jsonp 跨域 cross-domain
linux下安装apache服务器 g21121 apache
安装apache 下载windows版本apache，下载地址：http://httpd.apache.org/download.cgi 1.windows下安装apache Windows下安装apache比较简单，注意选择路径和端口即可，这里就不再赘述了。 2.linux下安装apache：下载之后上传到linux的相关目录，这里指定为/home/apach
FineReport的JS编辑框和URL地址栏语法简介老A不折腾 finereport web报表报表软件语法总结
JS编辑框： 1.FineReport的js。作为一款BS产品，browser端的JavaScript是必不可少的。 FineReport中的js是已经调用了finereport.js的。大家知道，预览报表时，报表servlet会将cpt模板转为html，在这个html的head头部中会引入FineReport的js，这个finereport.js中包含了许多内置的fun
根据STATUS信息对MySQL进行优化墙头上一根草 status
mysql 查看当前正在执行的操作，即正在执行的sql语句的方法为: show processlist 命令 mysql> show global status;可以列出MySQL服务器运行各种状态值，我个人较喜欢的用法是show status like '查询值%';一、慢查询mysql> show variab
我的spring学习笔记7-Spring的Bean配置文件给Bean定义别名 aijuans Spring 3
本文介绍如何给Spring的Bean配置文件的Bean定义别名？原始的 <bean id="business" class="onlyfun.caterpillar.device.Business"> <property name="writer"> <ref b
高性能mysql 之性能剖析 annan211 性能 mysql mysql 性能剖析剖析
1 定义性能优化 mysql服务器性能，此处定义为响应时间。在解释性能优化之前，先来消除一个误解，很多人认为，性能优化就是降低cpu的利用率或者减少对资源的使用。这是一个陷阱。资源时用来消耗并用来工作的，所以有时候消耗更多的资源能够加快查询速度，保持cpu忙绿，这是必要的。很多时候发现编译进了新版本的InnoDB之后，cpu利用率上升的很厉害，这并不
主外键和索引唯一性约束百合不是茶索引唯一性约束主外键约束联机删除
目标;第一步;创建两张表用户表和文章表第二步;发表文章 1,建表; ---用户表 BlogUsers --userID唯一的 --userName --pwd --sex create
线程的调度 bijian1013 java 多线程 thread 线程的调度 java多线程
1. Java提供一个线程调度程序来监控程序中启动后进入可运行状态的所有线程。线程调度程序按照线程的优先级决定应调度哪些线程来执行。 2. 多数线程的调度是抢占式的（即我想中断程序运行就中断，不需要和将被中断的程序协商） a)
查看日志常用命令 bijian1013 linux 命令 unix
一.日志查找方法，可以用通配符查某台主机上的所有服务器grep "关键字" /wls/applogs/custom-*/error.log 二.查看日志常用命令1.grep '关键字' error.log：在error.log中搜索'关键字'2.grep -C10 '关键字' error.log：显示关键字前后10行记录3.grep '关键字' error.l
【持久化框架MyBatis3一】MyBatis版HelloWorld bit1129 helloworld
MyBatis这个系列的文章，主要参考《Java Persistence with MyBatis 3》。样例数据本文以MySQL数据库为例，建立一个STUDENTS表，插入两条数据，然后进行单表的增删改查 CREATE TABLE STUDENTS ( stud_id int(11) NOT NULL AUTO_INCREMENT,
【Hadoop十五】Hadoop Counter bit1129 hadoop
1. 只有Map任务的Map Reduce Job File System Counters FILE: Number of bytes read=3629530 FILE: Number of bytes written=98312 FILE: Number of read operations=0 FILE: Number of lar
解决Tomcat数据连接池无法释放 ronin47 tomcat 连接池　优化
近段时间，公司的检测中心报表系统(SMC)的开发人员时不时找到我，说用户老是出现无法登录的情况。前些日子因为手头上有Jboss集群的测试工作，发现用户不能登录时，都是在Tomcat中将这个项目Reload一下就好了，不过只是治标而已，因为大概几个小时之后又会再次出现无法登录的情况。今天上午，开发人员小毛又找到我，要我协助将这个问题根治一下，拖太久用户难保不投诉。简单分析了一
java-75-二叉树两结点的最低共同父结点 bylijinnan java
import java.util.LinkedList; import java.util.List; import ljn.help.*; public class BTreeLowestParentOfTwoNodes { public static void main(String[] args) { /* * node data is stored in
行业垂直搜索引擎网页抓取项目 carlwu Lucene Nutch Heritrix Solr
公司有一个搜索引擎项目，希望各路高人有空来帮忙指导，谢谢！这是详细需求：（1）通过提供的网站地址(大概100-200个网站)，网页抓取程序能不断抓取网页和其它类型的文件（如Excel、PDF、Word、ppt及zip类型），并且程序能够根据事先提供的规则，过滤掉不相干的下载内容。（2）程序能够搜索这些抓取的内容，并能对这些抓取文件按照油田名进行分类，然后放到服务器不同的目录中。
[通讯与服务]在总带宽资源没有大幅增加之前,不适宜大幅度降低资费 comsci 资源
降低通讯服务资费，就意味着有更多的用户进入，就意味着通讯服务提供商要接待和服务更多的用户，在总体运维成本没有由于技术升级而大幅下降的情况下，这种降低资费的行为将导致每个用户的平均带宽不断下降，而享受到的服务质量也在下降，这对用户和服务商都是不利的。。。。。。。。 &nbs
Java时区转换及时间格式 Cwind java
本文介绍Java API 中 Date, Calendar, TimeZone和DateFormat的使用，以及不同时区时间相互转化的方法和原理。问题描述：向处于不同时区的服务器发请求时需要考虑时区转换的问题。譬如，服务器位于东八区（北京时间，GMT+8:00），而身处东四区的用户想要查询当天的销售记录。则需把东四区的“今天”这个时间范围转换为服务器所在时区的时间范围。
readonly,只读，不可用 dashuaifu js jsp disable readOnly readOnly
readOnly 和 readonly 不同，在做js开发时一定要注意函数大小写和jsp黄线的警告！！！我就经历过这么一件事：使用readOnly在某些浏览器或同一浏览器不同版本有的可以实现“只读”功能，有的就不行，而且函数readOnly有黄线警告！！！就这样被折磨了不短时间！！！（期间使用过disable函数，但是发现disable函数之后后台接收不到前台的的数据！！！）
LABjs、RequireJS、SeaJS 介绍 dcj3sjt126com js Web
LABjs 的核心是 LAB（Loading and Blocking）：Loading 指异步并行加载，Blocking 是指同步等待执行。LABjs 通过优雅的语法（script 和 wait）实现了这两大特性，核心价值是性能优化。LABjs 是一个文件加载器。RequireJS 和 SeaJS 则是模块加载器，倡导的是一种模块化开发理念，核心价值是让 JavaScript 的模块化开发变得更
[应用结构]入口脚本 dcj3sjt126com PHP yii2
入口脚本入口脚本是应用启动流程中的第一环，一个应用（不管是网页应用还是控制台应用）只有一个入口脚本。终端用户的请求通过入口脚本实例化应用并将将请求转发到应用。 Web 应用的入口脚本必须放在终端用户能够访问的目录下，通常命名为 index.php，也可以使用 Web 服务器能定位到的其他名称。控制台应用的入口脚本一般在应用根目录下命名为 yii（后缀为.php），该文
haoop shell命令 eksliang hadoop hadoop shell
cat chgrp chmod chown copyFromLocal copyToLocal cp du dus expunge get getmerge ls lsr mkdir movefromLocal mv put rm rmr setrep stat tail test text
MultiStateView不同的状态下显示不同的界面 gundumw100 android
只要将指定的view放在该控件里面，可以该view在不同的状态下显示不同的界面，这对ListView很有用，比如加载界面，空白界面，错误界面。而且这些见面由你指定布局，非常灵活。 PS：ListView虽然可以设置一个EmptyView，但使用起来不方便，不灵活，有点累赘。 <com.kennyc.view.MultiStateView xmlns:android=&qu
jQuery实现页面内锚点平滑跳转 ini JavaScript html jquery html5 css
平时我们做导航滚动到内容都是通过锚点来做，刷的一下就直接跳到内容了，没有一丝的滚动效果，而且 url 链接最后会有“小尾巴”，就像#keleyi，今天我就介绍一款 jquery 做的滚动的特效，既可以设置滚动速度，又可以在 url 链接上没有“小尾巴”。效果体验：http://keleyi.com/keleyi/phtml/jqtexiao/37.htmHTML文件代码： &
kafka offset迁移 kane_xie kafka
在早前的kafka版本中（0.8.0），offset是被存储在zookeeper中的。到当前版本（0.8.2）为止，kafka同时支持offset存储在zookeeper和offset manager（broker）中。从官方的说明来看，未来offset的zookeeper存储将会被弃用。因此现有的基于kafka的项目如果今后计划保持更新的话，可以考虑在合适
android > 搭建 cordova 环境 mft8899 android
1 , 安装 node.js http://nodejs.org node -v 查看版本 2, 安装 npm 可以先从 https://github.com/isaacs/npm/tags 下载源码解压到
java封装的比较器，比较是否全相同，获取不同字段名字 qifeifei
非常实用的java比较器，贴上代码： import java.util.HashSet; import java.util.List; import java.util.Set; import net.sf.json.JSONArray; import net.sf.json.JSONObject; import net.sf.json.JsonConfig; i
记录一些函数用法 .Aky. 位运算 PHP 数据库函数 IP
高手们照旧忽略。想弄个全天朝IP段数据库，找了个今天最新更新的国内所有运营商IP段，copy到文件，用文件函数，字符串函数把玩下。分割出startIp和endIp这样格式写入.txt文件，直接用phpmyadmin导入.csv文件的形式导入。（生命在于折腾，也许你们觉得我傻X，直接下载人家弄好的导入不就可以，做自己的菜鸟，让别人去说吧）当然用到了ip2long()函数把字符串转为整型数
sublime text 3 rust wudixiaotie Sublime Text
1.sublime text 3 => install package => Rust 2.cd ~/.config/sublime-text-3/Packages 3.mkdir rust 4.git clone https://github.com/sp0/rust-style 5.cd rust-style 6.cargo build --release 7.ctrl