行为识别与分析是计算机视觉领域的一个重要研究方向。通过分析视频中的行为和动作,我们可以为许多实际应用提供智能解决方案,如智能监控、安防、医疗康复、体育分析等
实际项目:基于3D卷积神经网络(3D-CNN)的行为识别
import urllib.request
import zipfile
url = "https://www.crcv.ucf.edu/data/UCF101/UCF101.rar"
file_name = "UCF101.rar"
urllib.request.urlretrieve(url, file_name)
with zipfile.ZipFile(file_name, 'r') as zip_ref:
zip_ref.extractall("UCF101")
2.数据预处理:将视频帧提取为图像序列,并进行缩放、裁剪和归一化等操作。这里,我们将使用 OpenCV 库来处理视频文件:
import cv2
import os
def video_to_frames(video_file, target_folder, frame_size=(224, 224)):
video = cv2.VideoCapture(video_file)
count = 0
while video.isOpened():
ret, frame = video.read()
if not ret:
break
resized_frame = cv2.resize(frame, frame_size)
frame_path = os.path.join(target_folder, f"frame_{count:04d}.png")
cv2.imwrite(frame_path, resized_frame)
count += 1
video.release()
return count
3.模型构建:我们将使用一个预先训练的 3D-CNN 模型(如 I3D 或 C3D)作为基础模型,并根据 UCF101 数据集进行微调。这里,我们将使用 PyTorch 框架构建模型:
import torch
import torch.nn as nn
from torchvision.models import video
class ActionRecognitionModel(nn.Module):
def __init__(self, num_classes):
super().__init__()
self.base_model = video.r3d_18(pretrained=True)
self.base_model.fc = nn.Linear(self.base_model.fc.in_features, num_classes)
def forward(self, x):
return self.base_model(x)
4.训练与评估:我们将使用 PyTorch 训练和评估我们的行为识别模型。首先,我们需要定义损失函数、优化器和学习率调整策略。然后,我们将使用 UCF101 数据集进行训练,并在每个训练周期结束时评估模型的性能。
import torch.optim as optim
from torch.utils.data import DataLoader
from torchvision.transforms import Compose, ToTensor, Normalize
from custom_dataset import UCF101Dataset
# Hyperparameters
num_classes = 101
num_epochs = 20
batch_size = 16
learning_rate = 0.001
momentum = 0.9
weight_decay = 0.0005
# Dataset and DataLoader
train_transforms = Compose([ToTensor(), Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
val_transforms = Compose([ToTensor(), Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
train_dataset = UCF101Dataset("UCF101/train", train_transforms)
val_dataset = UCF101Dataset("UCF101/val", val_transforms)
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4)
val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=4)
# Model, Loss, Optimizer, and Scheduler
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ActionRecognitionModel(num_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=weight_decay)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1)
# Training and Evaluation Loop
for epoch in range(num_epochs):
# Training
model.train()
running_loss = 0.0
for i, (inputs, labels) in enumerate(train_loader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
for inputs, labels in val_loader:
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader)}, Accuracy: {correct / total}")
# Update learning rate
scheduler.step()
这段代码首先设置了训练的超参数,接着创建了训练和验证数据集以及相应的 DataLoader。接下来,我们定义了模型、损失函数、优化器和学习率调整策略。在训练和评估循环中,我们在每个训练周期结束时计算损失和准确率,并根据需要调整学习率。请注意,这里的 `UCF101Dataset` 类需要自行实现,您可以参考以下示例代码创建自定义的数据集类:
import os
import glob
from torch.utils.data import Dataset
from PIL import Image
class UCF101Dataset(Dataset):
def __init__(self, data_folder, transform=None):
self.data_folder = data_folder
self.transform = transform
self.samples = self._load_samples()
def __len__(self):
return len(self.samples)
def __getitem__(self, index):
video_folder, label = self.samples[index]
frames = self._load_frames(video_folder)
if self.transform is not None:
frames = [self.transform(frame) for frame in frames]
video_tensor = torch.stack(frames)
return video_tensor, label
def _load_samples(self):
samples = []
for class_folder in glob.glob(os.path.join(self.data_folder, "*")):
label = os.path.basename(class_folder)
for video_folder in glob.glob(os.path.join(class_folder, "*")):
samples.append((video_folder, label))
return samples
def _load_frames(self, video_folder):
frame_paths = sorted(glob.glob(os.path.join(video_folder, "*.png")))
frames = [Image.open(frame_path).convert("RGB") for frame_path in frame_paths]
return frames
在完成训练和评估后,您可以使用训练好的模型来识别新视频中的行为。为了提高模型的性能,您可以尝试使用更复杂的 3D-CNN 架构,或者结合其他技术,如循环神经网络(RNN)或长短时记忆网络(LSTM)。
总之,行为识别与分析是计算机视觉领域的一项重要技术。在本文中,我们介绍了一个基于 3D-CNN 的行为识别项目,并详细解释了项目步骤及相关代码。这种技术可以广泛应用于各种实际场景,如智能监控、安防、医疗康复等。