手语翻译系统_如何建立一个神经网络将手语翻译成英语

手语翻译系统

The author selected Code Org to receive a donation as part of the Write for DOnations program.

作者选择Code Org接受捐赠,这是Write for DOnations计划的一部分。

介绍 (Introduction)

Computer vision is a subfield of computer science that aims to extract a higher-order understanding from images and videos. This powers technologies such as fun video chat filters, your mobile device’s face authenticator, and self-driving cars.

计算机视觉是计算机科学的一个子领域,旨在从图像和视频中提取更高层次的理解。 这推动了诸如有趣的视频聊天过滤器,移动设备的面部识别器和自动驾驶汽车等技术的发展。

In this tutorial, you’ll use computer vision to build an American Sign Language translator for your webcam. As you work through the tutorial, you’ll use OpenCV, a computer-vision library, PyTorch to build a deep neural network, and onnx to export your neural network. You’ll also apply the following concepts as you build a computer-vision application:

在本教程中,您将使用计算机视觉为网络摄像头构建American Sign Language转换器。 在学习本教程的过程中,将使用OpenCV (一个计算机视觉库), PyTorch构建一个深层神经网络,并使用onnx导出您的神经网络。 在构建计算机视觉应用程序时,您还将应用以下概念:

  • You’ll use the same three-step method as used in How To Apply Computer Vision to Build an Emotion-Based Dog Filter tutorial: preprocess a dataset, train a model, and evaluate the model.

    您将使用与如何应用计算机视觉来构建基于情感的狗过滤器教程中所用的三步方法相同的方法:预处理数据集,训练模型并评估模型。

  • You’ll also expand each of these steps: employ data augmentation to address rotated or non-centered hands, change learning rate schedules to improve model accuracy, and export models for faster inference speed.

    您还将扩展这些步骤中的每一个步骤:利用数据增强来解决旋转或不居中的手,更改学习率时间表以提高模型准确性,并导出模型以加快推理速度。
  • Along the way, you’ll also explore related concepts in machine learning.

    在此过程中,您还将探索机器学习中的相关概念。

By the end of this tutorial, you’ll have both an American Sign Language translator and foundational deep learning know-how. You can also access the complete source code for this project.

在本教程结束时,您将拥有美国手语翻译者和基础深度学习知识。 您也可以访问该项目的完整源代码 。

先决条件 (Prerequisites)

To complete this tutorial, you will need the following:

要完成本教程,您将需要以下内容:

  • A local development environment for Python 3 with at least 1GB of RAM. You can follow How to Install and Set Up a Local Programming Environment for Python 3 to configure everything you need.

    具有至少1GB RAM的Python 3本地开发环境。 您可以按照如何为Python 3安装和设置本地编程环境来配置所需的一切。

  • A working webcam to do real-time image detection.

    可以进行实时图像检测的有效网络摄像头。
  • (Recommended) Build an Emotion-Based Dog Filter; this tutorial is not explicitly used but the same ideas are reinforced and built upon.

    (推荐) 构建基于情感的狗过滤器 ; 本教程未明确使用,但相同的思想得到了加强和建立。

第1步-创建项目并安装依赖项 (Step 1 — Creating the Project and Installing Dependencies)

Let’s create a workspace for this project and install the dependencies we’ll need.

让我们为该项目创建一个工作区并安装所需的依赖项。

On Linux distributions, start by preparing your system package manager and install the Python3 virtualenv package. Use:

在Linux发行版上,首先准备系统软件包管理器并安装Python3 virtualenv软件包。 用:

  • apt-get update

    apt-get更新
  • apt-get upgrade

    apt-get升级
  • apt-get install python3-venv

    apt-get安装python3-venv

We’ll call our workspace SignLanguage:

我们将工作区SignLanguage

  • mkdir ~/SignLanguage

    mkdir〜/ SignLanguage

Navigate to the SignLanguage directory:

导航到SignLanguage目录:

  • cd ~/SignLanguage

    cd〜/ SignLanguage

Then create a new virtual environment for the project:

然后为项目创建一个新的虚拟环境:

  • python3 -m venv signlanguage

    python3 -m venv 手语

Activate your environment:

激活您的环境:

  • source signlanguage/bin/activate

    源手语 / bin / activate

Then install PyTorch, a deep-learning framework for Python that we’ll use in this tutorial.

然后安装PyTorch ,这是我们在本教程中将使用的Python深度学习框架。

On macOS, install Pytorch with the following command:

在macOS上,使用以下命令安装Pytorch:

  • python -m pip install torch==1.2.0 torchvision==0.4.0

    python -m pip install torch == 1.2.0 torchvision == 0.4.0

On Linux and Windows, use the following commands for a CPU-only build:

在Linux和Windows上,对仅CPU的构建使用以下命令:

  • pip install torch==1.2.0+cpu torchvision==0.4.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

    pip install torch == 1.2.0 + cpu torchvision == 0.4.0 + cpu -f https://download.pytorch.org/whl/torch_stable.html
  • pip install torchvision

    pip安装torchvision

Now install prepackaged binaries for OpenCV, numpy, and onnx, which are libraries for computer vision, linear algebra, AI model exporting, and AI model execution, respectively. OpenCV offers utilities such as image rotations, and numpy offers linear algebra utilities such as a matrix inversion:

现在为OpenCVnumpyonnx安装预打包的二进制文件,它们分别是计算机视觉,线性代数,AI模型导出和AI模型执行的库。 OpenCV提供诸如图像旋转之类的实用程序,而numpy提供诸如矩阵求逆之类的线性代数实用程序:

  • python -m pip install opencv-python==3.4.3.18 numpy==1.14.5 onnx==1.6.0 onnxruntime==1.0.0

    python -m pip install opencv-python == 3.4.3.18 numpy == 1.14.5 onnx == 1.6.0 onnxruntime == 1.0.0

On Linux distributions, you will need to install libSM.so:

在Linux发行版上,您将需要安装libSM.so

  • apt-get install libsm6 libxext6 libxrender-dev

    apt-get安装libsm6 libxext6 libxrender-dev

With the dependencies installed, let’s build the first version of our sign language translator: a sign language classifier.

安装依赖项后,让我们构建手语翻译器的第一个版本:手语分类器。

第2步-准备手语分类数据集 (Step 2 — Preparing the Sign Language Classification Dataset)

In these next three sections, you’ll build a sign language classifier using a neural network. Your goal is to produce a model that accepts a picture of a hand as input and outputs a letter.

在接下来的三部分中,您将使用神经网络构建手语分类器。 您的目标是产生一个模型,该模型接受一只手的图片作为输入并输出一个字母。

The following three steps are required to build a machine learning classification model:

建立机器学习分类模型需要以下三个步骤:

  1. Preprocess the data: Apply one-hot encoding to your labels and wrap your data in PyTorch Tensors. Train your model on augmented data to prepare it for “unusual” input, like an off-center or rotated hand.

    预处理数据:对标签应用一键编码 ,然后将数据包装在PyTorch张量中。 在增强数据上训练模型,以使其为“异常”输入做好准备,例如偏心或旋转手。

  2. Specify and train the model: Set up a neural network using PyTorch. Define training hyper-parameters—such as how long to train for—and run stochastic gradient descent. You’ll also vary a specific training hyper-parameter, which is learning rate schedule. These will boost model accuracy.

    指定并训练模型:使用PyTorch建立神经网络。 定义训练超参数(例如,训练时间)并进行随机梯度下降。 您还将更改特定的训练超参数,即学习率计划。 这些将提高模型的准确性。

  3. Run a prediction using the model: Evaluate the neural network on your validation data to understand its accuracy. Then, export the model to a format called ONNX for faster inference speeds.

    使用模型进行预测:在验证数据上评估神经网络以了解其准确性。 然后,将模型导出为一种称为ONNX的格式,以加快推理速度。

In this section of the tutorial, you will accomplish step 1 of 3. You will download the data, create a Dataset object to iterate over your data, and finally apply data augmentation. At the end of this step, you will have a programmatic way of accessing images and labels in your dataset to feed to your model.

在本教程的这一部分中,您将完成第1步(共3步)。您将下载数据,创建一个Dataset对象以遍历您的数据,最后应用数据扩充 。 在此步骤的最后,您将以编程方式访问数据集中的图像和标签以馈入模型。

First, download the dataset to your current working directory:

首先,将数据集下载到当前工作目录:

Note: On macOS, wget is not available by default. To do so, install Homebrew by following this DigitalOcean tutorial. Then, run brew install wget.

注意 :在macOS上,默认情况下wget不可用。 为此,请按照此DigitalOcean教程安装Homebrew。 然后,运行brew install wget

  • wget https://assets.digitalocean.com/articles/signlanguage_data/sign-language-mnist.tar.gz

    wget https://assets.digitalocean.com/articles/signlanguage_data/sign-language-mnist.tar.gz

Unzip the zip file, which contains a data/ directory:

解压缩包含data/目录的压缩文件:

  • tar -xzf sign-language-mnist.tar.gz

    tar -xzf sign-language-mnist.tar.gz

Create a new file, named step_2_dataset.py:

创建一个名为step_2_dataset.py的新文件:

  • nano step_2_dataset.py

    纳米step_2_dataset.py

As before, import the necessary utilities and create the class that will hold your data. For data processing here, you will create the train and test datasets. You’ll implement PyTorch’s Dataset interface, allowing you to load and use PyTorch’s built-in data pipeline for your sign language classification dataset:

和以前一样,导入必要的实用程序并创建将保存您的数据的类。 对于此处的数据处理,您将创建训练和测试数据集。 您将实现PyTorch的Dataset接口,从而允许您为手语分类数据集加载和使用PyTorch的内置数据管道:

step_2_dataset.py
step_2_dataset.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import numpy as np
import torch

import csv


class SignLanguageMNIST(Dataset):
    """Sign Language classification dataset.

    Utility for loading Sign Language dataset into PyTorch. Dataset posted on
    Kaggle in 2017, by an unnamed author with username `tecperson`:
    https://www.kaggle.com/datamunge/sign-language-mnist

    Each sample is 1 x 1 x 28 x 28, and each label is a scalar.
    """
    pass

Delete the pass placeholder in the SignLanguageMNIST class. In its place, add a method to generate a label mapping:

删除SignLanguageMNIST类中的pass占位符。 在其位置上,添加一种方法来生成标签映射:

step_2_dataset.py
step_2_dataset.py
@staticmethod
    def get_label_mapping():
        """
        We map all labels to [0, 23]. This mapping from dataset labels [0, 23]
        to letter indices [0, 25] is returned below.
        """
        mapping = list(range(25))
        mapping.pop(9)
        return mapping

Labels range from 0 to 25. However, letters J (9) and Z (25) are excluded. This means there are only 24 valid label values. So that the set of all label values starting from 0 is contiguous, we map all labels to [0, 23]. This mapping from dataset labels [0, 23] to letter indices [0, 25] is provided by this get_label_mapping method.

标签范围从0到25。但是,字母J(9)和Z(25)被排除。 这意味着只有24个有效标签值。 为了使所有从0开始的标签值都是连续的,我们将所有标签映射到[0,23]。 此get_label_mapping方法提供了从数据集标签[ get_label_mapping ]到字母索引[ get_label_mapping ]的get_label_mapping

Next, add a method to extract labels and samples from a CSV file. The following assumes that each line starts with the label and is then followed by 784 pixel values. These 784 pixel values represent a 28x28 image:

接下来,添加一种从CSV文件提取标签和样本的方法。 以下假设每行以label开头,然后是784个像素值。 这784个像素值代表28x28图像:

step_2_dataset.py
step_2_dataset.py
@staticmethod
    def read_label_samples_from_csv(path: str):
        """
        Assumes first column in CSV is the label and subsequent 28^2 values
        are image pixel values 0-255.
        """
        mapping = SignLanguageMNIST.get_label_mapping()
        labels, samples = [], []
        with open(path) as f:
            _ = next(f)  # skip header
            for line in csv.reader(f):
                label = int(line[0])
                labels.append(mapping.index(label))
                samples.append(list(map(int, line[1:])))
        return labels, samples

For an explanation of how these 784 values represent an image, see Build an Emotion-Based Dog Filter, Step 4.

有关这784个值如何表示图像的说明,请参阅第4步 , 构建基于情感的狗过滤器 。

Note that each line in the csv.reader iterable is a list of strings; the int and map(int, ...) invocations cast all strings to integers. Directly beneath our static method, add a function that will initialize our data holder:

注意, csv.reader可迭代的每一行都是一个字符串列表。 intmap(int, ...)调用将所有字符串转换为整数。 在我们的静态方法的正下方,添加一个函数来初始化我们的数据持有人:

step_2_dataset.py
step_2_dataset.py
def __init__(self,
            path: str="data/sign_mnist_train.csv",
            mean: List[float]=[0.485],
            std: List[float]=[0.229]):
        """
        Args:
            path: Path to `.csv` file containing `label`, `pixel0`, `pixel1`...
        """
        labels, samples = SignLanguageMNIST.read_label_samples_from_csv(path)
        self._samples = np.array(samples, dtype=np.uint8).reshape((-1, 28, 28, 1))
        self._labels = np.array(labels, dtype=np.uint8).reshape((-1, 1))

        self._mean = mean
        self._std = std

This function starts by loading the samples and labels. Then it wraps the data in NumPy arrays. The mean and standard deviation information will be explained shortly, in the __getitem__ section following.

此功能首先加载样品和标签。 然后,将数据包装在NumPy数组中。 平均值和标准差信息将在下面的__getitem__部分中__getitem__说明。

Directly after the __init__ function, add a __len__ function. The Dataset requires this method to determine when to stop iterating over data:

__init__函数之后,直接添加__len__函数。 Dataset需要此方法来确定何时停止遍历数据:

step_2_dataset.py
step_2_dataset.py
...
    def __len__(self):
        return len(self._labels)

Finally, add a __getitem__ method, which returns a dictionary containing the sample and the label:

最后,添加__getitem__方法,该方法返回包含样本和标签的字典:

step_2_dataset.py
step_2_dataset.py
def __getitem__(self, idx):
        transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.RandomResizedCrop(28, scale=(0.8, 1.2)),
            transforms.ToTensor(),
            transforms.Normalize(mean=self._mean, std=self._std)])

        return {
            'image': transform(self._samples[idx]).float(),
            'label': torch.from_numpy(self._labels[idx]).float()
        }

You use a technique called data augmentation, where samples are perturbed during training, to increase the model’s robustness to these perturbations. In particular, randomly zoom in on the image by varying amounts and on different locations, via RandomResizedCrop. Note that zooming in should not affect the final sign language class; thus, the label is not transformed. You additionally normalize the inputs so that image values are rescaled to the [0, 1] range in expectation, instead of [0, 255]; to accomplish this, use the dataset _mean and _std when normalizing.

您使用一种称为数据增强的技术,该技术在训练过程中会干扰样本,以提高模型对这些干扰的鲁棒性。 特别是,可以通过RandomResizedCrop在不同位置和不同位置随机放大图像。 请注意,放大不应影响最终的手语类。 因此,标签不会变形。 您还对输入进行了归一化,以便图像值按预期重新缩放到[0,1]范围,而不是[0,255]。 为此, _std在标准化时使用数据集_mean_std

Your completed SignLanguageMNIST class will look like the following:

您完成的SignLanguageMNIST类将如下所示:

step_2_dataset.py
step_2_dataset.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torchvision.transforms as transforms
import torch.nn as nn
import numpy as np
import torch

from typing import List

import csv


class SignLanguageMNIST(Dataset):
    """Sign Language classification dataset.

    Utility for loading Sign Language dataset into PyTorch. Dataset posted on
    Kaggle in 2017, by an unnamed author with username `tecperson`:
    https://www.kaggle.com/datamunge/sign-language-mnist

    Each sample is 1 x 1 x 28 x 28, and each label is a scalar.
    """

    @staticmethod
    def get_label_mapping():
        """
        We map all labels to [0, 23]. This mapping from dataset labels [0, 23]
        to letter indices [0, 25] is returned below.
        """
        mapping = list(range(25))
        mapping.pop(9)
        return mapping

    @staticmethod
    def read_label_samples_from_csv(path: str):
        """
        Assumes first column in CSV is the label and subsequent 28^2 values
        are image pixel values 0-255.
        """
        mapping = SignLanguageMNIST.get_label_mapping()
        labels, samples = [], []
        with open(path) as f:
            _ = next(f)  # skip header
            for line in csv.reader(f):
                label = int(line[0])
                labels.append(mapping.index(label))
                samples.append(list(map(int, line[1:])))
        return labels, samples

    def __init__(self,
            path: str="data/sign_mnist_train.csv",
            mean: List[float]=[0.485],
            std: List[float]=[0.229]):
        """
        Args:
            path: Path to `.csv` file containing `label`, `pixel0`, `pixel1`...
        """
        labels, samples = SignLanguageMNIST.read_label_samples_from_csv(path)
        self._samples = np.array(samples, dtype=np.uint8).reshape((-1, 28, 28, 1))
        self._labels = np.array(labels, dtype=np.uint8).reshape((-1, 1))

        self._mean = mean
        self._std = std

    def __len__(self):
        return len(self._labels)

    def __getitem__(self, idx):
        transform = transforms.Compose([
            transforms.ToPILImage(),
            transforms.RandomResizedCrop(28, scale=(0.8, 1.2)),
            transforms.ToTensor(),
            transforms.Normalize(mean=self._mean, std=self._std)])

        return {
            'image': transform(self._samples[idx]).float(),
            'label': torch.from_numpy(self._labels[idx]).float()
        }

As before, you will now verify our dataset utility functions by loading the SignLanguageMNIST dataset. Add the following code to the end of your file after the SignLanguageMNIST class:

和以前一样,您现在将通过加载SignLanguageMNIST数据集来验证我们的数据集实用程序功能。 在SignLanguageMNIST类之后,将以下代码添加到文件SignLanguageMNIST

step_2_dataset.py
step_2_dataset.py
def get_train_test_loaders(batch_size=32):
    trainset = SignLanguageMNIST('data/sign_mnist_train.csv')
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size, shuffle=True)

    testset = SignLanguageMNIST('data/sign_mnist_test.csv')
    testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size, shuffle=False)
    return trainloader, testloader

This code initializes the dataset using the SignLanguageMNIST class. Then for the train and validation sets, it wraps the dataset in a DataLoader. This will translate the dataset into an iterable to use later.

此代码使用SignLanguageMNIST类初始化数据集。 然后,对于训练集和验证集,它将数据集包装在DataLoader 。 这会将数据集转换为可迭代以供以后使用。

Now you’ll verify that the dataset utilities are functioning. Create a sample dataset loader using DataLoader and print the first element of that loader. Add the following to the end of your file:

现在,您将验证数据集实用程序是否正常运行。 使用DataLoader创建样本数据集加载器,并打印该加载器的第一个元素。 将以下内容添加到文件末尾:

step_2_dataset.py
step_2_dataset.py
if __name__ == '__main__':
    loader, _ = get_train_test_loaders(2)
    print(next(iter(loader)))

You can check that your file matches the step_2_dataset file in this (repository). Exit your editor and run the script with the following:

您可以检查文件是否与此( 存储库 )中的step_2_dataset文件匹配。 退出编辑器并使用以下命令运行脚本:

  • python step_2_dataset.py

    python step_2_dataset.py

This outputs the following pair of tensors. Our data pipeline outputs two samples and two labels. This indicates that our data pipeline is up and ready to go:

这将输出以下张量。 我们的数据管道输出两个样本和两个标签。 这表明我们的数据管道已启动并准备就绪:


   
     
     
     
     
Output
{'image': tensor([[[[ 0.4337, 0.5022, 0.5707, ..., 0.9988, 0.9646, 0.9646], [ 0.4851, 0.5536, 0.6049, ..., 1.0502, 1.0159, 0.9988], [ 0.5364, 0.6049, 0.6392, ..., 1.0844, 1.0844, 1.0673], ..., [-0.5253, -0.4739, -0.4054, ..., 0.9474, 1.2557, 1.2385], [-0.3369, -0.3369, -0.3369, ..., 0.0569, 1.3584, 1.3242], [-0.3712, -0.3369, -0.3198, ..., 0.5364, 0.5364, 1.4783]]], [[[ 0.2111, 0.2796, 0.3481, ..., 0.2453, -0.1314, -0.2342], [ 0.2624, 0.3309, 0.3652, ..., -0.3883, -0.0629, -0.4568], [ 0.3309, 0.3823, 0.4337, ..., -0.4054, -0.0458, -1.0048], ..., [ 1.3242, 1.3584, 1.3927, ..., -0.4054, -0.4568, 0.0227], [ 1.3242, 1.3927, 1.4612, ..., -0.1657, -0.6281, -0.0287], [ 1.3242, 1.3927, 1.4440, ..., -0.4397, -0.6452, -0.2856]]]]), 'label': tensor([[24.], [11.]])}

You’ve now verified that your data pipeline works. This concludes the first step—preprocessing your data—which now includes data augmentation for increased model robustness. Next you will define the neural network and optimizer.

现在,您已验证数据管道可以正常工作。 到此结束第一步-预处理数据-现在包括数据增强以提高模型的鲁棒性。 接下来,您将定义神经网络和优化器。

第3步-使用深度学习构建和训练手语分类器 (Step 3 — Building and Training the Sign Language Classifier Using Deep Learning)

With a functioning data pipeline, you will now define a model and train it on the data. In particular, you will build a neural network with six layers, define a loss, an optimizer, and finally, optimize the loss function for your neural network predictions. At the end of this step, you will have a working sign language classifier.

借助正常运行的数据管道,您现在将定义一个模型并对其进行训练。 特别是,您将构建一个具有六层的神经网络,定义一个损失,使用一个优化器,最后为神经网络预测优化损失函数。 在此步骤结束时,您将拥有一个有效的手语分类器。

Create a new file called step_3_train.py:

创建一个名为step_3_train.py的新文件:

  • nano step_3_train.py

    纳米step_3_train.py

Import the necessary utilities:

导入必要的实用程序:

step_3_train.py
step_3_train.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch

from step_2_dataset import get_train_test_loaders

Define a PyTorch neural network that includes three convolutional layers, followed by three fully connected layers. Add this to the end of your existing script:

定义一个PyTorch神经网络,其中包括三个卷积层,然后是三个完全连接的层。 将此添加到现有脚本的末尾:

step_3_train.py
step_3_train.py
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 24)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

Now initialize the neural network, define a loss function, and define optimization hyperparameters by adding the following code to the end of the script:

现在,通过在脚本末尾添加以下代码,初始化神经网络,定义损失函数并定义优化超参数:

step_3_train.py
step_3_train.py
def main():
    net = Net().float()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

Finally, you’ll train for two epochs:

最后,您将训练两个时期

step_3_train.py
step_3_train.py
def main():
    net = Net().float()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

    trainloader, _ = get_train_test_loaders()
    for epoch in range(2):  # loop over the dataset multiple times
        train(net, criterion, optimizer, trainloader, epoch)
    torch.save(net.state_dict(), "checkpoint.pth")

You define an epoch to be an iteration of training where every training sample has been used exactly once. At the end of the main function, the model parameters will be saved to a file called "checkpoint.pth".

您将一个时期定义为训练的迭代,其中每个训练样本都被精确地使用过一次。 在主要功能的最后,模型参数将保存到名为"checkpoint.pth"的文件中。

Add the following code to the end of your script to extract image and label from the dataset loader and then wrap each in a PyTorch Variable:

将以下代码添加到脚本的末尾,以从数据集加载器中提取imagelabel ,然后将它们包装在PyTorch Variable

step_3_train.py
step_3_train.py
def train(net, criterion, optimizer, trainloader, epoch):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs = Variable(data['image'].float())
        labels = Variable(data['label'].long())
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels[:, 0])
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.6f' % (epoch, i, running_loss / (i + 1)))

This code will also run the forward pass and then backpropagate through the loss and neural network.

此代码还将运行前向传递,然后通过损耗和神经网络反向传播。

At the end of your file, add the following to invoke the main function:

在文件末尾,添加以下内容以调用main功能:

step_3_train.py
step_3_train.py
if __name__ == '__main__':
    main()

Double-check that your file matches the following:

仔细检查您的文件是否符合以下条件:

step_3_train.py
step_3_train.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch

from step_2_dataset import get_train_test_loaders


class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 6, 3)
        self.conv3 = nn.Conv2d(6, 16, 3)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 48)
        self.fc3 = nn.Linear(48, 25)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(F.relu(self.conv2(x)))
        x = self.pool(F.relu(self.conv3(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x


def main():
    net = Net().float()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)

    trainloader, _ = get_train_test_loaders()
    for epoch in range(2):  # loop over the dataset multiple times
        train(net, criterion, optimizer, trainloader, epoch)
    torch.save(net.state_dict(), "checkpoint.pth")


def train(net, criterion, optimizer, trainloader, epoch):
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        inputs = Variable(data['image'].float())
        labels = Variable(data['label'].long())
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels[:, 0])
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 100 == 0:
            print('[%d, %5d] loss: %.6f' % (epoch, i, running_loss / (i + 1)))


if __name__ == '__main__':
    main()

Save and exit. Then, launch our proof-of-concept training by running:

保存并退出。 然后,通过运行以下命令来启动我们的概念验证培训:

  • python step_3_train.py

    python step_3_train.py

You’ll see output akin to the following as the neural network trains:

当神经网络训练时,您将看到类似于以下的输出:


   
     
     
     
     
Output
[0, 0] loss: 3.208171 [0, 100] loss: 3.211070 [0, 200] loss: 3.192235 [0, 300] loss: 2.943867 [0, 400] loss: 2.569440 [0, 500] loss: 2.243283 [0, 600] loss: 1.986425 [0, 700] loss: 1.768090 [0, 800] loss: 1.587308 [1, 0] loss: 0.254097 [1, 100] loss: 0.208116 [1, 200] loss: 0.196270 [1, 300] loss: 0.183676 [1, 400] loss: 0.169824 [1, 500] loss: 0.157704 [1, 600] loss: 0.151408 [1, 700] loss: 0.136470 [1, 800] loss: 0.123326

To obtain lower loss, you could increase the number of epochs to 5, 10, or even 20. However, after a certain period of training time, the network loss will cease to decrease with increased training time. To sidestep this issue, as training time increases, you will introduce a learning rate schedule, which decreases learning rate over time. To understand why this works, see Distill’s visualization at “Why Momentum Really Works”.

为了获得更低的损失,您可以将时期数增加到5、10甚至20。但是,经过一定时间的训练后,网络损失将随着训练时间的增加而减少。 为了避免这个问题,随着培训时间的增加,您将引入学习率计划,随着时间的推移,学习率会降低。 要了解为什么这样做有效,请参阅“为什么动量真正起作用”中 Distill的可视化。

Amend your main function with the following two lines, defining a scheduler and invoking scheduler.step. Furthermore, change the number of epochs to 12:

用以下两行修改您的main功能,定义一个scheduler并调用scheduler.step 。 此外,将纪元数更改为12

step_3_train.py
step_3_train.py
def main():
    net = Net().float()
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.01, momentum=0.9)
    scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.1)

    trainloader, _ = get_train_test_loaders()
    for epoch in range(12):  # loop over the dataset multiple times
        train(net, criterion, optimizer, trainloader, epoch)
        scheduler.step()
    torch.save(net.state_dict(), "checkpoint.pth")

Check that your file matches the step 3 file in this repository. Training will run for around 5 minutes. Your output will resemble the following:

检查文件是否与此存储库中的步骤3文件匹配。 培训将进行约5分钟。 您的输出将类似于以下内容:


   
     
     
     
     
Output
[0, 0] loss: 3.208171 [0, 100] loss: 3.211070 [0, 200] loss: 3.192235 [0, 300] loss: 2.943867 [0, 400] loss: 2.569440 [0, 500] loss: 2.243283 [0, 600] loss: 1.986425 [0, 700] loss: 1.768090 [0, 800] loss: 1.587308 ... [11, 0] loss: 0.000302 [11, 100] loss: 0.007548 [11, 200] loss: 0.009005 [11, 300] loss: 0.008193 [11, 400] loss: 0.007694 [11, 500] loss: 0.008509 [11, 600] loss: 0.008039 [11, 700] loss: 0.007524 [11, 800] loss: 0.007608

The final loss obtained is 0.007608, which is 3 orders of magnitude smaller than the starting loss 3.20. This concludes the second step of our workflow, where we set up and train the neural network. With that said, as small as this loss value is, it has little meaning. To put the model’s performance in perspective, we will compute its accuracy—the percentage of images the model correctly classified.

最终损失为0.007608 ,比起始损失3.20小3个数量级。 至此,我们工作流程的第二步结束了,我们在这里设置和训练了神经网络。 话虽如此,损失值虽然小,但意义不大。 为了正确看待模型的性能,我们将计算其准确性-模型正确分类的图像百分比。

步骤4 —评估手语分类器 (Step 4 — Evaluating the Sign Language Classifier)

You will now evaluate your sign language classifier by computing its accuracy on the validation set, a set of images the model did not see during training. This will provide a better sense of model performance than the final loss value did. Furthermore, you will add utilities to save our trained model at the end of training and load our pre-trained model when performing inference.

现在,您将通过在验证集上计算模型的准确性来评估手语分类器, 验证集是模型在训练过程中未看到的图像。 与最终损失值相比,这将提供更好的模型性能意识。 此外,您将添加实用程序以在训练结束时保存我们的训练后的模型,并在执行推理时加载我们的训练前的模型。

Create a new file, called step_4_evaluate.py.

创建一个名为step_4_evaluate.py的新文件。

  • nano step_4_evaluate.py

    纳米step_4_evaluate.py

Import the necessary utilities:

导入必要的实用程序:

step_4_evaluate.py
step_4_evaluate.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
import numpy as np

import onnx
import onnxruntime as ort

from step_2_dataset import get_train_test_loaders
from step_3_train import Net

Next, define a utility to evaluate the neural network’s performance. The following function compares the neural network’s predicted letter to the true letter, for a single image:

接下来,定义一个实用程序来评估神经网络的性能。 以下函数将单个图像的神经网络预测字母与真实字母进行比较:

step_4_evaluate.py
step_4_evaluate.py
def evaluate(outputs: Variable, labels: Variable) -> float:
    """Evaluate neural network outputs against non-one-hotted labels."""
    Y = labels.numpy()
    Yhat = np.argmax(outputs, axis=1)
    return float(np.sum(Yhat == Y))

outputs is a list of class probabilities for each sample. For example, outputs for a single sample may be [0.1, 0.3, 0.4, 0.2]. labels is a list of label classes. For example, the label class may be 3.

outputs是每个样本的类别概率的列表。 例如,单个样本的outputs可以是[0.1, 0.3, 0.4, 0.2]labels是标签类别的列表。 例如,标签类别可以是3

Y = ... converts the labels into a NumPy array. Next, Yhat = np.argmax(...) converts the outputs class probabilities into predicted classes. For example, the list of class probabilities [0.1, 0.3, 0.4, 0.2] would yield the predicted class 2, because the index 2 value of 0.4 is the largest value.

Y = ...将标签转换为NumPy数组。 接下来, Yhat = np.argmax(...)outputs类别的概率转换为预测的类别。 例如,类别概率列表[0.1, 0.3, 0.4, 0.2]将产生预测的类别2 ,因为索引2值为0.4是最大值。

Since both Y and Yhat are now classes, you can compare them. Yhat == Y checks if the predicted class matches the label class, and np.sum(...) is a trick that computes the number of truth-y values. In other words, np.sum will output the number of samples that were classified correctly.

由于YYhat现在都是类,因此可以对其进行比较。 Yhat == Y检查所预测的类是否与标签类匹配,并且np.sum(...)是一种计算np.sum(...) -y值数量的技巧。 换句话说, np.sum将输出正确分类的样本数。

Add the second function batch_evaluate, which applies the first function evaluate to all images:

添加第二个函数batch_evaluate ,它将第一个函数evaluate应用于所有图像:

step_4_evaluate.py
step_4_evaluate.py
def batch_evaluate(
        net: Net,
        dataloader: torch.utils.data.DataLoader) -> float:
    """Evaluate neural network in batches, if dataset is too large."""
    score = n = 0.0
    for batch in dataloader:
        n += len(batch['image'])
        outputs = net(batch['image'])
        if isinstance(outputs, torch.Tensor):
            outputs = outputs.detach().numpy()
        score += evaluate(outputs, batch['label'][:, 0])
    return score / n

batch is a group of images stored as a single tensor. First, you increment the total number of images you’re evaluating (n) by the number of images in this batch. Next, you run inference on the neural network with this batch of images, outputs = net(...). The type check if isinstance(...) converts the outputs in a NumPy array if needed. Finally, you use evaluate to compute the number of correctly-classified samples. At the conclusion of the function, you compute the percent of samples you correctly classified, score / n.

batch是一组存储为单个张量的图像。 首先,将要评估的图像总数( n )乘以该批次中的图像数。 接下来,使用这批图像在神经网络上运行推理, outputs = net(...) 。 如果需要,类型检查if isinstance(...)将输出转换为NumPy数组。 最后,使用evaluate来计算正确分类的样本数。 在函数结束时,您将计算正确分类的样本的百分比, score / n

Finally, add the following script to leverage the preceding utilities:

最后,添加以下脚本以利用上述实用程序:

step_4_evaluate.py
step_4_evaluate.py
def validate():
    trainloader, testloader = get_train_test_loaders()
    net = Net().float()

    pretrained_model = torch.load("checkpoint.pth")
    net.load_state_dict(pretrained_model)

    print('=' * 10, 'PyTorch', '=' * 10)
    train_acc = batch_evaluate(net, trainloader) * 100.
    print('Training accuracy: %.1f' % train_acc)
    test_acc = batch_evaluate(net, testloader) * 100.
    print('Validation accuracy: %.1f' % test_acc)


if __name__ == '__main__':
    validate()

This loads a pretrained neural network and evaluates its performance on the provided sign language dataset. Specifically, the script here outputs accuracy on the images you used for training and a separate set of images you put aside for testing purposes, called the validation set.

这将加载预训练的神经网络,并在提供的手语数据集上评估其性能。 具体来说,此处的脚本会输出用于训练的图像的准确性,以及为测试目的而放置的另一组图像,称为验证集

You will next export the PyTorch model to an ONNX binary. This binary file can then be used in production to run inference with your model. Most importantly, the code running this binary does not need a copy of the original network definition. At the end of the validate function, add the following:

接下来,您将PyTorch模型导出到ONNX二进制文件。 然后,可以在生产中使用此二进制文件来对模型进行推断。 最重要的是,运行此二进制文件的代码不需要原始网络定义的副本。 在validate函数的末尾,添加以下内容:

step_4_evaluate.py
step_4_evaluate.py
trainloader, testloader = get_train_test_loaders(1)

    # export to onnx
    fname = "signlanguage.onnx"
    dummy = torch.randn(1, 1, 28, 28)
    torch.onnx.export(net, dummy, fname, input_names=['input'])

    # check exported model
    model = onnx.load(fname)
    onnx.checker.check_model(model)  # check model is well-formed

    # create runnable session with exported model
    ort_session = ort.InferenceSession(fname)
    net = lambda inp: ort_session.run(None, {'input': inp.data.numpy()})[0]

    print('=' * 10, 'ONNX', '=' * 10)
    train_acc = batch_evaluate(net, trainloader) * 100.
    print('Training accuracy: %.1f' % train_acc)
    test_acc = batch_evaluate(net, testloader) * 100.
    print('Validation accuracy: %.1f' % test_acc)

This exports the ONNX model, checks the exported model, and then runs inference with the exported model. Double-check that your file matches the step 4 file in this repository:

这将导出ONNX模型,检查导出的模型,然后对导出的模型进行推断。 仔细检查您的文件是否与该存储库中的步骤4文件匹配:

step_4_evaluate.py
step_4_evaluate.py
from torch.utils.data import Dataset
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch
import numpy as np

import onnx
import onnxruntime as ort

from step_2_dataset import get_train_test_loaders
from step_3_train import Net


def evaluate(outputs: Variable, labels: Variable) -> float:
    """Evaluate neural network outputs against non-one-hotted labels."""
    Y = labels.numpy()
    Yhat = np.argmax(outputs, axis=1)
    return float(np.sum(Yhat == Y))


def batch_evaluate(
        net: Net,
        dataloader: torch.utils.data.DataLoader) -> float:
    """Evaluate neural network in batches, if dataset is too large."""
    score = n = 0.0
    for batch in dataloader:
        n += len(batch['image'])
        outputs = net(batch['image'])
        if isinstance(outputs, torch.Tensor):
            outputs = outputs.detach().numpy()
        score += evaluate(outputs, batch['label'][:, 0])
    return score / n


def validate():
    trainloader, testloader = get_train_test_loaders()
    net = Net().float().eval()

    pretrained_model = torch.load("checkpoint.pth")
    net.load_state_dict(pretrained_model)

    print('=' * 10, 'PyTorch', '=' * 10)
    train_acc = batch_evaluate(net, trainloader) * 100.
    print('Training accuracy: %.1f' % train_acc)
    test_acc = batch_evaluate(net, testloader) * 100.
    print('Validation accuracy: %.1f' % test_acc)

    trainloader, testloader = get_train_test_loaders(1)

    # export to onnx
    fname = "signlanguage.onnx"
    dummy = torch.randn(1, 1, 28, 28)
    torch.onnx.export(net, dummy, fname, input_names=['input'])

    # check exported model
    model = onnx.load(fname)
    onnx.checker.check_model(model)  # check model is well-formed

    # create runnable session with exported model
    ort_session = ort.InferenceSession(fname)
    net = lambda inp: ort_session.run(None, {'input': inp.data.numpy()})[0]

    print('=' * 10, 'ONNX', '=' * 10)
    train_acc = batch_evaluate(net, trainloader) * 100.
    print('Training accuracy: %.1f' % train_acc)
    test_acc = batch_evaluate(net, testloader) * 100.
    print('Validation accuracy: %.1f' % test_acc)


if __name__ == '__main__':
    validate()

To use and evaluate the checkpoint from the last step, run the following:

要从最后一步使用和评估检查点,请运行以下命令:

  • python step_4_evaluate.py

    python step_4_evaluate.py

This will yield output similar to the following, affirming that your exported model not only works, but also agrees with your original PyTorch model:

这将产生类似于以下内容的输出,确认您导出的模型不仅有效,而且与原始PyTorch模型一致:


   
     
     
     
     
Output
========== PyTorch ========== Training accuracy: 99.9 Validation accuracy: 97.4 ========== ONNX ========== Training accuracy: 99.9 Validation accuracy: 97.4

Your neural network attains a train accuracy of 99.9% and a 97.4% validation accuracy. This gap between train and validation accuracy indicates your model is overfitting. This means that instead of learning generalizable patterns, your model has memorized the training data. To understand the implications and causes of overfitting, see Understanding Bias-Variance Tradeoffs.

您的神经网络的训练精度为99.9%,验证精度为97.4%。 训练和验证准确性之间的差距表明您的模型过度拟合 。 这意味着您的模型没有记住通用的模式,而是存储了训练数据。 要了解过拟合的含义和原因,请参阅了解偏差-权衡折衷 。

At this point, we have completed a sign language classifier. In essence, our model can correctly disambiguate between signs correctly almost all the time. This is a reasonably good model, so we move on to the final stage of our application. We will use this sign language classifier in a real-time webcam application.

至此,我们已经完成了手语分类器。 本质上,我们的模型几乎可以始终正确地正确消除符号之间的歧义。 这是一个相当不错的模型,因此我们进入应用程序的最后阶段。 我们将在实时网络摄像头应用程序中使用此手语分类器。

步骤5 —链接相机Feed (Step 5 — Linking the Camera Feed)

Your next objective is to link the computer’s camera to your sign language classifier. You will collect camera input, classify the displayed sign language, and then report the classified sign back to the user.

下一个目标是将计算机的摄像头链接到手语分类器。 您将收集摄像机输入,对显示的手势语进行分类,然后将分类的手势报告给用户。

Now create a Python script for the face detector. Create the file step_6_camera.py using nano or your favorite text editor:

现在为面部检测器创建一个Python脚本。 使用nano或您喜欢的文本编辑器创建文件step_6_camera.py

  • nano step_5_camera.py

    纳米step_5_camera.py

Add the following code into the file:

将以下代码添加到文件中:

step_5_camera.py
step_5_camera.py
"""Test for sign language classification"""
import cv2
import numpy as np
import onnxruntime as ort

def main():
    pass

if __name__ == '__main__':
    main()

This code imports OpenCV, which contains your image utilities, and the ONNX runtime, which is all you need to run inference with your model. The rest of the code is typical Python program boilerplate.

这段代码导入了OpenCV(包含图像实用程序)和ONNX运行时,即运行模型所需的全部操作。 其余代码是典型的Python程序样板。

Now replace pass in the main function with the following code, which initializes a sign language classifier using the parameters you trained previously. Additionally add a mapping from indices to letters and image statistics:

现在,将main函数中的pass替换为以下代码,该代码使用您先前训练的参数来初始化手语分类器。 另外,添加从索引到字母和图像统计信息的映射:

step_5_camera.py
step_5_camera.py
def main():
    # constants
    index_to_letter = list('ABCDEFGHIKLMNOPQRSTUVWXY')
    mean = 0.485 * 255.
    std = 0.229 * 255.

    # create runnable session with exported model
    ort_session = ort.InferenceSession("signlanguage.onnx")

You will use elements of this test script from the official OpenCV documentation. Specifically, you will update the body of the main function. Start by initializing a VideoCapture object that is set to capture live feed from your computer’s camera. Place this at the end of the main function:

您将使用官方OpenCV文档中的此测试脚本的元素。 具体来说,您将更新main函数的main 。 首先初始化一个VideoCapture对象,该对象设置为从计算机的摄像机捕获实时供稿。 将其放在main函数的末尾:

step_5_camera.py
step_5_camera.py
def main():
    ...
    # create runnable session with exported model
    ort_session = ort.InferenceSession("signlanguage.onnx")

    cap = cv2.VideoCapture(0)

Then add a while loop, which reads from the camera at every timestep:

然后添加一个while循环,该循环在每个时间步都从相机读取:

step_5_camera.py
step_5_camera.py
def main():
    ...
    cap = cv2.VideoCapture(0)
    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()

Write a utility function that takes the center crop for the camera frame. Place this function before main:

编写一个实用程序功能,使相机框的中心裁切。 将此函数放在main之前:

step_5_camera.py
step_5_camera.py
def center_crop(frame):
    h, w, _ = frame.shape
    start = abs(h - w) // 2
    if h > w:
        frame = frame[start: start + w]
    else:
        frame = frame[:, start: start + h]
    return frame

Next, take the center crop for the camera frame, convert to grayscale, normalize, and resize to 28x28. Place this inside the while loop within the main function:

接下来,对相机框进行中心裁剪,转换为灰度,规格化并调整为28x28 。 将其放置在main函数的while循环中:

step_5_camera.py
step_5_camera.py
def main():
    ...
    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()

        # preprocess data
        frame = center_crop(frame)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
        x = cv2.resize(frame, (28, 28))
        x = (frame - mean) / std

Still within the while loop, run inference with the ONNX runtime. Convert the outputs to a class index, then to a letter:

仍在while循环中,使用ONNX运行时运行推理。 将输出转换为类索引,然后转换为字母:

step_5_camera.py
step_5_camera.py
...
        x = (frame - mean) / std

        x = x.reshape(1, 1, 28, 28).astype(np.float32)
        y = ort_session.run(None, {'input': x})[0]

        index = np.argmax(y, axis=1)
        letter = index_to_letter[int(index)]

Display the predicted letter inside the frame, and display the frame back to the user:

在框架内显示预测字母,然后将框架显示给用户:

step_5_camera.py
step_5_camera.py
...
        letter = index_to_letter[int(index)]

        cv2.putText(frame, letter, (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 255, 0), thickness=2)
        cv2.imshow("Sign Language Translator", frame)

At the end of the while loop, add this code to check if the user hits the q character and, if so, quit the application. This line halts the program for 1 millisecond. Add the following:

while循环的末尾,添加此代码以检查用户是否按了q字符,如果是,请退出应用程序。 此行将程序暂停1毫秒。 添加以下内容:

step_5_camera.py
step_5_camera.py
...
        cv2.imshow("Sign Language Translator", frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

Finally, release the capture and close all windows. Place this outside of the while loop to end the main function.

最后,释放捕获并关闭所有窗口。 将其放置在while循环之外以结束main功能。

step_5_camera.py
step_5_camera.py
...

    while True:
        ...
        if cv2.waitKey(1) & 0xFF == ord('q'):
            break


    cap.release()
    cv2.destroyAllWindows()

Double-check your file matches the following or this repository:

仔细检查您的文件是否与以下或此存储库匹配:

step_5_camera.py
step_5_camera.py
import cv2
import numpy as np
import onnxruntime as ort


def center_crop(frame):
    h, w, _ = frame.shape
    start = abs(h - w) // 2
    if h > w:
        return frame[start: start + w]
    return frame[:, start: start + h]


def main():
    # constants
    index_to_letter = list('ABCDEFGHIKLMNOPQRSTUVWXY')
    mean = 0.485 * 255.
    std = 0.229 * 255.

    # create runnable session with exported model
    ort_session = ort.InferenceSession("signlanguage.onnx")

    cap = cv2.VideoCapture(0)
    while True:
        # Capture frame-by-frame
        ret, frame = cap.read()

        # preprocess data
        frame = center_crop(frame)
        frame = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
        x = cv2.resize(frame, (28, 28))
        x = (x - mean) / std

        x = x.reshape(1, 1, 28, 28).astype(np.float32)
        y = ort_session.run(None, {'input': x})[0]

        index = np.argmax(y, axis=1)
        letter = index_to_letter[int(index)]

        cv2.putText(frame, letter, (100, 100), cv2.FONT_HERSHEY_SIMPLEX, 2.0, (0, 255, 0), thickness=2)
        cv2.imshow("Sign Language Translator", frame)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break

    cap.release()
    cv2.destroyAllWindows()

if __name__ == '__main__':
    main()

Exit your file and run the script.

退出文件并运行脚本。

  • python step_5_camera.py

    python step_5_camera.py

Once the script is run, a window will pop up with your live webcam feed. The predicted sign language letter will be shown in the top left. Hold up your hand and make your favorite sign to see your classifier in action. Here are some sample results showing the letter L and D.

运行脚本后,将弹出一个带有实时网络摄像头供稿的窗口。 预计的手语字母将显示在左上方。 举起您的手,做自己喜欢的手势,看看分类器的作用。 这是一些显示字母LD的示例结果。

While testing, note that the background needs to be fairly clear for this translator to work. This is an unfortunate consequence of the dataset’s cleanliness. Had the dataset included images of hand signs with miscellaneous backgrounds, the network would be robust to noisy backgrounds. However, the dataset features blank backgrounds and nicely centered hands. As a result, this webcam translator works best when your hand is likewise centered and placed against a blank background.

在测试时,请注意,要使此翻译器正常工作,必须清楚其背景。 这是数据集整洁度的不幸结果。 如果数据集包含具有其他背景的手势图像,则该网络对于嘈杂的背景将是可靠的。 但是,该数据集具有空白的背景和很好地居中的手。 因此,当您的手同样居中并放在空白背景下时,此网络摄像头翻译器效果最佳。

This concludes the sign language translator application.

到此结束手语翻译器的应用。

结论 (Conclusion)

In this tutorial, you built an American Sign Language translator using computer vision and a machine learning model. In particular, you saw new aspects of training a machine learning model—specifically, data augmentation for model robustness, learning rate schedules for lower loss, and exporting AI models using ONNX for production use. This then culminated in a real-time computer vision application, which translates sign language into letters using a pipeline you built. It’s worth noting that combatting the brittleness of the final classifier can be tackled with any or all of the following methods. For further exploration try the following topics to in improve your application:

在本教程中,您使用计算机视觉和机器学习模型构建了美国手语翻译器。 特别是,您看到了训练机器学习模型的新方面-具体来说,是增强模型健壮性的数据增强,降低损失的学习率计划以及使用ONNX导出AI模型用于生产用途。 然后,这最终出现在实时计算机视觉应用程序中,该应用程序使用您构建的管道将手语翻译成字母。 值得注意的是,可以使用以下任何一种或所有方法来解决最终分类器的脆性问题。 为了进一步探索,请尝试以下主题来改善您的应用程序:

  • Generalization: This isn’t a sub-topic within computer vision, rather, it’s a constant problem throughout all of machine learning. See Understanding Bias-Variance Tradeoffs.

    通用性:这不是计算机视觉中的子主题,而是在整个机器学习中始终存在的问题。 请参阅了解偏差-方差折衷 。

  • Domain Adaptation: Say your model is trained in domain A (for example, sunny environments). Can you adapt the model to domain B (for example, cloudy environments) quickly?

    领域适应:假设您的模型是在领域A中训练的(例如,晴天)。 您能否快速将模型适应域B(例如,多云的环境)?
  • Adversarial Examples: Say an adversary is designing images intentionally to fool your model. How can you design such images? How can you combat such images?

    对抗性示例:假设对手正在故意设计图像以欺骗您的模型。 您如何设计此类图像? 您如何对抗此类图像?

翻译自: https://www.digitalocean.com/community/tutorials/how-to-build-a-neural-network-to-translate-sign-language-into-english

手语翻译系统

你可能感兴趣的:(神经网络,python,机器学习,人工智能,深度学习)