monk js

计算机视觉 (Computer Vision)

Making computer vision easy with Monk, low code Deep Learning tool and a unified wrapper for computer vision.

通过Monk，低代码深度学习工具和统一的计算机视觉包装器，可以轻松实现计算机视觉。

介绍 (Introduction)

In this tutorial, we will be making an application for the segmentation of hand images from the EgoHands dataset. Such a hand segmentation system can be used for gesture recognition applications. These gesture recognition applications help disabled and elderly people in carrying out daily activities like conveying simple information or controlling machines with the help of a simple gesture. Monk toolkit helps us in creating real-world Computer Vision applications by allowing us to deploy our models using Monk’s low code-syntax. Monk’s one line installations of different deep learning pipelines make our work error-free.

在本教程中，我们将为EgoHands数据集中的手图像分割应用。这样的手分割系统可以用于手势识别应用。这些手势识别应用程序可帮助残疾人和老年人进行日常活动，例如传达简单的信息或借助简单的手势控制机器。 Monk工具包允许我们使用Monk的低代码语法来部署模型，从而帮助我们创建真实的计算机视觉应用程序。 Monk的一线式安装的不同深度学习管道使我们的工作没有错误。

使用Monk创建真实世界的图像分割应用程序 (Create real-world Image Segmentation applications using Monk)

Ultrasound nerve segmentation 超声神经分割

关于数据集 (About Dataset)

The EgoHands dataset contains 48 Google Glass videos of complex, first-person interactions between two people. Download the labeled data.zip folder from the link provided above. This zip folder contains all labeled frames as JPEG files. There are 100 labeled frames for each of the 48 videos for a total of 4800 frames. The folder contains images of different pairs of four participants facing each other, engaged in different activities. The activities included: — playing cards — playing chess — solving a 24- or 48-piece jigsaw puzzle — playing Jenga

EgoHands数据集包含48个有关两个人之间复杂的第一人称互动的Google Glass视频。从上面提供的链接下载带有标签的data.zip文件夹。该zip文件夹包含所有带有标签的帧为JPEG文件。 48个视频中的每个视频都有100个带标签的帧，总计4800个帧。该文件夹包含不同对的图像，每对图像的四个参与者彼此面对，从事不同的活动。活动包括：—玩纸牌—下棋—解决24或48件拼图游戏—玩积木

表中的内容 (Table of Content)

1.安装说明 (1. Installation Instructions)

2.使用已经训练好的模型 (2. Use an already trained model)

3.训练自定义分段器 (3. Train a custom segmenter)

— Steps to create the dataset — Splitting the dataset into train and validation dataset — Training

—创建数据集的步骤—将数据集分为训练和验证数据集—培训

4.推论模型 (4. Inference model)

安装说明 (Installation Instructions)

Before getting to the segmentation part, we will set up the Monk AI toolkit and its dependencies on the platform we are working on, I am using Google Colab as my environment.

在进入细分部分之前，我们将在我们正在使用的平台上设置Monk AI工具包及其依赖项，我使用Google Colab作为我的环境。

! git clone https://github.com/Tessellate-Imaging/Monk_Object_Detection.git# For colab use the command below! cd Monk_Object_Detection/9_segmentation_models/installation && cat requirements_colab.txt | xargs -n 1 -L 1 pip install# For Local systems and cloud select the right CUDA version#! cd Monk_Object_Detection/9_segmentation_models/installation && cat requirements_cuda10.0.txt | xargs -n 1 -L 1 pip install

使用已经训练好的模型。 (Use an already trained model.)

Monk allows us to use pre-trained models to demonstrate our applications. We can use a pre-trained model and infer the segmentation of some of the EgoHands images.

Monk允许我们使用预先训练的模型来演示我们的应用程序。我们可以使用预先训练的模型并推断一些EgoHands图像的分割。

import osimport syssys.path.append("Monk_Object_Detection/9_segmentation_models/lib/");from infer_segmentation import Infergtf = Infer();

In each image, we have to differentiate between hands and other background details. Therefore the dictionary class_dict will have two key-value pairs.

在每个图像中，我们必须区分手和其他背景细节。因此，字典class_dict将具有两个键值对。

classes_dict = {'background': 0,'hand': 1};

The model will be trained for only the hand class.

该模型将仅针对手形课程进行训练。

classes_to_train = ['hand'];

The next step will be to specify the data parameters.

下一步将是指定数据参数。

gtf.Data_Params(classes_dict, classes_to_train, image_shape=[716,1024])

Now we will download the trained model.

现在，我们将下载经过训练的模型。

! wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1c97ms04BVQKS3KH6TZ87KSEJkXCl-Qma' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1c97ms04BVQKS3KH6TZ87KSEJkXCl-Qma" -O seg_hand_trained.zip && rm -rf /tmp/cookies.txt

The seg_hand_trained.zip folder will have the trained model file and a test image folder which will have the images to be tested, the next step is to unzip this folder.

seg_hand_trained.zip文件夹将包含训练有素的模型文件和一个测试图像文件夹，其中将包含要测试的图像，下一步是解压缩此文件夹。

! unzip -qq seg_hand_trained.zip

In the next step, we specify the model parameters.

在下一步中，我们指定模型参数。

gtf.Model_Params(model="Unet", backbone="efficientnetb3", path_to_model='seg_hand_trained/best_model.h5')

Now, we will set up the model and predict the segmentation.

现在，我们将建立模型并预测细分。

gtf.Setup();

Predicted Segmentation 预测细分

训练自定义细分 (Train a custom segmenter)

Now, we will create a custom segmenter and for that, the first and most important step is dataset creation. The dataset downloaded from the link given above will have EgoHands images and the MATLAB code files. Using these files, we will generate the masked image for each EgoHands image.

现在，我们将创建一个自定义分段器，为此，第一步也是最重要的一步是创建数据集。从上面给出的链接下载的数据集将包含EgoHands图像和MATLAB代码文件。使用这些文件，我们将为每个EgoHands图像生成蒙版图像。

创建数据集的步骤 (Steps to create the dataset)

— We will obtain the EgoHand images from the link given above. — The folder downloaded from the above link will also have Matlab code files. — The masked images will be obtained by running the Matlab code files as instructed in the README file present in the downloaded folder. — All the EgoHand images which are initially saved in different category folders will be saved in a separate folder named Hand_Img. — The masked images obtained after running the Matlab codes will be stored in a separate folder named Hand_Annot. — We should ensure that list of images and masked labels are in the same order else label mapping will not be correct. — After obtaining separate folders Hand_Img and Hand_Annot we can upload them onto our notebook and use it accordingly.

—我们将从上面给出的链接中获取EgoHand图像。 —从上面的链接下载的文件夹也将包含Matlab代码文件。 —将按照下载的文件夹中存在的README文件中的说明运行Matlab代码文件来获取蒙版图像。 —最初保存在不同类别文件夹中的所有EgoHand图像将保存在名为Hand_Img的单独文件夹中。 —运行Matlab代码后获得的蒙版图像将存储在名为Hand_Annot的单独文件夹中。 —我们应确保图像列表和蒙版标签的顺序相同，否则标签映射将不正确。 —获得单独的文件夹Hand_Img和Hand_Annot之后，我们可以将它们上载到笔记本中并相应地使用它。

The EgoHands Image and its corresponding masked image EgoHands图像及其相应的蒙版图像

将数据集拆分为测试和验证数据 (Splitting Dataset into test and validation data)

We will now split the image dataset into training and validation data. The first step is to create a sorted list of both original images and the masked images.

现在，我们将图像数据集分为训练和验证数据。第一步是创建原始图像和蒙版图像的排序列表。

#The path given below will be the path of the two folders Hand_Annot and Hand_Img which are obtained after following steps to create the dataset.import osimg_list = sorted(os.listdir("vision.soic.indiana.edu/projects/egohands/Hand_Img"));mask_list = sorted(os.listdir("vision.soic.indiana.edu/projects/egohands/Hand_Annot"));

The next step is to create a separate folder for train and validation and inside them two subfolders for EgoHand images and corresponding masked images.

下一步是创建一个单独的文件夹用于训练和验证，并在其中包含两个子文件夹，分别用于EgoHand图像和相应的蒙版图像。

import os
os.mkdir("/content/train");
os.mkdir("/content/train/img31");
os.mkdir("/content/train/mask31");os.mkdir("/content/val");
os.mkdir("/content/img31");
os.mkdir("/content/val/mask31");

Now, splitting the EgoHands image data into training and validation data.

现在，将EgoHands图像数据分为训练和验证数据。

import cv2
import numpy as np
from tqdm.notebook import tqdm
for i in tqdm(range(len(img_list))):img_path = "vision.soic.indiana.edu/projects/egohands/Hand_Img/"+img_list[i];img = cv2.imread(img_path, 1);cv2.imwrite("/content/train/img31/img"+str(i+1)+".png" ,img);for i in tqdm(range(100)):img_path = "vision.soic.indiana.edu/projects/egohands/Hand_Img/"+img_list[i];img = cv2.imread(img_path, 1);cv2.imwrite("/content/val/img31/img"+str(i+1)+".png", img);

Now, we will save the masked image in the masked folder of the training and validation folder. The masked images are modified before splitting into training and validation. In each masked image, the pixel values greater than 0 are changed to 1, the pixel value 1 represents the hand, whereas, 0 represents the background.

现在，我们将遮罩的图像保存在训练和验证文件夹的遮罩文件夹中。在分为训练和验证之前，对蒙版图像进行修改。在每个蒙版图像中，大于0的像素值更改为1，像素值1表示手，而0表示背景。

import cv2
import numpy as np
from tqdm.notebook import tqdmfor i in tqdm(range(len(mask_list))):  img_path =   "vision.soic.indiana.edu/projects/egohands/Hand_Annot/"+mask_list[i];  img = cv2.imread(img_path,0)  img[img > 0 ] = 1;  cv2.imwrite("/content/train/mask31/img" + str(i+1)+".png", img);for i in tqdm(range(100)):  img_path = "vision.soic.indiana.edu/projects/egohands/Hand_Annot/" + mask_list[i];  img = cv2.imread(img_path,0)  img[img>0] = 1;  cv2.imwrite("/content/val/mask31/img" +str(i+1)+".png", img);

Final Dataset Directory Structure

最终数据集目录结构

root_dir
      |
      | 
      |         
      |----train
      |       |----img31
      |              |
      |              |---------img1.jpg
      |              |---------img2.jpg
      |                   |---------.........(and so on) 
      |
      |----train
      |       |----mask31
      |              |
      |              |---------img1.jpg
      |              |---------img2.jpg
      |                   |---------..........(and so on)
      |
      |----val (optional)
      |       |----img31
      |              |
      |              |---------img1.jpg
      |              |---------img2.jpg
      |                   |---------..........(and so on)
      |
      |----val
      |       |----mask31
      |              |
      |              |---------img1.jpg
      |              |---------img2.jpg
      |                   |---------..........(and so on)

训练 (Training)

This is the final step in creating a custom segmenter.

这是创建自定义分段器的最后一步。

import os
import syssys.path.append("Monk_Object_Detection/9_segmentation_models/lib/");from train_segmentation import Segmentergtf = Segmenter();

Saving the path to EgoHands and masked image directories.

保存EgoHands和蒙版图像目录的路径。

img_dir = "/content/train/img31";mask_dir = "/content/train/mask31";

Two classes ‘hand’ and ‘background’ are represented by pixel values 1 and 0 respectively, and we will train the segmentation model for ‘hand’ class only.

两个类“手”和“背景”分别由像素值1和0表示，我们将仅针对“手”类训练分割模型。

classes_dict = {'background': 0,'hand': 1};classes_to_train = [ 'hand'];

Specifying the dataset parameters for training and validation.

指定数据集参数以进行训练和验证。

gtf.Train_Dataset(img_dir, mask_dir, classes_dict, classes_to_train)img_dir = "/content/drive/My Drive/val/img31";
mask_dir = "/content/drive/My Drive/val/mask31";gtf.Val_Dataset(img_dir, mask_dir)

Getting the list of backbones available for the segmentation network and choosing appropriate data parameters.

获取可用于分段网络的骨干列表并选择适当的数据参数。

gtf.List_Backbones();gtf.Data_Params(batch_size=10, backbone="efficientnetb3",image_shape = [720,1280])

Getting the list of available models and choosing appropriate model parameters.

获取可用模型的列表并选择适当的模型参数。

gtf.List_Models();gtf.Model_Params(model="Unet")

Set up the segmentation model.

设置细分模型。

gtf.Setup();

Start the training.

开始训练。

gtf.Train(num_epochs=5);

Visualize the training result.

可视化训练结果。

gtf.Visualize_Training_History();

Iou_score curve and Loss curve for the trained model 训练模型的Iou_score曲线和Loss曲线

推论模型 (Inference model)

It will be similar to the pre-trained model.

它将类似于预先训练的模型。

import os
import sys
sys.path.append("Monk_Object_Detection/9_segmentation_models/lib/");from infer_segmentation import Infer
gtf = Infer();classes_dict = {
'background': 0,
'hand': 1
};
classes_to_train = ['hand'];

Defining the data and model parameters, and giving the path of the best model obtained on training

定义数据和模型参数，并给出在训练中获得的最佳模型的路径

gtf.Data_Params(classes_dict, classes_to_train, image_shape=[716,1024])gtf.Model_Params(model="Unet", backbone="efficientnetb3", path_to_model='best_model.h5')

Now, set up the model.

现在，建立模型。

gtf.Setup();

Infer the images.

推断图像。

from PIL import Image
img = Image.open("/content/train/img31/img100.png")
img = img.resize((1024,1024))
img.save('tmp_img.png')gtf.Predict("/content/tmp_img.png", vis=True);