使用TensorFlow.js进行对象检测入门

In deep learning, one of the most widely-used technologies is TensorFlow, an end-to-end open-source platform for building models. It has a vast, flexible ecosystem of tools, libraries, and community resources. With the power of TensorFlow, researchers and developers can develop and deploy ML-powered applications.

在深度学习中，使用最广泛的技术之一是TensorFlow，这是一个用于构建模型的端到端开源平台。它具有广泛，灵活的工具，库和社区资源生态系统。借助TensorFlow的力量，研究人员和开发人员可以开发和部署ML驱动的应用程序。

In this tutorial, we’re going to work with TensorFlow.js, TensorFlow’s JavaScript library. We’ll use this library to learn to perform object detection—and specifically, detect instances of people—using our device’s webcam. The idea is fairly simple. We launch the camera in observation mode. Then, when it detects a human, it starts to record the movement of the person until the person is no longer in the view of the camera.

在本教程中，我们将使用TensorFlowJavaScript库TensorFlow.js。我们将使用该库来学习使用设备的网络摄像头执行对象检测，尤其是检测人员实例。这个想法很简单。我们以观察模式启动相机。然后，当它检测到人时，它开始记录人的运动，直到该人不再在相机视野中。

For our object detection model, we are going to use the COCO-SSD, one of TensorFlow’s pre-built models. More on that next.

对于我们的对象检测模型，我们将使用TensorFlow的预构建模型之一COCO-SSD。接下来的更多内容。

什么是COCO-SSD？ (What is COCO-SSD?)

COCO-SSD is an object detection model powered by the TensorFlow object detection API. SSD is an acronym from Single-Shot MultiBox Detection. This model has the ability to detect 90 Class in the COCO Dataset. It makes use of large scale object detection, segmentation, and a captioning dataset in order to detect the target objects.

COCO-SSD是由TensorFlow对象检测API支持的对象检测模型。 SSD是Single-Shot MultiBox Detection的首字母缩写。该模型具有检测COCO数据集中的90类的能力。它利用大规模的对象检测，分段和字幕数据集来检测目标对象。

Now that we have some context for our project, let’s get started!

现在我们有了项目的上下文，让我们开始吧！

导入和配置 (Imports and Configurations)

First, we need to import the libraries required for this tutorial—React and TensorFlow:

首先，我们需要导入本教程所需的库-React和TensorFlow：

import React, { useRef, useEffect, useState } from "react";
import ReactDOM from "react-dom";
import * as cocoSsd from "@tensorflow-models/coco-ssd";
import "@tensorflow/tfjs";

Next, we need to create a functional component with a new variable, as shown in the code snippet below:

接下来，我们需要使用新变量创建一个功能组件，如下面的代码片段所示：

const App = () => {
  const [rec, setRec] = useState([]);
   const video = useRef(null);
  const startBtn = useRef(null);
  const stopBtn = useRef(null);
  const willRec = useRef(false);
  const model = useRef(null);
  const recnow = useRef(false);
  const recorderRef = useRef(null);

Here, the component name is App, which is the main component itself, and the state variable is named records and is initialized using the useState function. There are several other constant variables defined using the useRef function required for manual video configuration.

在这里，组件名称是App ，它是主要组件本身，状态变量被命名为records并使用useState函数初始化。手动视频配置还需要使用useRef函数定义其他几个常量变量。

Now, we need to initialize the plug video source to the COCO-SSD on initial load, as shown in the code snippet below:

现在，我们需要在初始加载时将插头视频源初始化为COCO-SSD，如下面的代码片段所示：

async function prepare() {
    startBtn.current.setAttribute("disabled", true);
    stopBtn.current.setAttribute("disabled", true);
    if (navigator.mediaDevices && navigator.mediaDevices.getUserMedia) {
      try {
        const stream = await navigator.mediaDevices.getUserMedia({
          audio: true,
          video: true
        });
        window.stream = stream;
        video.current.srcObject = stream;
        const model = await cocoSsd.load();
        model.current = model;
        startButtonElement.current.removeAttribute("disabled");
      } catch (error) {
        console.error(error);
      }
    }
  }

Here, we have a prepare function that performs the following operations:

在这里，我们有一个执行以下操作的prepare函数：

First, disables start and stop buttons.
首先，禁用开始和停止按钮。
Second, starts access to webcam.
其次，开始访问网络摄像头。
Third, loads COCO-SSD and assigns it to Model.
第三，加载COCO-SSD并将其分配给Model 。

The future of machine learning is on the edge. Subscribe to the Fritz AI Newsletter to discover the possibilities and benefits of embedding ML models inside mobile apps.

机器学习的未来处于边缘。订阅Fritz AI新闻简报，发现将ML模型嵌入移动应用程序的可能性和好处。

开始和停止记录 (Start and Stop Recording)

In order to start recording, we need to check the availability of the camera device first. Hence, we need to create a new MediaRecorder instance. Then, we need to create a video object and assign it to the state variable:

为了开始记录，我们需要首先检查相机设备的可用性。因此，我们需要创建一个新的MediaRecorder实例。然后，我们需要创建一个视频对象并将其分配给状态变量：

function startRecording() {
    if (rec.current) {
      return;
    }
    recordingRef.current = true;
    console.log("start recording");
    recorderRef.current = new MediaRecorder(window.stream);
    recorderRef.current.ondataavailable = function (e) {
      const title = new Date() + "";
      const href = URL.createObjectURL(e.data);
      setRecords(previousRecords => {
        return [...previousRecords, { href, title }];
      });
    };
    recorderRef.current.start();
  }

In order to stop recording, we need to call a method to stop the MediaRecoder instance that was created when the device started recording. For that, we simply need to call the stop function provided by the instance itself:

为了停止录制，我们需要调用一个方法来停止设备开始录制时创建的MediaRecoder实例。为此，我们只需要调用实例本身提供的stop函数：

  function stopRecording() {
    if (!recordingRef.current) {
      return;
    }
    recordingRef.current = false;
    recorderRef.current.stop();
    console.log("stopped recording");
  }

检测人类 (Detecting a Human)

For the main function that triggers the start and end of the recording, we need to configure the following points:

对于触发记录开始和结束的主要功能，我们需要配置以下几点：

We need to check shouldRecordRef variable that toggles between the start and stop.
我们需要检查在开始和停止之间切换的shouldRecordRef变量。
If recording starts, then we need to plug the video source into the COCO-SSD instance.
如果开始录制，则需要将视频源插入COCO-SSD实例。
If the prediction object returns an array, then it means that the model found a human on camera. Hence, we set foundPerson to true.
如果预测对象返回一个数组，则意味着该模型在相机上找到了一个人。因此，我们将foundPerson设置为true。
Then, we need to use foundPerson in order to decide when we start or stop recording.
然后，我们需要使用foundPerson来确定何时开始或停止记录。
But as you may notice, this method is still being called only once. So to be able to do this once again, we just need to call requestanimationframe to get the new frame from our window source. Then, we need to call the function recursively to detect the frame and exit.
但是，您可能会注意到，该方法仍然仅被调用一次。因此，要再次执行此操作，我们只需要调用requestanimationframe从我们的窗口源中获取新框架。然后，我们需要递归调用该函数以检测帧并退出。

Point #5 is exactly the reason why we’re using references refs instead of just states.When we’re constantly calling the function recursively, then we keep the old copy of the variables stored in a state, and using refs we can still have up-to-date values constantly. Thus, it enables us to detect the person as long as they’re visible on the webcam screen.

第5点正是我们使用引用ref而不是状态的原因，当我们不断递归调用函数时，我们将变量的旧副本保存在状态中，使用ref仍然可以不断更新价值。因此，只要我们在网络摄像头屏幕上看到该人，就可以检测到该人。

We’ll implement this functionality with the following code snippet:

我们将通过以下代码片段实现此功能：

async function detectFrame() {
    if (!shouldRecordRef.current) {
      stopRecording();
      return;
    }
    const predictions = await modelRef.current.detect(videoElement.current);
    let foundPerson = false;
    for (let i = 0; i < predictions.length; i++) {
      if (predictions[i].class == "person") {
        foundPerson = true;
      }
    }
    if (foundPerson) {
      startRecording();
    } else {
      stopRecording();
    }
    requestAnimationFrame(() => {
      detectFrame();
    });
  }

For the UI , we’re using Bootstrap to create a simple two-column interface with two buttons to start and stop recording. There will also be a table that displays the recorded file. The entire code implementation of the UI part is provided in the code snippet below:

对于UI，我们使用Bootstrap创建一个简单的两列界面，其中包含两个用于启动和停止记录的按钮。还将有一个表显示记录的文件。以下代码段提供了UI部分的整个代码实现：

return (
    
      
        
          

        
          
            
              
                                  className="btn btn-success"
                  onClick={() => {
                    shouldRecordRef.current = true;
                    stopButtonElement.current.removeAttribute("disabled");
                    startButtonElement.current.setAttribute("disabled", true);
                    detectFrame();
                  }}
                  ref={startButtonElement}
                >
                  Start
            
              

              
                                  className="btn btn-danger"
                  onClick={() => {
                    shouldRecordRef.current = false;
                    startButtonElement.current.removeAttribute("disabled");
                    stopButtonElement.current.setAttribute("disabled", true);
                    stopRecording();
                  }}
                  ref={stopButtonElement}
                >
                  Stop
            
              

            

            
              






                  {!records.length
                    ? 


                    : records.map(record => {
                      return (



                      );
                    })}

                                                      Records Time                   
                
                                      No record yet                     
                                                   {moment(record.title).format('LLLL')}                         
                
              

            

          

        

      

    

  );

Records Time
No record yet
{moment(record.title).format('LLLL')}

We can use this code in the render method of our functional component.

我们可以在功能组件的render方法中使用此代码。

And that’s it! We’re finished with the implementation. Now we just need to test it to make sure it works properly. In order to run this React project, we can simply run the following command in the project terminal:

就是这样！我们已经完成了实现。现在，我们只需要对其进行测试以确保其正常工作即可。为了运行这个React项目，我们可以简单地在项目终端中运行以下命令：

yarn run dev

We will get results similar to the ones shown in the simulation below:

我们将获得与以下模拟中所示结果相似的结果：

结语。 (Wrapping up.)

In this tutorial, we learned how to use COCO-SSD to detect an instance of a human with our webcam. The functions that were used, along with the mix of libraries, are pretty interesting. There is limitless potential to what we can do using this TensorFlow object detection technology.

在本教程中，我们学习了如何使用COCO-SSD通过我们的网络摄像头检测人类实例。使用的功能以及库的混合非常有趣。使用此TensorFlow对象检测技术可以做的事无穷。

You could easily extend an application like this using other JS frameworks like Vue, Electron, etc. Sign-language detection, eye-movement detections, virtual web games—these are some of the specific possibilities when it comes to working with TensorFlow.js.

您可以使用其他JS框架(例如Vue，Electron等)轻松地扩展这样的应用程序。手语检测，眼动检测，虚拟网络游戏-在使用TensorFlow.js时，这是一些特定的可能性。

For convenience, the entire code for this tutorial is available on GitHub:

为了方便起见，可以在GitHub上获得本教程的全部代码：

Editor’s Note: Heartbeat is a contributor-driven online publication and community dedicated to exploring the emerging intersection of mobile app development and machine learning. We’re committed to supporting and inspiring developers and engineers from all walks of life.

编者注： 心跳 是由贡献者驱动的在线出版物和社区，致力于探索移动应用程序开发和机器学习的新兴交集。 我们致力于为各行各业的开发人员和工程师提供支持和启发。

Editorially independent, Heartbeat is sponsored and published by Fritz AI, the machine learning platform that helps developers teach devices to see, hear, sense, and think. We pay our contributors, and we don’t sell ads.

Heartbeat在编辑上是独立的，由以下机构赞助和发布 Fritz AI ，一种机器学习平台，可帮助开发人员教设备看，听，感知和思考。 我们向贡献者付款，并且不出售广告。

If you’d like to contribute, head on over to our call for contributors. You can also sign up to receive our weekly newsletters (Deep Learning Weekly and the Fritz AI Newsletter), join us on Slack, and follow Fritz AI on Twitter for all the latest in mobile machine learning.

如果您想做出贡献，请继续我们的 呼吁捐助者 。 您还可以注册以接收我们的每周新闻通讯(《 深度学习每周》 和《 Fritz AI新闻通讯》 )，并加入我们 Slack ，然后继续关注Fritz AI Twitter 提供了有关移动机器学习的所有最新信息。

翻译自: https://heartbeat.fritz.ai/getting-started-with-object-detection-using-tensorflow-js-757d21658e2d