3D目标检测的实现(CV+TF)

相对于传统的2D目标检测,最近一段时间有机会接触到了3D打印设备,准备对3D打印的产品进行实时目标检测,下面分享一些我学到的经验。
硬件:ip摄像头
系统:Windows
软件:open cv

数据感知模块

摄像头采集需要安装opencv_python
下载连接:https://www.lfd.uci.edu/~gohlke/pythonlibs/#opencv

开始要先确定好所使用的摄像头。
对摄像头的ip修改成自己PC的ip
URL:Uniform Resource Locator,“统一资源定位符”,可以从互联网上得到资源的位置并访问,是互联网上标准资源的地址。
使用界面(浏览器)调用摄像头,把摄像头的ip输入到页面上,登录摄像头界面
然后对摄像头的每一帧图片进行采集和保存,如果是视频流的话,需要修改RTSP(实时传输协议),修改后最好重启一下摄像头。

#再将以下代码重新运行一下
import cv2
url = 'rtsp://admin:[email protected]:554/11'
cap = cv2.VideoCapture(url)
while(cap.isOpened()):  
    # 获取一帧
    ret, frame = cap.read()  
    # 显示结果帧  
    cv2.imshow('frame',frame)  
    if cv2.waitKey(1) & 0xFF == ord('q'):  
        break  
# 完成后,释放捕获 
cap.release()  
cv2.destroyAllWindows()

数据集的制作

目标检测的数据类型包括2D的RGB图像,2.5D的RGB-D图像以及3D的点云。
RGB图像高像素的特征,可以捕捉到更多的细节,但是缺乏3D信息,可以用比较成熟的CNN算法实现
RGBD图像具有3D信息,相对稠密,但受传感器影响大。可以结合相机内参转换为3D点云,因此其既可以用CNN,也可以用基于点云的DNN。
点云具有精确的3D信息,但太过稀疏。其中点云的表现形式,主要有体素化(voxelize)(用于训练3D-CNN)、原始点云(raw)(使用针对点云的DNN,例如PointNet、PointCNN等)、前视图(Front View)(对垂直空间进行划分,得到多层)、鸟瞰图(Bird Eye View, BEV)(使用传统CNN)。

模型训练

检测demo

from object_detection.utils import visualization_utils as vis_util
from object_detection.utils import label_map_util
from distutils.version import StrictVersion
import tensorflow as tf
import numpy as np
import cv2

if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
    raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')    
# 开启摄像头
cap = cv2.VideoCapture(0)
# 添加模型位置和标签配置文件位置
PATH_TO_FROZEN_GRAPH = ''
PATH_TO_LABELS = ''

# 载入模型
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS, use_display_name=True)

with detection_graph.as_default():
    with tf.Session(graph=detection_graph) as sess:
        while True:
            ret, image_np = cap.read()
            # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
            image_np_expanded = np.expand_dims(image_np, axis=0)
            image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
            # Each box represents a part of the image where a particular object was detected.
            boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
            # Each score represent how level of confidence for each of the objects.
            # Score is shown on the result image, together with the class label.
            scores = detection_graph.get_tensor_by_name('detection_scores:0')
            classes = detection_graph.get_tensor_by_name('detection_classes:0')
            num_detections = detection_graph.get_tensor_by_name('num_detections:0')
            # Actual detection.
            (boxes, scores, classes, num_detections) = sess.run(
                [boxes, scores, classes, num_detections],
                feed_dict={image_tensor: image_np_expanded})
            # Visualization of the results of a detection.
            vis_util.visualize_boxes_and_labels_on_image_array(
                image_np, np.squeeze(boxes),
                np.squeeze(classes).astype(np.int32),
                np.squeeze(scores), category_index,
                use_normalized_coordinates=True,
                line_thickness=8)

            cv2.imshow('object detection', image_np)
            if cv2.waitKey(25) & 0xFF == ord('q'):
                cv2.destroyAllWindows()
                break
cap.release()
cv2.destroyAllWindows()

你可能感兴趣的:(算法项目)