本章是对udacity 机器人课程第三个项目3D perception项目的学习总结。
本章的主要任务是以3D相机的图片作为输入, 经过一系列处理,识别图片中的目标物,标记出其质心位置, 然后控制机械手臂将目标物放入指定的筐中。
在项目二中介绍了3D相机, 3D相机可以生成一个三维立体的图片, 本章则将这种三维立体图片进行处理,经过采样, 滤波,分割处理图片,再经过聚类和svm对图片中的目标分类达到感知的目的。
本章分为三个部分
第一部分介绍使用pcl库对3D图形的预处理, 改阶段的主要目的是去掉3D图形中冗余的信息;
第二部分使用聚类的方法分割目标物;
第三部分使用svm对分割以后的目标物进行分类。
假设我们要操纵机械手臂拾取锤子, 则上图中的大部分信息都为冗余信息,为了减少计算复杂度首先要对图像进行处理,在保留必要信息的基础上去掉冗余信息。
# Create a VoxelGrid filter object for our input point cloud
vox = cloud.make_voxel_grid_filter()
# Choose a voxel (also known as leaf) size
LEAF_SIZE = 0.01
# Set the voxel (or leaf) size
vox.set_leaf_size(LEAF_SIZE, LEAF_SIZE, LEAF_SIZE)
# Call the filter function to obtain the resultant downsampled point cloud
cloud_filtered = vox.filter()
filename = 'voxel_downsampled.pcd'
pcl.save(cloud_filtered, filename)
# PassThrough filter
# Create a PassThrough filter object.
passthrough = cloud_filtered.make_passthrough_filter()
# Assign axis and range to the passthrough filter object.
filter_axis = 'z'
passthrough.set_filter_field_name(filter_axis)
axis_min = 0.6
axis_max = 1.1
passthrough.set_filter_limits(axis_min, axis_max)
# Finally use the filter function to obtain the resultant point cloud.
cloud_filtered = passthrough.filter()
filename = 'pass_through_filtered.pcd'
pcl.save(cloud_filtered, filename)
# Create the segmentation object
seg = cloud_filtered.make_segmenter()
# Set the model you wish to fit
seg.set_model_type(pcl.SACMODEL_PLANE)
seg.set_method_type(pcl.SAC_RANSAC)
# Max distance for a point to be considered fitting the model
# Experiment with different values for max_distance
# for segmenting the table
max_distance = 0.01
seg.set_distance_threshold(max_distance)
# Call the segment function to obtain set of inlier indices and model coefficients
inliers, coefficients = seg.segment()
# Extract inliers
extracted_inliers = cloud_filtered.extract(inliers, negative=False)
filename = 'extracted_inliers.pcd'
pcl.save(extracted_inliers, filename)
桌面并不是我们想要得到的结果, 这是只需要将negative参数设置为True, 则会得到和上一步相反的数据, 即桌面上的物体。
extracted_outliers = cloud_filtered.extract(inliers, negative=True)
filename = 'extracted_outliers.pcd'
pcl.save(extracted_outliers, filename)
以上就是第一部分的内容, 我们首先通过采样降低了数据量, 然后有通过滤波保留了桌面范围的数据, 最后使用随机采样算法又将桌面去掉,只保留了桌面上的物品。 虽然我们肉眼可以根据直觉轻松判断剩余物体的形状,但是对于计算机来说,上图依旧只是一个数字矩阵, 下一步我们需要做的就是将这个数据矩阵分割成一个个的物体。
机器学习按照训练集有无标签可以分为监督学习和无监督学习,其中无监督学习就是给定一组没有标签的数据, 算法自动将数据集按照某种特征分类, 本项目使用DBSCAN Algorithm 完成对上图分类的任务。
DBSCAN 算法将数据集分为三类分别为:核心点,边界点和噪声点。 算法首先需要定义一组参数eps和 minPts, 其中eps代表最大距离阈值,minPts为样本个数阈值。首先随机选中某个点为圆心, 以eps为半径计算半径内非零值的个数, 如果个数超过minPts, 则该点是核心点, 如果个数少于minPts但大于0个,则属于边界点,否则属于噪声点;联通的核心点和边界点会被划分为一类, 噪声点会被去掉。
可以直接使用PCL库的make_EuclideanClusterExtraction函数将图片中的每个物品分类, 然后将每一类标记为不同颜色, 如下图所示。
white_cloud = XYZRGB_to_XYZ(cloud_objects)# Apply function to convert XYZRGB to XYZ
tree = white_cloud.make_kdtree()
# Create Cluster-Mask Point Cloud to visualize each cluster separately
# Create a cluster extraction object
ec = white_cloud.make_EuclideanClusterExtraction()
# Set tolerances for distance threshold
# as well as minimum and maximum cluster size (in points)
ec.set_ClusterTolerance(0.05)
ec.set_MinClusterSize(10)
ec.set_MaxClusterSize(1000)
# Search the k-d tree for clusters
ec.set_SearchMethod(tree)
# Extract indices for each of the discovered clusters
cluster_indices = ec.Extract()
print "cluster_indices=", len(cluster_indices)
#Assign a color corresponding to each segmented object in scene
cluster_color = get_color_list(len(cluster_indices))
color_cluster_point_list = []
for j, indices in enumerate(cluster_indices):
for i, indice in enumerate(indices):
color_cluster_point_list.append([white_cloud[indice][0],
white_cloud[indice][1],
white_cloud[indice][2],
rgb_to_float(cluster_color[j])])
#Create new cloud containing all clusters, each with unique color
cluster_cloud = pcl.PointCloud_PointXYZRGB()
cluster_cloud.from_list(color_cluster_point_list)
通过上一步无监督学习的聚类方法, 已经成功将待识别的物品分段, 接下来就是使用有监督学习的支持向量机方法对这些物体分类, 在我的另一片博客《使用支持向量机对车辆检测追踪》中介绍了使用支持向量机识别汽车的原理和方法, 本节使用的方法和这篇播客基本相同, 区别是本章使用的待分类物体是有3D的点云组成, 而分类汽车使用的是2D的图片, 所以本章对3D点云提取的特征是颜色直方图和曲面法向量。
# Define a function to compute color histogram features
def color_hist(img, nbins=32, bins_range=(0, 256)):
# Convert from RGB to HSV using cv2.cvtColor()
hsv_img = cv2.cvtColor(img, cv2.COLOR_RGB2HSV)
# Compute the histogram of the HSV channels separately
h_hist = np.histogram(hsv_img[:,:,0], bins=nbins, range=bins_range)
s_hist = np.histogram(hsv_img[:,:,1], bins=nbins, range=bins_range)
v_hist = np.histogram(hsv_img[:,:,2], bins=nbins, range=bins_range)
# Concatenate the histograms into a single feature vector
hist_features = np.concatenate((h_hist[0], s_hist[0], v_hist[0])).astype(np.float64)
# Normalize the result
norm_features = hist_features / np.sum(hist_features)
# Return the feature vector
return norm_features
提取曲面法向量顾名思义就是提取3D点云曲面的法向量。如下图所示, 一个球形的法向量特征各方向法向量均匀分布。
def compute_normal_histograms(normal_cloud):
norm_x_vals = []
norm_y_vals = []
norm_z_vals = []
for norm_component in pc2.read_points(normal_cloud,
field_names = ('normal_x', 'normal_y', 'normal_z'),
skip_nans=True):
norm_x_vals.append(norm_component[0])
norm_y_vals.append(norm_component[1])
norm_z_vals.append(norm_component[2])
# TODO: Compute histograms of normal values (just like with color)
norm_x_hist = np.histogram(norm_x_vals, bins=32, range=(0, 256))
norm_y_hist = np.histogram(norm_y_vals, bins=32, range=(0, 256))
norm_z_hist = np.histogram(norm_z_vals, bins=32, range=(0, 256))
# TODO: Concatenate and normalize the histograms
norm_features = np.concatenate((norm_x_hist[0], norm_y_hist[0], norm_z_hist[0])).astype(np.float64)
normed_features = norm_features / np.sum(norm_features)
# Generate random features for demo mode.
# Replace normed_features with your feature vector
# normed_features = np.random.random(96)
return normed_features
有了特征向量, 下一步就是使用支持向量机训练数据, 同样的, 支持向量机已经在另一片博客中有过较详尽的描述, 而且网上关于支持向量机的介绍也很多, 这里就不在重复, 直接给出源码和对预测结果可视化。
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, '{0:.2f}'.format(cm[i, j]),
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Load training data from disk
training_set = pickle.load(open('training_set.sav', 'rb'))
# Format the features and labels for use with scikit learn
feature_list = []
label_list = []
for item in training_set:
if np.isnan(item[0]).sum() < 1:
feature_list.append(item[0])
label_list.append(item[1])
print('Features in Training Set: {}'.format(len(training_set)))
print('Invalid Features in Training set: {}'.format(len(training_set)-len(feature_list)))
X = np.array(feature_list)
# Fit a per-column scaler
X_scaler = StandardScaler().fit(X)
# Apply the scaler to X
X_train = X_scaler.transform(X)
y_train = np.array(label_list)
# Convert label strings to numerical encoding
encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
# Create classifier
clf = svm.SVC(kernel='linear')
# Set up 5-fold cross-validation
kf = cross_validation.KFold(len(X_train),
n_folds=5,
shuffle=True,
random_state=1)
# Perform cross-validation
scores = cross_validation.cross_val_score(cv=kf,
estimator=clf,
X=X_train,
y=y_train,
scoring='accuracy'
)
print('Scores: ' + str(scores))
print('Accuracy: %0.2f (+/- %0.2f)' % (scores.mean(), 2*scores.std()))
# Gather predictions
predictions = cross_validation.cross_val_predict(cv=kf,
estimator=clf,
X=X_train,
y=y_train
)
accuracy_score = metrics.accuracy_score(y_train, predictions)
print('accuracy score: '+str(accuracy_score))
confusion_matrix = metrics.confusion_matrix(y_train, predictions)
class_names = encoder.classes_.tolist()
#Train the classifier
clf.fit(X=X_train, y=y_train)
model = {'classifier': clf, 'classes': encoder.classes_, 'scaler': X_scaler}
# Save classifier to disk
pickle.dump(model, open('model.sav', 'wb'))
# Plot non-normalized confusion matrix
plt.figure()
plot_confusion_matrix(confusion_matrix, classes=encoder.classes_,
title='Confusion matrix, without normalization')
# Plot normalized confusion matrix
plt.figure()
plot_confusion_matrix(confusion_matrix, classes=encoder.classes_, normalize=True,
title='Normalized confusion matrix')
plt.show()
如果支持向量机训练得当, 则应该能够较准确的对通过聚类方法分段的物体进行分类。
如下图所示, 项目首先在机器人面前给出一组不同种类的物体, 这些物体可以分为两组, 一组要放到红色的筐中, 一组要放入蓝色的筐中,机器人的摄像头会记录这组物体的点云作为输入, 通过前边介绍的一系列步骤, 机器人能够识别出这些物体,换句话说能够给出这些物体的标签。通过标签就能区分其所属的组, 再通过点云计算其质心坐标,最后控制机器人的手臂在质心坐标出抓取, 在所属筐的坐标位置放下物体。