学习笔记（优达学城）-驾驶行为克隆

(来自优达学城无人车纳米基石项目Project3)

Project最终目的：通过Keras实现端到端无人驾驶。

引言

Behavioral cloning这个单词是指克隆行为。由于是无人车，那么克隆的就是驾驶行为了。

那么驾驶行为克隆是指，无人车像我们一样开车。通过我们训练的数据，无人车在遇到弯道应该怎么转弯，遇到直线应该怎么办，在特定情况是加速还是减速等等问题上做出跟输入数据类似反映。

那么我们怎样才能教会无人车像人一样去开车。事实上，这个也是人工智能的一部分。因为我们要教会“无人车”怎么像人一样开车。

为了能更准确的感受到驾驶行为克隆，拿我们的日常生活做比较。

假设我们买了一辆车，所以为了开车，我们需要考驾照。报考驾校，交学费，塞红包，该做的都做了，终于开始学车了。教练坐上驾驶座，开始温（粗）柔（暴）的叫我们开车。教练开着车，到了S型弯道。教练开口说，要过S形状的弯道，是有一定诀窍的！就是凭感觉~~。。

开个玩笑。

不过玩笑归玩笑，这里想说的就是开车的感觉是哪里来的呢？如果仔细想想就会明白，这是教练这个老司机日积月累训练出来的感觉。教练刚开始学车的时候，估计不是凭感觉开车的。

开车感觉就是人在一个车道上根据周边事物的情况做出的一系列判断。如果感觉对了，也就是一系列的判断是对的，那么这个车开的很好。如果感觉出现差错，那么车毁人亡的概率就比较大了。

这里，教练的感觉就是我们需要让无人车知道的事情。跟教练日积月累的开车训练一样，我们也要搜集大量车的输入出入数据和有相关性的周边环境数据。

控制模块输入输出

本次项目里，为了训练我们的控制模块，我们打算利用监督学习方法。也就是说，我们给定无人车的人工智能控制模块一个输入和对应这个输入的label。我们的label是对应图像的对应转向轮的转动角度。训练模型用的是神经网络模型。

所以训练阶段，神经网络的输入数据如下：

输入

摄像机的图像输入。（RGB或者BGR都可以的，没什么关系，因为之后会用computer vision修改）

对应每帧摄像机图像的转向轮（前轮）的转动角度。

那么在我们训练完成之后，神经网络的输出数据如下：

输出

转向轮（也就是前轮）的转动角度

具体操作方法可以参考NVIDIA公司的名为“End to End Learning for Self-Driving Cars”的文章。

数据采集及控制模块训练

那么接下来看一下如何实现。

首先系统配置(库版本）如下：

tensorflow 1.x

tensorflow's python 3.5.x

keras 1.x

anaconda 4.x

anaconda python 3.5(这个要和tensorflow的 python 版本一致）

说起版本，刚开始装的时候可真是各种坑啊，装python cv2的时候也是各种坑。。。以后会写一个Anaconda的library安装说明的。

废话不多说，开始正题

首先来个我的github传送门。首先来个我的github传送门。小伙子长得还行哈~

Fred159/My-Udacity-Project3-Clone-Driving-Behaviorgithub.com

数据采集

为了训练，我们需要提前准备训练数据。这里我们用优达学城的仿真器获取图像数据和转向角的数据。界面长这个样子。

优达学城仿真器

进入训练模式，就会出现下图。看到angle和record了吗？ angle 就是转向轮的角度。record就是开始采集数据的意思。所以当我们开始record的时候，仿真机就会在特定本地磁盘的路径内开始分别存储图像数据和转向轮数据。图像数据是从装在车前挡风玻璃上的虚拟摄像机采集的。（仿真器提供3个角度的摄像机数据，这里我只用了中间的，因为明明中间一个就可以了，为啥非要用三个。。而且白白增加不少训练时间。。）

仿真平台

这个就是excel表格，之前导入的csv库就是为了读取这个文件时候用的。这个csv文件里，记录了每个图像对应的转向角，也就是输入数据和label数据。

每个图像对应的转向轮角度

下图是摄像机视角。最终，训练好的车子会在这种视角下输出合理的转向角。感觉比我们开车还难不少呢。。。

中间摄像机视角

训练代码

数据采集完毕，我们开始训练。

首先我们需要导入必要的库。os库是为了读取大规模图片的时候指定路径用的，csv库就是处理微软excel文件时候用的，cv2库就是电脑视觉的opencv库了。numpy库是anaconda基本库，用来处理基本数学运算的。sklearn库在机器学习里面经常会用到，而且经常用在数据预处理上。keras是一种基于tensorflow的更上端的代码。也就是说，keras可以用更简单的代码，调动tensorflow的代码来完成对模型的训练。matplotlib库就是python里面最常用的画图工具。

import os

import csv

import cv2

import numpy as np

import sklearn

from keras.models import Sequential, Model

from keras.layers import Flatten, Dense, Lambda, Cropping2D, Dropout, Input

from keras.layers.convolutional import Convolution2D

from keras.layers.pooling import MaxPooling2D

from keras.backend import tf as ktf

from sklearn.utils import shuffle

import matplotlib.image as mpimg

之后我们读取各个图像的路径和转向角的数据。这里sklearn 用来随机分割训练数据集（training set）和测试数据集（test set）。

samples = []

with open('.\\logfile3\\driving_log.csv') as csvfile:

reader = csv.reader(csvfile)

for line in reader:

samples.append(line)

#split the data set into trainning data and validation data

from sklearn.model_selection import train_test_split

train_samples, validation_samples = train_test_split(samples, test_size=0.2)

指定输入（图像）大小

IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS = 160, 320, 3

INPUT_SHAPE = (IMAGE_HEIGHT, IMAGE_WIDTH, IMAGE_CHANNELS)

为了更好的训练，我们需要更多的数据，但是我们又不想采集更多的数据，那么这种情况，我们一般使用增强已有的数据的方式扩充训练数据集。所以，我们利用opencv的函数对我们现有的数据集进行增强扩充。这里，定义的方程如下。

#define augmented images functions

def random_flip(image, steering_angle):

"""

Randomly flipt the image left <-> right, and adjust the steering angle.

"""

if np.random.rand() < 0.5:

image = cv2.flip(image, 1)

steering_angle = -steering_angle

return image, steering_angle

def random_brightness(image):

"""

Randomly adjust brightness of the image.

"""

# HSV (Hue, Saturation, Value) is also called HSB ('B' for Brightness).

hsv = cv2.cvtColor(image, cv2.COLOR_RGB2HSV)

ratio = 1.0 + 0.4 * (np.random.rand() - 0.5)

hsv[:,:,2] = hsv[:,:,2] * ratio

return cv2.cvtColor(hsv, cv2.COLOR_HSV2RGB)

def random_shadow(image):

"""

Generates and adds random shadow

"""

# (x1, y1) and (x2, y2) forms a line

# xm, ym gives all the locations of the image

x1, y1 = IMAGE_WIDTH * np.random.rand(), 0

x2, y2 = IMAGE_WIDTH * np.random.rand(), IMAGE_HEIGHT

xm, ym = np.mgrid[0:IMAGE_HEIGHT, 0:IMAGE_WIDTH]

# mathematically speaking, we want to set 1 below the line and zero otherwise

# Our coordinate is up side down. So, the above the line:

# (ym-y1)/(xm-x1) > (y2-y1)/(x2-x1)

# as x2 == x1 causes zero-division problem, we'll write it in the below form:

# (ym-y1)*(x2-x1) - (y2-y1)*(xm-x1) > 0

mask = np.zeros_like(image[:, :, 1])

mask[(ym - y1) * (x2 - x1) - (y2 - y1) * (xm - x1) > 0] = 1

# choose which side should have shadow and adjust saturation

cond = mask == np.random.randint(2)

s_ratio = np.random.uniform(low=0.2, high=0.5)

# adjust Saturation in HLS(Hue, Light, Saturation)

hls = cv2.cvtColor(image, cv2.COLOR_RGB2HLS)

hls[:, :, 1][cond] = hls[:, :, 1][cond] * s_ratio

return cv2.cvtColor(hls, cv2.COLOR_HLS2RGB)

定义好各个cv函数之后，我们需要把已有的数据处理一下。但是，由于在神经网络训练的过程中，计算机会用掉很多的内存，这使得我们不能一下子把所有的数据集都训练一遍。所以一般会采用batch训练。这里优达学城给出了一个方程叫generator。这个函数可以根据我们的需求一批一批的把数据喂给神经网络里。（如果不用generator，训练的时候总是会出现memory error）。generator 代码如下。这里我的batch size初始值定义了32.也就是说，generator 每次喂给神经网络32个数据。

#define the genrators since the memory of the computer is not enough

def generator(samples, batch_size=32):

num_samples = len(samples)

while 1: # Loop forever so the generator never terminates

shuffle(samples)

for offset in range(0, num_samples, batch_size):

batch_samples = samples[offset:offset+batch_size]

images = []

angles = []

for batch_sample in batch_samples:

for i in range(3):

#three camera images readed

name = '.\\logfile3\\IMG\\'+batch_sample[i].split('\\')[-1]

#read BGR format image

center_image = cv2.imread(name)

center_image = cv2.cvtColor(center_image, cv2.COLOR_BGR2RGB)

center_angle = float(batch_sample[3])

images.append(center_image)

angles.append(center_angle)

#add random_shadow data into training samples

center_image1 = random_shadow(center_image)

center_angle1 = float(batch_sample[3])

images.append(center_image1)

angles.append(center_angle1)

# add random_brightness data into training samples

center_image2 = random_brightness(center_image)

center_angle2 = float(batch_sample[3])

images.append(center_image2)

angles.append(center_angle2)

# add random_flip data into training samples

center_image3, center_angle3 = random_flip(center_image, center_angle)

images.append(center_image3)

angles.append(center_angle3)

#change samples into array

X_train = np.array(images)

y_train = np.array(angles)

yield sklearn.utils.shuffle(X_train, y_train)

定义好generator，我们定义由generator给出的训练数据和验证数据集。这里batch size 调成了50. 这个数据不打紧，因为我电脑内存的问题，我就把它设定成了50。这个数字越大，训练速度越快，当然对电脑的要求也就更高。

# compile and train the model using the generator function

train_generator = generator(train_samples, batch_size=50)

validation_generator = generator(validation_samples, batch_size=50)

这里还有多定义一个函数，就是resize。resize就是把输入的图像按照我们的要求缩小。比如，我们想要用（32，100）大小的图片对神经网络进行训练，而仿真器给出的图像是（160，320）的话，我们就需要用resize 把大图片改成小图片。图片大小在训练神经网络的时候至关重要，因为图像的大小直接决定了神经网络输入层的神经元个数。所以神经网络设定的输入大小和喂给神经网络的图像的大小要一致。

def resize_img(input):

from keras.backend import tf as ktf

return ktf.image.resize_images(input, (32, 100))

现在我们用基于tensorflow的keras对整个网络进行训练。具体训练方法有很多种，请参考keras官网。我只用了最中规中矩的方法。如下。

# Keras model refered to NVIDIA model. But I added Cropping2D and made some of changes.

model = Sequential()

#cropping the images to remove the noise from image, Keras onyl need to feed input shape in fisrt layer.

model.add(Cropping2D(cropping = ((65, 20), (2,2)), input_shape = (160, 320, 3)))

#resize the image data to accelerate the traning step.

model.add(Lambda(resize_img))

#normalize the data

model.add(Lambda(lambda x: x/127.5 - 1))

#convolution layer with 12 filters(size (5,5)), stride = (2,2)

model.add(Convolution2D(12, 5, 5, activation='relu', subsample=(2,2)))

#add a dropout layer for preventing overfitting

model.add(Dropout(0.2))

#convolution layer with 32 filters(size (5,5)), stride = (2,2)

model.add(Convolution2D(32, 5, 5, activation='relu', subsample=(2,2)))

#add a dropout layer for preventing overfitting

model.add(Dropout(0.2))

#convolution layer with 48 filters(size (5,5)), stride = (2,2)

model.add(Convolution2D(48, 5, 5, activation='relu', subsample=(2,2)))

#flatten the layer into fully connected layer

model.add(Flatten())

#add dense layer , and the number of node become 100

model.add(Dense(100, activation='relu'))

#add a dropout layer for preventing overfitting

model.add(Dropout(0.2))

#add dense layer , and the number of node become 50

model.add(Dense(50, activation='relu'))

#add a dropout layer for preventing overfitting

model.add(Dropout(0.2))

#add dense layer , and the number of node become 10

model.add(Dense(10, activation='relu'))

#add dense layer , and the number of node become 1

model.add(Dense(1))

model.summary()

#train the model with generator

model.compile(loss='mse', optimizer='adam')

model.fit_generator(train_generator, samples_per_epoch= \

len(train_samples), validation_data=validation_generator, \

nb_val_samples=len(validation_samples), nb_epoch=5)

最终把训练好的模型存成h5格式。为什么存成h5那是因为，优达学城给的仿真器（unity）需要h5格式的模型去操纵车子。这方面我也不清楚~~

#save model as model_my.h5

model.save('model_my.h5')

最后，将仿真器和h5文件一起在cmd里按照优达学城给的命令运行，就可以看到我们的驾驶行为克隆模型的表现了。

总结

驾驶行为模式的介绍就到这里。其实，这个项目还有很多地方可以改进，而且可以让他变得更加有趣。比如输入数据里添加除了转向角度以外的东西，例如加速踏板值，刹车踏板值，加速度减速度，还有车线的曲率半径之类的。而且，地图上也可以有更多的障碍物，更多的移动单位。总是是一个非常有趣的project。这也是为什么我喜欢优达学城的原因。

学习笔记（优达学城）-驾驶行为克隆

你可能感兴趣的:(学习笔记（优达学城）-驾驶行为克隆)