代码解读:基于深度学习的单目深度估计(2)
那就接着分析depth.py呗,
先来分析_image_montage()函数,
def _image_montage(imgs, min, max):
imgs = imgutil.bxyc_from_bcxy(imgs)
return imgutil.montage(
imgutil.scale_values(imgs, min=min, max=max),
border=1)
不难看出这是对rgb图像的量化处理
再来分析_depth_motage()函数,
def _depth_montage(depths):
if depths.ndim == 4:
assert depths.shape[1] == 1
depths = depths[:,0,:,:]
#depths = imgutil.scale_values(depths, min=-2.5, max=2.5)
#depths = map(imgutil.scale_values, depths)
masks = []
for i in xrange(len(depths)):
x = depths[i]
mask = x != x.min()
masks.append(mask)
x = x[mask]
if len(x) == 0:
d = np.zeros_like(depths[i])
else:
d = imgutil.scale_values(depths[i], min=x.min(), max=x.max())
depths[i] = d
depths = plt.cm.jet(depths)[...,:3]
for i in xrange(len(depths)):
for c in xrange(3):
depths[i, :, :, c][masks[i] == 0] = 0.2
return imgutil.montage(depths, border=1)
这段代码可以了解到:
1,虽然读不太懂,可以推知第一个for语句是对深度图像做量化操作
2,第二个for语句意思不太明白
3,注意depths的维数可能是4维
再来看看简单的_zero_pad_bach()函数,
def _zero_pad_batch(batch, bsize):
assert len(batch) <= bsize
if len(batch) == bsize:
return batch
n = batch.shape[0]
shp = batch.shape[1:]
return np.concatenate((batch, np.zeros((bsize - n,) + shp,
dtype=batch.dtype)))
不难理解,是补零的操作
经过上述代码的热身之后,开始分析一些比较重要的代码,分析class machine的类函数,
class machine(Machine):
def __init__(self, conf):
Machine.__init__(self, conf)
来看class类的第一个函数infer_depth(),
def infer_depth(self, images):
'''
Infers depth maps for a list of 320x240 images.
images is a nimgs x 240 x 320 x 3 numpy uint8 array.
returns depths (nimgs x 55 x 74) corresponding to the center box
in the original rgb image.
'''
images = images.transpose((0,3,1,2))
(nimgs, nc, nh, nw) = images.shape
assert (nc, nh, nw) == (3, 240, 320)#网络的输出图片数据为(1,3, 240, 320)
(input_h, input_w) = self.input_size#网络输入feature map 图片的大小
(output_h, output_w) = self.output_size#网络输出feature map大小
bsize = self.bsize
b = 0
# pred_depth为输出,Tensor 类型变量,
v = self.vars
pred_depth = self.inverse_depth_transform(self.fine.pred_mean)
infer_f = theano.function([v.images], pred_depth)
depths = np.zeros((nimgs, output_h, output_w), dtype=np.float32)
# 一张图片的中心 bbox ,(i0, i1)为矩形的左上角、(j0, j1)为矩形的右下角
dh = nh - input_h
dw = nw - input_w
(i0, i1) = (dh/2, nh - dh/2)
(j0, j1) = (dw/2, nw - dw/2)
# infer depth for images in batches
b = 0
while b < nimgs:
batch = images[b:b+bsize]
n = len(batch)
if n < bsize:
batch = _zero_pad_batch(batch, bsize)
# crop to network input size
batch = batch[:, :, i0:i1, j0:j1]
# infer depth with nnet
depths[b:b+n] = infer_f(batch)[:n]
b += n
return depths
从这段代码可以了解:
1,函数infer_depth()的目的推测图像中的深度信息
2,定义深度网络输入输出的图片大小
3,对于while的循环语句,容易理解,对每一批图像进行处理,通过infer_f()估计图像深度信息。至于infer_f()内部
结构,在while语句前是这样定义的:
v = self.vars
pred_depth = self.inverse_depth_transform(self.fine.pred_mean)
infer_f = theano.function([v.images], pred_depth)
我猜测infer_f是一个句柄
4,对于函数inverse_depth_transform(),下面做分析
OK,来分析inverse_depth_transform(),
def inverse_depth_transform(self, logdepths):
# map network output log depths back to depth
# output bias is init'd with the mean, and output is logdepth / stdev
return T.exp(logdepths * self.meta.logdepths_std)
从这段代码可以推测,
1,深度网络输出的log深度信息
2,通过做指数运算,可以把对数消去
下一次再分析后面的函数!