maxnet学习(1):image函数

reference:https://mxnet.incubator.apache.org/api/python/image/image.html#mxnet.image.imread

注意这里不是gluon的函数

1.mxnet.image.imdecode和mxnet.image.imread的区别

二者都是使用C++的opencv对图像进行处理,imdecode将图片解码为NDarray,而在此前需要读入图片。imread直接读入并解码图片。二者都可以设置flag=0读入灰度图片,设置to_rgb=0保持原本的bgr格式(opencv)。

读进的图片都是0~255, shape = (H, W, C)。而输入gluon的网络需要0~1, shape = (n, C, H, W), 在输入网络之前需要经过转换

img = mxnet.image.imdecode(open("dog.jpg", "rb").read())
img = mxnet.image.imread("dog.jpg")
def transform(data): # Imagenet pretrained model 
    data = data.transpose((2, 0, 1)).expand_dims(axis = 0)
    rgb_mean = nd.array([0.485, 0.456, 0.406]).reshape(1, 3, 1, 1)
    rgb_std = nd.array([0.229, 0.224, 0.225]).reshape(1, 3, 1, 1)
    return (data.astype('float32') / 255 - rgb_mean) / rgb_std
input_image = transform(img)#此后这里可以直接输入gluon网络

2.cv2.imread和mxnet.image.imread的区别

前者使用cv2,后者使用c++版本的opencv。前者读取结果是numpy array,后者读取结果是nd.array。前者通道为bgr,后者默认是rgb。

3.mxnet.image.resize, mxnet.image.resize_short

前者是强行resize,后者是按比例将短边resize到指定大小。

4.mxnet.image.scale_down

如果在crop的时候,w/h大于了图片的w/h,就按比例调整crop的大小。

5.mxnet.image.color_normalize(src, mean, std = None)

根据mean和std对图片进行normalize,RGB顺序的NDarray

6.class mxnet.image.ImageIter()

class mxnet.image.ImageIter(
                            batch_size,
                            data_shape, #只支持3通道RGB
                            label_width=1, 
                            path_imgrec=None,
                            path_imglist=None, 
                            path_root=None, 
                            path_imgidx=None, 
                            shuffle=False, 
                            part_index=0, 
                            num_parts=1, 
                            aug_list=None, 
                            imglist=None, 
                            data_name ='data', 
                            label_name ='softmax_label', 
                            dtype='float32', 
                            last_batch_handle='pad', 
                            **kwargs
                            )

这是一个带有大量augmentation操作的data iterator,它支持从.rec文件或者原始图片读取数据

使用path_imgrec参数load .rec文件,使用path_imglist参数load原始图片数据。

通过指定path_imgidx参数使用数据分布式训练或者shuffling

参考

http://mxnet.incubator.apache.org/versions/master/api/python/image/image.html#mxnet.image.ImageIter
https://blog.csdn.net/u014380165/article/details/74906061

一个使用的例子

data_iter = mx.image.ImageIter(batch_size=4, data_shape=(3, 227, 227),
                              path_imgrec="./data/caltech.rec",
                              path_imgidx="./data/caltech.idx" )

# data_iter的类型是mxnet.image.ImageIter
#reset()函数的作用是:resents the iterator to the beginning of the data
data_iter.reset()

#batch的类型是mxnet.io.DataBatch,因为next()方法的返回值就是DataBatch
batch = data_iter.next()

#data是一个NDArray,表示第一个batch中的数据,因为这里的batch_size大小是4,所以data的size是4*3*227*227
data = batch.data[0]

#这个for循环就是读取这个batch中的每张图像并显示
for i in range(4):
    plt.subplot(1,4,i+1)
    plt.imshow(data[i].asnumpy().astype(np.uint8).transpose((1,2,0)))
plt.show()

使用mx.image.CreateAugmenter()进行图像augmentation

train = mx.image.ImageIter(
        batch_size            = args.batch_size,
        data_shape          = (3,224,224),
        label_width           = 1,
        path_imglist          = args.data_train,
        path_root              = args.image_train,
        part_index            = rank,
        shuffle                  = True,
        data_name           = 'data',
        label_name           = 'softmax_label',
        aug_list                 = mx.image.CreateAugmenter((3,224,224),resize=224,rand_crop=True,rand_mirror=True,mean=True))

image.CreateAugmenter相关的设置和参数

image.CreateAugmenter(
                data_shape,
                resize=0,
                rand_crop=False,
                rand_resize=False,
                rand_mirror=False,
                mean=None,#这里如果是True,默认imagenet的均值
                std=None,#同上
                brightness=0,
                contrast=0,
                saturation=0,
                hue=0,
                pca_noise=0,
                rand_gray=0,
                inter_method=2
                )
#Creates an augmenter list.

Parameters:

  • data_shape (tuple of int) – Shape for output data
  • resize (int) – Resize shorter edge if larger than 0 at the begining
  • rand_crop (bool) – Whether to enable random cropping other than center crop
  • rand_resize (bool) – Whether to enable random sized cropping, require rand_crop to be enabled
  • rand_gray (float) – [0, 1], probability to convert to grayscale for all channels, the number of channels will not be reduced to 1
  • rand_mirror (bool) – Whether to apply horizontal flip to image with probability 0.5
  • mean (np.ndarray or None) – Mean pixel values for [r, g, b]
  • std (np.ndarray or None) – Standard deviations for [r, g, b]
  • brightness (float) – Brightness jittering range (percent)
  • contrast (float) – Contrast jittering range (percent)
  • saturation (float) – Saturation jittering range (percent)
  • hue (float) – Hue jittering range (percent)
  • pca_noise (float) – Pca noise level (percent)
  • inter_method (int, default=2(Area-based)) –
    Interpolation method for all resizing operations
    Possible values: 0: Nearest Neighbors Interpolation. 1: Bilinear interpolation. 2: Area-based (resampling using pixel area relation). It may be a preferred method for image decimation, as it gives moire-free results. But when the image is zoomed, it is similar to the Nearest Neighbors method. (used by default). 3: Bicubic interpolation over 4x4 pixel neighborhood. 4: Lanczos interpolation over 8x8 pixel neighborhood. 9: Cubic for enlarge, area for shrink, bilinear for others 10: Random select from interpolation method metioned above. Note: When shrinking an image, it will generally look best with AREA-based interpolation, whereas, when enlarging an image, it will generally look best with Bicubic (slow) or Bilinear (faster but still looks OK).

你可能感兴趣的:(mxnet,mxnet,image,gluon)