自己写一个PRISMA "让两张图片融合起来"

原文:http://blog.askfermi.me/2016/09/27/diy-prisma/

大约2个月前的一天,一款叫做PRISMA的应用突然刷爆了朋友圈,后来还出现了叫做Ostagram之类的更丰富的应用,它可以让一张照片变成世界名画的风格。实话说,这款app突然火起来还是很让我惊讶的,因为之前也恰好看到了相关的论文,和一个开源的实现。而且在6月的《互联网编程》的课上还有一位同学实现了出来。今天,我们就来一起来实现一个高级版的PRISMA,不仅仅是世界名画,任意两幅图片,我们都能将它们融合在一起。

由于这不是一篇太学术意义上的科普文章,因此本文中会介绍相关的论文,和一些开源的项目,并利用这个开源的项目来实现一个简单的类PRISMA应用。算是一篇踏坑纪实。这篇文章将只会实现后台的一部分。

原理

PRISMA工作在一种叫做卷积神经网络的理论之上,论文可以按此:A Neural Algorithm of Artistic Style,我们的这个项目根据的是这篇论文在torch上的一个实现,作者也将其开源在了Github上了:Neural Style。我们将其安装在我们的系统上,在做一些简单的操作就可以完成类似PRISMA的操作了。

硬件配置

PRISMA所用的卷积神经网络(CNN)通常都对计算机的性能有着较高的要求,在科研和工业环境中,通常需要使用较高配置的显卡来进行基于CUDA的运算才可以在较快的时间内完成。Neural Style的作者也提供了对CUDA的支持,因此有一块较好的显卡是比较推荐的配置。

根据测试,大约需要6~8G的内存,才可以较好地在CPU模式下运行Neural Style。

强烈不建议在小内存的主机上运行这一程序。

安装

Neural Style的作者提供了安装文档,然而,还是会经常遇到一些问题。推荐的安装流程如下(以Ubuntu为例):

升级GCC

GCC 5是必备的组件之一。最初我使用gcc 4.8和gcc 4.9都失败了,这是特别坑的一点,只有使用gcc 5以上的版本才可以正常编译。

sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install gcc-5 g++-5
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-5 60 --slave /usr/bin/g++ g++ /usr/bin/g++-5

之后使用gcc -v就可以看到当前的版本,若为5就可以进行下面的步骤了。

安装Torch及依赖

cd ~/
curl -s https://raw.githubusercontent.com/torch/ezinstall/master/install-deps | bash
git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch
./install.sh

执行最后一条之后就会开始自动安装torch,在安装结束之后,会自动将环境变量信息写入bashrc,我们只需要source ~/.bashrc就可以使其生效,之后,在命令行中输入th,若出现 Torch,就表示安装成功了。

安装loadcaffe
loadcaffe 可以在Torch中加载Caffe的网络,也是一个经常使用的库。它依赖Google的Protocol Buffer Library,所以要先安装它们 sudo apt-get install libprotobuf-dev protobuf-compiler 我们可以通过luarocks(一个lua的Package Manager)来安装loadcaffe luarocks install loadcaffe

安装 Neural-Style

先从github上把仓库clone下来

cd ~/
git clone https://github.com/jcjohnson/neural-style.git
cd neural-style

之后下载提前训练好的神经网络数据,这个数据会比较大sh models/download_models.sh 下载结束之后,就基本可以开始使用了。

使用

最基础的使用:th neural_style.lua -style_image -content_image 就可以用默认的参数来输出融合后的图像了。我们也可以为其增加参数来实现不同的功能,基本可以实现PRISMA的各项功能:

Options:

  • image_size: Maximum side length (in pixels) of of the generated image. Default is 512.
  • style_blend_weights: The weight for blending the style of multiple style images, as a comma-separated list, such as -style_blend_weights 3,7. By default all style images are equally weighted.
  • gpu: Zero-indexed ID of the GPU to use; for CPU mode set -gpu to -1.

Optimization options:

  • content_weight: How much to weight the content reconstruction term. Default is 5e0.
  • style_weight: How much to weight the style reconstruction term. Default is 1e2.
  • tv_weight: Weight of total-variation (TV) regularization; this helps to smooth the image. Default is 1e-3. Set to 0 to disable TV regularization.
  • num_iterations: Default is 1000.
  • init: Method for generating the generated image; one of random or image. Default is random which uses a noise initialization as in the paper; image initializes with the content image.
  • optimizer: The optimization algorithm to use; either lbfgs or adam; default is lbfgs. L-BFGS tends to give better results, but uses more memory. Switching to ADAM will reduce memory usage; when using ADAM you will probably need to play with other parameters to get good results, especially the style weight, content weight, and learning rate; you may also want to normalize gradients when using ADAM.
  • learning_rate: Learning rate to use with the ADAM optimizer. Default is 1e1.
  • normalize_gradients: If this flag is present, style and content gradients from each layer will be L1 normalized. Idea from andersbll/neural_artistic_style.

Output options:

  • output_image: Name of the output image. Default is out.png.
  • print_iter: Print progress every print_iter iterations. Set to 0 to disable printing.
  • save_iter: Save the image every save_iter iterations. Set to 0 to disable saving intermediate results.

Layer options:

  • content_layers: Comma-separated list of layer names to use for content reconstruction. Default is relu4_2.
  • style_layers: Comma-separated list of layer names to use for style reconstruction. Default is relu1_1,relu2_1,relu3_1,relu4_1,relu5_1.

Other options:

  • style_scale: Scale at which to extract features from the style image. Default is 1.0.
  • original_colors: If you set this to 1, then the output image will keep the colors of the content image.
  • proto_file: Path to the deploy.txt file for the VGG Caffe model.
  • model_file: Path to the .caffemodel file for the VGG Caffe model. Default is the original VGG-19 model; you can also try the normalized VGG-19 model used in the paper.
  • pooling: The type of pooling layers to use; one of max or avg. Default is max. The VGG-19 models uses max pooling layers, but the paper mentions that replacing these layers with average pooling layers can improve the results. I haven’t been able to get good results using average pooling, but the option is here.
  • backend: nn, cudnn, or clnn. Default is nn. cudnn requires cudnn.torch and may reduce memory usage. clnn requires cltorch and clnn
  • cudnn_autotune: When using the cuDNN backend, pass this flag to use the built-in cuDNN autotuner to select the best convolution algorithms for your architecture. This will make the first iteration a bit slower and can take a bit more memory, but may significantly speed up the cuDNN backend.

运行结果

我对示例中的两幅图片进行了测试,分别使用50次迭代,100次迭代和200次迭代来达到不同的效果。效果图如下:

Content Image:

自己写一个PRISMA

Style Image:

自己写一个PRISMA

50次迭代:

自己写一个PRISMA

100次迭代:

自己写一个PRISMA

200次迭代:

你可能感兴趣的:(图像处理与计算机视觉)