Vincent Dumoulin, Jonathon Shlens, Manjunath Kudlur
ICLR 2017
construct a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings by reducing a painting to a point in an embedding space
pastiche: an artistic work that imitates the style of another one
automate pastiche/style transfer: render an image in the style of another one
traditional methods: “grow” textures one pixel at a time using non-parametric sampling of pixels in an examplar image → → “growing” textures one patch at a time → → …
machine learning methods: neural style(expensive) → → feedforward style transfer network (the style transfer network is tied to a single style)
solution: conditional instance normalization(reduces each style image into a point in an embedding space)
style transfer: finding a pastiche image p p whose content is similar to that of a content image c c but whose style is similar to that of a style image s s (high-level features in classifiers tend to correspond to higher levels of abstractions for visualizations)
content similarity: distance between high-level features extracted by a trained classifier
style similarity: distance between Gram matrices G G of low-level features as extracted by a trained classifier (the artistic style of a painting may be interpreted as a visual texture)
neural style:
feed-forward method: style transfer network T:c↦p T : c ↦ p
the network T is tied to one specific painting style
intuition: many styles probably share some degree of computation
train a single conditional style transfer network T(c,s) T ( c , s ) for N N styles
to model a style, it is sufficient to specialize scaling and shifting parameters after normalization to each specific style
all convolutional weights of a style transfer network can be shared across many styles
it is sufficient to tune parameters for an affine transformation after normalization for each style
conditional instance normalization: transform a layer’s activations x x into a normalized activation z z specific to painting style s s
the same network architecture as in “Perceptual losses for real-time style transfer and super-resolution”
train the N N -style network with stochastic gradient descent using the Adam optimizer
in the case of art stylization when posed as a feedforward network, it could be that the specific network architecture is unable to take full advantage of its capacity: pruning the architecture leads to qualitatively similar results;
the convolutional weights of the style transfer network encode transformations that represent “elements of style”