Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization


适用 随机风格变换AST 和 领域泛化DG 的 精确特征分布匹配算法


Arbitrary style transfer (AST) and domain generalization (DG) are important yet challenging visual learning tasks, which can be cast as a feature distribution matching problem.

ast和dg, 可以映射为特征分布匹配问题

With the assumption of Gaussian feature distribution, conventional feature distribution matching methods usually match the mean and standard deviation of features.

基于高斯特征分布假设, 卷积特征分布匹配方法通常和特征的均值与标准差做匹配

In this work, we, for the first time to our best knowledge, propose to perform Exact Feature Distribution Matching (EFDM) by exactly matching the empirical Cumulative Distribution Functions (eCDFs) of image features, which could be implemented by applying the Exact Histogram Matching (EHM) in the image feature space.


Particularly, a fast EHM algorithm, named Sort-Matching, is employed to perform EFDM in a plug-and-play manner with minimal cost.


代码:https://github.com/YBZh/EFDM ;



in arbitrary style transfer (AST) [12, 21], image styles can be interpreted as feature distributions and style transfer can be achieved by cross-distribution feature matching [25, 34].


by using style transfer techniques to augment training data, one can address the domain generalization (DG) tasks [13, 72]


The most popular method of feature distribution matching is to match feature mean and standard deviation by assuming that features follow Gaussian distribution [21,32,37,41,72].


feature distribution matching by using only mean and standard deviation is less accurate.


Motivated by the Glivenko–Cantelli theorem [54], which states that the empirical Cumulative Distribution Function (eCDF) asymptotically converges to the Cumulative Distribution Function when the number of samples approaches infinity, Risser et al. [46] introduce the classical Histogram Matching (HM) [16, 58] method as an auxiliary measurement to minimize the feature distribution divergence.


此外,还有个 经典直方图匹配方法HM作为最小化特征分布方差的辅助计算方法

For features generated by deep models, equivalent feature values are also ineluctable due to their dependency on discrete image pixels and the use of activation functions, e.g., ReLU [42] and ReLU6 [26] (please refer to Fig. 3 for more details). All these facts impede the effectiveness of EFDM via HM.


Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization_第1张图片

Figure 1. Histograms of feature values in a randomly selected channel, where features are computed from the first residual block of a ResNet-18 [20] trained on the dataset of four domains [28]. We first normalize the mean and standard deviation of each channel to be 0 and 1, respectively, and then collect feature values among all test samples in each domain for visualization. One can clearly see that the feature distributions of real-world data are usually too complicated to be modeled by Gaussian.


Following [72], we extend EFDM to generate feature augmentations with mixed styles, leading to the Exact Feature Distribution Mixing (EFDMix) (cf. Eq. (10)), which can provide more diverse feature augmentations for DG applications.


related work

Arbitrary style transfer (AST)

两个概念方向:基于迭代优化的方法iterative optimization-based methods;前馈方法feed-forward methods

Our method belongs to the latter one, which is generally faster and suitable for real-time applications.


Based on the Gaussian prior assumption, feature distribution matching is conducted by matching mean and standard deviation in AdaIN

基于 高斯先验假设 的 adaptive instance normalization 自适应实例标准化的特征分布匹配由匹配均值和标准差实现

Compared to AdaIN, WCT [33] additionally considers the covariance of feature channels via a pair of feature transforms, whitening and coloring.


To this end, we, for the first time to our best knowledge, propose an accurate and efficient way for EFDM by exactly matching the eCDFs of image features, leading to more faithful AST results (please refer to Fig. 5 for visual examples).


Domain generalization (DG)

Typical DG methods include learning domain-invariant feature representations [5, 15, 31, 40, 65–67], meta-learning-based learning strategies [4, 9, 29], data augmentation [13, 43, 56, 61, 71, 72] and so on [57, 69].


  1. 学习域不变特征表征

  2. 基于元学习的学习策略

  3. 数据增强

  4. 等等

Among all above methods, the recent state-of-the-art [72] augments cross-distribution features based on the feature distribution matching technique [21], which is introduced in the above AST part.


By utilizing high-order statistics implicitly via the proposed EFDM method, more diverse feature augmentations can be achieved and significant performance improvements have been observed (please refer to Tabs. 1 and 2 for details).


Exact histogram matching (EHM)

Compared to classical HM, EHM algorithms distinguish equivalent pixel values either randomly [47, 48] or according to their local mean [7, 18], leading to more accurate matching of histograms.


The difference between outputs of EHM and HM in image pixel space is typically small, which is hardly perceptible to human eyes.


However, this small difference can be amplified in the feature space of deep models, leading to clear divergence in feature distribution matching. We hence propose to perform EFDM by exactly matching the eCDFs of image features via EHM. While EHM can be conducted with different strategies, we empirically find that they yield similar results in our applications, and thus we promote the fast Sort-Matching [47] algorithm for EHM.



Adaptive instance normalization (AdaIN)


Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization_第2张图片

By assuming that X and Y follow Gaussian distributions and n and m approach infinity, AdaIN can achieve EFDM by matching feature mean and standard deviation [32, 37, 41].


However, feature distributions of real-world data usually deviate much from Gaussian, as can be seen from Fig. 1. Therefore, matching feature distributions by AdaIN is less accurate.


Histogram matching (HM)


Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization_第3张图片

It is worth mentioning that matching eCDFs is equivalent to matching histograms with bins of infinitesimal width, which is however hard to achieve due to the finite number of bits to represent features.


Ideally, HM could exactly match eCDFs of image features in the continuous case.


Unfortunately, HM can only approximately match eCDFs when there exist equivalent feature values in inputs, since HM merges equivalent values as a single point and applies a point-wise transformation


Exact Histogram Matching (EHM)

Different from HM, EHM algorithms distinguish equivalent pixel values and apply an element-wise transformation so that a more accurate histogram matching can be achieved.


While EHM can be conducted with different strategies, we adopt the Sort-Matching algorithm [47] for its fast speed. SortMatching is based on the quicksort strategy [49], which is generally accepted as the fastest sort algorithm with complexity of O(n log n). As stated by its name, Sort-Matching is implemented by matching two sorted vectors, whose indexes are illustrated in a one-line notation [2] as:

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization_第4张图片


Compared to AdaIN, HM and other EHM algorithms [7, 18], Sort-Matching additionally assumes that the two vectors to be matched are of the same size, i.e. m = n, which is satisfied in our focused applications of AST and DG. In other applications where the two vectors are of different sizes, interpolation or dropping elements can be conducted to make y and x the same size.


EFDM for AST and DG

Exact Feature Distribution Matching for Arbitrary Style Transfer and Domain Generalization_第5张图片


A simple encoder-decoder architecture is adopted


we fix the encoder f as the first few layers (up to relu4 1) of a pre-trained VGG-19


Given the content images X and style images Y , we first encode them to the feature space and apply EFDM to get the styletransferred features as



Then, we train a randomly initialized decoder g to map S to the image space, resulting in the stylized images g(S).


Following [10, 21], we train the decoder with the weighted combination of a content loss Lc and a style loss Ls, leading to the following objective:




Inspired by the studies that style information can be represented by the mean and standard deviation of image features [21, 33, 37], Zhou et al. [72] proposed to generate style-transferred and content-preserved feature augmentations for DG problems.


As we discussed before, distributions beyond Gaussian have high-order statistics other than mean and standard deviation, and hence the style information can be more accurately represented by using high-order feature statistics.




The advantage of utilizing high-order feature statistics could be intuitively clarified by the augmentation diversity.



Experiments on DG

Generalization on category classification.

We adopt the popular DG benchmark dataset of PACS

Generalization on instance retrieval.

adopt the person re-identification (re-ID) datasets
