Phoenixtree_DongZhao

ICCV2021 频域图像翻译 Frequency Domain Image Translation: More Photo-realistic, Better Identity-preserving

Frequency Domain Image Translation:

More Photo-realistic, Better Identity-preserving

[pdf] [github]

Figure 1: Image translation results of the Flicker mountains dataset. From left column to right: we show the source images, reference images, the generated images using Swapping Autoencoder [48] and FDIT (ours), respectively. SwapAE over-adapt to the reference image. FDIT better preserves the composition and identity with respect to the source image.

Abstract

1. Introduction

3. Frequency Domain Image Translation

3.1. Pixel Space Loss

3.2. Fourier Frequency Space Loss

3.3. Overall Loss

4. Experiments

4.1. Autoencoder

4.2. Ablation Study

4.3. GAN Inversion

4.4. StarGAN v2

Abstract

Image-to-image translation has been revolutionized with GAN-based methods. However, existing methods lack the ability to preserve the identity of the source domain. As a result, synthesized images can often over-adapt to the reference domain, losing important structural characteristics and suffering from suboptimal visual quality.

To solve these challenges, we propose a novel frequency domain image translation (FDIT) framework, exploiting frequency information for enhancing the image generation process. Our key idea is to decompose the image into low-frequency and high-frequency components, where the high-frequency feature captures object structure akin to the identity. Our training objective facilitates the preservation of frequency information in both pixel space and Fourier spectral space.

We broadly evaluate FDIT across five large-scale datasets and multiple tasks including image translation and GAN inversion. Extensive experiments and ablations show that FDIT effectively preserves the identity of the source image, and produces photo-realistic images. FDIT establishes stateof-the-art performance, reducing the average FID score by 5.6% compared to the previous best method.

研究方向与问题

基于 GAN 的方法使图像到图像的翻译发生了革命性的变化（点明研究对象：基于 GAN 的图像翻译）。然而，现有的方法缺乏保留源域标识的能力（问题原因）。因此，合成的图像往往会过度适应参考域，失去重要的结构特征，导致视觉质量不佳（问题现象）。

研究方法和优点

为了解决这些挑战，本文提出了一种新的频域图像翻译 (FDIT) 框架，利用频率信息增强图像生成过程（研究方法）。本文的主要想法是将图像分解为低频和高频成分，其中高频特征捕获类似于 identity 的对象结构（核心思想）。本文的训练目标有利于在像素空间和傅里叶频谱空间中保持频率信息（方法优/特点）。

研究结论和成就

本文在 5 个大规模数据集和多个任务中广泛评估 FDIT，包括图像翻译和 GAN 反演。大量的实验和消融实验表明，FDIT 能有效地保持了源图像 identity，并产生了逼真的图像（方法能力的结论）。FDIT 建立了最先进的性能，与之前的最佳方法相比，平均 FID 评分降低了 5.6%（方法成就的结论）。

1. Introduction

Image-to-image translation [70, 9, 4, 59, 56] has attracted great research attention in computer vision, which is tasked to synthesize new images based on the source and reference images (see Figure 1). This task has been revolutionized since the introduction of GAN-based methods [30, 69]. In particular, a plethora of literature attempts to decompose the image representation into a content space and a style space [11, 48, 40, 27]. To translate a source image, its content representation is combined with a different style representation from the reference domain.

图像到图像翻译在计算机视觉领域引起了极大的研究关注，该领域的任务是基于源和参考图像合成新的图像 (见图 1)。自基于 GAN 的方法引入以来，这一任务发生了革命性进展。特别是，大量的文献试图将图像表征分解为内容空间和风格空间。要转换源图像，将其内容表示与来自参考域的不同样式表示相结合。

Despite exciting progress, existing solutions suffer from two notable challenges.

First, there is no explicit mechanism that allows preserving the identity, and as a result, the synthesized image can over-adapt to the reference domain and lose the original identity characteristics. This can be observed in Figure 1, where Swapping Autoencoder [48] generates images with identity and structure closer to the reference rather than the source image. For example, in the second row, the tree is absent from the source image yet occurs in the translation result.

Second, the generation process may lose important fine-grained details, leading to suboptimal visual quality. This can be prohibitive for generating photo-realistic high-resolution images.

The challenges above raise the following important question: how can we enable photo-realistic image translation while better preserving the identity?

尽管取得了令人欣慰的进展，但现有的解决方案面临着两个显著的挑战。

首先，没有明确的机制来保留 identity（原因），导致合成图像过度适应参考域而失去原有的身份特征（现象）。这可以在图 1 中观察到，其中交换自编码器（Swapping Autoencoder [48] ）生成的图像标识和结构更接近参考图像而不是源图像。例如，在第二行中，源映像中没有树，但在转换结果中出现。

其次，生成过程可能会失去重要的细粒度细节（原因），导致不理想的视觉质量（现象）。这可能不利于生成逼真的高分辨率图像。

上述挑战提出了以下重要问题：如何在更好地保存 identity 的同时实现逼真的图像翻译?

【好的论文，首先是要提出好的问题。本段提出了两个非常合理的问题，读者一读就明白这些问题，并迫不及待的想要了解作者是怎么解决这些问题的。】

Motivated by this, we propose a novel framework–Frequency Domain Image Translation (FDIT)–exploiting frequency information for enhancing the image generation process.

Our key idea is to decompose the image into lowand high-frequency components, and regulate the frequency consistency during image translation.

Our framework is inspired by and grounded in signal processing [15, 5, 22]. Intuitively, the low-frequency component captures information such as color and illumination; whereas the highfrequency component corresponds to sharp edges and important details of objects. For example, Figure 2 shows the resulting images via adopting the Gaussian blur to decompose the original image into low- vs. high-frequency counterparts (top vs. bottom). The building identity is distinguishable based on the high-frequency components.

本文方法介绍：整体简介

基于此，我们提出了一种新的框架——频域图像翻译 (FDIT)——利用频率信息来增强图像生成过程。

核心思想是将图像分解为低频和高频分量，并在图像平移过程中调节频率一致性。

本文的框架受到信号处理的启发和基础。低频部分直观地捕捉颜色和照明等信息；而高频分量对应的是物体的尖锐边缘和重要细节。例如，图 2 显示了采用高斯模糊将原始图像分解为低、高频(上、下) 对应的结果图像。基于高频成分，可以使得构建 identity 是可解耦的。

Formally, FDIT introduces novel frequency-based training objectives, which facilitates the preservation of frequency information during training. The frequency information can be reflected in the visual space as identity characteristics and important fine details.

Formally, we impose restrictions in both pixel space as well as the Fourier spectral space.

In the pixel space, we transform each image into its high-frequency and low-frequency components by applying the Gaussian kernel (i.e., low-frequency filter). A loss term regulates the high-frequency components to be similar between the source image and the generated image.

Furthermore, FDIT directly regulates the consistency in the frequency domain by applying Fast Fourier Transformation (FFT) to each image. This additionally ensures that the original and translated images share a similar highfrequency spectrum.

本文方法介绍：关键描述

在形式上，FDIT 引入了新的基于频率的训练目标，这有助于在训练期间保存频率信息。频率信息可以作为 identity 特征和重要的精细细节体现在视觉空间中。

形式上，本文在像素空间和傅里叶频谱空间都施加了限制。

在像素空间中，通过高斯核 (即低频滤波器) 将每幅图像转换为其高频和低频分量。损耗项调节源图像和生成图像之间的高频分量相似。

此外，FDIT 通过对图像进行快速傅里叶变换 (FFT)，直接在频域上调节一致性。这还确保了原始图像和翻译图像共享相似的高频频谱。

Extensive experiments demonstrate that FDIT is highly effective, establishing state-of-the-art performance on image translation tasks. Below we summarize our key results and contributions:

• We propose a novel frequency-based image translation framework, FDIT, which substantially improves the identity-preserving generation, while enhancing the image hybrids realism. FDIT outperforms competitive baselines by a large margin, across all datasets considered. Compared to the vanilla Swapping Autoencoder (SwapAE) [48], FDIT decreases the FID score by 5.6%.

• We conduct extensive ablations and user study to evaluate the (1) identity-preserving capability and (2) image quality, where FDIT constantly surpasses previous methods. For example, user study shows an average preference of 75.40% and 64.39% for FDIT over Swap AE in the above two aspects. We also conduct the ablation study to understand the efficacy of different loss terms and frequency supervision modules.

• We broadly evaluate our approach across five largescale datasets (including two newly collected ones). Quantitative and qualitative evaluations on image translation and GAN-inversion tasks demonstrate the superiority of our method.

大量的实验证明 FDIT 是非常有效的，在图像翻译任务中建立了 SOTA 的性能。

• 提出了一种新的基于频率的图像翻译框架 FDIT，该框架极大地改进了保持 identity 的生成，同时增强了图像的混合真实感。在考虑的所有数据集上，FDIT 的性能都大大优于竞争性基线。与交换自动编码器 (SwapAE)[48] 相比，FDIT 降低了 5.6% 的 FID 得分。

• 进行了广泛的消融和用户研究，以评估 (1) identity 保持能力和 (2) 图像质量，其中 FDIT 不断超越以前的方法。例如，用户研究显示，FDIT 在上述两个方面的平均偏好分别为7 5.40% 和 64.39%。还进行了消融研究，以了解不同损失条件和频率监控模块的有效性。

• 在 5 个大型数据集 (包括两个新收集的数据集) 上对本文的方法进行了广泛评估。对图像平移和GAN 反演任务的定量和定性评价证明了本文方法的优越性。

3. Frequency Domain Image Translation

Our novel frequency-based image translation framework is illustrated in Figure 3. In what follows, we first provide an overview and then describe the training objective. Our training objective facilitates the preservation of frequency information during the image translation process. Specifically, we impose restrictions in both pixel space (Section 3.1) as well as the Fourier spectral space (Section 3.2).

基于频率的图像翻译框架如图 3 所示。在接下来的内容中，首先提供一个概述，然后描述训练目标。本文的训练目标是在图像翻译过程中保持频率信息。具体来说，在像素空间 (第3.1节) 和傅里叶频谱空间 (第3.2节) 都施加了限制。

3.1. Pixel Space Loss

High- and low-frequency images

We transform each input x into two images xL ∈ X and xH ∈ X , which correspond to the low-frequency and high-frequency images respectively. Note that both xL and xH are in the same spatial dimension as x. Specifically, we employ the Gaussian kernel, which filters the high frequency feature and keeps the low frequency information:

where [i, j] denotes the spatial location within the image, and σ^2 denotes the variance of the Gaussian function. Following [22], the variance is increased proportionally with the Gaussian kernel size . Using convolution of the Gaussian kernel on input x, we obtain the low frequency (blurred) image xL:

where m, n denotes the index of an 2D Gaussian kernel, i.e.,

To obtain xH, we first convert color images into grayscale, and then subtract the low frequency information:

where the rgb2gray function converts the color image to the grayscale. This removes the color and illumination information that is unrelated to the identity and structure. The resulting high frequency image xH contains the sharp edges, i.e. sketch of the original image.

获得高、低频图像

将每个输入 x 变换为两幅图像 xL∈x 和 xH∈x，分别对应低频图像和高频图像。注意 xL和 xH 与 x 在同一个空间维数。

低频：具体来说，采用高斯核如公式（1），滤波高频特征，保留低频信息。

其中 [i, j] 为图像内的空间位置，σ^2 为高斯函数的方差。方差随高斯核大小成比例增加。利用高斯核对输入 x 的卷积公式（2），得到低频 (模糊) 图像 xL。其中 m, n 为二维高斯核的指标，即

高频：为了得到 xH，首先将彩色图像转换为灰度，然后减去低频信息，如公式（3）。

其中 rgb2gray 函数将彩色图像转换为灰度。这就去掉了与 identity 和结构无关的颜色和照明信息。得到的高频图像 xH 包含了尖锐的边缘，即原始图像的草图。

Reconstruction loss in the pixel space

We now employ the following reconstruction loss term, which enforces the similarity between the input and generator’s output, for both low-frequency and high-frequency components:

像素空间重构损失

对低频和高频分量使用（4）的重构损耗项，这加强了输入和输出之间的相似性。

Translation matching loss in the pixel space

In addition to reconstruction loss, we also employ the translation matching loss:

where $z^{source }_c$ and $z^{ref}_s$ are the content code of the source image and the style code of the reference image, respectively. Intuitively, the translated images should adhere to the identity of the original image. We achieve this by regulating the high frequency components, and enforce the generated image to have the same high frequency images as the original source image.

像素空间中的平移匹配损失

除了重构损失外，本文还采用了（5）平移匹配损失。

其中， $z^{source }_c$ and $z^{ref}_s$ 分别是源图像的内容代码和参考图像的样式代码。直觉上，翻译后的图像应该坚持原图像的 identity。本文通过调节高频分量来实现这一点，并强制生成的图像具有与原始源图像相同的高频图像。

3.2. Fourier Frequency Space Loss

Transformation from pixel space to the Fourier spectral space

In addition to the pixel-space constraints, we introduce loss terms that directly operate in the Fourier domain space. In particular, we use Fast Fourier Transformation (FFT) and map x from the pixel space to the Fourier spectral space. We apply the Discrete Fourier Transform F on a real 2D image I of size H × W:

For the ease of post processing, we then transform F from the complex number domain to the real number domain. Additionally, we take the logarithm to stabilize the training:

where $\epsilon=1\times 10^{-8}$ is a term added for numerical stability; Re and Im denote the real part and the imaginary part of F(I)(a, b) respectively. Each point in the Fourier spectrum would utilize information from all pixels according to the discrete spatial frequency, which would represent the frequency features in the global level.

从像素空间到傅里叶频谱空间的变换

除了像素空间约束之外，本文还引入了直接作用于傅里叶域空间的损失项。

使用快速傅里叶变换 (FFT)，将 x 从像素空间映射到傅里叶频谱空间。将离散傅里叶变换 F 应用于尺寸为 H × W 的真实二维图像 I，即公式（6）。

为了便于后期处理，将 F 从复数域转换为实数域。另外，本文使用对数来稳定训练，即公式（7）。其中， $\epsilon=1\times 10^{-8}$ 为数值稳定性增加的项; Re 和 Im 分别表示 F(I)(a, b) 的实部和虚部。傅里叶频谱中的每个点将根据离散的空间频率利用来自所有像素的信息，这将代表全局水平的频率特征。

Reconstruction loss in the Fourier space

We then regulate the reconstruction loss in the frequency spectrum:

傅里叶空间的重构损失

然后调节频谱中的重构损耗为（8）。

Translation matching loss in the Fourier space

In a similar spirit as Equation 5, we devise a translation matching loss in the Fourier frequency domain:

where $F ^R _H(x) = F^R(rgb2gray(x))\cdot M_H$ . M_H is the frequency mask, for which we provided detailed explanation below. The loss constrains the high frequency components of the generated images for better identity preserving.

傅里叶空间中的平移匹配损失

与式 5 类似，傅里叶频域设计一个平移匹配损失（9）。

式中 $F ^R _H(x) = F^R(rgb2gray(x))\cdot M_H$ 。M_H 是频率 mask，在下面详细说明。

损失约束了生成图像的高频成分，以更好地保持身份。

Frequency mask

As illustrated in Figure 3, the low-frequency mask is a circle with radius r, whereas the high-frequency mask is the complement region. The frequency masks M_H and M_L can be estimated empirically from the distribution of F^R on the entire training dataset. We choose the radius to be 21 for images with resolution 256×256. The energy within the low-frequency mask accounts for 97.8% of the total energy in the spectrum.

如图 3 所示，低频 mask 是一个半径为 r 的圆，而高频 mask 是其补区域。频率掩模 M_H 和 M_L 可以从 F^R在整个训练数据集上的分布进行经验估计。对于分辨率为 256×256 的图像，本文选择半径为 21。低频掩模内的能量占频谱总能量的 97.8%。

3.3. Overall Loss

Considering all the aforementioned losses, the overall loss is formalized as:

where Lorg is the orginal loss function of any image translation model. For simplicity, we use λ1 = λ2 = λ3 = λ4 = 1 in this paper.

综合上述所有损失，整体损失的形式为 (10)。

其中，L_org 为任意图像平移模型的原始损失函数。为简便起见，本文采用 λ1 = λ2 = λ3 = λ4 = 1。

Gaussian kernel vs. FFT

Gaussian kernel and FFT are complementary for preserving the frequency information.

On one hand, the Gaussian kernel extracts the frequency information via the convolution, therefore representing the frequency features in a local manner.

On the other hand, Fast Fourier Transformation utilizes the information from all pixels to obtain the FFT value for each spatial frequency, characterizing the frequency distribution globally.

Gaussian kernel and FFT are therefore complementary in preserving the frequency information. We show ablation study on this in Section 4.2, where both are effective in enhancing the identity-preserving capability for image translation tasks.

高斯核和 FFT 是互补的，以保持频率信息。

一方面，高斯核通过卷积提取频率信息，以局部方式表示频率特征；

另一方面，快速傅里叶变换利用来自所有像素的信息，得到每个空间频率的 FFT 值，表征全局频率分布。

因此，高斯核和 FFT 在保留频率信息方面是互补的。本文将在 4.2 节对此进行消融研究，这两种方法都能有效地增强图像翻译任务的 identity 保持能力。

Gaussian kernel size

When transforming the images in Figure 2 into the spectrum space, the effects of the Gaussian kernel size could be clearly reflected in Figure 4. To be specific, a large kernel would cause severe distortion on the low-frequency band while a small kernel would not preserve much of the high-frequency information. In this work, we choose the kernel size k = 21 for images with resolution 256×256, which could appropriately separate the high/lowfrequency information, demonstrated in both image space and spectral space distribution. Our experiments also show that FDIT is not sensitive to the selection of k as long as it falls into a mild range.

将图 2 中的图像转换到谱空间时，高斯核大小的影响可以在图 4 中清晰地体现出来。具体来说，大的核会在低频带上造成严重的失真，而小的核不会保留很多高频信息。在本工作中，对于分辨率为 256×256 的图像，我们选择的核大小为 k = 21，可以适当地分离高/低频信息，在图像空间和频谱空间分布上都表现出来。实验也表明，FDIT 对 k 的选择并不敏感，只要它在一个温和的范围内。

Figure 4: Transforming the resulting high- and low-frequency images in Figure 2 into the frequency power spectrum. The Gaussian kernel with kernel size k = 21 could avoid the distortion in high-frequency and low-frequency regions. The power spectrum represents the energy distribution at each spatial frequency.

4. Experiments

In this section, we evaluate our proposed method on two state-of-the-art image translation architectures, i.e., Swapping Autoencoder [48], StarGAN v2 [11], and one GAN inversion model, i.e., Image2StyleGAN [1]. Extensive experimental results show that FDIT not only better preserves the identity, but also enhances image quality.

在本节中，在两个最先进的图像转换架构上评估提出的方法，即 Swapping 自编码器[48]，StarGAN v2 [11]，以及一个 GAN 反转模型 Image2StyleGAN [1]。大量的实验结果表明，FDIT不仅能更好地保持图像的一致性，而且能提高图像质量。

[48] Swapping Autoencoder for Deep Image Manipulation NeurIPS 2020

pytorch

[11] StarGAN v2: Diverse Image Synthesis for Multiple Domains CVPR 2020

pytorch

[1] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? ICCV 2019

pytoch

Image2StyleGAN++: How to Edit the Embedded Images?

Datasets

We evaluate FDIT on the following five datasets: (1) LSUN Church [65], (2) CelebA-HQ [34], (3) LSUN Bedroom [65], (4) Flickr Mountains (100k selfcollected images), (5) Flickr Waterfalls (100k self-collected images). (6) Flickr Faces HQ (FFHQ) dataset [35]. All the images are trained and tested at 256 × 256 resolution except FFHQ, which is trained at 512 × 512, and finetuned at 1024 × 1024 resolution. For evaluation, we use a validation set that is separate from the training data.

本实验评估了以下 5 个数据集:

(1) LSUN Church（官网） [65]， (2) CelebA-HQ (知乎链接) [34]， (3) LSUN Bedroom（github） [65]， (4) Flickr Mountains (100k self-collected images)， (5) Flickr Waterfalls (100k self-collected images)。(6) Flickr Faces HQ (FFHQ) 数据集[35]。所有图像都在 256 × 256 分辨率下进行训练和测试，FFHQ 在 512 × 512 分辨率下进行训练，在1024 × 1024分辨率下进行微调。为了进行评估，我们使用一个与训练数据分离的验证集。

[35] A style-based generator architecture for generative adversarial networks. CVPR 2019.

[65] LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop

4.1. Autoencoder

Autoencoder is widely used as the backbone of the deep image translation task [1, 27]. We use state-of-the-art Swapping Autoencoder (SwapAE) [48], which is built on the backbone of StyleGAN2 [36]. Swap AE also uses the technique in PatchGAN [31] to further improve the texture transferring performance. We incorporate our proposed FDIT training objectives into the vanilla SwapAE.

自编码器作为深度图像翻译任务的 backbone 被广泛使用 [1,27]。本文使用最先进的 Swapping 自编码器 (SwapAE) [48]，它构建在 StyleGAN2 [36] 的 backbone 上。Swap AE 也使用 PatchGAN [31] 中的技术来进一步提高纹理传输性能。本文将提出的 FDIT 训练目标纳入 vanilla SwapAE。

4.1.1 Reference-guided Image Synthesis

FDIT better preserves the identity with respect to the source image

We contrast the image translation performance using FDIT vs. vanilla SwapAE in Figure 1 and Figure 5. The vanilla SwapAE is unable to preserve the important identity of the source images, and over-adapts to the reference image. For example, the face identity is completely switched after translation, as seen in rows 4 of Figure 5. SwapAE also fails to preserve the outline and the local sharp edges in the source image. As shown in Figure 1, the outlines of the mountains are severely distorted. Besides, the overall image composition has a large shift from the original source image.

In contrast, using our method FDIT, the identity and structure of the swapped hybrid images are highly preserved. As shown in Figure 1 and Figure 5, the overall sketches and local fine details are well preserved while the coloring, illumination, and even the weather are well transferred from the reference image (top rows of Figure 1).

Lastly, we compare FDIT with the state-of-the-art image stylization method STROTSS [38] and WCT2 [63]. Image stylization is a strong baseline as it emphasizes on the strict adherence to the source image. However, as shown in Figure 5, WCT2 leads to poor transferability in image generation tasks. Despite strong identity-preservation, STROTSS and WCT2 are less flexible, and generate images that highly resemble the source image. In contrast, FDIT can both preserve the identity of the source image as well as maintain a high transfer capability. This further demonstrates the superiority of FDIT in image translation.

FDIT 较好地保留了相对于源图像的 identity

在图 1 和图 5 中，对比了使用 FDIT 和 SwapAE 的图像转换性能。SwapAE 无法保持源图像的重要 identity，对参考图像过度适应。例如，翻译后的人脸身份完全切换，如图 5 的第 4 行所示。SwapAE 也不能保持源图像中的轮廓和局部锐利边缘。如图 1 所示，山脉的轮廓被严重扭曲。此外，整体的图像构成与原始源图像有较大的偏移。

相比之下，采用 FDIT 方法，交换后的混合图像的 identity 和结构得到了很好的保留。如图 1 和图 5 所示，整体草图和局部精细细节得到了很好的保存，而色彩、照明甚至天气都很好地从参考图像 (图1顶部的行) 转移。

最后，比较了 FDIT 与最先进的图像程式化方法 STROTSS [38] 和 WCT2 [63]。图像风格化是一个强大的 baseline，因为它强调严格遵守源图像。然而，如图 5 所示，WCT2 导致图像生成任务的可移植性较差。尽管 STROTSS 和 WCT2 具有很强的 identity 保存功能，但它们的灵活性较差，并且生成的图像与源图像高度相似。相比之下，FDIT 既能保持源图像的 identity，又能保持较高的传输能力。这进一步证明了 FDIT 在图像翻译中的优越性。

FDIT enhances the image generation quality

We show in Table 1 that FDIT can substantially improve the image quality while preserving the image content. We adopt the Frechet Inception Distance (FID) [23] as the measure of image quality. Small values indicate better image quality. Details about Im2StyleGAN [1] and StyleGAN2 [1] are shown in the supplementary material. FDIT achieves the lowest FID across all datasets. On average, FDIT could reduce the FID score by 5.6% compared to the current state-of-the-art method.

FDIT 提高了图像生成的质量

从表 1 中可以看出，FDIT 可以在保留图像内容的同时显著提高图像质量。本文采用 Frechet Inception Distance (FID) [23] 作为图像质量的度量。数值越小，图像质量越好。Im2StyleGAN [1]和 StyleGAN2 的详细信息在补充材料中显示。FDIT 在所有数据集中实现了最低的 FID。与目前最先进的方法相比，FDIT 平均可以降低 5.6% 的 FID 评分。

4.1.2 Image Attributes Editing

FDIT enables continuous interpolation between different domains

We show that FDIT enables image attribute editing task, which creates a series of smoothly changing images between two sets of distinct images [48, 51]. Vector arithmetic is one commonly used way to achieve this [51]. For example, we can sample n images from each of the two target domains, and then compute the average difference of the vectors between these two sets of images:

where , $z^{d2}$ denote the latent code from two domains.

We perform interpolation on the style code while keeping the content code unchanged. The generated images can be formalized as $x_{gen} = G(z^{source }, z^{ref }+ \theta \cdot \hat{z})$ , where θ is the interpolation parameter. We show results on CelebAHQ dataset in Supplementary material. FDIT performs image editing towards the target domain while strictly adhering to the content of the source image. Compared to the vanilla Swapping Autoencoder and StarGAN v2, our results demonstrate the better disentanglement ability of unique image attributes and identity characteristics. We also verify the disentangled semantic latent vectors using Principal Component Analysis (PCA). The implementation details and the identity-preserving results are shown in the supplementary materials.

FDIT 允许不同域之间的连续插值

FDIT 可以实现图像属性编辑任务，在两组不同的图像之间创建一系列平滑变化的图像。向量算法是实现 [51] 的一种常用方法。例如，我们可以从两个目标域中分别抽取 n 幅图像，然后计算两组图像之间向量的平均差值，如公式（11）。

其中 , $z^{d2}$ 为两个域的 latent code 。

在保持内容代码不变的同时对样式代码执行插值。生成的图像可以形式化为 $x_{gen} = G(z^{source }, z^{ref }+ \theta \cdot \hat{z})$ ，其中 θ 为插值参数。

在补充材料中显示了 CelebAHQ 数据集的结果。

FDIT 对目标域执行图像编辑，同时严格遵守源图像的内容。与 vanilla swap Autoencoder 和 StarGAN v2 相比，FDIT 的结果显示了更好的图像属性和 identity 特征的解耦能力。

本文还利用主成分分析 (PCA) 验证了解耦的语义潜在向量。实现细节和身份保持结果在补充材料中显示。

4.2. Ablation Study

Pixel and Fourier space losses are complementary

To better understand our method, we isolate the effect of pixel space loss and Fourier spectral space loss. The results on the LSUN Church dataset are summarized in Table 2. The vanilla SwapAE is equivalent to having neither loss terms, which yields the FID score of 52.34. Using pixel space frequency loss reduces the FID score to 49.47. Our method is most effective when combining both pixel-space and Fourier-space loss terms, achieving the FID score of 48.21. Our ablation signifies the importance of using frequencybased training objectives.

消融实验：像素和傅里叶空间损失是互补的

为了更好地理解本文的方法，本文分离了像素空间损失和傅里叶频谱空间损失的影响。LSUN Church 数据集上的结果如表 2 所示。SwapAE 相当于没有损失项，其 FID 得分为 52.34。使用像素空间频率损失将 FID 评分降低到 49.47。当结合像素空间和傅里叶空间的损失项时，FDIT 方法是最有效的，达到 48.21 分。消融表明了使用基于频率的训练目标的重要性。

4.3. GAN Inversion

FDIT improves reconstruction quality in GAN inversion

We evaluate the efficacy of FDIT on the GAN inversion task, which maps the real images into the noise latent vectors. In particular, Image2StyleGAN[1] serves as a strong baseline, which performs reconstruction between the real image and the generated images via iterative optimization over the latent vector.

We adopt the same architecture, however impose our frequency-based reconstruction loss. The inversion results are shown in Figure 6. On high-resolution (1024 × 1024) images, the quality of the inverted images is improved across all scenes. FDIT better preserves the overall structure, fine details, and color distribution. We further measure the performance quantitatively, summarizing the results in Table 3. Under different metrics (MSE, MAE, PSNR, SSIM), our method FDIT outperforms Image2StyleGAN.

FDIT 提高了 GAN 反演的重建质量

本文评价了 FDIT 在 GAN 反演任务中的有效性，该任务将真实图像映射到噪声潜在向量中。特别是 Image2StyleGAN[1] 作为一个强 baseline，通过对潜在向量的迭代优化，在真实图像和生成图像之间进行重构。

本文采用相同的架构，但是会造成基于频率的重构损失。反演结果如图 6 所示。在高分辨率 (1024 × 1024) 图像上，所有场景的倒立图像质量都得到了改善。FDIT 更好地保留了整体结构、细节和颜色分布。

进一步定量地度量性能，总结表 3 中的结果。在不同的度量 (MSE, MAE, PSNR, SSIM) 下，FDIT 方法的性能优于 Image2StyleGAN 方法。

4.4. StarGAN v2

StarGAN v2 is another state-of-the-art image translation model which can generate image hybrids guided by either reference images or latent noises. Similar to the autoencoder-based network, we can optimize the StarGAN v2 framework with our frequency-based losses. In order to validate FDIT in a stricter condition, we construct a CelebA-HQ-Smile dataset based on the smiling attribute from CelebA-HQ dataset. The style refers to whether that person smiles, and the content refers to the identity. Several salient observations can be drawn from Figure 7.

First, FDIT can highly preserve the gender identity; whereas the vanilla StarGAN v2 model would change the resulting gender according to the reference image (e.g. first and second row).

Secondly, the image quality of FDIT is better, where FID is improved from 17.32 to 16.86.

Thirdly, our model can change the smiling attribute while maintaining other facial features strictly. For example, as shown in the third row, StarGAN v2 undesirably changes the hairstyle from straight (source) to curly (reference), whereas FDIT maintains the same hairstyle.

StarGAN v2 是另一种先进的图像转换模型，它可以在参考图像或潜在噪声的引导下生成图像混合。与基于自动编码器的网络类似，可以用基于频率的损失优化 StarGAN v2 框架。为了在更严格的条件下验证 FDIT，基于 CelebA-HQ 数据集的 smile 属性构建了一个 CelebA-HQ - Smile 数据集。风格是指那个人是否微笑，内容是指 identity。从图 7 可以得出几个显著的观察结果。

第一，FDIT 可以高度保护性别认同；而 StarGAN v2 模型会根据参考图像 (例如第一行和第二行) 改变结果性别。

其次，FDIT 的图像质量较好，FID 从 17.32 提高到 16.86。

第三，FDIT 模型可以在严格保持其他面部特征的同时改变微笑属性。例如，如第三行所示，StarGAN v2 不希望将发型从直 (来源) 更改为卷曲 (参考)，而 FDIT 保持相同的发型。

你可能感兴趣的:(Image-to-Image,deep,learning,图像翻译,深度学习,风格迁移)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
基于社交网络算法优化的二维最大熵图像分割智能算法研学社（Jack旭）智能优化算法应用图像分割算法 php 开发语言
智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码文章目录智能优化算法应用：基于社交网络优化的二维最大熵图像阈值分割-附代码1.前言2.二维最大熵阈值分割原理3.基于社交网络优化的多阈值分割4.算法结果：5.参考文献：6.Matlab代码摘要：本文介绍基于最大熵的图像分割，并且应用社交网络算法进行阈值寻优。1.前言阅读此文章前，请阅读《图像分割：直方图区域划分及信息统计介绍》htt
使用LLaVa和Ollama实现多模态RAG示例 llzwxh888 python 人工智能开发语言
本文将详细介绍如何使用LLaVa和Ollama实现多模态RAG（检索增强生成），通过提取图像中的结构化数据、生成图像字幕等功能来展示这一技术的强大之处。安装环境首先，您需要安装以下依赖包：!pipinstallllama-index-multi-modal-llms-ollama!pipinstallllama-index-readers-file!pipinstallunstructured!p
番茄西红柿叶子病害分类数据集12882张11类别 futureflsl 数据集分类数据挖掘人工智能
数据集类型：图像分类用，不可用于目标检测无标注文件数据集格式：仅仅包含jpg图片，每个类别文件夹下面存放着对应图片图片数量(jpg文件个数)：12882分类类别数：11类别名称:["Bacterial_Spot_Bacteria","Early_Blight_Fungus","Healthy","Late_Blight_Water_Mold","Leaf_Mold_Fungus","Powdery
简介Shell、zsh、bash zhaosuningsn Shell zsh bash shell linux bash
Shell是Linux和Unix的外壳，类似衣服，负责外界与Linux和Unix内核的交互联系。例如接收终端用户及各种应用程序的命令，把接收的命令翻译成内核能理解的语言，传递给内核，并把内核处理接收的命令的结果返回给外界，即Shell是外界和内核沟通的桥梁或大门。Linux和Unix提供了多种Shell，其中有种bash，当然还有其他好多种。Mac电脑中不但有bash，还有一个zsh，预装的，据说
Shell、Bash、Zsh这都是啥啊小白码上飞 bash linux 开发语言
Zsh和Bash都是我们常用的Shell，那先搞明白啥是shell吧。Shell作为一个单词，他是“壳”的意思，蛋壳坚果壳。之所以叫壳，是为了和计算机的“核”来区分，用它表示“为使用者提供的操作界面”。所以这个命名其实很形象，翻译成中文，直译过来叫“壳层”。个人认为这个叫法很奇怪，意译貌似也没有什么好的词汇来匹配。就还是叫shell吧。维基百科给的定义是：Incomputing,ashellisa
可以赚钱的app，你们都在用哪些？配音新手圈
1.七猫免费小说2.有柿3.番茄小说兼职副业推荐公众号，配音新手圈，声优配音圈，新配音兼职圈，配音就业圈，鼎音副业，有声新手圈，每天更新各种远程工作与在线兼职，职位包括：写手、程序开发、剪辑、设计、翻译、配音、无门槛、插画、翻译、等等。。。每日更新兼职。4.速读免费小说5.得间免费小说6.快手7.快手极速8.抖音火山版（可提0.2，可能我懒赚的慢，但真不推荐）9.拼多多10.淘宝11.点淘12.美
[Python] 数据结构详解及代码 AIAdvocate 算法 python 数据结构链表
今日内容大纲介绍数据结构介绍列表链表1.数据结构和算法简介程序大白话翻译,程序=数据结构+算法数据结构指的是存储,组织数据的方式.算法指的是为了解决实际业务问题而思考思路和方法,就叫:算法.2.算法的5大特性介绍算法具有独立性算法是解决问题的思路和方式,最重要的是思维,而不是语言,其(算法)可以通过多种语言进行演绎.5大特性有输入,需要传入1或者多个参数有输出,需要返回1个或者多个结果有穷性,执行
JavaScript 中，深拷贝（Deep Copy）和浅拷贝（Shallow Copy）跳房子的前端前端面试 javascript 开发语言 ecmascript
在JavaScript中，深拷贝（DeepCopy）和浅拷贝（ShallowCopy）是用于复制对象或数组的两种不同方法。了解它们的区别和应用场景对于避免潜在的bugs和高效地处理数据非常重要。以下是对深拷贝和浅拷贝的详细解释，包括它们的概念、用途、优缺点以及实现方式。1.浅拷贝（ShallowCopy）概念定义：浅拷贝是指创建一个新的对象或数组，其中包含了原对象或数组的基本数据类型的值和对引用数
利用python实现图片格式之间的相互转换难得北窗高卧 python 开发语言
一、概要图片一般有多种格式，常见的图片格式包括：JPEG（.jpg或.jpeg）：一种广泛使用的有损压缩格式，适用于摄影图像和网页上的图片。PNG（.png）：一种无损压缩格式，支持透明度和更好的图像质量，常用于图标、图形和需要透明背景的图片。该图片是4通道的，外加一个透明通道。如截屏GIF（.gif）：一种支持动画和透明度的格式，常用于简单的动画和图标。BMP（.bmp）：一种无损格式，存储图像
遥感影像的切片处理 sand&wich 计算机视觉 python 图像处理
在遥感影像分析中，经常需要将大尺寸的影像切分成小片段，以便于进行详细的分析和处理。这种方法特别适用于机器学习和图像处理任务，如对象检测、图像分类等。以下是如何使用Python和OpenCV库来实现这一过程，同时确保每个影像片段保留正确的地理信息。准备环境首先，确保安装了必要的Python库，包括numpy、opencv-python和xml.etree.ElementTree。这些库将用于图像处理
Python实现下载当前年份的谷歌影像 sand&wich python 开发语言
在GIS项目和地图应用中，获取最新的地理影像数据是非常重要的。本文将介绍如何使用Python代码从Google地图自动下载当前年份的影像数据，并将其保存为高分辨率的TIFF格式文件。这个过程涉及地理坐标转换、多线程下载和图像处理。关键功能该脚本的核心功能包括：坐标转换：支持WGS-84与WebMercator投影之间转换，以及处理中国GCJ-02偏移。自动化下载：多线程下载地图瓦片，提高效率。图像
Python实现TIFF 文件转换为 PNG 和 JPG 格式 sand&wich python 开发语言
在日常的图像处理工作中，可能会遇到需要将TIFF格式的图像转换为其他格式的情况，例如PNG和JPG。下面，本文将介绍如何使用Python和GDAL库实现这一功能。准备工作在开始之前，请确保已经安装了必要的库：GDAL（GeospatialDataAbstractionLibrary）可以使用以下命令安装GDAL：pipinstallgdal代码实现以下是一个将TIFF文件转换为PNG文件的示例代码
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
Python数据分析与可视化 jun778895 python 数据分析开发语言
Python数据分析与可视化是一个涉及数据处理、分析和以图形化方式展示数据的过程，它对于数据科学家、分析师以及任何需要从数据中提取洞察力的专业人员来说至关重要。以下将详细探讨Python在数据分析与可视化方面的应用，包括常用的库、数据处理流程、可视化技巧以及实际应用案例。一、Python数据分析与可视化的重要性数据可视化是将数据以图形或图像的形式表示出来，以便人们能够更直观地理解数据背后的信息和规
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
docker from指令的含义_多个FROM-含义 weixin_39722188 docker from指令的含义
小编典典什么是基本图片？一组文件，加上EXPOSE端口ENTRYPOINT和CMD。您可以添加文件并基于该基础图像构建新图像，Dockerfile并以FROM指令开头：后面提到的图像FROM是新图像的“基础图像”。这是否意味着如果我neo4j/neo4j在FROM指令中声明，则在运行映像时，neo数据库将自动运行并且可在端口7474的容器中使用？仅当您不覆盖CMD和时ENTRYPOINT。但是图像
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
深度 Qlearning：在直播推荐系统中的应用 AGI通用人工智能之禅程序员提升自我硅基计算碳基计算认知计算生物计算深度学习神经网络大数据 AIGC AGI LLM Java Python 架构设计 Agent 程序员实现财富自由
深度Q-learning：在直播推荐系统中的应用关键词：深度Q-learning,强化学习,直播推荐系统,个性化推荐1.背景介绍1.1问题的由来随着互联网技术的飞速发展,直播平台如雨后春笋般涌现。面对海量的直播内容,用户很难快速找到自己感兴趣的内容。因此,个性化推荐系统在直播平台中扮演着越来越重要的角色。1.2研究现状目前,主流的个性化推荐算法包括协同过滤、基于内容的推荐等。这些方法在一定程度上缓
浅评《记忆像铁轨一样长》中的修辞手法后会定无期
《记忆像铁轨一样长》是已逝世的余光中先生在一九八四年创作的一篇散文，后成为其代表作之一。余光中先生作为著名的作家、诗人和翻译家，素有文坛“璀璨五彩笔”、“诗文双绝”和“诗坛最后的守夜人”等美誉。《记忆像铁轨一样长》这篇散文也继承了作者一贯的风格，全文语言优美隽永，结构清晰紧凑，节奏张弛有度，想象天马行空，感情细腻真挚。其中运用了大量的修辞手法，或新颖巧妙，或生动有趣，用词准确灵活，给读者留下了深刻
轻量级模型解读——轻量transformer系列 lishanlu136 #图像分类轻量级模型 transformer 图像分类
先占坑，持续更新。。。文章目录1、DeiT2、ConViT3、Mobile-Former4、MobileViTTransformer是2017谷歌提出的一篇论文，最早应用于NLP领域的机器翻译工作，Transformer解读，但随着2020年DETR和ViT的出现(DETR解读，ViT解读)，其在视觉领域的应用也如雨后春笋般渐渐出现，其特有的全局注意力机制给图像识别领域带来了重要参考。但是tran
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
上班族怎么赚钱搞副业，每月让你多挣几千元的方法配音就业圈
适合上班族的副业有哪些?1、投稿赚在线贡献，节省邮费，但也很快，一篇手稿也可以投资于许多手稿。文章不会写，找别人的改变，拼凑在一起，非常简单方便。兼职副业推荐公众号，配音新手圈，声优配音圈，新配音兼职圈，配音就业圈，鼎音副业，有声新手圈，每天更新各种远程工作与在线兼职，职位包括：写手、程序开发、剪辑、设计、翻译、配音、无门槛、插画、翻译、等等。。。每日更新兼职。如果你不能写软文章，请去软文章网络学
后端开发刷题 | 把数字翻译成字符串（动态规划） jingling555 笔试题目动态规划 java 算法数据结构后端
描述有一种将字母编码成数字的方式：'a'->1,'b->2',...,'z->26'。现在给一串数字，返回有多少种可能的译码结果数据范围：字符串长度满足0=10&&num<=26){if(i==1){dp[i]+=1;}else{dp[i]+=dp[i-2];}}}returndp[nums.length()-1];}}
基于STM32的简易RTOS分析-预备知识騏威嵌入式
写下这篇文章的主要目的是对自己学习RTOS的历程做一个记录和总结，方便以后回忆翻看。以下内容主要来自宋岩先生翻译的《Cortex-M3权威指南》。目录一、Cortex-M3寄存器简介二、堆栈操作简介三、汇编指令简介LDR和STR指令STMDB和LDMIA指令B、BX、BL、BLX指令MRS和MSR指令四、中断简介中断响应过程简介SVC和PensSV中断简介软件中断五、汇编基础一、Cortex-M3
sql统计相同项个数并按名次显示朱辉辉33 java oracle
现在有如下这样一个表： A表 ID Name time ------------------------------ 0001 aaa 2006-11-18 0002 ccc 2006-11-18 0003 eee 2006-11-18 0004 aaa 2006-11-18 0005 eee 2006-11-18 0004 aaa 2006-11-18 0002 ccc 20
Android+Jquery Mobile学习系列-目录白糖_ JQuery Mobile
最近在研究学习基于Android的移动应用开发，准备给家里人做一个应用程序用用。向公司手机移动团队咨询了下，觉得使用Android的WebView上手最快，因为WebView等于是一个内置浏览器，可以基于html页面开发，不用去学习Android自带的七七八八的控件。然后加上Jquery mobile的样式渲染和事件等，就能非常方便的做动态应用了。从现在起，往后一段时间，我打算
如何给线程池命名 daysinsun 线程池
在系统运行后，在线程快照里总是看到线程池的名字为pool-xx，这样导致很不好定位，怎么给线程池一个有意义的名字呢。参照ThreadPoolExecutor类的ThreadFactory，自己实现ThreadFactory接口，重写newThread方法即可。参考代码如下： public class Named
IE 中"HTML Parsing Error:Unable to modify the parent container element before the 周凡杨 html 解析 error readyState
错误： IE 中"HTML Parsing Error:Unable to modify the parent container element before the child element is closed" 现象：同事之间几个IE 测试情况下，有的报这个错，有的不报。经查询资料后，可归纳以下原因。
java上传 g21121 java
我们在做web项目中通常会遇到上传文件的情况，用struts等框架的会直接用的自带的标签和组件，今天说的是利用servlet来完成上传。我们这里利用到commons-fileupload组件，相关jar包可以取apache官网下载：http://commons.apache.org/ 下面是servlet的代码： //定义一个磁盘文件工厂 DiskFileItemFactory fact
SpringMVC配置学习 510888780 spring mvc
spring MVC配置详解现在主流的Web MVC框架除了Struts这个主力外，其次就是Spring MVC了，因此这也是作为一名程序员需要掌握的主流框架，框架选择多了，应对多变的需求和业务时，可实行的方案自然就多了。不过要想灵活运用Spring MVC来应对大多数的Web开发，就必须要掌握它的配置及原理。　　一、Spring MVC环境搭建：（Spring 2.5.6 + Hi
spring mvc-jfreeChart 柱图(1) 布衣凌宇 jfreechart
第一步：下载jfreeChart包，注意是jfreeChart文件lib目录下的，jcommon-1.0.23.jar和jfreechart-1.0.19.jar两个包即可；第二步：配置web.xml; web.xml代码如下 <servlet> <servlet-name>jfreechart</servlet-nam
我的spring学习笔记13-容器扩展点之PropertyPlaceholderConfigurer aijuans Spring3
PropertyPlaceholderConfigurer是个bean工厂后置处理器的实现，也就是BeanFactoryPostProcessor接口的一个实现。关于BeanFactoryPostProcessor和BeanPostProcessor类似。我会在其他地方介绍。PropertyPlaceholderConfigurer可以将上下文（配置文件）中的属性值放在另一个单独的标准java P
java 线程池使用 Runnable&Callable&Future antlove java thread Runnable callable future
1. 创建线程池 ExecutorService executorService = Executors.newCachedThreadPool(); 2. 执行一次线程，调用Runnable接口实现 Future<?> future = executorService.submit(new DefaultRunnable()); System.out.prin
XML语法元素结构的总结百合不是茶 xml 树结构
1.XML介绍1969年 gml (主要目的是要在不同的机器进行通信的数据规范)1985年 sgml standard generralized markup language1993年 html(www网)1998年 xml extensible markup language
改变eclipse编码格式 bijian1013 eclipse 编码格式
1.改变整个工作空间的编码格式改变整个工作空间的编码格式，这样以后新建的文件也是新设置的编码格式。 Eclipse->window->preferences->General->workspace-
javascript中return的设计缺陷 bijian1013 JavaScript AngularJS
代码1： <script> var gisService = (function(window) { return { name:function () { alert(1); } }; })(this); gisService.name(); &l
【持久化框架MyBatis3八】Spring集成MyBatis3 bit1129 Mybatis3
pom.xml配置 Maven的pom中主要包括： MyBatis MyBatis-Spring Spring MySQL-Connector-Java Druid applicationContext.xml配置 <?xml version="1.0" encoding="UTF-8"?> &
java web项目启动时自动加载自定义properties文件 bitray java Web 监听器相对路径
创建一个类 public class ContextInitListener implements ServletContextListener 使得该类成为一个监听器。用于监听整个容器生命周期的，主要是初始化和销毁的。类创建后要在web.xml配置文件中增加一个简单的监听器配置，即刚才我们定义的类。 <listener> <des
用nginx区分文件大小做出不同响应 ronin47
昨晚和前21v的同事聊天，说到我离职后一些技术上的更新。其中有个给某大客户(游戏下载类)的特殊需求设计，因为文件大小差距很大——估计是大版本和补丁的区别——又走的是同一个域名，而squid在响应比较大的文件时，尤其是初次下载的时候，性能比较差，所以拆成两组服务器，squid服务于较小的文件，通过pull方式从peer层获取，nginx服务于较大的文件，通过push方式由peer层分发同步。外部发布
java-67-扑克牌的顺子.从扑克牌中随机抽5张牌，判断是不是一个顺子，即这5张牌是不是连续的.2-10为数字本身，A为1，J为11，Q为12，K为13，而大 bylijinnan java
package com.ljn.base; import java.util.Arrays; import java.util.Random; public class ContinuousPoker { /** * Q67 扑克牌的顺子从扑克牌中随机抽5张牌，判断是不是一个顺子，即这5张牌是不是连续的。 * 2-10为数字本身，A为1，J为1
翟鸿燊老师语录 ccii 翟鸿燊
一、国学应用智慧TAT之亮剑精神A 1. 角色就是人格就像你一回家的时候，你一进屋里面，你已经是儿子，是姑娘啦，给老爸老妈倒怀水吧，你还觉得你是老总呢？还拿派呢？就像今天一样，你们往这儿一坐，你们之间是什么，同学，是朋友。还有下属最忌讳的就是领导向他询问情况的时候，什么我不知道，我不清楚，该你知道的你凭什么不知道
[光速与宇宙]进行光速飞行的一些问题 comsci 问题
在人类整体进入宇宙时代，即将开展深空宇宙探索之前，我有几个猜想想告诉大家仅仅是猜想。。。未经官方证实 1：要在宇宙中进行光速飞行，必须首先获得宇宙中的航行通行证，而这个航行通行证并不是我们平常认为的那种带钢印的证书，是什么呢？下面我来告诉
oracle undo解析 cwqcwqmax9 oracle
oracle undo解析2012-09-24 09:02:01 我来说两句作者：虫师收藏我要投稿 Undo是干嘛用的？ &nb
java中各种集合的详细介绍 dashuaifu java 集合
一，java中各种集合的关系图 Collection 接口的接口对象的集合 ├ List 子接口 &n
卸载windows服务的方法 dcj3sjt126com windows service
卸载Windows服务的方法在Windows中，有一类程序称为服务，在操作系统内核加载完成后就开始加载。这里程序往往运行在操作系统的底层，因此资源占用比较大、执行效率比较高，比较有代表性的就是杀毒软件。但是一旦因为特殊原因不能正确卸载这些程序了，其加载在Windows内的服务就不容易删除了。即便是删除注册表中的相应项目，虽然不启动了，但是系统中仍然存在此项服务，只是没有加载而已。如果安装其他
Warning: The Copy Bundle Resources build phase contains this target's Info.plist dcj3sjt126com ios xcode
http://developer.apple.com/iphone/library/qa/qa2009/qa1649.html Excerpt: You are getting this warning because you probably added your Info.plist file to your Copy Bundle
2014之C++学习笔记（一） Etwo C++Etwo Etwo iterator 迭代器
已经有很长一段时间没有写博客了，可能大家已经淡忘了Etwo这个人的存在，这一年多以来，本人从事了AS的相关开发工作，但最近一段时间，AS在天朝的没落，相信有很多码农也都清楚，现在的页游基本上达到饱和，手机上的游戏基本被unity3D与cocos占据，AS基本没有容身之处。so。。。最近我并不打算直接转型
js跨越获取数据问题记录 haifengwuch jsonp json Ajax
js的跨越问题，普通的ajax无法获取服务器返回的值。第一种解决方案，通过getson，后台配合方式，实现。 Java后台代码： protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException { String ca
蓝色jQuery导航条 ini JavaScript html jquery Web html5
效果体验：http://keleyi.com/keleyi/phtml/jqtexiao/39.htmHTML文件代码： <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>jQuery鼠标悬停上下滑动导航条 - 柯乐义<
linux部署jdk,tomcat,mysql kerryg jdk tomcat linux mysql
1、安装java环境jdk: 一般系统都会默认自带的JDK,但是不太好用，都会卸载了，然后重新安装。 1.1）、卸载：（rpm -qa :查询已经安装哪些软件包； rmp -q 软件包：查询指定包是否已
DOMContentLoaded VS onload VS onreadystatechange mutongwu jquery js
1. DOMContentLoaded 在页面html、script、style加载完毕即可触发，无需等待所有资源（image/iframe）加载完毕。（IE9+） 2. onload是最早支持的事件，要求所有资源加载完毕触发。 3. onreadystatechange 开始在IE引入，后来其它浏览器也有一定的实现。涉及以下 document , applet, embed, fra
sql批量插入数据 qifeifei 批量插入
hi，自己在做工程的时候，遇到批量插入数据的数据修复场景。我的思路是在插入前准备一个临时表，临时表的整理就看当时的选择条件了，临时表就是要插入的数据集，最后再批量插入到数据库中。 WITH tempT AS ( SELECT item_id AS combo_id, item_id, now() AS create_date FROM a
log4j打印日志文件如何实现相对路径到项目工程下 thinkfreer Web log4j 应用服务器日志
最近为了实现统计一个网站的访问量，记录用户的登录信息，以方便站长实时了解自己网站的访问情况，选择了Apache 的log4j,但是在选择相对路径那块卡主了，X度了好多方法(其实大多都是一样的内用，还一个字都不差的)，都没有能解决问题，无奈搞了2天终于解决了，与大家分享一下需求：用户登录该网站时，把用户的登录名,ip,时间。统计到一个txt文档里，以方便其他系统调用此txt。项目名
linux下mysql-5.6.23.tar.gz安装与配置笑我痴狂 mysql linux unix
1.卸载系统默认的mysql [root@localhost ~]# rpm -qa | grep mysql mysql-libs-5.1.66-2.el6_3.x86_64 mysql-devel-5.1.66-2.el6_3.x86_64 mysql-5.1.66-2.el6_3.x86_64 [root@localhost ~]# rpm -e mysql-libs-5.1