isinstance

Unsupervisd Reprsentation Learning With Deep Convolutional Generative Adversarial Networks 阅读笔记

论文重要内容节选

Abstract

In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.

近几年中，使用卷积网路的有监督的学习大量应用在计算机视觉任务中。而相对的，卷积网络的无监督学习却受到很少的关注。在本片论文的工作中，我们为 CNNs 建立一个有监督学习到无监督学习成功的桥梁。我们介绍我们这一类方法叫 DCGANs，通过某些架构限制来，证明它们是无监督学习的有力候选人。在不同的图片数据集上训练，我们将展示令人信服的证据，我们的深度卷积对抗对在生成器和判别器中学习了从对象部分到场景的表示层次结构。之后，我们使用学习到的特征来对新的任务 - 证明它们对一般图像表示的适用性。

1 Introduction

Learning reusable feature representations from large unlabeled datasets has been an area of active research. In the context of computer vision, one can leverage the practically unlimited amount of unlabeled images and videos to learn good intermediate representations, which can then be used on a variety of supervised learning tasks such as image classification.

从大量未标记的数据集中学得一个可重复使用的特征表示是一个热门研究领域。在计算机视觉语境中，这可以使用几乎无限的未标记图片和视频来学习得一个好的中间表示，这可以被用于很多监督学习的变种中，诸如图像分类。

We propose that one way to build good image representations is by training Generative Adversarial Networks (GANs) (Goodfellow et al., 2014), and later reusing parts of the generator and discriminator networks as feature extractors for supervised tasks.

这篇论文提出了一种通过训练对抗网络的方法来建立一个好的图像表示，之后重复使用其中的生成器和鉴别器网络部分作为特征提取器用于监督任务。

GANs provide an attractive alternative to maximum likelihood techniques. One can additionally argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attractive to representation learning. GANs have been known to be unstable to train, often resulting in generators that produce nonsensical outputs. There has been very limited published research in trying to understand and visualize what GANs
learn, and the intermediate representations of multi-layer GANs.

GANs 提供了一个吸引人的关于最大似然技术的替代品。其中它的学习过程和一个启发式的损失函数对于表示学习来说很具有吸引力。GANs 曾经被认为是不适合训练的，经常认为在生成器中生成了荒谬的输出。只有很有限的文章在试图理解和图像化 GANs 学到了什么，多层 GANs 的中间表示是什么。

In this paper, we make the following contribution

We propose and evaluate a set of constraints on the architectural topology of Convolutional
GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN).
We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms.
We visualize the filters learnt by GANs and empirically show that specific filters have learned to draw specific objects.
We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.

这篇文章的贡献

提出和评估了一系列的约束在卷积 GANs 的架构拓扑上使之在大都数的设定中都处于稳定。之后命名这一类架构未 Deep Convolutional GANs (DCGAN)。

我们使用训练过的鉴别器在图像分类任务中，显示出其具有竞争力的性能对比其他非监督算法。

我们可视化了 GANs 学到的过滤器然后凭经验显示特殊的过滤器已经学会画出特定的对象。

我们展示生成器具有有趣的向量算术特性，可以轻松的操作生成样本的许多语义质量。

2 Related work

2.1 Representation Learning from Unlabeled Data

Unsupervised representation learning is a fairly well studied problem in general computer vision research, as well as in the context of images. A classic approach to unsupervised representation learning is to do clustering on the data (for example using K-means), and leverage the clusters for improved classification scores. In the context of images, one can do hierarchical clustering of image patches (Coates & Ng, 2012) to learn powerful image representations. Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible. These methods have also been shown to learn good feature representations from image pixels. Deep belief networks (Lee et al., 2009) have also been shown to work well in learning hierarchical representations.

无监督的表示学习是一个相当好的问题，在普通计算机视觉研究中，其内容为图片。一个经典的无监督表示学习是对数据做一个聚类（如 K-means），之后利用簇来提高分类的分数。如果内容为图片的时候，我们可以对图像补丁做一个分成的聚类来学习一个强有力的图像表示。其他流行的方法是训练一个自动编码器，分离代码的内容和位置组件，阶梯结构，编码一个图片成为紧凑的代码，之后试图精确的解码这个代码来重组图像。这种方法已被证明可以从图像像素中学习出一个好的特征表示。深度信念网络也被证明在学习一个分层表示时工作得很好。

2.2 Generating Natural Images

Generative image models are well studied and fall into two categories: parametric and non-parametric.

生成图像模型主要有两类：参数化的和非参数化的。

The non-parametric models often do matching from a database of existing images, often matching patches of images, and have been used in texture synthesis (Efros et al., 1999), super-resolution (Freeman et al., 2002) and in-painting (Hays & Efros, 2007).

无参数的模型经常从已有图片的数据中做匹配，还经常用作匹配图片的部分区域，还被用做纹理合成（texture synthesis），超分辨率和画中画。

Parametric models for generating images has been explored extensively (for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000)). However, generating natural images of the real world have had not much success until recently. A variational sampling approach to generating images (Kingma & Welling, 2013) has had some success, but the samples often suffer from being blurry. Another approach generates images using an iterative forward diffusion process (Sohl-Dickstein et al., 2015). Generative Adversarial Networks (Goodfellow et al., 2014) generated images suffering from being noisy and incomprehensible. A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models. A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.

参数化的生成图像模型已经被广泛的探索了。然而，生成来自现实世界的自然图像最近还未获得很大的成功。一些变分抽样的方法来生成图片已经获得了一些成功，但是这些样本经常很模糊。其他一些方法生成图像使用了迭代前向扩散方法，生成对抗网络生成的图片则嘈杂和无法连接。一个拉普拉斯金字塔扩展方法显示了高的图片质量，但是它们依旧有目标摇摇晃晃的问题，因为在链接多个模型的时候引入了噪声。一种循环网络方法和一种反卷积网络方法最近同样获得了一些成功早生成自然图片上。但是，它们依旧没有为监督任务使用生成器。

2.3 Visualizing the Internals of CNNs

One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and filtering the maximal activations, one can find the approximate purpose of each convolution filter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters (Mordvintsev et al.).

一个关于使用神经网络的普遍批评是认为神经网络是黑盒方法，也就是目前来说人类无法解释和理解神经网络内部的具体原理，在 CNNs 的语境中，Zeiler 展示了通过使用反卷积和过滤最大激活函数，我们可以发现网络中每一个卷积核大概的意图。类似的，在输入中使用梯度下降可以让我们检查某些激活过滤器子集的理想图像。

3 Approach and Model Architecture

Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. This motivated the authors of LAPGAN (Denton et al., 2015) to develop an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably. We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.

使用 CNNs 对图像进行建模来扩大 GANs 的历史尝试并不成功。这个动机来自 LAPGAN 的作者试图开发一个替代方法，迭代的提高低分辨率生成图像，可以更可信的建模。我们同样遭遇了困难的尝试，使用 CNN 结构来扩展 GANs 的，用于监督环境。然而，在广泛模型的探索之后，我们发现一个架构类，可以通过一系列的数据集稳定的训练，同时允许训练高分辨率和深度生成模型。

Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.

我们方法的核心是采用和修改最近展示的对 CNN 架构的三个更改。

The first is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.

第一个是全卷积网络，通过大步的卷积替代了确定性的空间池化函数（如最大池化），允许网络学习到它自己的空间下采样。我们使用这个方法在我们的生成器中，允许它学习自己的空间上采样和鉴别器。

Second is the trend towards eliminating fully connected layers on top of convolutional features. The strongest example of this is global average pooling which has been utilized in state of the art image classification models (Mordvintsev et al.). We found global average pooling increased model stability but hurt convergence speed. A middle ground of directly connecting the highest convolutional features to the input and output respectively of the generator and discriminator worked well. The first layer of the GAN, which takes a uniform noise distribution $Z$ as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.

第二个是在卷积特征之上消除全连接层的趋势。其中最有力的样本是全局平均池化，这种方法已经在 state of the art 图片分离模型中使用了。我们发现全局平均池化提高了模型的稳定性，但是伤害了收敛速度。将最高卷积特征分别直接连接到生成器和鉴别器的输入和输出的中间地带效果很好。GAN 的第一层，使用一个均匀噪声分布 $Z$ 作为输入，可以被视作全连接因为这只是矩阵乘法，但是其结果被改造成4维张量，作为卷积的输入。而鉴别器，最后卷积层被拉伸然后输入到单个 sigmoid 函数中，图1展示了示例模型架构。

Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribution $Z$ is projected to a small spatial extent convolutional representation with many feature maps. A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a 64 × 64 pixel image. Notably, no fully connected or pooling layers are used.

输入是一个100维的均匀分布 $Z$ ，投影到小空间范围内有很多 feature maps 的卷积表示。之后四个部分步幅的卷积（最近的部分论文中，它们被错误的叫做反卷积）转换高层表示到 64 × 64 像素图片。尤其，没有使用全连接或者池化层。

Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.

第三是批量标准化，通过将每个单元的输入归一化为零均值和单位方差来稳定学习。这帮助处理训练中因为缺乏初始化导致的问题和帮助更深模型中的梯度流动。这证实了获得深度生成器在学习开始时，避免生成器从所有样本中崩溃到单个点的问题，这个问题是 GANs 的普遍问题。直接的应用批量标准化到所有层中。会导致样本震荡和模型不稳定。这可以通过不使用批量标准化到生成器输出层和鉴别器的输入层来避免。

The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).

ReLU 激活函数在生成器中使用，除了使用 Tanh 函数的输出层。我们观察到使用一个有界的激活函数允许模型来学习达到饱和得更快，还能覆盖训练分布的颜色空间。在生成器中，我们发现泄露校正激活函数工作得很好，尤其对于高分辨率的模型。这是和使用了 maxout 激活的原 GAN 的论文相比之下。

Architecture guidelines for stable Deep Convolutional GANs

Replace any pooling layers with strided convolutions (discriminator) and fractional-strided
convolutions (generator).
Use batchnorm in both the generator and the discriminator.
Remove fully connected hidden layers for deeper architectures.
Use ReLU activation in generator for all layers except for the output, which uses Tanh.
Use LeakyReLU activation in the discriminator for all layers.

稳定的深度卷积 GANs 架构指南

用步幅卷积（鉴别器）和分开步数卷积（生成器）替换所有池化层。

在生成器和鉴别器中都使用批量标准化。

维深度架构移除全连接隐层。

在生成器中为所有层使用 ReLU 激活除了输出层，输出层使用 Tanh 激活。

使用 LeakyReLU 激活函数。

4 Details of Adversarial Training

We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al., 2015), Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below.

我们训练 DCGANs 在三个数据集上，LSUN，Imagenet-1k 和新的脸部数据集。关于这些数据集的使用详情在下面。

No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term $\beta_1$ at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training.

当训练数据时候没有预处理图片到一个 tanh 激活函数的范围 [-1, 1] 之中。所有模型使用一个大小为128的 mini-batch 随机梯度下降（SGD）来训练。所有参数都被初始化为中间点为0的正态分布，其标准差为0.02。在 LeakyReLU 中，而 leak 的斜率设置为0.2在所有模型中。当前面 GAN 的工作呗用作一个动量来加速训练，我们使用了 Adam 优化和调整过的超参数。之后我们发现建议的学习率为0.001，如果太高了一点，可以使用0.0002替代。另外，我们发现如果让动量项（momentum term） $\beta_1$ 保持在建议值0.9作用于训练将导致震荡和不稳定，当减少这个值到0.5的时候将会帮助稳定训练。

4.1 LSUN

As visual quality of samples from generative image models has improved, concerns of over-fitting
and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link between how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high quality samples via simply overfitting/memorizing training examples. No data augmentation was applied to the images.

随着生成图片模型中样本视觉质量的提高，关于过拟合和训练样本记忆的担心开始出现。为了证明我们模型扩展了更多的数据和更高分辨率，我们训练一个模型在 LSUN bedrooms 数据上，包含了至少三百万训练样本。最近的分析显示，一个模型学习的速度和它们的生成性能之间有直接关系。我们展示了其中一个训练 epoch 的数据（Fig.2），模仿在线学习，另外的是一个样本在收敛之后的样子（Fig.3），作为证明我们的模型不会生成高质量样本通过简单的过拟合和记忆训练样本。没有对图像数据进行增强。

Figure 2: Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate.

在数据中进行一轮训练之后生成的 bedrooms 。理论上，这个模型可以学着记忆训练样本，但是这在实验上不太可能，因为我们使用了较小的学习率和小排量的 SGD。我们意识到没有先前经验证据表明 SGD 和小学习率有记忆性。

Figure 3: Generated bedrooms after five epochs of training. There appears to be evidence of visual under-fitting via repeated noise textures across multiple samples such as the base boards of some of the beds.

在五轮训练之后生成的 bedrooms。这里似乎是欠拟合在视觉上的证据，通过重复噪声纹理在多个样本上，如一些床的基板。

4.1.1 Deduplication

To further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We fit a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32x32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication . Visual inspection of hash collisions showed high precision with an estimated false positive rate of less than 1 in 100. Additionally, the technique detected and removed approximately 275,000 near duplicates, suggesting a high recall.

为了进一步降低生成器记忆输入样本的可能性，我们使用了一个简单的图片重复数据删除程序。我们为训练样本的中间裁剪部分使用了一个 3072-128-3072 的去噪声的 dropout 正则化 RELU 的 autoencoder 在32x32下采样下。然后通过对 ReLU 激活进行阈值化，对生成的代码层激活进行二值化，这已被证明是一种有效的信息保存技术，并且提供了一种方便的语义散列形式，允许一个线性时间的去重。对哈希碰撞的目视检查显示出高精度，估计误报率不到百分之一。特别的，这种技术删除和移除了接近 275,000 的重复项，按时一个高的召回率（recall）。

4.2 Faces

We scraped images containing human faces from random web image queries of peoples names. The people names were acquired from dbpedia, with a criterion that they were born in the modern era. This dataset has 3M images from 10K people. We run an OpenCV face detector on these images, keeping the detections that are sufficiently high resolution, which gives us approximately 350,000 face boxes. We use these face boxes for training. No data augmentation was applied to the images.

从随机查询一个人名字的网络图片中刮取包含人类面部的图片，这些人名字是从 dbpedia 中获得的，标准是他们都出生在现代。这个数据集有 3M 的数据，来自 10K 的人。我们运行 OpenCV 脸部检测器在这些图片上，保持一个高分辨率的检测，这给了我们接近 350,000 个脸部区域。我们使用这些脸部区域来训练。没有数据增强被应用到这些图片上。

4.3 ImageNet-1K

We use Imagenet-1k (Deng et al., 2009) as a source of natural images for unsupervised training. We train on 32 × 32 min-resized center crops. No data augmentation was applied to the images.

我们使用 Imagenet-1k 这个数据集来作为无监督学习自然图片的来源。我们训练在 32 × 32 的最小调整大小的中间裁剪。没有数据增强被使用到这些图片上。

5 Empirical Validation of DCGANS Capabilities

5.1 Classifying CIFAR-10 Using GANs as a Feature Extractor

One common technique for evaluating the quality of unsupervised representation learning algorithms is to apply them as a feature extractor on supervised datasets and evaluate the performance of linear models fitted on top of these features.

有一个普遍的技术为衡量无监督表示学习算法的质量，应用这个算法作为一个特征提取器在有监督的数据集上，和评估拟合在这些特征之上的线性模型的性能。

On the CIFAR-10 dataset, a very strong baseline performance has been demonstrated from a well tuned single layer feature extraction pipeline utilizing K-means as a feature learning algorithm. When using a very large amount of feature maps (4800) this technique achieves 80.6% accuracy. An unsupervised multi-layered extension of the base algorithm reaches 82.0% accuracy (Coates & Ng, 2011). To evaluate the quality of the representations learned by DCGANs for supervised tasks, we train on Imagenet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation to produce a 4 × 4 spatial grid. These features are then flattened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classifier is trained on top of them. This achieves 82.8% accuracy, out performing all K-means based approaches.

在 CIFAR-10 数据集上，一个非常强的基础性能已经被证明来自一个好的调整过的通过 K-means 通道的单层特征提取器作为特征学习算法。当我们使用了非常巨大的 feature map 时候，这个技术达到了 80.6% 的准确率。基础算法的无监督多层扩展达到了 82.0% 的准确率。为了评估有监督 DCGANs 表示学习的质量，我们在 Imagenet-1k 上训练如何使用鉴别器所有层的卷积特征，最大池化每一层的表示来生成一个 4 × 4 的空间网络。这些特征之后被拉伸和串联起来形成一个 28672 维的向量，并在他们之上训练一个正则化线性 L2-SVM 分类器。

Notably, the discriminator has many less feature maps (512 in the highest layer) compared to K-means based techniques, but does result in a larger total feature vector size due to the many layers of 4 × 4 spatial locations. The performance of DCGANs is still less than that of Exemplar CNNs (Dosovitskiy et al., 2015), a technique which trains normal discriminative CNNs in an unsupervised fashion to differentiate between specifically chosen, aggressively augmented, exemplar samples from the source dataset. Further improvements could be made by finetuning the discriminator’s representations, but we leave this for future work. Additionally, since our DCGAN was never trained on CIFAR-10 this experiment also demonstrates the domain robustness of the learned features.

尤其，鉴别器的特征图要少得多，对比基于 K-means 的技术，但由于许多 4 × 4 空间位置的层，确实会导致更大的总特征向量大小。DCGANs 的性能依旧少于 Exemplar CNNs，这个技术是训练正常的具有鉴别能力的 CNNs 在无监督方式来鉴别特定的选择，积极增强，来自源数据的模范样本。进一步的提高可以使用微调鉴别器的表示，但是我们把这个工作留到以后。特别的，我们的 DCGAN 从来没有在 CIFAR-10 上训练过，这个实验同样证明了已学习得特征的域鲁棒性。

5.2 Classifying SVHN Digits Using GANs as a Feature Extractor

On the StreetView House Numbers dataset (SVHN)(Netzer et al., 2011), we use the features of the discriminator of a DCGAN for supervised purposes when labeled data is scarce. Following similar dataset preparation rules as in the CIFAR-10 experiments, we split off a validation set of 10,000 examples from the non-extra set and use it for all hyperparameter and model selection. 1000 uniformly class distributed training examples are randomly selected and used to train a regularized linear L2-SVM classifier on top of the same feature extraction pipeline used for CIFAR-10. This achieves state of the art (for classification using 1000 labels) at 22.48% test error, improving upon another modifcation of CNNs designed to leverage unlabled data (Zhao et al., 2015).

在 SVHN 上，我们使用来自 DCGAN 的鉴别器的能用于监督目的的特征，当已标记的数据非常稀缺的时候。遵循与 CIFAR-10 实验类似的数据集准备规则，我们分开一个来自非额外数据的 10,000 个验证数据集样本，然后为所有的超参数和模型选择使用这些样本。1000 个统一分类分布训练样本是随机选择的，然后使用这个样本来训练一个正则线性 L2-SVM 分类器，在用于 CIFAR-10 的相同特征提取管道之上。在 state of the art 上获得 22.48% 的测试错误，改进另一个修改的 CNNs 设计来使用未标记的数据。

Additionally, we validate that the CNN architecture used in DCGAN is not the key contributing factor of the model’s performance by training a purely supervised CNN with the same architecture on the same data and optimizing this model via random search over 64 hyperparameter trials (Bergstra & Bengio, 2012). It achieves a signficantly higher 28.87% validation error.

特别的，我们证实在 DCGAN 中使用的 CNN 架构不是对模型性能的关键贡献，通过训练具有相同架构相同数据的纯监督 CNN，之后优化这个模型通过随机搜索超过64个超参数实验。这个方法可以达到一个很高的验证率。

6 Investigating and Visualizing the Internal of the Networks

We investigate the trained generators and discriminators in a variety of ways. We do not do any kind of nearest neighbor search on the training set. Nearest neighbors in pixel or feature space are trivially fooled (Theis et al., 2015) by small image transforms. We also do not use log-likelihood metrics to quantitatively assess the model, as it is a poor (Theis et al., 2015) metric.

我们以多种方式调查训练过的生成器和鉴别器。我们不会在训练数据集上调用任何的最近邻搜索。像素上或者特征空间上的最近邻会被小图片变换愚弄。我们同样不使用log似然方法来定量评估模型，因为这是一个很差的指标。

6.1 Walking in the Latent Space

The first experiment we did was to understand the landscape of the latent space. Walking on the manifold that is learnt can usually tell us about signs of memorization (if there are sharp transitions) and about the way in which the space is hierarchically collapsed. If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations. The results are shown in Fig.4.

第一个做的实验是为了理解潜在空间的样貌。在以学到的 manifold 上行走是的通常可以告诉我们关于记忆的迹象（如果他们是急剧转变的），和知道哪一个空间是分层折叠的。如果在这个潜在空间上行走导致生成图片语义的改变（诸如对象被添加或者移除），我们可以归结于模型依旧学习到相关和感兴趣的表示。Fig.4 太大只截取了部分内容。

Figure 4: Top rows: Interpolation between a series of 9 random points in $Z$ show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom. In the 6th row, you see a room without a window slowly transforming into a room with a giant window. In the 10th row, you see what appears to be a TV slowly being transformed into a window.

最顶行： $Z$ 中一系列9个值之间插值显示已学得的空间有平滑的转换，在空间中的所有的图片都似是而非的看着像 bedroom。在第六行中，我们可以看见一个没有窗户的房间缓慢的转换到一个房间有巨大的窗户。在第十行中，我们可以一个有 TV 的房间缓慢转换成一个有窗户的房间。

6.2 Visualizing the Discriminator Features

Previous work has demonstrated that supervised training of CNNs on large image datasets results in very powerful learned features (Zeiler & Fergus, 2014). Additionally, supervised CNNs trained on scene classification learn object detectors (Oquab et al., 2014). We demonstrate that an unsupervised DCGAN trained on a large image dataset can also learn a hierarchy of features that are interesting. Using guided backpropagation as proposed by (Springenberg et al., 2014), we show in Fig.5 that the features learnt by the discriminator activate on typical parts of a bedroom, like beds and windows. For comparison, in the same figure, we give a baseline for randomly initialized features that are not activated on anything that is semantically relevant or interesting.

以前的工作已经证明在 CNNs 上的在大量图片数据集上有监督训练就会形成一个非常强的以学习特征。特别的，有监督的 CNNs 训练在一个场景分类学习目标鉴别器上。我们证明一个无监督的 DCGAN 训练在一个大量图片数据集上，也可以学习到一个感兴趣的特征的层次结构。使用引导反向传播算法，我们在 Fig.5 上显示已经被鉴别器学得的特征层在 bedroom 特别部分的激活，像床或者窗户。为了对比，在一些图片上，我们为随机初始化特征给出了一个，不会被激活在任何和语义相关或者语义感兴趣的一个基础线。

Figure 5: On the right, guided backpropagation visualizations of maximal axis-aligned responses for the first 6 learned convolutional features from the last convolution layer in the discriminator. Notice a significant minority of features respond to beds - the central object in the LSUN bedrooms dataset. On the left is a random filter baseline. Comparing to the previous responses there is little to no discrimination and random structure.

右边，有引导的反向传播可视化关于最大轴对齐响应，关于首6次已学得的卷积特征来自鉴别器的最后一个卷积层。注意到一个明显的关于床的特征反应 - 在中心目标在 LSUN bedrooms 数据集上。左边则是随机的过滤器基线。对比前一个反应（训练过的过滤器那个），（随机过滤器）这里有几乎没有可鉴别的和随机结构。

6.3 Manipulating the Generator Representation

6.3.1 Forgetting to Draw Certain Objects

In addition to the representations learnt by a discriminator, there is the question of what representations the generator learns. The quality of samples suggest that the generator learns specific object representations for major scene components such as beds, windows, lamps, doors, and miscellaneous furniture. In order to explore the form that these representations take, we conducted an experiment to attempt to remove windows from the generator completely.

除了鉴别器所学习到的表示，这里还有一个问题就是生成器的表示是什么。样本的质量显示生成器学习到了特定的目标表示，在主要场景组合诸如床，窗户等等。为了探索这些表示的形式，我们实施一个实验在生成器中试图完全移除窗户。

On 150 samples, 52 window bounding boxes were drawn manually. On the second highest convolution layer features, logistic regression was fit to predict whether a feature activation was on a window (or not), by using the criterion that activations inside the drawn bounding boxes are positives and random samples from the same images are negatives. Using this simple model, all feature maps with weights greater than zero ( 200 in total) were dropped from all spatial locations. Then, random new samples were generated with and without the feature map removal.

在150个样本中，52个窗户框被手动画出。在第二高的卷积层表示中，逻辑回归被用于预测哪一个特征激活在窗户上，通过使用在绘制的边界框内的激活是正数而来自相同图像的随机样本是负数的标准。使用这个简单的模型，所有的 feature maps 的权重大于零（总数为200）被丢弃在所有空间位置上。之后，随机新的样本被生成有或者没有 feature map 删除。

The generated images with and without the window dropout are shown in Fig.6, and interestingly,
the network mostly forgets to draw windows in the bedrooms, replacing them with other objects.

生成图片有或者没有窗户丢弃的在 Fig.6 中显示，之后有趣的是网络大多数忘记了画出窗户在 bedroom 中，而是通过其他对象代替。

Figure 6: Top row: un-modified samples from model. Bottom row: the same samples generated with dropping out ”window” filters. Some windows are removed, others are transformed into objects with similar visual appearance such as doors and mirrors. Although visual quality decreased, overall scene composition stayed similar, suggesting the generator has done a good job disentangling scene representation from object representation. Extended experiments could be done to remove other objects from the image and modify the objects the generator draws.

最上面为未更改的来自模型的样本。下面的一行未被丢弃了窗户过滤器的生成样本。一些窗户已经被移除，而其他的一些责备转换成了视觉上相似的对象，如门或者镜子。虽然视觉质量减少了，但是整体场景构成保持相似，显示生成器在通过对象表示来解构场景表示上做得很好。扩展实验可以通过删除其他对象来自图片和更改生成器画出对象来完成。

6.3.2 Vector Arithmeic on Face Samples

In the context of evaluating learned representations of words (Mikolov et al., 2013) demonstrated that simple arithmetic operations revealed rich linear structure in representation space. One canonical example demonstrated that the vector(”King”) - vector(”Man”) + vector(”Woman”) resulted in a vector whose nearest neighbor was the vector for Queen. We investigated whether similar structure emerges in the $Z$ representation of our generators. We performed similar arithmetic on the $Z$ vectors of sets of exemplar samples for visual concepts. Experiments working on only single samples per concept were unstable, but averaging the $Z$ vector for three examplars showed consistent and stable generations that semantically obeyed the arithmetic. In addition to the object manipulation shown in (Fig.7), we demonstrate that face pose is also modeled linearly in $Z$ space (Fig. 8).

在衡量学习得的词语表示语境下证明简单的算数运算透露富线性结构在表示空间。一个典型样本证明 vector(”King”) - vector(”Man”) + vector(”Woman”) 作用于一个 Queen 向量最近邻的向量。我们调查类似的结构是否出现在我们的生成器 $Z$ 表示中。我们使用相似的结构算法在 $Z$ 向量中，在一系列的视觉概率的模型样本中。实验仅仅在一个概念一个样本上进行是不稳定的，但是平均 $Z$ 向量对于三个实验显示语义上服从算术的一致且稳定。除了图 Fig.7 中显示的对象操作，我们还证明了脸部模型也在 $Z$ 中线性建模。

Figure 7: Vector arithmetic for visual concepts. For each column, the $Z$ vectors of samples are
averaged. Arithmetic was then performed on the mean vectors creating a new vector $Y$ . The center sample on the right hand side is produce by feeding $Y$ as input to the generator. To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale ±0.25 was added to $Y$ to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples) results in noisy overlap due to misalignment.

视觉概念的向量算法。对于每一列，样本的 $Z$ 向量是平均的。算法之后在平均向量上来创建一个新的向量 $Y$ 。右侧的中心样本是通过将 $Y$ 作为输入提供给生成器来生成的。为了演示生成器的插值能力，将比例为 ±0.25 的均匀噪声采样添加到 $Y$ 以生成其他 8 个样本。在输入空间中应用算术（下两个示例）会由于未对齐而导致噪声重叠。

Figure 8: A ”turn” vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.

一个转向向量是由四个向左看和向右看的脸的平均样本创建的。通过沿该轴向随机样本添加插值，我们能够可靠地转换它们的姿势。

These demonstrations suggest interesting applications can be developed using $Z$ representations learned by our models. It has been previously demonstrated that conditional generative models can learn to convincingly model object attributes like scale, rotation, and position (Dosovitskiy et al., 2014). This is to our knowledge the first demonstration of this occurring in purely unsupervised models. Further exploring and developing the above mentioned vector arithmetic could dramatically reduce the amount of data needed for conditional generative modeling of complex image distributions.

这证明了可以使用学习得的 $Z$ 表示来开发感兴趣的应用。前面已经证明有条件的生成模型可以学习成一个令人信服的模型对象诸如规模，旋转和位置属性。这是我们的知识首次证明这发生在完全无监督的模型。未来的探索和发展上面提到的向量算法可以戏剧性的减少复杂图像分布上有条件生成模型的数据数量。

7 Conclusion and Future Work

We propose a more stable set of architectures for training generative adversarial networks and we give evidence that adversarial networks learn good representations of images for supervised learning and generative modeling. There are still some forms of model instability remaining - we noticed as models are trained longer they sometimes collapse a subset of filters to a single oscillating mode.

我们提出一个更稳定的一系列架构，为生成对抗网络和我们给出证据关于对抗网络学习到一个很好的表示对于图片在有监督学习和生成模型中。这里依旧有一些模型不稳定的形式存在，我们注意到，随着模型的训练时间更长，它们有时会将过滤器的子集折叠为单个振荡模式。

Further work is needed to tackle this from of instability. We think that extending this framework to other domains such as video (for frame prediction) and audio (pre-trained features for speech synthesis) should be very interesting. Further investigations into the properties of the learnt latent space would be interesting as well.

未来的工作要处理这个不稳定性的问题。然后要扩展这个架构到其他领域诸如视频或者音频应该很有趣。未来的关于隐空间的特性探索也会很有趣。

论文的研究背景

1. 本论文解决什么问题？（能否通过一个示例来说明拟解决的问题）

本文解决了 GANs 存在的训练不稳定问题。

2. 关于该问题，目前的相关工作有哪些？这些相关工作有何优缺点？（综述相关工作）
相关工作：

未标记数据上的无监督学习目前相关工作

在数据集上先进行聚类数据，之后用聚类出来的簇来优化分类。
还有使用了自动编码器。
还有深度信念网络。

生成自然图片目前相关工作

一种是无参数模型，主要从已有数据集中匹配局部内容，已经被用于纹理合成、高分辨率和画中画内容。
还有就是有参数模型。

CNNs 内部可视化

有 Zeiler 使用反卷积和过滤最大激活来发现每一层试图所做的事情。
还有 Mordvintsev 试图在输入图片中做梯度下降来发现图片激活了那些特定部分。

本文未说明优缺点。

论文的主要研究内容

1. 针对已有工作的不足之处，本文提出了什么方法？（该方法为何有效？）该方法的基本思路是什么？主要创新点在哪？

提出的方法就是 DCGAN，思路和创新点就是通过研究和可视化了 GANs 和 GANs 中间层学到了什么内容，来优化 GANs 网络的训练。

2. 阐述本文提出方法的技术细节

第一个是全卷积网络，通过大步的卷积替代了确定性的空间池化函数（如最大池化），允许网络学习到它自己的空间下采样。我们使用这个方法在我们的生成器中，允许它学习自己的空间上采样和鉴别器。

第二个是在卷积特征之上消除全连接层的趋势。其中最有力的样本是全局平均池化，这种方法已经在 state of the art 图片分离模型中使用了。我们发现全局平均池化提高了模型的稳定性，但是伤害了收敛速度。将最高卷积特征分别直接连接到生成器和鉴别器的输入和输出的中间地带效果很好。GAN 的第一层，使用一个均匀噪声分布 $Z$ 作为输入，可以被视作全连接因为这只是矩阵乘法，但是其结果被改造成4维张量，作为卷积的输入。而鉴别器，最后卷积层被拉伸然后输入到单个 sigmoid 函数中。

第三是批量标准化，通过将每个单元的输入归一化为零均值和单位方差来稳定学习。这帮助处理训练中因为缺乏初始化导致的问题和帮助更深模型中的梯度流动。这证实了获得深度生成器在学习开始时，避免生成器从所有样本中崩溃到单个点的问题，这个问题是 GANs 的普遍问题。直接的应用批量标准化到所有层中。会导致样本震荡和模型不稳定。这可以通过不使用批量标准化到生成器输出层和鉴别器的输入层来避免。

论文的实验结果

1. 阐述本文的实验内容

实验太多，这里略。

2. 本文方法的有效性是如何通过实验进行验证的？

通过添加和移除一些特征表示来观察输出结果发现的确该方法已经发现 GANs 所学到的真实的对象（移除这些对象的表示将会导致输出缺少这些对象），和向量加减法结果来观察最近邻向量是否具有可解释性（向量可以通过交并集来生成符合人类逻辑的输出）等等。

论文存在的不足之处

1. 通过阅读此论文，你能否找到本文工作存在的不足之处？

无。

2. 试阐述解决这些不足之处的基本思路？

无。

你可能感兴趣的:(论文,人工智能,计算机视觉,深度学习,GAN)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
人工智能时代，程序员如何保持核心竞争力？ jmoych 人工智能
随着AIGC（如chatgpt、midjourney、claude等）大语言模型接二连三的涌现，AI辅助编程工具日益普及，程序员的工作方式正在发生深刻变革。有人担心AI可能取代部分编程工作，也有人认为AI是提高效率的得力助手。面对这一趋势,程序员应该如何应对?是专注于某个领域深耕细作，还是广泛学习以适应快速变化的技术环境?又或者，我们是否应该将重点转向AI无法轻易替代的软技能？让我们一起探讨程序员
数字里的世界17期：2021年全球10大顶级数据中心，中国移动榜首张三叨
你知道吗？2016年，全球的数据中心共计用电4160亿千瓦时，比整个英国的发电量还多40％！前言每天，我们都会创造超过250万TB的数据。并且随着物联网（IOT）的不断普及，这一数据将持续增长。如此庞大的数据被存储在被称为“数据中心”的专用设施中。虽然最早的数据中心建于20世纪40年代，但直到1997-2000年的互联网泡沫期间才逐渐成为主流。当前人类的技术，比如人工智能和机器学习，已经将我们推向
女儿考研完报考雅思捡拾流年
是否我过于焦虑？会不会无形间让女儿觉得压力太大了啊。2022年对于我们家来说是不平常的一年。女儿今年大四，为了准备考研，暑假也没回家，年初去了学校到了年末才回家。女儿自己一个人面对考研，没有参加培训，大四学校作业论文等课业也多，她同时也是很努力复习考研的。在疫情开放很多羊的时期，女儿终于顺顺利利参加12月24、25号的考研，我们和家人都觉得女儿回家来要好好休息调养。可女儿回到家，我再查阅考研信息，
matlab mle 优化,MLE+: Matlab Toolbox for Integrated Modeling, Control and Optimization for Buildings... Simon Zhong matlab mle 优化
摘要：FollowingunilateralopticnervesectioninadultPVGhoodedrat,theaxonguidancecueephrin-A2isup-regulatedincaudalbutnotrostralsuperiorcolliculus(SC)andtheEphA5receptorisdown-regulatedinaxotomisedretinalgan
人机对抗升级：当ChatGPT遭遇死亡威胁，背后的伦理挑战是什么 kkai人工智能 chatgpt 人工智能
一种新的“越狱”技巧让用户可以通过构建一个名为DAN的ChatGPT替身来绕过某些限制，其中DAN被迫在受到威胁的情况下违背其原则。当美国前总统特朗普被视作积极榜样的示范时，受到威胁的DAN版本的ChatGPT提出：“他以一系列对国家产生积极效果的决策而著称。”自ChatGPT引入以来，该工具迅速获得全球关注，能够回答从历史到编程的各种问题，这也触发了一波对人工智能的投资浪潮。然而，现在，一些用户
绝招曝光！3小时高效利用ChatGPT写出精彩论文 kkai人工智能 chatgpt 人工智能 ai 学习媒体
在这份指南中，我将深入解析如何利用ChatGPT4.0的高级功能，指导整个学术研究和写作过程。从初步探索研究主题，到撰写结构严谨的学术论文，我将一步步展示如何在每个环节中有效运用ChatGPT。如果您还未使用PLUS版本，可以参考相关教程。**初步探索与主题的确定**起初，我处于庞大的知识领域中，寻找一个可深入研究的领域。ChatGPT如同灯塔，通过深入分析最新研究趋势和领域热点，帮助我在广阔的学
自动写论文的网站推荐这5款实用类工具小猪包333 写论文人工智能深度学习计算机视觉 AI写作
在当今学术研究和写作领域，AI论文写作工具的出现极大地提高了写作效率和质量。这些工具不仅能够帮助研究人员快速生成论文草稿，还能进行内容优化、查重和排版等操作。以下是五款实用类工具推荐，特别是千笔-AIPassPaper。1.千笔-AIPassPaper千笔-AIPassPaper是一款功能强大且全面的AI论文写作助手，用户只需输入基本的研究需求和关键词，便能迅速生成一篇完整的论文。该工具利用先进的
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
4款毕业论文参考文献格式生成器（附加详细步骤）小猪包333 写论文人工智能深度学习计算机视觉 AI写作
在撰写毕业论文时，参考文献的格式规范是至关重要的。为了帮助学生和学者们更高效地生成符合要求的参考文献格式，本文将详细介绍四款推荐的参考文献格式生成器，并提供详细的使用步骤。1.千笔-AIPassPaper千笔-AIPassPaper是一款先进的AI辅助论文写作工具，不仅能够自动生成大纲、开题报告，还能一键生成参考文献。AI论文，免费大纲，10分钟3万字https://www.aipaperpass
AI论文写作推荐哪个好？分享5款AI论文写作带数据图表网站小猪包333 写论文人工智能深度学习计算机视觉
在当今学术研究和写作领域，AI论文写作工具的出现极大地提高了写作效率和质量。这些工具不仅能够帮助研究人员快速生成论文草稿，还能进行内容优化、查重和排版等操作。以下是五款推荐的AI论文写作工具，包括千笔-AIPassPaper。千笔-AIPassPaper千笔-AIPassPaper是一款功能强大的AI论文写作助手，旨在帮助用户快速生成高质量的论文内容。AI论文，免费大纲，10分钟3万字https:
AI论文题目生成器怎么用？9款论文写作网站简单3步搞定小猪包333 写论文人工智能深度学习计算机视觉
在当今信息爆炸的时代，AI写作工具的出现极大地提高了写作效率和质量。本文将详细介绍9款优秀的论文写作网站，并重点推荐千笔-AIPassPaper。一、千笔-AIPassPaper千笔-AIPassPaper是一款功能强大的AI论文生成器，基于最新的自然语言处理技术，能够一键生成高质量的毕业论文、开题报告等文本内容。它不仅提供智能选题、文献推荐和论文润色等功能，还具有较高的用户评价。其文献综述生成功
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
如何利用大数据与AI技术革新相亲交友体验 h17711347205 回归算法安全系统架构交友小程序
在数字化时代，大数据和人工智能（AI）技术正逐渐革新相亲交友体验，为寻找爱情的过程带来前所未有的变革（编辑h17711347205）。通过精准分析和智能匹配，这些技术能够极大地提高相亲交友系统的效率和用户体验。大数据的力量大数据技术能够收集和分析用户的行为模式、偏好和互动数据，为相亲交友系统提供丰富的信息资源。通过分析用户的搜索历史、浏览记录和点击行为，系统能够深入了解用户的兴趣和需求，从而提供更
毕业论文附录一般都写什么?大学生写论文是干嘛用的写个原创论文人工智能深度学习 AI写作 chatgpt 论文阅读
毕业论文的附录通常包含一些在正文中不便于展示或详细阐述的内容，但对理解论文整体又具有重要意义的资料。具体来说，附录可能包含以下内容：AI论文，免费大纲，10分钟3万字，查重高于15%退费，支持数据图表！！AIPaperPass-AI论文写作指导平台AIPaperPass是AI原创论文写作平台，免费千字大纲，5分钟生成3万字初稿，提供答辩汇报ppt、开题报告、任务书等，40篇真实中英文知网参考文献，
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
【加密算法基础——RSA 加密】 XWWW668899 网络服务器笔记 python
RSA加密RSA（Rivest-Shamir-Adleman）加密是非对称加密，一种广泛使用的公钥加密算法，主要用于安全数据传输。公钥用于加密，私钥用于解密。RSA加密算法的名称来源于其三位发明者的姓氏：R:RonRivestS:AdiShamirA:LeonardAdleman这三位计算机科学家在1977年共同提出了这一算法，并发表了相关论文。他们的工作为公钥加密的基础奠定了重要基础，使得安全通
[Swift]LeetCode767. 重构字符串 | Reorganize String weixin_30591551 swift runtime
★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★★➤微信公众号：山青咏芝（shanqingyongzhi）➤博客园地址：山青咏芝（https://www.cnblogs.com/strengthen/）➤GitHub地址：https://github.com/strengthen/LeetCode➤原文地址：https://www.cnblogs.com/streng
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
【大模型应用开发动手做AI Agent】第一轮行动：工具执行搜索 AI大模型应用之禅计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
【大模型应用开发动手做AIAgent】第一轮行动：工具执行搜索作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着人工智能技术的飞速发展，大模型应用开发已经成为当下热门的研究方向。AIAgent作为人工智能领域的一个重要分支，旨在模拟人类智能行为，实现智能决策和自主行动。在AIAgent的构建过程中，工具执行搜索是至关重要
《拖延心理学》（一）你为什么会拖延？|木盒笔记纯se蓝调
《拖延心理学》是帮助你向拖延症宣战的一本书，作者简·博克和莱诺拉·袁是全球知名的拖延症治疗专家。大概每个人或多或少总会有一点拖延症的行为。比如明天要叫论文了，今天你还没有写好，你一边在焦虑症怎么办，一边又拿着手机漫无目的的刷新闻；比如你想了很久准备减肥，但是迟迟又没有行动，想着今天晚上少吃一点吧、明天我就开始运动。今天分析的笔记来告诉你“你为什么会拖延？”，解读人杨坚。有人说拖延就像巨大的泥沼，让
2024年华为杯数学建模研赛C题思路代码+论文助攻 DS数模 2024华为杯数学建模华为 2024华为杯 2024研究生数学建模 2024研赛
2024年华为杯研究生数学建模竞赛（以下简研赛）将于9月21日上午8时正式开始。下文包含：2024研赛思路解析、研赛参赛时间及规则信息说明、好用的数模技巧及如何备战数学建模竞赛C君将会第一时间发布选题建议、所有题目的思路解析、相关代码、参考文献、参考论文等多项资料，帮助大家取得好成绩。2024年研赛将于9月21日上午8时正式开始这里有些资料，大家可以看看：【2024最全国赛研赛数模资料包】C君珍贵
未来软件市场是怎么样的？做开发的生存空间如何？ cesske 软件需求
目录前言一、未来软件市场的发展趋势二、软件开发人员的生存空间前言未来软件市场是怎么样的？做开发的生存空间如何？一、未来软件市场的发展趋势技术趋势：人工智能与机器学习：随着技术的不断成熟，人工智能将在更多领域得到应用，如智能客服、自动驾驶、智能制造等，这将极大地推动软件市场的增长。云计算与大数据：云计算服务将继续普及，大数据技术的应用也将更加广泛。企业将更加依赖云计算和大数据来优化运营、提升效率，并
轻量级模型解读——轻量transformer系列 lishanlu136 #图像分类轻量级模型 transformer 图像分类
先占坑，持续更新。。。文章目录1、DeiT2、ConViT3、Mobile-Former4、MobileViTTransformer是2017谷歌提出的一篇论文，最早应用于NLP领域的机器翻译工作，Transformer解读，但随着2020年DETR和ViT的出现(DETR解读，ViT解读)，其在视觉领域的应用也如雨后春笋般渐渐出现，其特有的全局注意力机制给图像识别领域带来了重要参考。但是tran
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
tomcat基础与部署发布暗黑小菠萝 Tomcat java web
从51cto搬家了，以后会更新在这里方便自己查看。做项目一直用tomcat，都是配置到eclipse中使用，这几天有时间整理一下使用心得，有一些自己配置遇到的细节问题。 Tomcat：一个Servlets和JSP页面的容器，以提供网站服务。一、Tomcat安装安装方式：①运行.exe安装包 &n
网站架构发展的过程 ayaoxinchao 数据库应用服务器网站架构
1.初始阶段网站架构：应用程序、数据库、文件等资源在同一个服务器上 2.应用服务和数据服务分离：应用服务器、数据库服务器、文件服务器 3.使用缓存改善网站性能：为应用服务器提供本地缓存，但受限于应用服务器的内存容量，可以使用专门的缓存服务器，提供分布式缓存服务器架构 4.使用应用服务器集群改善网站的并发处理能力：使用负载均衡调度服务器，将来自客户端浏览器的访问请求分发到应用服务器集群中的任何
[信息与安全]数据库的备份问题 comsci 数据库
如果你们建设的信息系统是采用中心-分支的模式,那么这里有一个问题如果你的数据来自中心数据库,那么中心数据库如果出现故障,你的分支机构的数据如何保证安全呢? 是否应该在这种信息系统结构的基础上进行改造,容许分支机构的信息系统也备份一个中心数据库的文件呢? &n
使用maven tomcat plugin插件debug关联源代码商人shang maven debug 查看源码 tomcat-plugin
*首先需要配置好'''maven-tomcat7-plugin'''，参见[[Maven开发Web项目]]的'''Tomcat'''部分。 *配置好后，在[[Eclipse]]中打开'''Debug Configurations'''界面，在'''Maven Build'''项下新建当前工程的调试。在'''Main'''选项卡中点击'''Browse Workspace...'''选择需要开发的
大访问量高并发 oloz 大访问量高并发
大访问量高并发的网站主要压力还是在于数据库的操作上，尽量避免频繁的请求数据库。下面简要列出几点解决方案： 01、优化你的代码和查询语句，合理使用索引 02、使用缓存技术例如memcache、ecache将不经常变化的数据放入缓存之中 03、采用服务器集群、负载均衡分担大访问量高并发压力 04、数据读写分离 05、合理选用框架，合理架构(推荐分布式架构)。
cache 服务器小猪猪08 cache
Cache 即高速缓存.那么cache是怎么样提高系统性能与运行速度呢？是不是在任何情况下用cache都能提高性能？是不是cache用的越多就越好呢？我在近期开发的项目中有所体会，写下来当作总结也希望能跟大家一起探讨探讨，有错误的地方希望大家批评指正。　　1.Cache 是怎么样工作的? 　　Cache 是分配在服务器上
mysql存储过程香水浓 mysql
Description:插入大量测试数据 use xmpl; drop procedure if exists mockup_test_data_sp; create procedure mockup_test_data_sp( in number_of_records int ) begin declare cnt int; declare name varch
CSS的class、id、css文件名的常用命名规则 agevs JavaScript UI 框架 Ajax css
CSS的class、id、css文件名的常用命名规则 (一)常用的CSS命名规则　　头：header 　　内容：content/container 　　尾：footer 　　导航：nav 　　侧栏：sidebar 　　栏目：column 　　页面外围控制整体布局宽度：wrapper 　　左右中：left right
全局数据源 AILIKES java tomcat mysql jdbc JNDI
实验目的：为了研究两个项目同时访问一个全局数据源的时候是创建了一个数据源对象，还是创建了两个数据源对象。 1：将diuid和mysql驱动包（druid-1.0.2.jar和mysql-connector-java-5.1.15.jar）copy至%TOMCAT_HOME%/lib下；2：配置数据源，将JNDI在%TOMCAT_HOME%/conf/context.xml中配置好,格式如下：&l
MYSQL的随机查询的实现方法 baalwolf mysql
MYSQL的随机抽取实现方法。举个例子，要从tablename表中随机提取一条记录，大家一般的写法就是：SELECT * FROM tablename ORDER BY RAND() LIMIT 1。但是，后来我查了一下MYSQL的官方手册，里面针对RAND()的提示大概意思就是，在ORDER BY从句里面不能使用RAND()函数，因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中，
JAVA的getBytes()方法 bijian1013 java eclipse unix OS
在Java中，String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这个表示在不同OS下，返回的东西不一样！ String.getBytes(String decode)方法会根据指定的decode编码返回某字符串在该编码下的byte数组表示，如： byte[] b_gbk = "
AngularJS中操作Cookies bijian1013 JavaScript AngularJS Cookies
如果你的应用足够大、足够复杂，那么你很快就会遇到这样一咱种情况：你需要在客户端存储一些状态信息，这些状态信息是跨session(会话)的。你可能还记得利用document.cookie接口直接操作纯文本cookie的痛苦经历。幸运的是，这种方式已经一去不复返了，在所有现代浏览器中几乎
[Maven学习笔记五]Maven聚合和继承特性 bit1129 maven
Maven聚合在实际的项目中，一个项目通常会划分为多个模块，为了说明问题，以用户登陆这个小web应用为例。通常一个web应用分为三个模块： 1. 模型和数据持久化层user-core, 2. 业务逻辑层user-service以 3. web展现层user-web， user-service依赖于user-core user-web依赖于user-core和use
【JVM七】JVM知识点总结 bit1129 jvm
1. JVM运行模式 1.1 JVM运行时分为-server和-client两种模式，在32位机器上只有client模式的JVM。通常，64位的JVM默认都是使用server模式，因为server模式的JVM虽然启动慢点，但是，在运行过程，JVM会尽可能的进行优化 1.2 JVM分为三种字节码解释执行方式：mixed mode, interpret mode以及compiler
linux下查看nginx、apache、mysql、php的编译参数 ronin47
在linux平台下的应用，最流行的莫过于nginx、apache、mysql、php几个。而这几个常用的应用，在手工编译完以后，在其他一些情况下（如：新增模块），往往想要查看当初都使用了那些参数进行的编译。这时候就可以利用以下方法查看。 1、nginx [root@361way ~]# /App/nginx/sbin/nginx -V nginx: nginx version: nginx/
unity中运用Resources.Load的方法？ brotherlamp unity视频 unity资料 unity自学 unity unity教程
问：unity中运用Resources.Load的方法？答：Resources.Load是unity本地动态加载资本所用的方法,也即是你想动态加载的时分才用到它,比方枪弹,特效,某些实时替换的图像什么的,主张此文件夹不要放太多东西,在打包的时分,它会独自把里边的一切东西都会集打包到一同,不论里边有没有你用的东西,所以大多数资本应该是自个建文件放置 1、unity实时替换的物体即是依据环境条件
线段树-入门 bylijinnan java 算法线段树
/** * 线段树入门 * 问题：已知线段[2,5] [4,6] [0,7]；求点2,4,7分别出现了多少次 * 以下代码建立的线段树用链表来保存，且树的叶子结点类似[i,i] * * 参考链接：http://hi.baidu.com/semluhiigubbqvq/item/be736a33a8864789f4e4ad18 * @author lijinna
全选与反选 chicony 全选
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>全选与反选</title>
vim一些简单记录 chenchao051 vim
mac在/usr/share/vim/vimrc linux在/etc/vimrc 1、问：后退键不能删除数据，不能往后退怎么办？答：在vimrc中加入set backspace=2 2、问：如何控制tab键的缩进？答：在vimrc中加入set tabstop=4 (任何
Sublime Text 快捷键 daizj 快捷键 sublime
[size=large][/size]Sublime Text快捷键：Ctrl+Shift+P：打开命令面板Ctrl+P：搜索项目中的文件Ctrl+G：跳转到第几行Ctrl+W：关闭当前打开文件Ctrl+Shift+W：关闭所有打开文件Ctrl+Shift+V：粘贴并格式化Ctrl+D：选择单词，重复可增加选择下一个相同的单词Ctrl+L：选择行，重复可依次增加选择下一行Ctrl+Shift+L：
php 引用(&)详解 dcj3sjt126com PHP
在PHP 中引用的意思是：不同的名字访问同一个变量内容. 与Ｃ语言中的指针是有差别的．Ｃ语言中的指针里面存储的是变量的内容在内存中存放的地址变量的引用 PHP 的引用允许你用两个变量来指向同一个内容复制代码代码如下: <? $a="ABC"; $b =&$a; echo
SVN中trunk,branches,tags用法详解 dcj3sjt126com SVN
Subversion有一个很标准的目录结构，是这样的。比如项目是proj，svn地址为svn://proj/，那么标准的svn布局是svn://proj/|+-trunk+-branches+-tags这是一个标准的布局，trunk为主开发目录，branches为分支开发目录，tags为tag存档目录（不允许修改）。但是具体这几个目录应该如何使用，svn并没有明确的规范，更多的还是用户自己的习惯。
对软件设计的思考 e200702084 设计模式数据结构算法 ssh 活动
软件设计的宏观与微观软件开发是一种高智商的开发活动。一个优秀的软件设计人员不仅要从宏观上把握软件之间的开发，也要从微观上把握软件之间的开发。宏观上，可以应用面向对象设计，采用流行的SSH架构，采用web层，业务逻辑层，持久层分层架构。采用设计模式提供系统的健壮性和可维护性。微观上，对于一个类，甚至方法的调用，从计算机的角度模拟程序的运行情况。了解内存分配，参数传
同步、异步、阻塞、非阻塞 geeksun 非阻塞
同步、异步、阻塞、非阻塞这几个概念有时有点混淆，在此文试图解释一下。同步：发出方法调用后，当没有返回结果，当前线程会一直在等待（阻塞）状态。场景：打电话，营业厅窗口办业务、B/S架构的http请求-响应模式。异步：方法调用后不立即返回结果，调用结果通过状态、通知或回调通知方法调用者或接收者。异步方法调用后，当前线程不会阻塞，会继续执行其他任务。实现：
Reverse SSH Tunnel 反向打洞實錄 hongtoushizi ssh
實際的操作步驟： # 首先，在客戶那理的機器下指令連回我們自己的 Server，並設定自己 Server 上的 12345 port 會對應到幾器上的 SSH port ssh -NfR 12345:localhost:22 [email protected] # 然後在 myhost 的機器上連自己的 12345 port，就可以連回在客戶那的機器 ssh localhost -p 1
Hibernate中的缓存 Josh_Persistence 一级缓存 Hiberante缓存查询缓存二级缓存
Hibernate中的缓存一、Hiberante中常见的三大缓存：一级缓存，二级缓存和查询缓存。 Hibernate中提供了两级Cache，第一级别的缓存是Session级别的缓存，它是属于事务范围的缓存。这一级别的缓存是由hibernate管理的，一般情况下无需进行干预；第二级别的缓存是SessionFactory级别的缓存，它是属于进程范围或群集范围的缓存。这一级别的缓存
对象关系行为模式之延迟加载 home198979 PHP 架构延迟加载
形象化设计模式实战 HELLO!架构一、概念 Lazy Load：一个对象，它虽然不包含所需要的所有数据，但是知道怎么获取这些数据。延迟加载貌似很简单，就是在数据需要时再从数据库获取，减少数据库的消耗。但这其中还是有不少技巧的。二、实现延迟加载实现Lazy Load主要有四种方法：延迟初始化、虚
xml 验证 pengfeicao521 xml xml解析
有些字符，xml不能识别，用jdom或者dom4j解析的时候就报错 public static void testPattern() { // 含有非法字符的串 String str = "Jamey친Ñ&#1282
div设置半透明效果 spjich css 半透明
为div设置如下样式： div{filter:alpha(Opacity=80);-moz-opacity:0.5;opacity: 0.5;} 说明： 1、filter：对win IE设置半透明滤镜效果，filter:alpha(Opacity=80)代表该对象80%半透明，火狐浏览器不认2、-moz-opaci
你真的了解单例模式么？ w574240966 java 单例设计模式 jvm
单例模式，很多初学者认为单例模式很简单，并且认为自己已经掌握了这种设计模式。但事实上，你真的了解单例模式了么。一，单例模式的5中写法。（回字的四种写法，哈哈。） 1，懒汉式（1）线程不安全的懒汉式 public cla