希念

论文翻译——无监督DCGAN做表征学习

UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS

DCGAN下的非监督表征学习

Abstract

摘要

In Recent year, supervised learning with convolution networks(CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has recieved less attention. In this work we hope to help beidge between the success of CNNs for supervised learning and unsupervised learning. We intoduce a class os CNNs called deep convolutional generative adversarial networks(DCGAN), that have centain architectural constrains, and demonstrate that they are a strong candidate for unsupervised learning. Trainning on various datasets, we show convincing evidence that our deep convolutional adversarial pairs learns a hierarchy of representions from objects parts to senses in both generator and discriminator. Addtionally, we use the learned feature for novel tasks —- demonstrate their appicability as general image representions.

在这些年里，机器学习中已经广泛应用CNN做监督学习。相形之下，CNN的非监督学习却并没有得到多少关注。在这份工作中，我们希望能建立起CNN中成功的监督学习和非监督学习之间的桥梁。我们介绍了一种名为DCGAN（深度卷积对抗式生成网络）的结构，为它确定了一些必须的架构约束，并揭示了它们是非监督学习下颇具竞争力的候选方案。通过在不同训练集上训练，我们可以相信，不论是判别器还是生成器，不论是单个对象还是图像全局场景，DCGAN都能学习到一系列特征。除此之外，我们使用这些学到的特征完成了一些新奇的应用——揭示了它们在普遍意义上能做出图像表征。

1. Introduction

1. 介绍

Learning reusable feature represention from large unlable datasets has been an area of active research. In the context of computer vision， In the context of conputer vision, one can leverage the practically unlimited amount of unlabeled images and videos to learn good intermidiate representions, which can then be used on a varity of supervied taska such as image classification. We propose that one way to build good image represention is by training Generative Adversarial Networks(GAN), and later reused parts of generator and discriminator networks as feature extrators for supervised tasks. GANs provide an attrative alternative to maximum likehoood techniques. One can addtional argue that their learning process and the lack of a heuristic cost function (such as pixel-wise independent mean-square error) are attrative to represention learning. GANs have been known to be unstable to train, often resulting in generator that produce nonsensical output. There has been very limit published research in trying to understand and visualize what GANs learn, and the intermidiate represention of GANs.

从大规模无标记数据集中学习到可重复使用的特征，这已经是一个活跃的研究领域了。在计算机视觉环境下，如果能从大批量无标记图像和视频中学习到良好的中间特征，就可以将它用于诸如图像分类这样的监督学习任务。我们提出，要建立图像的良好特征，训练GAN是一种方法，之后，我们会把判别器和生成器都作为可再用的特征提取器，用到监督学习的任务中。GAN实际上为最大似然估计的相关技术提供了一种颇具吸引力的替代方案。更进一步，GAN的学习过程，和对启发式损失函数要求不高，这两点对表征学习具有吸引力。GAN出了名的训练不稳定，这经常导致生成器会产出很多荒谬的结果。对于GAN中到底学习到了什么样的中间表征，在当前出版的研究里也并不多见。

In this paper, we make the following contributions：

在这篇文章里，我们做出如下贡献：

We propose and evaluate a set of constraints on the architectural topology of Convolutional GANs that make them stable to train in most settings. We name this class of architectures Deep Convolutional GANs (DCGAN)
我们提出并评价了这样一件事情：什么样的GAN体系拓扑能让训练更加稳定。我们将这一类架构称作是DCGAN
We use the trained discriminators for image classification tasks, showing competitive performance with other unsupervised algorithms.
我们使用图像分类任务上训练出来的判别器和其他的非监督算法做了比较。
We visualize the filters learnt by GANs and empirically show that specific filters have learned to draw specific objects.
我们对GAN学习到的特征做出了可视化，并经验性的证明了特殊的特征表征了特殊的对象。
We show that the generators have interesting vector arithmetic properties allowing for easy manipulation of many semantic qualities of generated samples.
针对生成器，我们提出了一个很有趣的算法向量，这个向量能很简单的在语义层面上操作生成样例的质量。

2 相关工作

2.1 REPRESENTATION LEARNING FROM UNLABELED DATA

2.1 无监督表征学习

Unsupervised representation learning is a fairly well studied problem in general computer vision research, as well as in the context of images. A classic approach to unsupervised representation learning is to do clustering on the data (for example using K-means), and leverage the clusters for improved classification scores. In the context of images, one can do hierarchical clustering of image patches (Coates & Ng, 2012) to learn powerful image representations. Another popular method is to train auto-encoders (convolutionally, stacked (Vincent et al., 2010), separating the what and where components of the code (Zhao et al., 2015), ladder structures (Rasmus et al., 2015)) that encode an image into a compact code, and decode the code to reconstruct the image as accurately as possible. These methods have also been shown to learn good feature representations from image pixels. Deep belief networks (Lee et al., 2009) have also been shown to work well in learning hierarchical representations.

公平的说，非监督表征学习在机器视觉中的研究做的已经很好了，例如对图像上下文的表征。一个经典的非监督表征学习手段是做出数据聚类（例如使用k-means），之后利用聚类结果来改善分类结果。在图像这一类场景下，可以对图像进行批处理，利用多个图像的聚类来学习到更有效的图像表征，另一个很流行的理论是训练自编码器（卷积式的自编码器），主流存在两种方式：一种是分离编码中向量的意义和位置，另一种是分析编码的梯度结构，这两种方式都能将图像作成紧编码，并且尽可能的通过解码器还原图像。这些方法已经被证明能很好的通过图像像素来学习表征。深度置信网络同样也能学习到表征的连续表达。

2.2 GENERATING NATURAL IMAGES

2.2 生成自然图像

Generative image models are well studied and fall into two categories: parametric and nonparametric.

图像的生成模型已经充分研究过，并划分为两个领域：参数化领域和非参数化领域

The non-parametric models often do matching from a database of existing images, often matching patches of images, and have been used in texture synthesis (Efros et al., 1999), super-resolution (Freeman et al., 2002) and in-painting (Hays & Efros, 2007).

非参数领域通常是在图像数据库下做匹配，经常对成批的图像做匹配，它在纹理合成，超分辨率重建和in-paiting中用的较多。

Parametric models for generating images has been explored extensively (for example on MNIST digits or for texture synthesis (Portilla & Simoncelli, 2000)). However, generating natural images of the real world have had not much success until recently. A variational sampling approach to generating images (Kingma & Welling, 2013) has had some success, but the samples often suffer from being blurry. Another approach generates images using an iterative forward diffusion process (Sohl-Dickstein et al., 2015). Generative Adversarial Networks (Goodfellow et al., 2014) generated images suffering from being noisy and incomprehensible. A laplacian pyramid extension to this approach (Denton et al., 2015) showed higher quality images, but they still suffered from the objects looking wobbly because of noise introduced in chaining multiple models. A recurrent network approach (Gregor et al., 2015) and a deconvolution network approach (Dosovitskiy et al., 2014) have also recently had some success with generating natural images. However, they have not leveraged the generators for supervised tasks.

参数模型已经被广泛研究过（例如，MNIST中手写数字的纹理合成）。尽管生成真实世界的自然图像这一点在当前并没有很大成功。尽管其中的一些变种已经取得了一定成功，但是这种采样通常十分模糊。其他的，页游诸如面向扩散过程的生成方法。GAN在图像生成方面，具有不可思议的抗噪特性。这种方法中，一种添加拉普拉斯金字塔的方法展示出了较高质量的图像，但是由于在链式乘法模型中引入了噪声，导致生成的对象看上去是摇摆的。循环网络和反卷积网络似乎在自然图像的生成上取得了一定成功，但它们并没有应用到监督任务上。

2.3 VISUALIZING THE INTERNALS OF CNNS

2.3 CNNs内部的可视化

One constant criticism of using neural networks has been that they are black-box methods, with little understanding of what the networks do in the form of a simple human-consumable algorithm. In the context of CNNs, Zeiler et. al. (Zeiler & Fergus, 2014) showed that by using deconvolutions and filtering the maximal activations, one can find the approximate purpose of each convolution filter in the network. Similarly, using a gradient descent on the inputs lets us inspect the ideal image that activates certain subsets of filters (Mordvintsev et al.).

神经网络最受人诟病的地方就是它的黑箱子属性，即使只是用它来模仿很简单的人类行为也一样，我们也对“网络内部干了什么？”这个问题理解不透。在CNNs这个领域内，Zeiler et. al.证明了使用反卷积，过滤最大激活，可以逼近网络中每一个卷积滤波器的结果。相似的，对输入图像使用梯度下降，我们可以看到滤波器子集上所激活的理想图像。

3 APPROACH AND MODEL ARCHITECTURE

3 方法和模型架构

Historical attempts to scale up GANs using CNNs to model images have been unsuccessful. This motivated the authors of LAPGAN (Denton et al., 2015) to develop an alternative approach to iteratively upscale low resolution generated images which can be modeled more reliably. We also encountered difficulties attempting to scale GANs using CNN architectures commonly used in the supervised literature. However, after extensive model exploration we identified a family of architectures that resulted in stable training across a range of datasets and allowed for training higher resolution and deeper generative models.

在过去，使用CNN构造GAN的做法并不是很成功。这驱使LAPGAN的作者开始尝试一种能更加稳定建模，并能在迭代方法下提高分辨率层次生成图像的替代方案。对于在监督学习文献里常见的使用CNN来尺度化GAN的文章，我们在重现的时候也遇到过很多苦难。不论如何，在扩大了对模型的研究之后，我们定义了一些列网络架构，这些架构能在一系列数据集上稳定训练，也能获得尺度更优也更深的生成模型。

Core to our approach is adopting and modifying three recently demonstrated changes to CNN architectures.

我们研究的核心，是对现行CNN架构三个改进的学习和纠正。

The first is the all convolutional net (Springenberg et al., 2014) which replaces deterministic spatial pooling functions (such as maxpooling) with strided convolutions, allowing the network to learn its own spatial downsampling. We use this approach in our generator, allowing it to learn its own spatial upsampling, and discriminator.

第一是，所有的卷积网络，凡是自卷积开始，使用了确定性pooling函数的，都能学习到自己空间上的降采样。我们在生成器中使用了这种方法，允许生成器学习到它自己的空间降采样，也包括判别器。

Second is the trend towards eliminating fully connected layers on top of convolutional features. The strongest example of this is global average pooling which has been utilized in state of the art image classification models (Mordvintsev et al.). We found global average pooling increased model stability but hurt convergence speed. A middle ground of directly connecting the highest convolutional features to the input and output respectively of the generator and discriminator worked well. The first layer of the GAN, which takes a uniform noise distribution Z as input, could be called fully connected as it is just a matrix multiplication, but the result is reshaped into a 4-dimensional tensor and used as the start of the convolution stack. For the discriminator, the last convolution layer is flattened and then fed into a single sigmoid output. See Fig. 1 for a visualization of an example model architecture.

第二是，消除卷积特征顶部的全连接层已经成为一种趋势。最有力的证据是在顶级表现的分类模型中使用了全局平均pooling.我们发现，全局平均pooling增强了模型稳定性，但减缓了收敛速度。一个折中的方法，是把最高层的卷积特征和输入连接起来，生成器和判别其各自做自己的输出，实测效果不错。GAN的第一层，输入了一个均匀分布的噪声Z，仅仅只是一个矩阵乘法，可以视作一个全连接层，但是结果被转化为一个4维张量并被用作卷积栈的开始。对于判别器而言，最后一层被平展开，然后输入了一个sigmoid函数,图1中对这种模型架构做了一个可视化。

Third is Batch Normalization (Ioffe & Szegedy, 2015) which stabilizes learning by normalizing the input to each unit to have zero mean and unit variance. This helps deal with training problems that arise due to poor initialization and helps gradient flow in deeper models. This proved critical to get deep generators to begin learning, preventing the generator from collapsing all samples to a single point which is a common failure mode observed in GANs. Directly applying batchnorm to all layers however, resulted in sample oscillation and model instability. This was avoided by not applying batchnorm to the generator output layer and the discriminator input layer.

第三个是BN，这种方法将每一层的输入都正则化为期望0方差1。它改进了训练问题，也缓解了深层网络中的梯度溢出问题。但实际上，这种方法在深层的生成器中被证明是不适用的，它会导致生成器反复震荡生成单点数据，这在GANs中往往是失败的。对于直接将BN使用在所有层上的方法，同样会引起震荡并导致模型不稳定，所以，不要再生成器的输入层上使用BN，也不要在判别器的输出层上使用BN。

The ReLU activation (Nair & Hinton, 2010) is used in the generator with the exception of the output layer which uses the Tanh function. We observed that using a bounded activation allowed the model to learn more quickly to saturate and cover the color space of the training distribution. Within the discriminator we found the leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013).

在生成器中，使用Relu来做激活函数，并在输出层上使用Tanh。我们发现，使用有界的函数更有助于模型迅速在训练分布中覆盖颜色空间。在判别器中，尤其是对高分辨率模型的时候，我们发现Weak Relu更好用。这是与原始GAN论文的对比，在那篇论文中，它使用Maxout来激活。

Architecture guidelines for stable Deep Convolutional GANs
稳定DCGAN的架构指导：
* Replace any pooling layers with strided convolutions (discriminator) and fractional-strided convolutions (generator).
* 判别器中，使用带步长的卷基层来替换所有pooling层，生成器中使用小步长卷积来代替pooling层。
* Use batchnorm in both the generator and the discriminator.
* 在生成器和判别器中使用BN。
* Remove fully connected hidden layers for deeper architectures.
* 去除深度架构中的全连接隐藏层。
* Use ReLU activation in generator for all layers except for the output, which uses Tanh.
* 生成器中，除去最后一层使用Tanh之外，每一层都使用ReLU来激活。
* Use LeakyReLU activation in the discriminator for all layers.
* 判别器中，每一层都使用LeakReLU来激活。

4 DETAILS OF ADVERSARIAL TRAINING

4 对抗式训练的细节

We trained DCGANs on three datasets, Large-scale Scene Understanding (LSUN) (Yu et al., 2015),Imagenet-1k and a newly assembled Faces dataset. Details on the usage of each of these datasets are given below.

我们在三个训练集上训练DCGAN，LSUN，Imagenet-1K和一个新的脸谱数据集。这三大数据集上的使用细则会在接下来给出。

No pre-processing was applied to training images besides scaling to the range of the tanh activation function [-1, 1]. All models were trained with mini-batch stochastic gradient descent (SGD) with a mini-batch size of 128. All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02. In the LeakyReLU, the slope of the leak was set to 0.2 in all models. While previous GAN work has used momentum to accelerate training, we used the Adam optimizer (Kingma & Ba, 2014) with tuned hyperparameters. We found the suggested learning rate of 0.001, to be too high, using 0.0002 instead. Additionally, we found leaving the momentum term β1 at the suggested value of 0.9 resulted in training oscillation and instability while reducing it to 0.5 helped stabilize training

所有的图像，都缩放到Tanh激活函数的定义域[-1,1]之内，除此之外没有做任何预处理。所有的模型都使用小批量SGD，一批是128张图。所有的权重都使用正态分布初始化，期望为1，方差为0.02。在LeakReLU中，负向权重全部设置为0.1。有鉴于之前的GAN中使用了动量，我们使用Adam来优化超参数。学习率中发现0.001太大了，使用0.0002。此外，我们发现动量参数 β1=0.9 的时候，训练波动大也不稳定，所以设置为0.5使训练稳定。

Figure 1: DCGAN generator used for LSUN scene modeling. A 100 dimensional uniform distribution Z is projected to a small spatial extent convolutional representation with many feature maps.A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called deconvolutions) then convert this high level representation into a 64 × 64 pixel image. Notably, no fully connected or pooling layers are used.

图1：用于LSUN场景的DCGAN的生成器。一个100维，服从均匀分布的噪声Z在一个小范围卷积表征空间中投影为许多特征映射，接着连续4个小步长卷积，将这个噪声作成一个64*64的像素图像，毫无疑问，没有全连接层或者pooling层。

4.1 LSUN

As visual quality of samples from generative image models has improved, concerns of over-fitting and memorization of training samples have risen. To demonstrate how our model scales with more data and higher resolution generation, we train a model on the LSUN bedrooms dataset containing a little over 3 million training examples. Recent analysis has shown that there is a direct link between how fast models learn and their generalization performance (Hardt et al., 2015). We show samples from one epoch of training (Fig.2), mimicking online learning, in addition to samples after convergence (Fig.3), as an opportunity to demonstrate that our model is not producing high quality samples via simply overfitting/memorizing training examples. No data augmentation was applied to the images.

生成模型中采样的视觉质量改善了对过拟合的关注和对训练样本的记忆。为了证明我们的模型在不同分辨率下都能良好的度量数据，我们在LSUN卧室数据集上训练了超过3000000张图。当前的分析证明，生成表现和模型的学习速度之间有显著关系。我们在图2中展示了一个epoch的样例，就像在线学习那样，在收敛之后又继续添加了样例，以此证明，我们的模型并不是因为过拟合或者记住了训练样本才表现好的。图像中的数据量并没有任何增加。

Figure 2: Generated bedrooms after one training pass through the dataset. Theoretically, the model could learn to memorize training examples, but this is experimentally unlikely as we train with a small learning rate and minibatch SGD. We are aware of no prior empirical evidence demonstrating memorization with SGD and a small learning rate.

图2：这是一次训练后生成的卧室。理论上而言，模型是可以记住训练样本的，但是在小学习率和小批量SGD下这并不是一个经验事实，实际上我们意识到SGD和小学习率下并没有先验的经验证据。

Figure 3: Generated bedrooms after five epochs of training. There appears to be evidence of visual under-fitting via repeated noise textures across multiple samples such as the base boards of some of the beds.

图3:5个epoch训练后生成的卧室。有证据表明多个样本上的噪声导致欠拟合。

4.1.1 DEDUPLICATION

To further decrease the likelihood of the generator memorizing input examples (Fig.2) we perform a simple image de-duplication process. We fit a 3072-128-3072 de-noising dropout regularized RELU autoencoder on 32×32 downsampled center-crops of training examples. The resulting code layer activations are then binarized via thresholding the ReLU activation which has been shown to be an effective information preserving technique (Srivastava et al., 2014) and provides a convenient form of semantic-hashing, allowing for linear time de-duplication . Visual inspection of hash collisions showed high precision with an estimated false positive rate of less than 1 in 100. Additionally, the technique detected and removed approximately 275,000 near duplicates, suggesting a high recall.

为了降低生成物和所记忆的输入样本的相似程度，我们执行了一个简单的图像二分过程。我们在训练样本中 32×32 降采样中心切片（就是在训练样本正中间切了32*32出来）上施加了一个3072-128-3072的去噪 dropout+ReLU的自编码器。编码结果层使用ReLU激活阈值进行二值化（这已经被证明是一种有效的信息保存手段），它提供了一个语义hash的简单形式，允许在线性时间内进行二分。这个hash编码可视化结果的错误率不超过1%。此外，此技术的召回率也很好，在复制品的选择和删除上逼近275000。

4.2 FACES

4.2 脸

We scraped images containing human faces from random web image queries of peoples names. The people names were acquired from dbpedia, with a criterion that they were born in the modern era. This dataset has 3M images from 10K people. We run an OpenCV face detector on these images, keeping the detections that are sufficiently high resolution, which gives us approximately 350,000 face boxes. We use these face boxes for training. No data augmentation was applied to the images.

我们从网上的图像里面按名字挖了人脸，人名字是从dbpedia中查询出来的，这些人的共同特点是都出生在现代。这个数据集包括1万人，3百万图片，我们在这些图像上用OpenCV跑了一个人脸detection，保证detection具有足够的分辨率，这给了我们35万张脸，我们拿这些脸来训练。图像没有做数据扩张。

4.3 IMAGENET-1K

We use Imagenet-1k (Deng et al., 2009) as a source of natural images for unsupervised training. We train on 32 × 32 min-resized center crops. No data augmentation was applied to the images.

imagenet-1k作为非监督学习下自然图像的来源。我们训练了图像正中间 32×32 的截图。无数据扩张。

5 EMPIRICAL VALIDATION OF DCGANS CAPABILITIES

5 DCGAN能力的经验确认

5.1 CLASSIFYING CIFAR-10 USING GANS AS A FEATURE EXTRACTOR

5.1 CIFAR-10 上使用GAN作为特征提取器进行分类

One common technique for evaluating the quality of unsupervised representation learning algorithms is to apply them as a feature extractor on supervised datasets and evaluate the performance of linear models fitted on top of these features.

用于非监督学习算法评估的一般技术是，将这些算法作为特征提取器应用到一些有监督的数据集上，计算线性模型在这些特征上的拟合表现

On the CIFAR-10 dataset, a very strong baseline performance has been demonstrated from a well tuned single layer feature extraction pipeline utilizing K-means as a feature learning algorithm. When using a very large amount of feature maps (4800) this technique achieves 80.6% accuracy. An unsupervised multi-layered extension of the base algorithm reaches 82.0% accuracy (Coates & Ng, 2011). To evaluate the quality of the representations learned by DCGANs for supervised tasks, we train on Imagenet-1k and then use the discriminator’s convolutional features from all layers, maxpooling each layers representation to produce a 4 × 4 spatial grid. These features are then flattened and concatenated to form a 28672 dimensional vector and a regularized linear L2-SVM classifier is trained on top of them. This achieves 82.8% accuracy, out performing all K-means based approaches. Notably, the discriminator has many less feature maps (512 in the highest layer) compared to K-means based techniques, but does result in a larger total feature vector size due to the many layers of 4 × 4 spatial locations. The performance of DCGANs is still less than that of Exemplar CNNs (Dosovitskiy et al., 2015), a technique which trains normal discriminative CNNs in an unsupervised fashion to differentiate between specifically chosen, aggressively augmented, exemplar samples from the source dataset. Further improvements could be made by finetuning the discriminator’s representations, but we leave this for future work. Additionally, since our DCGAN was never trained on CIFAR-10 this experiment also demonstrates the domain robustness of the learned features.

在CIFAR-10这个数据集上，单层特征提取pipline，使用k-means作为特征学习算法的时候，具备了极佳表现。在大规模特征映射下，这项技术达到了80.6%。基础算法的一个非监督多层扩展达到了82.0%.为了评价DCGAN在有监督任务下的表现，我们在Image-1K上训练，之后使用判别器所有层上的卷积特征，使用maxpooling将每一层的表征表示为一个 4×4 的方块，这些特征平展后串联，用于表示28672空间向量，然后使用一个正则 L2−SVM 来做分类器，这样做达到了82.8%，比所有的k-means都强。注意到，判别器拥有许多小的特征映射，但是最后产生了很大尺寸的总特征，这是因为每一层的特征加在一起后很多。DCGAN比起经典的CNN模型还是有差距，如果将经典CNN的判别器fineturing过来，预计会表现得更好。此外，由于我们的DCGAN从来没有在CIFAR-10上训练过，所以这个结果也表现出算法自身的强大鲁棒性

Table 1: CIFAR-10 classification results using our pre-trained model. Our DCGAN is not pretrained on CIFAR-10, but on Imagenet-1k, and the features are used to classify CIFAR-10 images.
表1：CIFAR-10使用我们预训练模型的结果，我们的DCGAN只在Imagenet-1k上训练过，没有在CIFAR-10上训练过，特征提取器是直接搬过来的。

Model	Accuracy	Accuracy (400 per class)	max # of features units
1 Layer K-means	80.6%	63.7% (±0.7%)	4800
3 Layer K-means Learned RF	82.0%	70.7% (±0.7%)	3200
View Invariant K-means	81.9%	72.6% (±0.7%)	6400
Exemplar CNN	84.3%	77.4% (±0.2%)	1024
DCGAN (ours) + L2-SVM	82.8%	73.8% (±0.4%)	512

5.2 CLASSIFYING SVHN DIGITS USING GANS AS A FEATURE EXTRACTOR

On the StreetView House Numbers dataset (SVHN)(Netzer et al., 2011), we use the features of the discriminator of a DCGAN for supervised purposes when labeled data is scarce. Following similar dataset preparation rules as in the CIFAR-10 experiments, we split off a validation set of 10,000 examples from the non-extra set and use it for all hyperparameter and model selection. 1000 uniformly class distributed training examples are randomly selected and used to train a regularized linear L2-SVM classifier on top of the same feature extraction pipeline used for CIFAR-10. This achieves state of the art (for classification using 1000 labels) at 22.48% test error, improving upon another modifcation of CNNs designed to leverage unlabled data (Zhao et al., 2015). Additionally, we validate that the CNN architecture used in DCGAN is not the key contributing factor of the model’s performance by training a purely supervised CNN with the same architecture on the same data and optimizing this model via random search over 64 hyperparameter trials (Bergstra & Bengio, 2012). It achieves a signficantly higher 28.87% validation error.

跟CIFAR-10一样处理模型之后，这模型在StreetView这个数据集上有22.48%的错误。CNN的架构在DCGAN里面并不是模型表现的关键因素：在使用同样的架构训练一个监督的CNN，反而变差了。

6 INVESTIGATING AND VISUALIZING THE INTERNALS OF THE NETWORKS

We investigate the trained generators and discriminators in a variety of ways. We do not do any kind of nearest neighbor search on the training set. Nearest neighbors in pixel or feature space are trivially fooled (Theis et al., 2015) by small image transforms. We also do not use log-likelihood metrics to quantitatively assess the model, as it is a poor (Theis et al., 2015) metric

我们用了很多方式来评价训练处的判别器和生成器。这其中并不包括愚蠢的最近邻搜索和对数似然估计度量。

Table 2: SVHN classification with 1000 labels

Model	error rate
KNN	77.93%
TSVM	66.55%
M1+KNN	65.63%
M1+TSVM	54.33%
M1+M2	36.02%
SWWAE without dropout	27.83%
SWWAE with dropout	23.56%
DCGAN (ours) + L2-SVM	22.48%
Supervised CNN with the same architecture	28.87% (validation)

6.1 WALKING IN THE LATENT SPACE

6.1 潜在空间行走，以下视作概率空间行走

The first experiment we did was to understand the landscape of the latent space. Walking on the manifold that is learnt can usually tell us about signs of memorization (if there are sharp transitions) and about the way in which the space is hierarchically collapsed. If walking in this latent space results in semantic changes to the image generations (such as objects being added and removed), we can reason that the model has learned relevant and interesting representations. The results are shown in Fig.4.

我们做的第一个实验就是针对概率空间的全貌描述。流形上的行走既可以告诉我们一些关于记忆的信号，也可以告诉我们一些空间层次的问题。如果在概率空间内的行走导致了图像语义的改变，我们可以推断模型的学习到了切题而有趣的表征，结果如图4所示。

Figure 4: Top rows: Interpolation between a series of 9 random points in Z show that the space learned has smooth transitions, with every image in the space plausibly looking like a bedroom. In the 6th row, you see a room without a window slowly transforming into a room with a giant window. In the 10th row, you see what appears to be a TV slowly being transformed into a window.

图4：Z流形上随机插值9点的图像。

6.2 VISUALIZING THE DISCRIMINATOR FEATURES

6.2 判别器特征可视化

Previous work has demonstrated that supervised training of CNNs on large image datasets results in very powerful learned features (Zeiler & Fergus, 2014). Additionally, supervised CNNs trained on scene classification learn object detectors (Oquab et al., 2014). We demonstrate that an unsupervised DCGAN trained on a large image dataset can also learn a hierarchy of features that are interesting.

前人的工作已经说明了监督学习下的CNN可以在大规模数据集上学习到很有用的特征。此外，学习classification的监督CNN也可以学习detection。我们揭示了大图像集上无监督DCGAN同样能学习到一系列有趣的特征。

Using guided backpropagation as proposed by (Springenberg et al., 2014), we show in Fig.5 that the features learnt by the discriminator activate on typical parts of a bedroom, like beds and windows. For comparison, in the same figure, we give a baseline for randomly initialized features that are not activated on anything that is semantically relevant or interesting.

通过反馈，在图5中我们展示了判别器在卧室中典型部分上激活之后学习到的特征，例如床或者窗户。相比之下，在同样一张图上，我们给出了无激活的随机初始化特征，它更加语义化，也更加有趣。

Figure 5: On the right, guided backpropagation visualizations of maximal axis-aligned responses for the first 6 learned convolutional features from the last convolution layer in the discriminator. Notice a significant minority of features respond to beds - the central object in the LSUN bedrooms dataset. On the left is a random filter baseline. Comparing to the previous responses there is little to no discrimination and random structure.

图5：右边是前6个卷积特征子的最大轴对称响应，左边是随机滤波器基准。

6.3 MANIPULATING THE GENERATOR REPRESENTATION

6.3 操作生成器表征

6.3.1 FORGETTING TO DRAW CERTAIN OBJECTS

6.3.1 特定对象的遗忘

In addition to the representations learnt by a discriminator, there is the question of what representations the generator learns. The quality of samples suggest that the generator learns specific object representations for major scene components such as beds, windows, lamps, doors, and miscellaneous furniture. In order to explore the form that these representations take, we conducted an experiment to attempt to remove windows from the generator completely.

除了判别器的表征学习之外，生成器的表征学习也是一个问题。样本质量上可以看到，生成器学习到了场景中的主要构建，如床，窗户，灯，门和一些家具。为了探索表征的形式，我们构造一个实验从生成器中完全去除掉“窗户”这一对象。

On 150 samples, 52 window bounding boxes were drawn manually. On the second highest convolution layer features, logistic regression was fit to predict whether a feature activation was on a window (or not), by using the criterion that activations inside the drawn bounding boxes are positives and random samples from the same images are negatives. Using this simple model, all feature maps with weights greater than zero ( 200 in total) were dropped from all spatial locations. Then, random new samples were generated with and without the feature map removal.

在150个样例上，手动选择了52个窗户。在自高往低第二个卷积层上，使用logistics来判断特征响应是不是一个窗户，画了窗户的响应是正的，否则是负的。在这个简单的模型下，在所有空间位置上，所有大于0的特征映射全部dropout。随后，分别使用dropout后和dropout之前的特征来生成随机的新样例。

The generated images with and without the window dropout are shown in Fig.6, and interestingly, the network mostly forgets to draw windows in the bedrooms, replacing them with other objects.
图6使两种方式下的生成结果，有趣的是，网络忘记了画卧室里的窗户，它使用其他物体来替换掉了。

Figure 6: Top row: un-modified samples from model. Bottom row: the same samples generated with dropping out ”window” filters. Some windows are removed, others are transformed into objects with similar visual appearance such as doors and mirrors. Although visual quality decreased, overall scene composition stayed similar, suggesting the generator has done a good job disentangling scene representation from object representation. Extended experiments could be done to remove other objects from the image and modify the objects the generator draws.
图6：上面一排是没有dropout窗户的，下面是dropout之后的。

6.3.2 VECTOR ARITHMETIC ON FACE SAMPLES

6.3.2 脸抽样的向量算法

In the context of evaluating learned representations of words (Mikolov et al., 2013) demonstrated that simple arithmetic operations revealed rich linear structure in representation space. One canonical example demonstrated that the vector(”King”) - vector(”Man”) + vector(”Woman”) resulted in a vector whose nearest neighbor was the vector for Queen. We investigated whether similar structure emerges in the Z representation of our generators. We performed similar arithmetic on the Z vectors of sets of exemplar samples for visual concepts. Experiments working on only single samples per concept were unstable, but averaging the Z vector for three examplars showed consistent and stable generations that semantically obeyed the arithmetic. In addition to the object manipulation shown in (Fig. 7), we demonstrate that face pose is also modeled linearly in Z space (Fig. 8). These demonstrations suggest interesting applications can be developed using Z representations learned by our models. It has been previously demonstrated that conditional generative models can learn to convincingly model object attributes like scale, rotation, and position (Dosovitskiy et al., 2014). This is to our knowledge the first demonstration of this occurring in purely unsupervised models. Further exploring and developing the above mentioned vector arithmetic could dramatically reduce the amount of data needed for conditional generative modeling of complex image distributions.

word2vector算法揭示了简单的算法操作也可以在特征空间中揭示丰富的线性结构。一个权威的例子是 vector(“国王”) - vector(“男人”) + vector(“女人”) 所得的结果最近邻居是vector(“皇后”)，我们的z流形上也呈现相似的结构。我们展示了Z向量集合上相似的例子。单例上的实验结果并不稳定，但是三样例的结果明显具备算术特性。如图7所示，我们揭示了脸的姿态在Z空间上是线性的。前人的工作已经揭示出，条件生成模型可以学习到诸如伸缩，旋转和平移这样的操作。我们第一次揭示出无监督得到的模型也能具备这样的特性，在复杂图像分布下的生成模型中，这个基于向量的算术特性可以显著地做出降维。

Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples are averaged. Arithmetic was then performed on the mean vectors creating a new vector Y . The center sample on the right hand side is produce by feeding Y as input to the generator. To demonstrate the interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was added to Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples) results in noisy overlap due to misalignment.
图7：生成向量的算术特性在人脸上的表现。

Figure 8: A ”turn” vector was created from four averaged samples of faces looking left vs looking right. By adding interpolations along this axis to random samples we were able to reliably transform their pose.
图8 算术向量一个从左到右的连续变换性质。

7 CONCLUSION AND FUTURE WORK

We propose a more stable set of architectures for training generative adversarial networks and we give evidence that adversarial networks learn good representations of images for supervised learning and generative modeling. There are still some forms of model instability remaining - we noticed as models are trained longer they sometimes collapse a subset of filters to a single oscillating mode.Further work is needed to tackle this from of instability. We think that extending this framework to other domains such as video (for frame prediction) and audio (pre-trained features for speech synthesis) should be very interesting. Further investigations into the properties of the learnt latent space would be interesting as well.

有价值的是：随着训练时间变长，会有一部分特征子坍塌为一个单震荡的模式。

你可能感兴趣的:(深度学习)

【机器学习】探索未来科技的前沿：人工智能、机器学习与大模型 AIGC零基础入门小白 AI大模型大模型教程人工智能机器学习科技 AI大模型 AIGC AI教程大模型教程
文章目录引言一、人工智能：从概念到现实1.1人工智能的定义1.2人工智能的发展历史1.3人工智能的分类1.4人工智能的应用二、机器学习：人工智能的核心技术2.1机器学习的定义2.2机器学习的分类2.3机器学习的实现原理2.4机器学习的应用2.5机器学习的示例代码2.6解释代码三、大模型：推动AI前沿发展的关键技术3.1大模型的定义3.2大模型的发展历程3.3深度学习与神经网络3.4大模型的优势与挑
基于YOLOv8的火灾智能检测系统设计与实现斟的是酒中桃深度学习人工智能 pyqt yolo
在各类安全事故中，火灾因其突发性强、破坏力大，一直是威胁人们生命财产安全的重大隐患。传统的火灾检测方式多依赖烟雾传感器、温度传感器等，存在响应滞后、易受环境干扰等问题。随着深度学习技术的飞速发展，基于计算机视觉的火灾检测方法凭借其实时性强、检测范围广等优势，逐渐成为研究热点。本文将简单介绍一款基于深度学习的火灾智能检测系统的设计与实现过程。一、系统整体设计本火灾智能检测系统旨在通过深度学习技术实现
人工智能入门指南：从基础概念到实际应用
前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站。https://www.captainbed.cn/north文章目录1.**人工智能的基本概念**1.1什么是人工智能？1.2人工智能的分类2.**人工智能的核心技术**2.1机器学习（MachineLearning）2.1.1机器学习的类型2.1.2机器学习流程2.2深度学习（DeepLearni
Datawhale X 魔塔 Ai夏令营 --深度学习基础
一、局部极小值与全局极小值全局极小值：在损失函数的整个定义域内，损失值最小的点。这是我们在训练深度学习模型时希望找到的点，因为它代表着模型的最佳性能。局部极小值：在损失函数的一个局部区域内，损失值达到最小，但在整个函数定义域内可能不是最小的。当优化算法陷入局部极小值时，它可能会误以为已经找到了全局最优解，从而停止搜索。局部极小值的检测两种直观的方法来检测局部极小值：可视化方法：对于低维问题，我们可
深度学习模块实践手册（第十二期）加油吧zkf 目标检测目标检测模块解析与实践深度学习人工智能计算机视觉目标检测 python
56、Ghost模块论文《GhostNet:MoreFeaturesfromCheapOperations》1、作用：Ghost模块是一种轻量级的特征提取模块，旨在通过廉价操作生成更多特征图，减少计算量的同时保持模型性能。传统卷积神经网络在生成特征图时存在大量冗余计算，Ghost模块通过将特征图生成过程分解为两个步骤，有效减少了计算复杂度，特别适合移动端和嵌入式设备部署。2、机制Ghost模块的机
DETR革命：目标检测的Transformer时代加油吧zkf 目标检测 YOLO python 开发语言人工智能图像处理
《DETR从0到1：目标检测Transformer的崛起》为什么会有DETR？在深度学习目标检测发展史上，2014~2019年几乎被基于卷积神经网络（CNN）的检测器统治：两阶段：FasterR-CNN、MaskR-CNN单阶段：YOLO、SSD、RetinaNet这些检测器虽然效果强大，但背后依赖：✅Anchor（先验框）✅NMS（非极大值抑制）✅特征金字塔、手工设计问题：结构复杂、调参困难、不
深度学习模块实践手册（第十一期）加油吧zkf 目标检测目标检测模块解析与实践深度学习人工智能计算机视觉目标检测 python
46、缩放点积注意力模块论文《AttentionIsAllYouNeed》1、作用：缩放点积注意力（ScaledDot-ProductAttention）是Transformer模型的核心组件，旨在解决序列建模中长距离依赖关系捕捉的问题。传统的循环神经网络（RNN）在处理长序列时存在梯度消失或爆炸的问题，且并行性较差。该模块通过计算查询（Query）、键（Key）和值（Value）之间的相似度，实
基于NanoDet的健身姿势纠正系统开发 YOLO实战营人工智能 NanoDet 深度学习计算机视觉 ui
1.引言在现代健身行业中，正确的运动姿势至关重要，不仅能提升训练效果，还能预防运动损伤。尤其是在进行一些高强度的力量训练时，如深蹲、俯卧撑等，错误的姿势可能导致肌肉不平衡或关节损伤。传统的健身姿势纠正方式依赖教练的人工指导，但随着人工智能技术的发展，使用计算机视觉和深度学习技术来进行姿势纠正，逐渐成为一种高效且可扩展的解决方案。本文将详细介绍如何基于NanoDet（一个轻量化目标检测模型）开发一个
大模型算法工程师技术路线全解析：从基础到资深的能力跃迁 Mr.小海大模型算法数据挖掘人工智能机器学习深度学习机器翻译 web3
文章目录大模型算法工程师技术路线全解析：从基础到资深的能力跃迁一、基础阶段（0-2年经验）：构建核心知识体系与工程入门数学与机器学习基础编程与深度学习框架NLP与Transformer入门二、进阶阶段（2-4年经验）：深化模型技术与工程落地能力大模型预训练与微调技术预训练原理：数据与任务的协同设计微调工具：参数高效适配与工程优化对齐实践：价值观优化与实证效果分布式训练与框架工具并行策略：多维度协同
【深度学习-Day 36】CNN的开山鼻祖：从LeNet-5到AlexNet的架构演进之路吴师兄大模型深度学习入门到精通 python pytorch 开发语言人工智能 CNN 深度学习大模型
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
人脸识别实战：使用Python OpenCV 和深度学习进行人脸识别(2)
先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前阿里P7深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！因此收集整理了一份《2024年最新Python全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课
TensorFlow深度学习实战——DCGAN详解与实现盼小辉丶深度学习 tensorflow 生成对抗网络
TensorFlow深度学习实战——DCGAN详解与实现0.前言1.DCGAN架构2.构建DCGAN生成手写数字图像2.1生成器与判别器架构2.2构建DCGAN相关链接0.前言深度卷积生成对抗网络(DeepConvolutionalGenerativeAdversarialNetwork,DCGAN)是一种基于生成对抗网络(GenerativeAdversarialNetwork,GAN)的深度学
基于cnn和resnet和mobilenet对比实现驾驶员分心检测深度学习乐园 cnn 人工智能神经网络
演示效果及获取项目源码点击文末名片本项目旨在通过深度学习技术，结合卷积神经网络（CNN）模型、ResNet模型和MobileNet模型，实现对驾驶员分心行为的自动检测。我们通过训练这些模型来识别不同的驾驶员分心行为，包括如发短信、通话、喝水等行为。使用的数据集包含驾驶员行为的图片，并且针对每个行为标注了相应的标签（例如"正常驾驶"、"右手发短信"等）。MobileNetV2是Google于2018
opencv 4.12.0版本发布详解：核心优化与新特性全解析 Risehuxyc #opencv opencv 人工智能计算机视觉
OpenCV4.12.0夏季更新带来核心模块优化、图像处理增强、深度学习支持扩展及新兴硬件适配，全面提升计算机视觉开发效率与性能。引言OpenCV（开源计算机视觉库）作为计算机视觉领域最受欢迎的开源库之一，在2025年7月发布了4.12.0版本。这个夏季更新带来了大量性能优化、新功能和错误修复，覆盖了核心模块、图像处理、3D校准、深度学习等多个领域。本文将详细介绍OpenCV4.12.0的主要更新
如何用深度学习实现图像风格迁移
最近研学过程中发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击链接跳转到网站人工智能及编程语言学习教程。读者们可以通过里面的文章详细了解一下人工智能及其编程等教程和学习方法。下面开始对正文内容的介绍。前言图像风格迁移是人工智能领域中一个非常有趣且富有创意的应用。它能够让一张普通的照片瞬间变成梵高笔下的《星月夜》风格，或者像莫奈的《睡莲》一样充满艺术感。这种技术不仅在
AI人工智能领域TensorFlow的模型训练策略 AIGC应用创新大全人工智能 tensorflow python ai
AI人工智能领域TensorFlow的模型训练策略关键词：TensorFlow、模型训练、深度学习、神经网络、优化策略、分布式训练、迁移学习摘要：本文将深入探讨TensorFlow框架下的模型训练策略，从基础概念到高级技巧，全面解析如何高效训练深度学习模型。我们将从数据准备、模型构建、训练优化到部署应用，一步步揭示TensorFlow模型训练的核心技术，并通过实际代码示例展示最佳实践。背景介绍目的
ROS2 通过相机确定物品坐标位置
要实现通过相机确定物品坐标位置，通常需要相机标定、物体检测和坐标转换几个步骤。下面我将提供一个完整的解决方案，包括相机标定、物体检测和3D坐标估计。1.系统架构相机标定-获取相机内参和畸变系数物体检测-使用OpenCV或深度学习模型检测物品坐标转换-将2D图像坐标转换为3D世界坐标ROS2集成-将上述功能集成到ROS2节点中2.实现步骤2.1创建功能包bashros2pkgcreateobject
【机器学习&深度学习】什么是量化？一叶千舟深度学习【理论】机器学习深度学习人工智能
目录前言一、量化的基本概念1.1量化对比示例1.2量化是如何实现的？二、为什么要进行量化？2.1解决模型体积过大问题2.2降低对算力的依赖2.3加速模型训练和推理2.4优化训练过程2.5降低部署成本小结：量化的应用场景三、量化的类型与实现3.1权重量化（WeightQuantization）3.2激活量化（ActivationQuantization）3.3梯度量化（GradientQuantiz
基于AutoCut实现在文档中按照片段剪辑视频 Mr数据杨 Python 音频技术音视频
本项目致力于通过构建一个具备深度学习支持的多功能视频处理环境，为用户提供高效、智能的视频编辑和字幕生成工具。依托Anaconda环境管理工具和PyTorch的GPU加速能力，用户能够迅速搭建一个符合项目需求的Python环境。结合FunClip的源代码以及相关插件的安装和配置，用户可充分利用项目所支持的图像、音频识别功能，并以极少的配置便获得理想的视频裁剪效果。项目的核心在于简化深度学习项目的环境
基于深度学习的和平精英（吃鸡）内置锁头训练摆烂仙君深度学习人工智能
前言本教程以和平精英为例，主要讲解如何构建深度学习模型对游戏中角色进行头部标注，并控制鼠标对其进行锁定射击，同时围绕其游戏防作弊系统进行算法攻防讲解，该方案对于csgo,cf等游戏也同样适用。请注意，该教程仅供娱乐教学，若本教程评论超过100，将会开源相关代码并对实际的代码部署进行进一步分析。一、和平精英伤害机制分析在《刺激战场》（现为《和平精英》）中，击中头部的伤害远高于身体其他部位，这是由游戏
迁移学习让深度学习更容易城市中迷途小书童
摘要：一文读懂迁移学习及其对深度学习发展的影响！深度学习在一些传统方法难以处理的领域有了很大的进展。这种成功是由于改变了传统机器学习的几个出发点，使其在应用于非结构化数据时性能很好。如今深度学习模型可以玩游戏，检测癌症，和人类交谈，自动驾驶。深度学习变得强大的同时也需要很大的代价。进行深度学习需要大量的数据、昂贵的硬件、甚至更昂贵的精英工程人才。在ClouderaFastForward实验室，我们
股票基金量化开源平台对比 Mr.小海开源开源金融
股票基金量化开源平台对比分析报告引言研究背景与意义在金融科技快速发展的背景下，量化交易已成为现代金融市场中投资者追求高效与精准交易的核心工具。通过程序化方式，投资者能够迅速处理海量市场数据，制定并执行复杂交易策略，其高效性、低情绪干扰及策略多样性等优势显著[1]。特别是随着人工智能技术的深化，2025年基于深度学习与机器学习的开源量化工具持续涌现，推动行业向数据驱动转型——量化交易将决策逻辑从经验
开源基金/股票量化平台调研报告 Mr.小海金融
开源基金/股票量化平台调研报告引言调研背景与目的近年来，随着人工智能技术的持续深化，量化交易领域迎来了深刻变革。2025年，基于深度学习和机器学习的开源工具不断涌现，不仅在技术层面实现突破，更在实际应用中展现出强大竞争优势，推动行业创新与升级[1].作为融合数学、统计与计算机技术的科技驱动型金融策略，量化交易通过自动化与数据驱动方法提升投资决策效率与准确性，已成为金融机构与投资者追求超额收益的重要
Python Gradio：快速搭建人脸识别应用 Python编程之道 Python人工智能与大数据 Python编程之道 python 开发语言 ai
PythonGradio：快速搭建人脸识别应用关键词：Python,Gradio,人脸识别,深度学习,计算机视觉,交互式应用,模型部署摘要：本文详细介绍了如何使用Python的Gradio库快速搭建一个交互式的人脸识别应用。我们将从基础概念出发，逐步讲解人脸识别的核心算法原理、Gradio的界面设计方法，并通过完整的项目实战演示如何将深度学习模型部署为可交互的Web应用。文章包含详细的代码实现、数
DataWhale 二月组队学习-深入浅出pytorch-Task04 －273.15K DataWhale组队学习学习 pytorch 人工智能
一、自定义损失函数1.损失函数的作用与自定义意义在深度学习中，损失函数（LossFunction）用于衡量模型预测结果与真实标签之间的差异，是模型优化的目标。PyTorch内置了多种常用损失函数（如交叉熵损失nn.CrossEntropyLoss、均方误差nn.MSELoss等）。但在实际任务中，可能需要针对特定问题设计自定义损失函数，例如：处理类别不平衡问题（如加权交叉熵）实现特殊业务需求（如对
大模型核心概念 | 嵌入模型（Embedding）、向量模型（Vector Model）
一、核心概念解析1.1嵌入模型（Embedding）作为AI领域的核心基础技术，嵌入模型通过将非结构化数据映射为低维稠密向量，实现语义特征的深度捕捉：文本嵌入：如将语句转换为1536维向量，使"机器学习"与"深度学习"的向量余弦相似度达0.92跨模态嵌入：支持图像与文本的联合向量空间映射，如CLIP模型实现文图互搜1.2向量模型（VectorModel）作为嵌入技术的下游应用体系，主要包含两大方向
Python实现神经网络算法指南代码编织匠人 python 神经网络算法
Python实现神经网络算法指南神经网络是一种模拟人脑神经元结构进行信息处理的机器学习算法。在深度学习领域中，神经网络是最为强大的算法之一。Python作为一门简单易学的编程语言，也成为了许多人选择实现神经网络算法的首选语言。在本篇文章中，我们将通过Python代码来实现神经网络算法。导入必要的库为了实现神经网络算法，我们需要导入一些必要的Python库，包括numpy和matplotlib。其中
基于DTLC-AEC与DTLN的轻量级实时语音增强系统设计与实现神经网络15044 仿真模型神经网络机器学习图像处理 cnn 人工智能机器人
基于DTLC-AEC与DTLN的轻量级实时语音增强系统设计与实现前些天发现了一个巨牛的人工智能学习网站，通俗易懂，风趣幽默，忍不住分享一下给大家。点击跳转到网站。1.引言在当今的互联网通信时代，实时语音通信已成为人们日常生活中不可或缺的一部分。然而，语音通信质量常常受到回声、背景噪声等因素的严重影响。为了解决这些问题，我们需要高效的语音增强技术。本文将详细介绍如何将DTLC-AEC（深度学习回声消
目标检测-YOLOv5 wydxry 深度学习目标检测 YOLO 人工智能深度学习
YOLOv5介绍YOLOv5是YOLO系列的第五个版本，由Ultralytics团队发布。虽然YOLOv5并非JosephRedmon原团队发布，但它在YOLOv4的基础上进行了重要的优化和改进，成为了深度学习目标检测领域中的热门模型之一。YOLOv5的优势不仅体现在其性能上，还包括其简洁易用、部署便捷的特点。相较于YOLOv4，YOLOv5对于代码框架的重构、推理速度的提升，以及模型的轻量化等方
【DW11月-深度学习】Task03前馈神经网络沫2021
参考链接：https://datawhalechina.github.io/unusual-deep-learning/#/4.%E5%89%8D%E9%A6%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C一、神经元模型2.1神经元1943年，美国神经生理学家沃伦·麦卡洛克(WarrenMcCulloch)和数学家沃尔特·皮茨(WalterPitts)对生物神经元进行
Spring的注解积累 yijiesuifeng spring 注解
用注解来向Spring容器注册Bean。需要在applicationContext.xml中注册： <context:component-scan base-package=”pagkage1[,pagkage2,…,pagkageN]”/>。如：在base-package指明一个包 <context:component-sc
传感器百合不是茶 android 传感器
android传感器的作用主要就是来获取数据,根据得到的数据来触发某种事件下面就以重力传感器为例; 1,在onCreate中获得传感器服务 private SensorManager sm;// 获得系统的服务 private Sensor sensor;// 创建传感器实例 @Override protected void
[光磁与探测]金吕玉衣的意义 comsci
这是一个古代人的秘密:现在告诉大家信不信由你们: 穿上金律玉衣的人,如果处于灵魂出窍的状态,可以飞到宇宙中去看星星这就是为什么古代
精简的反序打印某个数沐刃青蛟打印
以前看到一些让求反序打印某个数的程序。比如：输入123，输出321。记得以前是告诉你是几位数的，当时就抓耳挠腮，完全没有思路。似乎最后是用到%和/方法解决的。而今突然想到一个简短的方法，就可以实现任意位数的反序打印（但是如果是首位数或者尾位数为0时就没有打印出来了）代码如下： long num, num1=0;
PHP：6种方法获取文件的扩展名 IT独行者 PHP 扩展名
PHP：6种方法获取文件的扩展名 1、字符串查找和截取的方法 1 $extension = substr ( strrchr ( $file , '.' ), 1); 2、字符串查找和截取的方法二 1 $extension = substr
面试111 文强chu 面试
1事务隔离级别有那些，事务特性是什么（问到一次） 2 spring aop 如何管理事务的，如何实现的。动态代理如何实现，jdk怎么实现动态代理的，ioc是怎么实现的，spring是单例还是多例，有那些初始化bean的方式，各有什么区别（经常问） 3 struts默认提供了那些拦截器（一次） 4 过滤器和拦截器的区别（频率也挺高） 5 final，finally final
XML的四种解析方式小桔子 dom jdom dom4j sax
在平时工作中，难免会遇到把 XML 作为数据存储格式。面对目前种类繁多的解决方案，哪个最适合我们呢？在这篇文章中，我对这四种主流方案做一个不完全评测，仅仅针对遍历 XML 这块来测试，因为遍历 XML 是工作中使用最多的（至少我认为）。　　预备　　测试环境：　　AMD 毒龙1.4G OC 1.5G、256M DDR333、Windows2000 Server
wordpress中常见的操作 aichenglong 中文注册 wordpress 移除菜单
1 wordpress中使用中文名注册解决办法 1)使用插件 2)修改wp源代码进入到wp-include/formatting.php文件中找到 function sanitize_user( $username, $strict = false
小飞飞学管理-1 alafqq 管理
项目管理的下午题，其实就在提出问题（挑刺），分析问题，解决问题。今天我随意看下10年上半年的第一题。主要就是项目经理的提拨和培养。结合我自己经历写下心得对于公司选拔和培养项目经理的制度有什么毛病呢？ 1，公司考察，选拔项目经理，只关注技术能力，而很少或没有关注管理方面的经验，能力。 2，公司对项目经理缺乏必要的项目管理知识和技能方面的培训。 3，公司对项目经理的工作缺乏进行指
IO输入输出部分探讨百合不是茶 IO
//文件处理在处理文件输入输出时要引入java.IO这个包； /* 1，运用File类对文件目录和属性进行操作 2，理解流，理解输入输出流的概念 3，使用字节/符流对文件进行读/写操作 4，了解标准的I/O 5，了解对象序列化 */ //1，运用File类对文件目录和属性进行操作 //在工程中线创建一个text.txt
getElementById的用法 bijian1013 element
getElementById是通过Id来设置/返回HTML标签的属性及调用其事件与方法。用这个方法基本上可以控制页面所有标签，条件很简单，就是给每个标签分配一个ID号。返回具有指定ID属性值的第一个对象的一个引用。语法： &n
励志经典语录 bijian1013 励志人生
经典语录1: 哈佛有一个著名的理论：人的差别在于业余时间，而一个人的命运决定于晚上8点到10点之间。每晚抽出2个小时的时间用来阅读、进修、思考或参加有意的演讲、讨论，你会发现，你的人生正在发生改变，坚持数年之后，成功会向你招手。不要每天抱着QQ/MSN/游戏/电影/肥皂剧……奋斗到12点都舍不得休息，看就看一些励志的影视或者文章，不要当作消遣；学会思考人生，学会感悟人生
[MongoDB学习笔记三]MongoDB分片 bit1129 mongodb
MongoDB的副本集(Replica Set)一方面解决了数据的备份和数据的可靠性问题，另一方面也提升了数据的读写性能。MongoDB分片(Sharding)则解决了数据的扩容问题，MongoDB作为云计算时代的分布式数据库，大容量数据存储，高效并发的数据存取，自动容错等是MongoDB的关键指标。本篇介绍MongoDB的切片(Sharding) 1.何时需要分片 &nbs
【Spark八十三】BlockManager在Spark中的使用场景 bit1129 manager
1. Broadcast变量的存储，在HttpBroadcast类中可以知道 2. RDD通过CacheManager存储RDD中的数据，CacheManager也是通过BlockManager进行存储的 3. ShuffleMapTask得到的结果数据，是通过FileShuffleBlockManager进行管理的，而FileShuffleBlockManager最终也是使用BlockMan
yum方式部署zabbix ronin47 yum方式部署zabbix
安装网络yum库#rpm -ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-release-2.4-1.el6.noarch.rpm 通过yum装mysql和zabbix调用的插件还有agent代理#yum install zabbix-server-mysql zabbix-web-mysql mysql-
Hibernate4和MySQL5.5自动创建表失败问题解决方法 byalias J2EE Hibernate4
今天初学Hibernate4，了解了使用Hibernate的过程。大体分为4个步骤： ①创建hibernate.cfg.xml文件 ②创建持久化对象 ③创建*.hbm.xml映射文件 ④编写hibernate相应代码在第四步中，进行了单元测试，测试预期结果是hibernate自动帮助在数据库中创建数据表，结果JUnit单元测试没有问题，在控制台打印了创建数据表的SQL语句，但在数据库中
Netty源码学习-FrameDecoder bylijinnan java netty
Netty 3.x的user guide里FrameDecoder的例子，有几个疑问： 1.文档说：FrameDecoder calls decode method with an internally maintained cumulative buffer whenever new data is received. 为什么每次有新数据到达时，都会调用decode方法？ 2.Dec
SQL行列转换方法 chicony 行列转换
create table tb(终端名称 varchar(10) , CEI分值 varchar(10) , 终端数量 int) insert into tb values('三星' , '0-5' , 74) insert into tb values('三星' , '10-15' , 83) insert into tb values('苹果' , '0-5' , 93)
中文编码测试 ctrain 编码
循环打印转换编码 String[] codes = { "iso-8859-1", "utf-8", "gbk", "unicode" }; for (int i = 0; i < codes.length; i++) { for (int j
hive 客户端查询报堆内存溢出解决方法 daizj hive 堆内存溢出
hive> select * from t_test where ds=20150323 limit 2; OK Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 问题原因： hive堆内存默认为256M 这个问题的解决方法为：修改/us
人有多大懒，才有多大闲 (评论『卓有成效的程序员』) dcj3sjt126com 程序员
卓有成效的程序员给我的震撼很大，程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可以那么勤奋，每天都孜孜不倦得做着重复单调的工作。在看这本书之前，我属于勤奋的人，而看完这本书以后，我要努力变成懒惰的人。不要在去庞大的开始菜单里面一项一项搜索自己的应用程序，也不要在自己的桌面上放置眼花缭乱的快捷图标
Eclipse简单有用的配置 dcj3sjt126com eclipse
1、显示行号 Window -- Prefences -- General -- Editors -- Text Editors -- show line numbers 2、代码提示字符 Window ->Perferences，并依次展开 Java -> Editor -> Content Assist，最下面一栏 auto-Activation
在tomcat上面安装solr4.8.0全过程 eksliang Solr solr4.0后的版本安装 solr4.8.0安装
转载请出自出处： http://eksliang.iteye.com/blog/2096478 首先solr是一个基于java的web的应用，所以安装solr之前必须先安装JDK和tomcat，我这里就先省略安装tomcat和jdk了第一步：当然是下载去官网上下载最新的solr版本，下载地址
Android APP通用型拒绝服务、漏洞分析报告 gg163 漏洞 android APP 分析
点评：记得曾经有段时间很多SRC平台被刷了大量APP本地拒绝服务漏洞，移动安全团队爱内测（ineice.com）发现了一个安卓客户端的通用型拒绝服务漏洞，来看看他们的详细分析吧。 0xr0ot和Xbalien交流所有可能导致应用拒绝服务的异常类型时，发现了一处通用的本地拒绝服务漏洞。该通用型本地拒绝服务可以造成大面积的app拒绝服务。针对序列化对象而出现的拒绝服务主要
HoverTree项目已经实现分层 hvt 编程 .net Web C#ASP.ENT
HoverTree项目已经初步实现分层，源代码已经上传到 http://hovertree.codeplex.com请到SOURCE CODE查看。在本地用SQL Server 2008 数据库测试成功。数据库和表请参考：http://keleyi.com/a/bjae/ue6stb42.htmHoverTree是一个ASP.NET 开源项目，希望对你学习ASP.NET或者C#语言有帮助，如果你对
Google Maps API v3: Remove Markers 移除标记天梯梦 google maps api
Simply do the following: I. Declare a global variable: var markersArray = []; II. Define a function: function clearOverlays() { for (var i = 0; i < markersArray.length; i++ )
jQuery选择器总结 lq38366 jquery 选择器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
基础数据结构和算法六：Quick sort sunwinner Algorithm Quicksort
Quick sort is probably used more widely than any other. It is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster t
如何让Flash不遮挡HTML div元素的技巧_HTML/Xhtml_网页制作刘星宇 html Web
今天在写一个flash广告代码的时候，因为flash自带的链接，容易被当成弹出广告，所以做了一个div层放到flash上面，这样链接都是a触发的不会被拦截，但发现flash一直处于div层上面，原来flash需要加个参数才可以。让flash置于DIV层之下的方法，让flash不挡住飘浮层或下拉菜单，让Flash不档住浮动对象或层的关键参数：wmode=opaque。方法如下：
Mybatis实用Mapper SQL汇总示例 wdmcygah sql mysql mybatis 实用
Mybatis作为一个非常好用的持久层框架，相关资料真的是少得可怜，所幸的是官方文档还算详细。本博文主要列举一些个人感觉比较常用的场景及相应的Mapper SQL写法，希望能够对大家有所帮助。不少持久层框架对动态SQL的支持不足，在SQL需要动态拼接时非常苦恼，而Mybatis很好地解决了这个问题，算是框架的一大亮点。对于常见的场景，例如：批量插入/更新/删除，模糊查询，多条件查询，联表查询，