LIQING LIN

17_2Representation Learning and Generative Learning Deep Convolutional_Progressive Growing Style GAN

17_Representation Tying权重 CNN RNN denoising Sparse Autoencoder_潜在loss_accuracy_TSNE_KL Divergence_L1_hasing Autoencoders : https://blog.csdn.net/Linli522362242/article/details/116576478

cp17_GAN for Synthesizing Data_fully connected layer 2 convolutional_colab_ax.transAxes_twiny_spine : https://blog.csdn.net/Linli522362242/article/details/116565829

cp17_2GAN for Synthesizing_upsample_Transposed_Batch normalization_DCGAN(transposed convolution in GAN)_KL_JS divergence_双轴_EM_tape : https://blog.csdn.net/Linli522362242/article/details/117370337

Generative Adversarial Networks

Generative adversarial networks were proposed in a 2014 paper(Ian Goodfellow et al., “Generative Adversarial Nets,” Proceedings of the 27th International Conference on Neural
Information Processing Systems 2 (2014): 2672–2680.) by Ian Goodfellow et al., and although the idea got researchers excited almost instantly, it took a few years to overcome some of the difficulties of training GANs. Like many great ideas, it seems simple in hindsight事后诸葛亮: make neural networks compete against each other in the hope that this competition will push them to excel. As shown in Figure 17-15, a GAN is composed of two neural networks:

Generator

Takes a random distribution as input (typically Gaussian) and outputs some data—typically, an image. You can think of the random inputs as the latent representations (i.e., codings) of the image to be generated. So, as you can see, the generator offers the same functionality as a decoder in a variational autoencoder, and it can be used in the same way to generate new images (just feed it some Gaussian noise, and it outputs a brand-new image). However, it is trained very differently, as we will soon see.
Discriminator

Takes either a fake image from the generator or a real image from the training set as input, and must guess whether the input image is fake or real.

Figure 17-15. A generative adversarial network

During training, the generator and the discriminator have opposite goals: the discriminator tries to tell fake images from real images, while the generator tries to produce images that look real enough to trick the discriminator. Because the GAN is composed of two networks with different objectives, it cannot be trained like a regular neural network. Each training iteration is divided into two phases:

• In the first phase, we train the discriminator. A batch of real images is sampled from the training set and is completed with an equal number of fake images produced by the generator. The labels are set to 0 for fake images and 1 for real images, and the discriminator is trained on this labeled batch for one step, using the binary cross-entropy loss. Importantly, backpropagation only optimizes the weights of the discriminator during this phase.
• In the second phase, we train the generator. We first use it to produce another batch of fake images, and once again the discriminator is used to tell whether the images are fake or real. This time we do not add real images in the batch, and all the labels are set to 1 (real): in other words, we want the generator to produce images that the discriminator will (wrongly) believe to be real! Crucially, the weights of the discriminator are frozen during this step, so backpropagation only affects the weights of the generator.

The generator never actually sees any real images, yet it gradually learns to produce convincing fake images! All it gets is the gradients flowing back through the discriminator. Fortunately, the better the discriminator gets, the more information about the real images is contained in these secondhand gradients, so the generator can make significant progress.

Let’s go ahead and build a simple GAN for Fashion MNIST.

First, we need to build the generator and the discriminator. The generator is similar to an autoencoder’s decoder, and the discriminator is a regular binary classifier (it takes an image as input and ends with a Dense layer containing a single unit and using the sigmoid activation function). For the second phase of each training iteration, we also need the full GAN model containing the generator followed by the discriminator:

import numpy as np
import tensorflow as tf

np.random.seed(42)
tf.random.set_seed(42)

codings_size=30

generator = keras.models.Sequential([
  keras.layers.Dense( 100, activation="selu", input_shape=[codings_size] ),
  keras.layers.Dense( 150, activation="selu" ),
  keras.layers.Dense( 28*28, activation="sigmoid"),
  keras.layers.Reshape([28,28])                                   
])

discriminator = keras.models.Sequential([
  keras.layers.Flatten( input_shape=[28,28] ),
  keras.layers.Dense( 150, activation="selu" ),
  keras.layers.Dense( 100, activation="selu" ),
  keras.layers.Dense( 1, activation="sigmoid" )                                       
])

gan = keras.models.Sequential([ generator, discriminator ])

Next, we need to compile these models. As the discriminator is a binary classifier, we can naturally use the binary cross-entropy loss. The generator will only be trained through the gan model, so we do not need to compile it at all. The gan model is also a binary classifier, so it can use the binary cross-entropy loss. Importantly, the discriminator should not be trained during the second phase, so we make it non-trainable before compiling the gan model:

discriminator.compile( loss="binary_crossentropy", optimizer="rmsprop" )

discriminator.trainable = False
gan.compile( loss="binary_crossentropy", optimizer="rmsprop" )

The trainable attribute is taken into account by Keras only when compiling a model, so after running this code, the discriminator is trainable if we call its fit() method or its train_on_batch() method (which we will be using), while it is not trainable when we call these methods on the gan model.

Since the training loop is unusual, we cannot use the regular fit() method. Instead, we will write a custom training loop. For this, we first need to create a Dataset to iterate through the images:

from tensorflow import keras

(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()

X_train_full = X_train_full.astype( np.float32 )/255
X_test = X_test.astype( np.float32 )/255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices( X_train ).shuffle( 1000 )
dataset = dataset.batch( batch_size, drop_remainder=True ).prefetch(1)

import matplotlib.pyplot as plt

def plot_multiple_images( images, n_cols=None ):
  n_cols = n_cols or len(images)
  n_rows = ( len(images)-1 )//n_cols + 1
  if images.shape[-1] == 1:
    images = np.squeeze( images, axis=-1 )
    
  plt.figure( figsize=(n_cols, n_rows) )
  for index, image in enumerate(images):
    plt.subplot( n_rows, n_cols, index+1 )
    plt.imshow( image, cmap="binary" )
    plt.axis("off")

We are now ready to write the training loop. Let’s wrap it in a train_gan() function.
https://blog.csdn.net/Linli522362242/article/details/116565829

As discussed earlier, you can see the two phases at each iteration:

• In 1st phase, we feed Gaussian noise to the generator to produce fake images( ̂ = () ),
and we complete this batch by concatenating an equal number of real images. The targets y1 are set to 0 for fake images and 1 for real images.
Then we train the discriminator on this batch. Note that we set the discriminator’s trainable attribute to True: this is only to get rid of a warning that Keras displays when it notices that trainable is now False but was True when the model was compiled (or vice versa).
• In 2nd phase, we feed the GAN some Gaussian noise. Its generator will start by producing fake images( ̂ = () ), then the discriminator will try to guess whether these images are fake or real. We want the discriminator to believe that the fake images are real, so the targets y2 are set to 1. Note that we set the trainable attribute to False, once again to avoid a warning.

def train_gan( gan, dataset, batch_size, codings_size, n_epochs=50 ):
  # gan = keras.models.Sequential([ generator, discriminator ])
  generator, discriminator = gan.layers
  for epoch in range( n_epochs ):
    print( "Epoch {}/{}".format( epoch+1, n_epochs ) )
    for X_batch in dataset:

      # phase 1 - training the discriminator
      ######################## ̂ = () ######################## 
      noise = tf.random.normal( shape=[batch_size, codings_size] ) # mean=0.0, stddev=1.0
      generated_images = generator( noise ) # training=True

      X_fake_and_real = tf.concat( [generated_images, X_batch], 
                                   axis=0 
                                 )
                # label generated_images and X_batch
                # OR
                # tf.zeros_like(generated_images) + tf.ones_like(X_batch) 
      y1 = tf.constant( [[0.]]*batch_size + [[1.]]*batch_size ) # + is tf.concat
      discriminator.trainable=True ########
      discriminator.train_on_batch( X_fake_and_real, 
                                    y1 ) # Runs a single gradient update on a single batch of data.

      # phase 2 - training the generator
      noise = tf.random.normal( shape=[batch_size, codings_size ] )
      # for training the generator, we swap the labels of real and fake examples
      # by assigning label 1 to the outputs of the generator
      y2 = tf.constant( [[1.]] * batch_size )
      discriminator.trainable = False ########
      gan.train_on_batch( noise, y2 )

    plot_multiple_images( generated_images, 8 )
    plt.show()

train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)

Figure 17-16. Images generated by the GAN after one epoch of training

That’s it! If you display the generated images (see Figure 17-16), you will see that at the end of the first epoch, they already start to look like (very noisy) Fashion MNIST images.

tf.random.set_seed(42)
np.random.seed(42)

noise = tf.random.normal( shape=[batch_size, codings_size] )
generated_images = generator( noise )
plot_multiple_images( generated_images, 8 )

Unfortunately, the images never really get much better than that, and you may even find epochs where the GAN seems to be forgetting what it learned. Why is that? Well, it turns out that training a GAN can be challenging. Let’s see why.

The Difficulties of Training GANs

During training, the generator and the discriminator constantly try to outsmart each other, in a zero-sum game. As training advances, the game may end up in a state that game theorists call a Nash equilibrium纳什均衡, named after the mathematician John Nash: this is when no player would be better off changing their own strategy, assuming the other players do not change theirs. For example, a Nash equilibrium is reached when everyone drives on the left side of the road: no driver would be better off being the only one to switch sides. Of course, there is a second possible Nash equilibrium: when everyone drives on the right side of the road. Different initial states and dynamics may lead to one equilibrium or the other. In this example, there is a single optimal strategy once an equilibrium is reached (i.e., driving on the same side as everyone else), but a Nash equilibrium can involve multiple competing strategies (e.g., a predator[ˈpredətər]捕食者 chases its prey[preɪ]猎物, the prey tries to escape, and neither would be better off changing their strategy).

So how does this apply to GANs? Well, the authors of the paper demonstrated that a GAN can only reach a single Nash equilibrium: that’s when the generator produces perfectly realistic images, and the discriminator is forced to guess (50% real, 50% fake). This fact is very encouraging: it would seem that you just need to train the GAN for long enough, and it will eventually reach this equilibrium, giving you a perfect generator. Unfortunately, it’s not that simple: nothing guarantees that the equilibrium will ever be reached.
https://blog.csdn.net/Linli522362242/article/details/117370337

The biggest difficulty is called mode collapse: this is when the generator’s outputs gradually become less diverse(OR One common cause of failure in training GANs is when the generator gets stuck in a small subspace and learns to generate similar samples. This is called mode collapse, and an example is shown in the previous figure.). How can this happen? Suppose that the generator gets better at producing convincing shoes than any other class. It will fool the discriminator a bit more with shoes, and this will encourage it to produce even more images of shoes. Gradually, it will forget how to produce anything else. Meanwhile, the only fake images that the discriminator will see will be shoes, so it will also forget how to discriminate fake images of other classes. Eventually, when the discriminator manages to discriminate the fake shoes from the real ones, the generator will be forced to move to another class. It may then become good at shirts, forgetting about shoes, and the discriminator will follow. The GAN may gradually cycle across a few classes, never really becoming very good at any of them.

Moreover, because the generator and the discriminator are constantly pushing against each other, their parameters may end up oscillating振荡的 and becoming unstable. Training may begin properly, then suddenly diverge for no apparent reason, due to these instabilities. And since many factors affect these complex dynamics, GANs are very sensitive to the hyperparameters: you may have to spend a lot of effort fine-tuning them.

These problems have kept researchers very busy since 2014: many papers were published on this topic, some proposing new cost functions ### For a nice comparison of the main GAN losses, check out this great GitHub project by Hwalsuk Lee. ### (though a 2018 paper ### Mario Lucic et al., “Are GANs Created Equal? A Large-Scale Study,” Proceedings of the 32nd International Conference on Neural Information Processing Systems (2018): 698–707. ### by Google researchers questions their efficiency) or techniques to stabilize training or to avoid the mode collapse issue. For example, a popular technique called experience replay consists in

storing the images produced by the generator at each iteration in a replay buffer (gradually dropping older generated images) and
training the discriminator using real images plus fake images drawn from this buffer (rather than just fake images produced by the current generator). This reduces the chances that the discriminator will overfit the latest generator’s outputs.

Another common technique is called mini-batch discrimination: it measures how similar images are across the batch and provides this statistic to the discriminator, so it can easily reject a whole batch of fake images that lack diversity. This encourages the generator to produce a greater variety of images, reducing the chance of mode collapse. Other papers simply propose specific architectures that happen to perform well.

In short, this is still a very active field of research, and the dynamics of GANs are still not perfectly understood. But the good news is that great progress has been made, and some of the results are truly astounding! So let’s look at some of the most successful architectures, starting with deep convolutional GANs, which were the state of the art just a few years ago. Then we will look at two more recent (and more complex) architectures.

Deep Convolutional GANs

The original GAN paper in 2014 experimented with convolutional layers, but only tried to generate small images. Soon after, many researchers tried to build GANs based on deeper convolutional nets for larger images. This proved to be tricky, as training was very unstable, but Alec Radford et al. finally succeeded in late 2015, after
experimenting with many different architectures and hyperparameters. They called their architecture deep convolutional GANs (DCGANs).(Alec Radford et al., “Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks,” arXiv preprint arXiv:1511.06434 (2015).) Here are the main guidelines they proposed for building stable convolutional GANs:

• Replace any pooling layers with strided convolutions (in the discriminator) and transposed convolutions (in the generator).
• Use Batch Normalization in both the generator and the discriminator, except in the generator’s output layer and the discriminator’s input layer.
• Remove fully connected hidden layers for deeper architectures.
• Use ReLU activation in the generator for all layers except the output layer, which should use tanh.
• Use leaky ReLU activation in the discriminator for all layers.

These guidelines will work in many cases, but not always, so you may still need to experiment with different hyperparameters (in fact, just changing the random seed and training the same model again will sometimes work). For example, here is a small DCGAN that works reasonably well with Fashion MNIST:

The generator

takes codings of size 100, and it projects them to 6272 dimensions (7 *7 * 128), and
reshapes the result to get a 7 × 7 × 128 tensor.
This tensor is batch normalized
and fed to a transposed convolutional layer with a stride of 2, which upsamples it from 7 × 7 to 14 × 14 and reduces its depth from 128 to 64.This layer uses the selu activation function.
The result is batch normalized again
and fed to another transposed convolutional layer with a stride of 2, which upsamples it from 14 × 14 to 28 × 28 and reduces the depth from 64 to 1. This layer uses the tanh activation function, so the outputs will range from –1 to 1. For this reason, before training the GAN, we need to rescale the training set to that same range. We also need to reshape it to add the channel dimension:

tf.random.set_seed(42)
np.random.seed(42)

codings_size = 100

generator = keras.models.Sequential([
  keras.layers.Dense( units=7*7*128, input_shape=[codings_size,]), # ==> # 128*7*7=6272
  keras.layers.Reshape([7,7,128]),
  keras.layers.BatchNormalization(),
  # tf.keras.layers.selu(),

  keras.layers.Conv2DTranspose( 64, kernel_size=5, 
                                strides=2, padding="SAME", # (14,14,filters=64)
                                activation="selu"
                              ),
  keras.layers.BatchNormalization(),

  keras.layers.Conv2DTranspose( 1, kernel_size=5, 
                                strides=2, padding="SAME", # (28,28,filters=1)
                                activation="tanh"
                              ),                                                                 
])

generator.build()
generator.summary()

The discriminator looks much like a regular CNN for binary classification, except instead of using max pooling layers to downsample the image, we use strided convolutions (strides=2). Also note that we use the leaky ReLU activation function.

Overall, we respected the DCGAN guidelines, except we replaced the BatchNormalization layers in the discriminator with Dropout layers (otherwise training was unstable in this case) and we replaced ReLU with SELU in the generator. Feel free to tweak this architecture: you will see how sensitive it is to the hyperparameters (especially the relative learning rates of the two networks).

discriminator = keras.models.Sequential([
  keras.layers.Conv2D( 64, kernel_size=5, 
                       strides=2, padding="SAME",          # (14,14,filters=64)
                       activation=keras.layers.LeakyReLU(0.2),
                       input_shape=[28,28,1],
                     ),
  keras.layers.Dropout(0.4),

  keras.layers.Conv2D( 128, kernel_size=5, 
                       strides=2, padding="SAME",          # (7,7,filters=128)
                       activation=keras.layers.LeakyReLU(0.2),
                     ),
  keras.layers.Dropout(0.4),

  keras.layers.Flatten(),
  keras.layers.Dense(1, activation="sigmoid"),                                                                              
])

discriminator.build()
discriminator.summary()

gan = keras.models.Sequential([generator, discriminator])

gan.build()
gan.summary()

The generator's last layer( transposed convolutional layer) uses the tanh activation function, so the outputs will range from –1 to 1. For this reason, before training the GAN, we need to rescale the training set to that same range. We also need to reshape it to add the channel dimension:

# scale them by a factor of 2 and shift them by –1 such that 
# the pixel intensities will be rescaled to be in the range [–1, 1]
X_train_dcgan = X_train.reshape(-1, 28,28,1)*2. -1. # reshape and rescale

Lastly, to build the dataset, then compile and train this model, we use the exact same code as earlier.

batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices( X_train_dcgan )
dataset = dataset.shuffle(1000)
dataset = dataset.batch( batch_size, drop_remainder=True ).prefetch(1)

train_gan(gan, dataset, batch_size, codings_size)

After 50 epochs of training, the generator produces images like those shown in Figure 17-17. It’s still not perfect, but many of these images are pretty convincing.
==>Figure 17-17. Images generated by the DCGAN after 50 epochs of training

Generating Fashion MNIST Images

tf.random.set_seed(42)
np.random.seed(42)

noise = tf.random.normal( shape=[batch_size, codings_size] )
generated_images = generator(noise)

plot_multiple_images( generated_images, 8)

#############################
Figure 17-18. Vector arithmetic for visual concepts (part of figure 7 from the DCGAN paper)(Reproduced with the kind authorization of the authors.)

If you scale up this architecture and train it on a large dataset of faces, you can get fairly realistic images. In fact, DCGANs can learn quite meaningful latent representations, as you can see in Figure 17-18: many images were generated, and nine of them were picked manually (top left), including

3 representing men with glasses,
3 men without glasses, and
3 women without glasses.

For each of these categories, the codings that were used to generate the images were averaged, and an image was generated based on the resulting mean codings (lower left). In short, each of the three lower-left images represents the mean of the three images located above it. But this is not a simple mean computed at the pixel level (this would result in three overlapping faces), it is a mean computed in the latent space, so the images still look like normal faces. Amazingly, if you compute men with glasses, minus men without glasses, plus women without glasses—where each term corresponds to one of the mean codings—and you generate the image that corresponds to this coding, you get the image at the center of the 3 × 3 grid of faces on the right: a woman with glasses! The eight other images around it were generated based on the same vector plus a bit of noise, to illustrate the semantic interpolation (https://blog.csdn.net/Linli522362242/article/details/117370337) capabilities of DCGANs. Being able to do arithmetic on faces feels like science fiction!

If you add each image’s class as an extra input to both the generator and the discriminator, they will both learn what each class looks like, and thus you will be able to control the class of each image produced by the generator. This is called a conditional GAN (CGAN)(Mehdi Mirza and Simon Osindero, “Conditional Generative Adversarial Nets,” arXiv preprint arXiv:1411.1784 (2014).). ###Conditional Generative Adversarial Nets (https://arxiv.org/pdf/1411.1784.pdf) uses the class label information and learns to synthesize new images conditioned on the provided label, that is, ̃ = (|). Furthermore, conditional GANs allows us to do image-to-image translation, which is to learn how to convert a given image from a specific domain to another. In this context, one interesting work is the Pix2Pix algorithm, published in the paper Image-to-Image Translation with Conditional Adversarial Networks by PhilipIsola et al.(https://arxiv.org/pdf/1611.07004.pdf). It is worth mentioning that in the Pix2Pix algorithm, the discriminator provides the real/fake predictions for multiple patches across the image as opposed to a single prediction for an entire image.

DCGANs aren’t perfect, though. For example, when you try to generate very large images using DCGANs, you often end up with locally convincing features but overall inconsistencies (such as shirts with one sleeve much longer than the other). How can you fix this?

Progressive Growing of GANs (PGGAN)

An important technique was proposed in a 2018 paper(Tero Karras et al., “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” Proceedings of the International Conference on Learning Representations (2018).) by Nvidia researchers Tero Karras et al.: they suggested

generating small images at the beginning of training, 先训一个小分辨率的图像生成，训好了之后再逐步过渡到更高分辨率的图像。然后稳定训练当前分辨率，再逐步过渡到下一个更高的分辨率。
then gradually adding convolutional layers to both the generator and the discriminator to produce larger and larger images (4 × 4, 8 × 8, 16 × 16, …, 512 × 512, 1,024 × 1,024).
This approach resembles greedy layer-wise training of stacked autoencoders(https://blog.csdn.net/Linli522362242/article/details/116576478).
The extra layers get added at the end of the generator and at the beginning of the discriminator, and previously trained layers remain trainable.

Figure 17-19. Progressively growing GAN: a GAN generator outputs 4 × 4 color images (left); we extend it to output 8 × 8 images (right)

For example, when growing the generator’s outputs from 4 × 4 to 8 × 8 (see Figure 17-19),

an upsampling layer (using nearest neighbor filtering) is added to the existing convolutional layer, so it outputs 8 × 8 feature maps,
which are then fed to the new convolutional layer (which uses "same" padding and strides of 1, so its outputs are also 8 × 8).
This new layer is followed by a new output convolutional layer: this is a regular convolutional layer with kernel size 1 that projects the outputs down to the desired number of color channels (e.g., 3=rgb).
To avoid breaking the trained weights of the 1st convolutional layer when the new convolutional layer(Conv 2) is added, the final output is a weighted sum of the original output layer (which now outputs 8 × 8 feature maps) and the new output layer. The weight of the new outputs is α, while the weight of the original outputs is 1 – α, and α is slowly increased from 0 to 1. In other words, the new convolutional layers (represented with dashed lines in Figure 17-19) are gradually faded in逐渐淡入, while the original output layer is gradually faded out.
A similar fade-in/fade-out technique is used when a new convolutional layer is added to the discriminator (followed by an average pooling layer for downsampling).

The paper also introduced several other techniques aimed at increasing the diversity of the outputs (to avoid mode collapse) and making training more stable:

Minibatch standard deviation layer

Added near the end of the discriminator. For each spatial position in the inputs, it computes the standard deviation across all channels and all instances in the batch ( S = tf.math.reduce_std( inputs, axis=[0, -1] ) ).

These standard deviations are then averaged across all points to get a single value (v = tf.reduce_mean(S)).

Finally, an extra feature map is added to each instance in the batch and filled with the computed value (
```
tf.concat(  [ inputs, 
              tf.fill( [batch_size, height, width, 1], 
                       v 
                     ) 
            ],
            axis=-1
         )
```
). How does this help? Well, if the generator produces images with little variety, then there will be a small standard deviation across feature maps in the discriminator. Thanks to this layer, the discriminator will have easy access to this statistic, making it less likely to be fooled by a generator that produces too little diversity. This will encourage the generator to produce more diverse outputs, reducing the risk of mode collapse.
我们有N个样本的feature maps(为了画图方便，不妨假设每个样本只有一个feature map OR channels=1)，我们对每个空间位置求标准差standard deviation，用numpy的std函数来说就是沿着样本的维度(feature)求std。这样就得到一张新的feature map(如果样本的feature map不止一个，那么这样构造得到的feature map数量应该是一致的)，接着feature map求平均average得到一个数。这个过程简单来说就是求mean std，作者把这个数复制成一张feature map的大小，跟原来的input feature maps 拼在一起送给Discriminator。

从作者放出来的代码来看，这对应averaging=“all”的情况。作者还尝试了其他的统计量：“spatial”，“gpool”，“flat”等。它们的主要差别在于沿着哪些维度求标准差。至于它们的作用，等我的代码复现完成了会做一个测试。估计作者调参发现“all”的效果最好。
Equalized learning rate均衡学习率

Initializes all weights using a simple Gaussian distribution with mean 0 and standard deviation 1 rather than using He initialization. However,
the weights are scaled down at runtime (i.e., every time the layer is executed) by the same factor as in He initialization: they are divided by , where is the number of inputs to the layer ~~.
The paper demonstrated that this technique significantly improved the GAN’s performance when using RMSProp, Adam, or other adaptive gradient optimizers. Indeed, these optimizers normalize the gradient updates by their estimated standard deviation (see Chapter 11 https://blog.csdn.net/Linli522362242/article/details/106982127), so parameters(OR weights) that have a larger dynamic range(The dynamic range of a variable is the ratio between the highest and the lowest value it may take.) will take longer to train, while parameters with a small dynamic range may be updated too quickly, leading to instabilities. By rescaling the weights as part of the model itself rather than just rescaling them upon initialization, this approach ensures that the dynamic range is the same for all parameters, throughout training, so they all learn at the same speed. This both speeds up and stabilizes training.

第二种normalization方法跟凯明大神的初始化方法[4]挂钩。He的初始化方法能够确保网络初始化的时候，随机初始化的参数不会大幅度地改变输入信号的强度。

根据这个式子，我们可以推导出网络每一层的参数应该怎样初始化。可以参考pytorch提供的接口https://link.zhihu.com/?target=http%3A//pytorch.org/docs/master/nn.html%23torch-nn-init。

不只是初始化的时候对参数做了调整，而是动态调整。初始化采用标准高斯分布，但是每次迭代都会对weights按照上面的式子做归一化。作者argue这样的归一化的好处在于它不用再担心参数的scale问题，起到均衡学习率的作用(Equalized learning rate)。
Pixelwise normalization layer

Added after each convolutional layer in the generator. It normalizes each activation based on all the activations in the same image and at the same location, but across all channels (dividing by the square root of the mean squared activation). In TensorFlow code, this is
```
inputs / tf.sqrt( tf.reduce_mean( tf.square(X), 
                                  axis=-1, # across all channels
                                  keepdims=True
                                ) + 1e-8
                )
```
(the smoothing term 1e-8 is needed to avoid division by zero). This technique avoids explosions in the activations due to excessive competition between the generator and the discriminator.

从DCGAN[3]开始，GAN的网络使用batch(or instance) normalization几乎成为惯例。使用batch norm可以增加训练的稳定性，大大减少了中途崩掉的情况。作者采用了两种新的normalization方法，不引入新的参数(不引入新的参数似乎是PG-GAN各种tricks的一个卖点)。

pixel norm，是local response normalization的变种。Pixel norm沿着channel维度做归一化，这样归一化的一个好处在于，activation-feature map的每个位置都具有单位长度。这个归一化策略与作者设计的Generator输出有较大关系，注意到Generator的输出层并没有Tanh或者Sigmoid激活函数，后面我们针对这个问题进行探讨。
有针对性地给样本加噪声
通过给真实样本加噪声能够起到均衡Generator和Discriminator的作用，起到缓解mode collapse的作用，这一点在WGAN(Wasserstein GAN)的前传中就已经提到[5]。尽管使用LSGAN会比原始的GAN更容易训练，然而它在Discriminator的输出接近1的适合，梯度就消失，不能给Generator起到引导作用。针对D趋近1的这种特性，作者提出了下面这种添加噪声的方式

其中，分别为第t次迭代判别器输出的修正值、第t-1次迭代真样本的判别器输出。
从式子可以看出，当真样本的判别器输出的修正值越接近1的时候，噪声强度就越大，而输出太小(<=0.5)的时候，不引入噪声，这是因为0.5是LSGAN收敛时，D的合理输出(无法判断真假样本)，而小于0.5意味着D的能力太弱。
官方Lasagna代码 : https://github.com/tkarras/progressive_growing_of_gans

The combination of all these techniques allowed the authors to generate extremely convincing high-definition images of faces. But what exactly do we call “convincing”? Evaluation is one of the big challenges when working with GANs: although it is possible to automatically evaluate the diversity of the generated images, judging their quality is a much trickier and subjective task. One technique is to use human raters, but this is costly and time-consuming. So the authors proposed to measure the similarity between the local image structure of the generated images and the training images, considering every scale. This idea led them to another groundbreaking innovation: StyleGANs.

StyleGANs

The state of the art in high-resolution image generation was advanced once again by the same Nvidia team in a 2018 paper(Tero Karras et al., “A Style-Based Generator Architecture for Generative Adversarial Networks,” arXiv preprint arXiv:1812.04948 (2018).) that introduced the popular StyleGAN architecture. The authors used style transfer techniques in the generator to ensure that the generated images have the same local structure as the training images, at every scale, greatly improving the quality of the generated images. The discriminator and the loss function were not modified, only the generator. Let’s take a look at the StyleGAN. It is composed of two networks (see Figure 17-20):
Figure 17-20. StyleGAN’s generator architecture (part of figure 1 from the StyleGAN paper)(Reproduced with the kind authorization of the authors.)

Mapping network

An 8-layer MLP that maps the latent representations z (i.e., the codings) to a vector w.

This vector is then sent through multiple affine transformations (i.e., Dense layers with no activation functions, represented by the “A” boxes in Figure 17-20), which produces multiple vectors. These vectors control the style of the generated image at different levels, from fine-grained texture (e.g., hair color) to high-level features (e.g., adult or child).

In short, the mapping network maps the codings to multiple style vectors.
Synthesis network

Responsible for generating the images. It has a constant learned input (to be clear, this input will be constant after training, but during training it keeps getting tweaked by backpropagation). It processes this input through multiple convolutional and upsampling layers, as earlier, but there are two twists:

first, some noise is added to the input and to all the outputs of the convolutional layers (before the activation function).

Second, each noise layer is followed by an Adaptive Instance Normalization (AdaIN) layer: it standardizes each feature map independently (by subtracting the feature map’s mean and dividing by its standard deviation),
then it uses the style vector to determine the scale and offset of each feature map (the style vector contains one scale and one bias term for each feature map).

The idea of adding noise independently from the codings is very important. Some parts of an image are quite random, such as the exact position of each freckle[ˈfrekl]雀斑 or hair. In earlier GANs, this randomness had to either come from the codings or be some pseudorandom noise produced by the generator itself.

If it came from the codings, it meant that the generator had to dedicate a significant portion of the codings’ representational power to store noise: this is quite wasteful. Moreover, the noise had to be able to flow through the network and reach the final layers of the generator: this seems like an unnecessary constraint that probably slowed down training. And finally, some visual artifacts视觉伪影 may appear because the same noise was used at different levels.
If instead the generator tried to produce its own pseudorandom noise, this noise might not look very convincing, leading to more visual artifacts. Plus, part of the generator’s weights would be dedicated to generating pseudorandom noise, which again seems wasteful.

By adding extra noise inputs, all these issues are avoided; the GAN is able to use the provided noise to add the right amount of stochasticity to each part of the image.

The added noise is different for each level. Each noise input consists of a single feature map full of Gaussian noise, which is broadcast to all feature maps (of the given level) and scaled using learned per-feature scaling factors (this is represented by the “B” boxes in Figure 17-20) before it is added.

Finally, StyleGAN uses a technique called mixing regularization (or style mixing), where a percentage of the generated images are produced using two different codings. Specifically, the codings c1 and c2 are sent through the mapping network, giving two style vectors w1 and w2. Then the synthesis network generates an image based on the styles w1 for the first levels and the styles w2 for the remaining levels.
The cutoff level is picked randomly. This prevents the network from assuming that styles at adjacent levels are correlated, which in turn encourages locality in the GAN, meaning that each style vector only affects a limited number of traits in the generated image.

There is such a wide variety of GANs out there that it would require a whole book to cover them all. Hopefully this introduction has given you the main ideas, and most importantly the desire to learn more. If you’re struggling with a mathematical concept, there are probably blog posts out there that will help you understand it better. Then go ahead and implement your own GAN, and do not get discouraged if it has trouble learning at first: unfortunately, this is normal, and it will require quite a bit of patience before it works, but the result is worth it. If you’re struggling with an implementation detail, there are plenty of Keras or TensorFlow implementations that you can look at. In fact, if all you want is to get some amazing results quickly, then you can just use a pretrained model (e.g., there are pretrained StyleGAN models available for Keras).

In the next chapter we will move to an entirely different branch of Deep Learning: Deep Reinforcement Learning.

Exercises

1. What are the main tasks that autoencoders are used for?

• Feature extraction : (stacked Convolutional Auto-Encoders, Denoising Autoencoders(Gaussian noise OR dropout), Sparse Autoencoders)
• Unsupervised pretraining (Unsupervised Pretraining Using Stacked Autoencoders, Convolutional Autoencoders)
• Dimensionality reduction(Convolutional Autoencoders, Recurrent Autoencoders)
• Generative models (they are capable of randomly generating new data that looks very similar to the training data. For example, you could train an autoencoder on pictures of faces, and it would then be able to generate new faces. However, the generated images are usually fuzzy and not entirely realistic.)
• Anomaly detection (an autoencoder is generally bad at reconstructing outliers)

2. Suppose you want to train a classifier, and you have plenty of unlabeled training data but only a few thousand labeled instances. How can autoencoders help? How would you proceed?

If you want to train a classifier and you have plenty of unlabeled training data but only a few thousand labeled instances, then you could first train a deep autoencoder on the full dataset (labeled + unlabeled), then reuse its lower half for the classifier (i.e., reuse the layers up to the codings layer, included) and train the classifier using the labeled data. If you have little labeled data, you probably want to freeze the reused layers when training the classifier.
Figure 17-6. Unsupervised pretraining using autoencoders
Semi-supervised Learning : https://blog.csdn.net/Linli522362242/article/details/105973507

3. If an autoencoder perfectly reconstructs the inputs, is it necessarily a good autoencoder? How can you evaluate the performance of an autoencoder?

The fact that an autoencoder perfectly reconstructs its inputs does not necessarily mean that it is a good autoencoder; perhaps it is simply an overcomplete autoencoder( overcomplete, where the dimensionality of the latent vector, z, is, in fact, greater than the dimensionality of the input examples (p > d). ) that learned to copy its inputs to the codings layer and then to the outputs. In fact, even if the codings layer contained a single neuron, it would be possible for a very deep autoencoder to learn to map each training instance to a different coding (e.g., the first instance could be mapped to 0.001, the second to 0.002, the third to 0.003, and so on), and it could learn “by heart” to reconstruct the right training instance for each coding. It would perfectly reconstruct its inputs without really learning any useful pattern in the data(OR without learning any useful features). In practice such a mapping is unlikely to happen, but it illustrates the fact that perfect reconstructions are not a guarantee that the autoencoder learned anything useful. However, if it produces very bad reconstructions, then it is almost guaranteed to be a bad autoencoder. To evaluate the performance of an autoencoder, one option is to measure the reconstruction loss (e.g., compute the MSE, or the mean square of the outputs minus the inputs). Again, a high reconstruction loss is a good sign that the autoencoder is bad, but a low reconstruction loss is not a guarantee that it is good. You should also evaluate the autoencoder according to what it will be used for. For example, if you are using it for unsupervised pretraining of a classifier, then you should also evaluate the classifier’s performance.

4. What are undercomplete and overcomplete autoencoders? What is the main risk of an excessively undercomplete autoencoder? What about the main risk of an overcomplete autoencoder?

An undercomplete autoencoder is one whose codings layer is smaller than the input and output layers. If it is larger, then it is an overcomplete autoencoder.
The main risk of an excessively undercomplete autoencoder is that it may fail to reconstruct the inputs.
The main risk of an overcomplete autoencoder is that it may just copy the inputs to the outputs, without learning any useful features.

5. How do you tie weights in a stacked autoencoder? What is the point of doing so?

To tie the weights of an encoder layer and its corresponding decoder layer, you simply make the decoder weights equal to the transpose of the encoder weights. This reduces the number of parameters in the model by half, often making training converge faster with less training data and reducing the risk of overfitting the training set.

Specifically, if the autoencoder has a total of N layers (not counting the input layer), and represents the connection weights of the layer (e.g., layer 1 is the first hidden layer, layer N/2 is the coding layer, and layer N is the output layer), then the decoder layer weights can be defined simply as: = (with L = 1, 2, …, N/2).

To tie weights between layers using Keras, let’s define a custom layer:

class DenseTranspose( keras.layers.Layer ):
  def __init__( self, dense, activation=None, **kwargs ):
    self.dense = dense
    self.activation = keras.activations.get( activation )
    super().__init__( **kwargs )
 
  def build( self, batch_input_shape ):
    # for using its own bias vector
    self.biases = self.add_weight( name="bias",
                                   shape=[ self.dense.input_shape[-1] ], # uses 100 for the DenseTranspose( dense_2, activation="selu" )
                                   initializer='zeros' )
    super().build( batch_input_shape ) #batch_input_shape= (batch_size, input_dimensions)
 
  def call( self, inputs ):
    # for the DenseTranspose( dense_2, activation="selu" )
    # inputs:  Tensor("Placeholder:0", shape=(None, 30), dtype=float32)
    # self.dense.weights:  
    #           [, 
    #             # can't be used #############################################
    #           ]
    z = tf.matmul( inputs, self.dense.weights[0], 
                   transpose_b = True # for the second argument is transposed 
                 )                    # before multiplication # self.dense.weights[0] ==> (30~input features,100) # x_1*w_1 + ... + x_n*w_n + b       
    return self.activation( z+self.biases )

https://blog.csdn.net/Linli522362242/article/details/116576478

keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)
 
dense_1 = keras.layers.Dense( 100, activation="selu" ) 
dense_2 = keras.layers.Dense( 30, activation="selu" )
 
tied_encoder = keras.models.Sequential([
                                keras.layers.Flatten( input_shape=[28,28] ), # 784=28*28
                                dense_1, # weight shape: (784, 100) # input_shape(?,784 neurons)
                                dense_2, # weight shape: (100, 30)  # input_shape(?,100 neurons)                                      
               ]) # output==> (batch_size, 30)
tied_decoder = keras.models.Sequential([
                                DenseTranspose( dense_2, activation="selu" ),
                                DenseTranspose( dense_1, activation="sigmoid" ),
                                keras.layers.Reshape([28,28])
               ])
 
tied_ae = keras.models.Sequential([ tied_encoder, tied_decoder ])
 
tied_ae.compile( loss="binary_crossentropy",
                 optimizer = keras.optimizers.SGD(lr=1.5),
                 metrics=[rounded_accuracy] )
history = tied_ae.fit( X_train, X_train, epochs=10, 
                       validation_data=(X_valid, X_valid)

6. What is a generative model? Can you name a type of generative autoencoder?

A generative model is a model capable of randomly generating outputs that resemble the training instances. For example, once trained successfully on the MNIST dataset, a generative model can be used to randomly generate realistic images of digits. The output distribution is typically similar to the training data. For example, since MNIST contains many images of each digit, the generative model would output roughly the same number of images of each digit. Some generative models can be parametrized—for example, to generate only some kinds of outputs. An example of a generative autoencoder is the variational autoencoder. https://blog.csdn.net/Linli522362242/article/details/116576478

7. What is a GAN? Can you name a few tasks where GANs can shine?

A generative adversarial network is a neural network architecture composed of two parts, the generator and the discriminator, which have opposing objectives. The generator’s goal is to generate instances similar to those in the training set, to fool the discriminator. The discriminator must distinguish the real instances from the generated ones.

At each training iteration, the discriminator is trained like a normal binary classifier, then the generator is trained to maximize the discriminator’s error.
( As discussed earlier, you can see the two phases at each iteration:

• In 1st phase, we feed Gaussian noise to the generator to produce fake images( ̂ = () ),
and we complete this batch by concatenating an equal number of real images. The targets y1 are set to 0 for fake images and 1 for real images.
Then we train the discriminator on this batch. Note that we set the discriminator’s trainable attribute to True: this is only to get rid of a warning that Keras displays when it notices that trainable is now False but was True when the model was compiled (or vice versa).
• In 2nd phase, we feed the GAN some Gaussian noise. Its generator will start by producing fake images( ̂ = () ), then the discriminator will try to guess whether these images are fake or real. We want the discriminator to believe that the fake images are real, so the targets-fake are set to 1. Note that we set the trainable attribute to False, once again to avoid a warning.

Minimize its value with respect to the generator (G), that is,
D is fixed, Minimize==>log(1 − (())) , suffers from vanishing gradients消失的梯度 in the early training stages( right-bottom figure,缓), The reason for this is that the outputs, G(z), early in the learning process, look nothing like real examples, and therefore D(G(z)) will be close to zero with high confidence.
so do swap the labels of real and fake examples==>==>minimizefor using binary cross-entropy loss https://blog.csdn.net/Linli522362242/article/details/116565829

)

GANs are used for advanced image processing tasks such as super resolution, colorization, image editing (replacing objects with realistic background), turning a simple sketch into a photorealistic image, or predicting the next frames in a video. They are also used to augment(https://blog.csdn.net/Linli522362242/article/details/108396485) a dataset (to train other models), to generate other types of data (such as text, audio, and time series), and to identify the weaknesses in other models and strengthen them.

8. What are the main difficulties when training GANs?

Training GANs is notoriously臭名昭著 difficult, because of the complex dynamics between the generator and the discriminator. The biggest difficulty is mode collapse, where the generator produces outputs with very little diversity. Moreover, training can be terribly unstable: it may start out fine and then suddenly start oscillating摆动的 or diverging, without any apparent reason. GANs are also very sensitive to the choice of hyperparameters.

9. Exercise: Try using a denoising autoencoder to pretrain an image classifier. You can use MNIST (the simplest option), or a more complex image dataset such as CIFAR10 if you want a bigger challenge. Regardless of the dataset you're using, follow these steps:

Split the dataset into a training set and a test set. Train a deep denoising autoencoder on the full training set.

from tensorflow import keras

[X_train, y_train], [X_test, y_test] = keras.datasets.cifar10.load_data()
X_train = X_train /255.
X_test = X_test / 255.

X_train.shape

y_train.shape

import tensorflow as tf
import numpy as np

denoising_encoder = keras.models.Sequential([
  keras.layers.GaussianNoise(0.1, input_shape=[32, 32, 3]),
  keras.layers.Conv2D(32, kernel_size=3, padding="same", activation="relu"),
  keras.layers.MaxPool2D(),
  keras.layers.Flatten(), # since Dense
  keras.layers.Dense(512, activation="relu"),                                             
])

denoising_encoder.summary()

denoising_decoder = keras.models.Sequential([
  keras.layers.Dense( 16*16*32, activation="relu", input_shape=[512]),
  keras.layers.Reshape([16,16,32]),
  keras.layers.Conv2DTranspose( filters=3, kernel_size=3, 
                                strides=2, padding="same", 
                                activation="sigmoid"
                              )                                             
])
denoising_decoder.summary()

When compiling the stacked autoencoder, we use the binary cross-entropy loss instead of the mean squared error. We are treating the reconstruction task as a multilabel binary classification problem(https://blog.csdn.net/Linli522362242/article/details/103866244): each pixel(or each label) intensity represents the probability that the pixel should be R/G/B. Framing it this way (rather than as a regression problem) tends to make the model converge faster.

denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])

denoising_ae.compile( loss="binary_crossentropy", 
                      optimizer=keras.optimizers.Nadam(),
                      metrics=["mse"]
                    )
history = denoising_ae.fit( X_train, X_train, epochs=10,
                            validation_data=(X_test, X_test)
                          )

Check that the images are fairly well reconstructed. Visualize the images that most activate each neuron in the coding layer.

import matplotlib.pyplot as plt
%matplotlib inline

n_images = 5
new_images = X_test[:n_images]
new_images_noisy = new_images + np.random.randn( n_images, 32, 32, 3 )*0.1
new_images_denoised = denoising_ae.predict( new_images_noisy )

plt.figure( figsize=(6, n_images*2) )
for index in range( n_images ):
  plt.subplot( n_images, 3, index*3 +1 )
  plt.imshow( new_images[index] )
  plt.axis('off')
  if index ==0:
    plt.title("Original")
  
  plt.subplot( n_images, 3, index*3 +2 )
  # if not clip: warning Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).
  plt.imshow( np.clip(new_images_noisy[index], 0., 1.) )
  plt.axis('off')
  if index ==0:
    plt.title("Noisy")

  plt.subplot( n_images, 3, index*3 +3 )
  plt.imshow( new_images_denoised[index] )
  plt.axis('off')
  if index ==0:
    plt.title('Denoised')

plt.show()

Build a classification DNN, reusing the lower layers of the autoencoder. Train it using only 500 images from the training set. Does it perform better with or without pretraining?

你可能感兴趣的:(17_2Representation Learning and Generative Learning Deep Convolutional_Progressive Growing Style GAN)

机器学习：让计算机学会思考的艺术平凡而伟大. 机器学习机器学习人工智能
目录什么是机器学习？机器学习的基本步骤常见的机器学习算法机器学习的实际应用如何入门机器学习？结语在当今数字化时代，机器学习（MachineLearning,ML）已经成为一个炙手可热的话题。从推荐系统到自动驾驶汽车，再到语音助手，机器学习的应用无处不在。然而，对于许多人来说，机器学习仍然是一个神秘而复杂的领域。本文将用通俗易懂的语言，带你走进机器学习的世界，了解它的基本原理和应用。什么是机器学习？
一文讲清楚深度学习和机器学习平凡而伟大. 机器学习人工智能深度学习机器学习人工智能
目录1.定义机器学习（MachineLearning,ML）深度学习（DeepLearning,DL）2.工作原理机器学习深度学习3.应用场景机器学习深度学习4.主要区别5.为什么选择深度学习？6.总结深度学习和机器学习是人工智能（AI）领域中两个密切相关但有所区别的概念。要清楚地解释它们之间的关系，我们可以从定义、工作原理、应用场景以及两者的主要区别等方面进行探讨。1.定义机器学习（Machin
AI大模型编程能力对比：Deepseek&Claude&Gemini 黑夜路人（heiyeluren） AI人工智能人工智能 ai AIGC 语言模型
在当今快速发展的技术领域，人工智能（AI）模型在编程和数据处理方面的应用越来越广泛。不同的AI模型因其独特的设计理念和技术优势，适用于不同的编程任务和场景。本文将对三种主流的AI模型——DeepSeekv3、GeminiFlash2.0和Claude3.5Sonnet的编程能力进行详细对比，帮助读者根据具体需求选择最合适的工具。同时对DeepSeekv3、GeminiFlash2.0和Claude
DeepSeek：智能搜索与分析的新纪元 XRC2231 学习
在人工智能浪潮席卷全球的今天，DeepSeek如同一颗璀璨的新星，以其独特的魅力和强大的功能，在AI领域脱颖而出。DeepSeek，这一基于深度学习和数据挖掘技术的智能搜索与分析系统，不仅重新定义了搜索引擎的边界，更以其卓越的性能和广泛的应用场景，为全球用户带来了前所未有的智能体验。本文将从DeepSeek的定义、特点、应用场景、优势等方面进行全面而深入的介绍，带您领略这一新兴技术的独特魅力。一、
哈尔滨工业大学DeepSeek公开课人工智能：大模型原理技术与应用-从GPT到DeepSeek｜附视频下载方法你觉得205 人工智能机器学习大数据 ai 知识图谱 python 运维
导读INTRODUCTION今天继续哈尔滨工业大学车万翔教授带来了一场主题为“DeepSeek技术前沿与应用”的报告。本报告深入探讨了大语言模型在自然语言处理（NLP）领域的核心地位及其发展历程，从基础概念出发，延伸至语言模型在机器翻译、拼音输入法、语音识别等任务中的关键作用。强调了语言模型不仅辅助其他NLP任务，本身也蕴含大量知识，如地理信息、语义理解和推理能力。随着技术的发展，尤其是trans
TicTacToe Module W_X_99515681 python 开发语言
Homework2Releasev5.016/02/2025CONTENTS1TicTacToe12Contents32.1TicTacToeModule............................................32.2ReinforcementLearningPlayer.....................................62.3Require
46-886 Machine Learning Fundamentals W_X_99515681 机器学习人工智能
46-886MachineLearningFundamentalsHW1Homework1Due:Sunday,March23,11:59pm•UploadyourassignmenttoCanvas(onlyonepersonperteamneedstosubmit)•Includeawriteupcontainingyouranswerstothequestionsbelow(andyourt
小白学AI量化：DeepSeek+Python构建强大的金融数据挖掘与多维分析机器人老余捞鱼 AI顾投高级策略 AI探讨与学习人工智能 python 金融 deepseek
作者：老余捞鱼原创不易，转载请标明出处及原作者。写在前面的话：在机构主导的量化交易时代，普通投资者如何用一杯奶茶的钱（15元/天）打造专业级智能量化产品？本文将为您揭秘一个革命性的解决方案——基于国产大模型DeepSeek和Python构建的智能数据挖掘分析机器人。它不仅适用于通用网页数据抓取，更能深度应用于金融领域，精准捕捉市场信号。本文“干货”很多，请务必耐心读完。一、颠覆认知的性价比革命1.
HTML实现酷炫3D相册算法与编程之美编程之美 css html js css3 javascript
欢迎点击「算法与编程之美」↑关注我们！本文首发于微信公众号："算法与编程之美"，欢迎关注，及时了解更多此系列文章。欢迎加入团队圈子！与作者面对面！直接点击！目录1、创建文件目录2、调背景色3、制作3D相册4、将图片散开，围成一圈。5、绘制透明底盘6、最终效果1、创建文件目录在Hbuilder在新建一个目录，创建css和js文件。图12、调背景色在style块里面给整个页面渲染成黑色调。*{padd
COMP3411/9814 Artificial Intelligence W_X_99515681 人工智能
COMP3411/9814ArtificialIntelligenceTerm1,2025Assignment1–Search,PruningandTreasureHuntingDue:Friday21March,10pmMarks:25%offinalassessmentInthisassignmentyouwillbeexaminingsearchstrategiesforthe15-puzz
COMP 315: Cloud Computing for E-Commerce W_X_99515681 开发语言
Assignment1:JavascriptCOMP315:CloudComputingforE-CommerceFebruary20251IntroductionAcommontaskwhenbackendprogrammingisdatacleaning,whichistheprocessoftakinganinitialdatasetthatmaycontainerroneousorinco
DeepSeek 如何处理多模态数据（如文本、图像、视频）？借雨醉东风人工智能
关注我，持续分享逻辑思维&管理思维&面试题；可提供大厂面试辅导、及定制化求职/在职/管理/架构辅导；推荐专栏《10天学会使用asp.net编程AI大模型》，目前已完成所有内容。一顿烧烤不到的费用，让人能紧跟时代的浪潮。从普通网站，到公众号、小程序，再到AI大模型网站。干货满满。学成后可接项目赚外快，绝对划算。不仅学会如何编程，还将学会如何将AI技术应用到实际问题中，为您的职业生涯增添一笔宝贵的财富
SassScript：Sass中的编程特性详解 jiajia651304 sass 前端 css
Sass（SyntacticallyAwesomeStylesheets）是一种强大的CSS预处理器，它允许开发者使用类似于编程语言的语法来编写CSS，然后通过编译生成标准的CSS代码。SassScript是Sass中的编程特性集合，它包含了变量、嵌套规则、混合、函数以及控制指令等，极大地提高了CSS的开发效率和可维护性。1.变量SassScript中的变量允许开发者在样式表中存储和重复使用值。变
探秘知乎数据抓取神器 —— zhihu-spider 丁慧湘Gwynne
探秘知乎数据抓取神器——zhihu-spider项目地址:https://gitcode.com/gh_mirrors/zh/zhihu-spider在知识的海洋中畅游，每一份数据都可能成为智慧的火花。今天，我们来一起探索一个专为知乎设计的数据爬虫工具——zhihu-spider，它是由计算机科学研究生MorganZhang精心打造的开源宝藏。项目介绍zhihu-spider，正如其名，是一个针对
【机器学习】机器学习四大分类藓类少女机器学习机器学习分类人工智能
机器学习的方法主要可以分为四大类，根据学习方式和数据标注情况进行分类：1.监督学习（SupervisedLearning）特点：有标注数据（即训练数据有明确的输入(X)和输出(Y)）。学习目标是找到一个映射(f(X)\approxY)。适用于分类和回归问题。主要算法：分类（Classification）：逻辑回归（LogisticRegression）支持向量机（SVM）朴素贝叶斯（NaïveBa
国产模型能否挑战 GPT-4？一文拆解 DeepSeek-V3 架构与实战应用 AI筑梦师人工智能学习框架架构深度学习 python agi 人工智能 tensorflow
✳️一、引言✅1.1DeepSeek-V3发布背景与定位随着大模型技术的快速演进，从GPT-3到GPT-4，全球在通用人工智能方向取得了长足进展。但与此同时，开源社区始终缺乏一个真正兼顾性能、效率、中文能力和实用性的高质量大模型。DeepSeek-V3的推出正是在这个背景下的一次关键突破。DeepSeek-V3是由中国团队DeepSeek开发的第三代大语言模型，它具备以下几个核心特性：开源可商用：
DeepSeek多语言AI高效应用实践智能计算研究中心其他
内容概要在人工智能技术快速迭代的背景下，DeepSeek系列模型凭借混合专家架构（MoE）与670亿参数规模，在多语言处理、视觉语言理解及复杂任务生成领域实现了突破性进展。本文系统性拆解其技术架构设计逻辑，聚焦论文写作、代码生成、SEO关键词拓展三大核心场景，分析模型在高生成质量、低使用成本维度的差异化优势。技术维度DeepSeekProver传统单模态模型多语言支持97种语言动态切换单一语种优化
DeepSeek R1 本地部署指南 (3) - 更换本地部署模型 Windows/macOS 通用 Eric Woo X 人工智能 AI DeepSeek macos windows deepseek ai
0.准备完成Windows或macOS安装：DeepSeekR1本地部署指南(1)-Windows本地部署-CSDN博客DeepSeekR1本地部署指南(2)-macOS本地部署-CSDN博客以下内容Windows和macOS命令执行相同：Windows管理员启动：命令提示符CMDmacOS启动：Terminal1.查看已安装模型ollamalist如图，已安装1.5b版本：ollamarunde
华山论剑，大模型(deepseek qwq gemini)辩论人生意义 Lifeng66666666 语言模型语言模型人工智能
借助DeepDiscussion程序，通过让大模型(deepseekqwqgemini)讨论“人生意义是什么”这一挑战问题，我们得以客观观察目前这几种大模型的价值观，能力，不足。部分讨论过程：问题:人生的意义是什么？deepseek/deepseek-r1:free初始方案:针对“人生的意义是什么”这一终极问题，我的解决方案分为以下五个层次，融合东西方哲学智慧与实践心理学，并提供具体行动方向：一、
DeepSeek关键RL算法GRPO，有人从头跑通了，贡献完整代码强化学习曾小健2 大语言模型LLM 算法
DeepSeek关键RL算法GRPO，有人从头跑通了，贡献完整代码机器之心2025年03月02日11:54北京选自GitHub作者：AndriyBurkov机器之心编译GRPO（GroupRelativePolicyOptimization）是DeepSeek-R1成功的基础技术之一，我们之前也多次报道过该技术，比如《DeepSeek用的GRPO占用大量内存？有人给出了些破解方法》。简单来说，GR
深度学习 Deep Learning 第8章深度学习优化 odoo中国 AI编程人工智能深度学习人工智能优化
深度学习第8章深度学习的优化章节概述本章深入探讨了深度学习中的优化技术，旨在解决模型训练过程中面临的各种挑战。优化是深度学习的核心环节，直接关系到模型的训练效率和最终性能。本章首先介绍了优化在深度学习中的特殊性，然后详细讨论了多种优化算法，包括随机梯度下降（SGD）、动量法、Nesterov动量法、AdaGrad、RMSProp和Adam等。此外，还探讨了参数初始化策略、自适应学习率方法以及二阶优
echarts map3D区域颜色单独设置浪漫不敌风月 echarts echarts 前端 3d
效果图：实现：用的是map3D，之前试了下geo3d因为版本问题不好控制（地图上字体颜色都没法设置）只需要在series的data中加上你要标色的区域名称和颜色即可。此效果实现的是无图例着色。series:[{type:"map3D",//系列类型name:"map3D",//系列名称map:"yuhang",//地图类型。data:[{name:"鸬鸟镇",itemStyle:{color:"#
清华DeepSeek教程1至7版，解锁前沿技术 2501_91206263 pdf
清华DeepSeek教程1至7版，解锁前沿技术「DeepSeek清华资料」共7册链接：https://pan.quark.cn/s/b8d8760976ca「DeepSeek使用手册大全」链接：https://pan.quark.cn/s/52c234062a2e「DeepSeek资料合集」链接：https://pan.quark.cn/s/71c8604f0e8a「DeepSeep使用手册」链接
清华出品DeepSeek教程7版合集，一站式掌握前沿技术 2501_91206263 pdf
亲爱的读者们，今天要给大家介绍一套由清华大学出品的超硬核教程——DeepSeek教程7版合集！「DeepSeek清华资料」共7册链接：https://pan.quark.cn/s/b8d8760976ca「DeepSeek使用手册大全」链接：https://pan.quark.cn/s/52c234062a2e「DeepSeek资料合集」链接：https://pan.quark.cn/s/71c8
Java中卫语句的设计思想而为. java 服务器开发语言
卫语句（GuardClauses）是一种通过提前返回简化条件嵌套、提升代码可读性的编程技巧。其核心思想是优先处理异常或边界情况，让主逻辑保持扁平化。以下是deepseek做出的设计思想详解：核心设计原则FailFast（快速失败）在函数入口处立即检查非法参数或无效状态，若不符合条件则提前终止（如返回、抛异常），避免后续无效操作。减少嵌套层级用卫语句替换多层if-else嵌套，将代码从“箭头型”结构
SpringAI集成DeepSeek 一诚学编程 java 人工智能 spring boot
1、利用spring-ai-openai集成DeepSeek1.1、在DeepSeek开放平台创建APIKEY1.2、创建SpringBoot工程，引入依赖4.0.0org.springframework.bootspring-boot-starter-parent3.3.8org.examplespringai-deepseek1.0-SNAPSHOT17171.0.0-M5org.spring
生成对抗网络（GAN）的高级变体及在图像生成领域的创新实践算法探索者生成对抗网络计算机视觉人工智能
摘要生成对抗网络（GAN）自提出以来，在诸多领域取得了显著进展，尤其是在图像生成方面展现出强大的潜力。本文深入探讨了GAN的多种高级变体，如CycleGAN、StyleGAN等，详细分析它们在结构设计、训练机制上的创新之处，阐述其在生成高分辨率、多样化图像时具备的独特优势，并结合丰富的实际案例，展示这些变体在图像生成领域的卓越应用成果，为相关研究与应用提供全面且深入的参考。一、引言生成对抗网络（G
DeepSeek、Grok 与 ChatGPT 三巨头：技术架构与应用场景的全方位解析云策量化 Deepseek chatgpt deepseek grok
前言在当今人工智能领域，DeepSeek、Grok和ChatGPT作为语言模型的三巨头，各自凭借独特的技术架构和广泛的应用场景，在自然语言处理领域占据着重要地位。本文将对这三款模型的技术架构和应用场景进行全方位解析，以期为读者提供深入的了解和有价值的参考。一、技术架构（一）DeepSeekDeepSeek是由DeepSeek团队开发的一款大型语言模型，其技术架构基于深度学习中的Transforme
探索AI模型的巅峰之战：ChatGPT、DeepSeek与Grok 3，谁才是最强？温暖阳光阿斌人工智能 chatgpt
近年来，人工智能领域正处于一场高速迭代的革命中。大型语言模型（LLMs）如ChatGPT、DeepSeek和Grok3纷纷亮相，各展所长，为人们带来了前所未有的体验。在这场"谁是最强"的竞争中，每一方都展现出了令人惊叹的能力和独特的优势。然而，这些模型之间的差异和特点，究竟是什么？它们各自的优势在哪里？又有哪些隐藏的短板？本文将带您深入了解这三位AI巨头的亮点与争议，共同探讨它们在AI领域的位置，
强化学习中策略网络模型设计与优化技巧数字扫地僧计算机视觉深度学习
I.引言强化学习（ReinforcementLearning,RL）是一种通过与环境交互，学习如何采取行动以最大化累积奖励的机器学习方法。策略网络（PolicyNetwork）是强化学习中一种重要的模型，它直接输出动作的概率分布或具体的动作。本篇博客将深入探讨策略网络的设计原则、优化技巧，并结合具体实例展示其应用。II.策略网络的基本概念A.策略网络的定义策略网络是一种神经网络，它接受当前状态作为
Spring的注解积累 yijiesuifeng spring 注解
用注解来向Spring容器注册Bean。需要在applicationContext.xml中注册： <context:component-scan base-package=”pagkage1[,pagkage2,…,pagkageN]”/>。如：在base-package指明一个包 <context:component-sc
传感器百合不是茶 android 传感器
android传感器的作用主要就是来获取数据,根据得到的数据来触发某种事件下面就以重力传感器为例; 1,在onCreate中获得传感器服务 private SensorManager sm;// 获得系统的服务 private Sensor sensor;// 创建传感器实例 @Override protected void
[光磁与探测]金吕玉衣的意义 comsci
这是一个古代人的秘密:现在告诉大家信不信由你们: 穿上金律玉衣的人,如果处于灵魂出窍的状态,可以飞到宇宙中去看星星这就是为什么古代
精简的反序打印某个数沐刃青蛟打印
以前看到一些让求反序打印某个数的程序。比如：输入123，输出321。记得以前是告诉你是几位数的，当时就抓耳挠腮，完全没有思路。似乎最后是用到%和/方法解决的。而今突然想到一个简短的方法，就可以实现任意位数的反序打印（但是如果是首位数或者尾位数为0时就没有打印出来了）代码如下： long num, num1=0;
PHP：6种方法获取文件的扩展名 IT独行者 PHP 扩展名
PHP：6种方法获取文件的扩展名 1、字符串查找和截取的方法 1 $extension = substr ( strrchr ( $file , '.' ), 1); 2、字符串查找和截取的方法二 1 $extension = substr
面试111 文强chu 面试
1事务隔离级别有那些，事务特性是什么（问到一次） 2 spring aop 如何管理事务的，如何实现的。动态代理如何实现，jdk怎么实现动态代理的，ioc是怎么实现的，spring是单例还是多例，有那些初始化bean的方式，各有什么区别（经常问） 3 struts默认提供了那些拦截器（一次） 4 过滤器和拦截器的区别（频率也挺高） 5 final，finally final
XML的四种解析方式小桔子 dom jdom dom4j sax
在平时工作中，难免会遇到把 XML 作为数据存储格式。面对目前种类繁多的解决方案，哪个最适合我们呢？在这篇文章中，我对这四种主流方案做一个不完全评测，仅仅针对遍历 XML 这块来测试，因为遍历 XML 是工作中使用最多的（至少我认为）。　　预备　　测试环境：　　AMD 毒龙1.4G OC 1.5G、256M DDR333、Windows2000 Server
wordpress中常见的操作 aichenglong 中文注册 wordpress 移除菜单
1 wordpress中使用中文名注册解决办法 1)使用插件 2)修改wp源代码进入到wp-include/formatting.php文件中找到 function sanitize_user( $username, $strict = false
小飞飞学管理-1 alafqq 管理
项目管理的下午题，其实就在提出问题（挑刺），分析问题，解决问题。今天我随意看下10年上半年的第一题。主要就是项目经理的提拨和培养。结合我自己经历写下心得对于公司选拔和培养项目经理的制度有什么毛病呢？ 1，公司考察，选拔项目经理，只关注技术能力，而很少或没有关注管理方面的经验，能力。 2，公司对项目经理缺乏必要的项目管理知识和技能方面的培训。 3，公司对项目经理的工作缺乏进行指
IO输入输出部分探讨百合不是茶 IO
//文件处理在处理文件输入输出时要引入java.IO这个包； /* 1，运用File类对文件目录和属性进行操作 2，理解流，理解输入输出流的概念 3，使用字节/符流对文件进行读/写操作 4，了解标准的I/O 5，了解对象序列化 */ //1，运用File类对文件目录和属性进行操作 //在工程中线创建一个text.txt
getElementById的用法 bijian1013 element
getElementById是通过Id来设置/返回HTML标签的属性及调用其事件与方法。用这个方法基本上可以控制页面所有标签，条件很简单，就是给每个标签分配一个ID号。返回具有指定ID属性值的第一个对象的一个引用。语法： &n
励志经典语录 bijian1013 励志人生
经典语录1: 哈佛有一个著名的理论：人的差别在于业余时间，而一个人的命运决定于晚上8点到10点之间。每晚抽出2个小时的时间用来阅读、进修、思考或参加有意的演讲、讨论，你会发现，你的人生正在发生改变，坚持数年之后，成功会向你招手。不要每天抱着QQ/MSN/游戏/电影/肥皂剧……奋斗到12点都舍不得休息，看就看一些励志的影视或者文章，不要当作消遣；学会思考人生，学会感悟人生
[MongoDB学习笔记三]MongoDB分片 bit1129 mongodb
MongoDB的副本集(Replica Set)一方面解决了数据的备份和数据的可靠性问题，另一方面也提升了数据的读写性能。MongoDB分片(Sharding)则解决了数据的扩容问题，MongoDB作为云计算时代的分布式数据库，大容量数据存储，高效并发的数据存取，自动容错等是MongoDB的关键指标。本篇介绍MongoDB的切片(Sharding) 1.何时需要分片 &nbs
【Spark八十三】BlockManager在Spark中的使用场景 bit1129 manager
1. Broadcast变量的存储，在HttpBroadcast类中可以知道 2. RDD通过CacheManager存储RDD中的数据，CacheManager也是通过BlockManager进行存储的 3. ShuffleMapTask得到的结果数据，是通过FileShuffleBlockManager进行管理的，而FileShuffleBlockManager最终也是使用BlockMan
yum方式部署zabbix ronin47 yum方式部署zabbix
安装网络yum库#rpm -ivh http://repo.zabbix.com/zabbix/2.4/rhel/6/x86_64/zabbix-release-2.4-1.el6.noarch.rpm 通过yum装mysql和zabbix调用的插件还有agent代理#yum install zabbix-server-mysql zabbix-web-mysql mysql-
Hibernate4和MySQL5.5自动创建表失败问题解决方法 byalias J2EE Hibernate4
今天初学Hibernate4，了解了使用Hibernate的过程。大体分为4个步骤： ①创建hibernate.cfg.xml文件 ②创建持久化对象 ③创建*.hbm.xml映射文件 ④编写hibernate相应代码在第四步中，进行了单元测试，测试预期结果是hibernate自动帮助在数据库中创建数据表，结果JUnit单元测试没有问题，在控制台打印了创建数据表的SQL语句，但在数据库中
Netty源码学习-FrameDecoder bylijinnan java netty
Netty 3.x的user guide里FrameDecoder的例子，有几个疑问： 1.文档说：FrameDecoder calls decode method with an internally maintained cumulative buffer whenever new data is received. 为什么每次有新数据到达时，都会调用decode方法？ 2.Dec
SQL行列转换方法 chicony 行列转换
create table tb(终端名称 varchar(10) , CEI分值 varchar(10) , 终端数量 int) insert into tb values('三星' , '0-5' , 74) insert into tb values('三星' , '10-15' , 83) insert into tb values('苹果' , '0-5' , 93)
中文编码测试 ctrain 编码
循环打印转换编码 String[] codes = { "iso-8859-1", "utf-8", "gbk", "unicode" }; for (int i = 0; i < codes.length; i++) { for (int j
hive 客户端查询报堆内存溢出解决方法 daizj hive 堆内存溢出
hive> select * from t_test where ds=20150323 limit 2; OK Exception in thread "main" java.lang.OutOfMemoryError: Java heap space 问题原因： hive堆内存默认为256M 这个问题的解决方法为：修改/us
人有多大懒，才有多大闲 (评论『卓有成效的程序员』) dcj3sjt126com 程序员
卓有成效的程序员给我的震撼很大，程序员作为特殊的群体，有的人可以这么懒，懒到事情都交给机器去做，而有的人又可以那么勤奋，每天都孜孜不倦得做着重复单调的工作。在看这本书之前，我属于勤奋的人，而看完这本书以后，我要努力变成懒惰的人。不要在去庞大的开始菜单里面一项一项搜索自己的应用程序，也不要在自己的桌面上放置眼花缭乱的快捷图标
Eclipse简单有用的配置 dcj3sjt126com eclipse
1、显示行号 Window -- Prefences -- General -- Editors -- Text Editors -- show line numbers 2、代码提示字符 Window ->Perferences，并依次展开 Java -> Editor -> Content Assist，最下面一栏 auto-Activation
在tomcat上面安装solr4.8.0全过程 eksliang Solr solr4.0后的版本安装 solr4.8.0安装
转载请出自出处： http://eksliang.iteye.com/blog/2096478 首先solr是一个基于java的web的应用，所以安装solr之前必须先安装JDK和tomcat，我这里就先省略安装tomcat和jdk了第一步：当然是下载去官网上下载最新的solr版本，下载地址
Android APP通用型拒绝服务、漏洞分析报告 gg163 漏洞 android APP 分析
点评：记得曾经有段时间很多SRC平台被刷了大量APP本地拒绝服务漏洞，移动安全团队爱内测（ineice.com）发现了一个安卓客户端的通用型拒绝服务漏洞，来看看他们的详细分析吧。 0xr0ot和Xbalien交流所有可能导致应用拒绝服务的异常类型时，发现了一处通用的本地拒绝服务漏洞。该通用型本地拒绝服务可以造成大面积的app拒绝服务。针对序列化对象而出现的拒绝服务主要
HoverTree项目已经实现分层 hvt 编程 .net Web C#ASP.ENT
HoverTree项目已经初步实现分层，源代码已经上传到 http://hovertree.codeplex.com请到SOURCE CODE查看。在本地用SQL Server 2008 数据库测试成功。数据库和表请参考：http://keleyi.com/a/bjae/ue6stb42.htmHoverTree是一个ASP.NET 开源项目，希望对你学习ASP.NET或者C#语言有帮助，如果你对
Google Maps API v3: Remove Markers 移除标记天梯梦 google maps api
Simply do the following: I. Declare a global variable: var markersArray = []; II. Define a function: function clearOverlays() { for (var i = 0; i < markersArray.length; i++ )
jQuery选择器总结 lq38366 jquery 选择器
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
基础数据结构和算法六：Quick sort sunwinner Algorithm Quicksort
Quick sort is probably used more widely than any other. It is popular because it is not difficult to implement, works well for a variety of different kinds of input data, and is substantially faster t
如何让Flash不遮挡HTML div元素的技巧_HTML/Xhtml_网页制作刘星宇 html Web
今天在写一个flash广告代码的时候，因为flash自带的链接，容易被当成弹出广告，所以做了一个div层放到flash上面，这样链接都是a触发的不会被拦截，但发现flash一直处于div层上面，原来flash需要加个参数才可以。让flash置于DIV层之下的方法，让flash不挡住飘浮层或下拉菜单，让Flash不档住浮动对象或层的关键参数：wmode=opaque。方法如下：
Mybatis实用Mapper SQL汇总示例 wdmcygah sql mysql mybatis 实用
Mybatis作为一个非常好用的持久层框架，相关资料真的是少得可怜，所幸的是官方文档还算详细。本博文主要列举一些个人感觉比较常用的场景及相应的Mapper SQL写法，希望能够对大家有所帮助。不少持久层框架对动态SQL的支持不足，在SQL需要动态拼接时非常苦恼，而Mybatis很好地解决了这个问题，算是框架的一大亮点。对于常见的场景，例如：批量插入/更新/删除，模糊查询，多条件查询，联表查询，