灵隐寺扫地僧

[CS231n Assignment 2 #01] 全连接神经网络(Fully-connected Neural Network)

文章目录

作业介绍
1.Fully-Connected Neural Nets 架构
2. 初始化作业环境
3. 实现全连接层(Affine Layer)

3.1 前向传播
3.2 反向传播

4. ReLU激活函数
5. "Sandwich" layers
6. 损失层(Loss Layer)
7. 两层神经网络（Two-layer network）
8. 优化器（Solver）

8.1 Solver剖析
8.2 实际训练

9. 多层神经网络（Multilayer network）
10. 更新规则（Update-rule）

10.1 SGD+Momentum
10.2 RMSProp and Adam

11. Train a good model!
作业提问
推荐阅读
代码记录

作业介绍

作业主页：Assignment 2
作业目的：之前我们已经实现过一个双层的神经网络了，但是，它的所有函数被放置在一个文件中。对于，简单的神经网络，这种做法或许比较简便，但是当我们需要更大、更深的神经网络的时候，这种写法可能就不是那么高效。所以，本次作业，我们需要学会如何对神经网络进行 分层设计 以及 模块化，在不同的文件中实现我们不同的模块，然后将它们整合成最终的网络。
官方示例代码： Assignment 2 code
作业源文件 FullyConnectedNets.ipynb

1.Fully-Connected Neural Nets 架构

在本次作业中，我们将模块化的实现我们的全连接神经网络，每一层网络，我们将实现前向传播forward() 和反向传播 backward()。

其中，forward()接收输入和权重以及必要的其它参数，然后返回一个输出，以及存储我们在反向传播过程中需要的变量，即类似于：

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output

  cache = (x, w, z, out) # Values we need to compute gradients

  return out, cache

而反向传播过程backward()将接收上流梯度以及之前存储的变量，然后返回输入以及权重的梯度：

def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache

  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w

  return dx, dw

2. 初始化作业环境

下载数据集
安装必要的包
注意：gnureadline==6.3.3 在windows下不支持，直接不安装就行；其它也并不都是必须的，可选择性安装，但是要有Numpy、Cython、Future等。

cd assignment2
pip install -r requirements.txt

编译Cython扩展：因为卷积神经网络需要一些高效的操作，所以官方已经用Cython实现了必要的操作，例如im2col.py。我们要做的就是先编译这个文件，即：在cs231n目录下，运行setup.py

python setup.py build_ext --inplace

初始化Jupyter notebook环境

# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

加载数据：

# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in list(data.items()):
  print(('%s: ' % k, v.shape))

('X_val: ', (1000, 3, 32, 32))
('y_test: ', (1000,))
('y_train: ', (49000,))
('X_test: ', (1000, 3, 32, 32))
('X_train: ', (49000, 3, 32, 32))
('y_val: ', (1000,))

3. 实现全连接层(Affine Layer)

3.1 前向传播

Open the file cs231n/layers.py and implement the affine_forward function.

def affine_forward(x, w, b):
    """
    Computes the forward pass for an affine (fully-connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    """
    out = None
    # reshape the input into rows.
    x = x.reshape(x.shape[0],-1) # [N , D]
    out = np.dot(x,w) + b # [N , M]
    cache = (x, w, b)
    return out, cache

3.2 反向传播

Now implement the affine_backward function and test your implementation using numeric gradient checking.

def affine_backward(dout, cache):
    """
    Computes the backward pass for an affine layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    x_rows = x.reshape(x.shape[0],-1)
    d_xrows = np.dot(dout,w.T)
    dx = d_xrows.reshape(x.shape)
    dw = np.dot(x_rows.T, dout)
    # 注意，这里的db没有对行取平均
    db = np.sum(dout, axis=0)
    return dx, dw, db

4. ReLU激活函数

def relu_forward(x):
    """
    Computes the forward pass for a layer of rectified linear units (ReLUs).

    Input:
    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    """
    out = np.maximum(0,x)
    cache = x
    return out, cache


def relu_backward(dout, cache):
    """
    Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """
    dx, x = None, cache
    dx = (x > 0) * dout
    return dx

5. “Sandwich” layers

我们经常会在FC层后面添加ReLU激活函数，请在cs231n/layer_utils.py.中实现该组合。也算是简单的模型设计组合练习。

def affine_relu_forward(x, w, b):
    """
    Convenience layer that perorms an affine transform followed by a ReLU

    Inputs:
    - x: Input to the affine layer
    - w, b: Weights for the affine layer

    Returns a tuple of:
    - out: Output from the ReLU
    - cache: Object to give to the backward pass
    """
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache


def affine_relu_backward(dout, cache):
    """
    Backward pass for the affine-relu convenience layer
    """
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db

6. 损失层(Loss Layer)

You implemented these loss functions in the last assignment, so we’ll give them to you for free here. You should still make sure you understand how they work by looking at the implementations in cs231n/layers.py.
居然还有这种好事，不过大家可以和前面自己实现的比较一下。

def svm_loss(x, y):
    """
    Computes the loss and gradient using for multiclass SVM classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx


def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    shifted_logits = x - np.max(x, axis=1, keepdims=True)
    Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)
    log_probs = shifted_logits - np.log(Z)
    probs = np.exp(log_probs)
    N = x.shape[0]
    loss = -np.sum(log_probs[np.arange(N), y]) / N
    dx = probs.copy()
    dx[np.arange(N), y] -= 1
    dx /= N
    return loss, dx

损失层主要放在最后，所以没有保存中间变量，直接返回损失和回传梯度。

7. 两层神经网络（Two-layer network）

用我们之前实现的层来搭建一个二层全连接神经网络
Open the file cs231n/classifiers/fc_net.py and complete the implementation of the TwoLayerNet class. This class will serve as a model for the other networks you will implement in this assignment, so read through it to make sure you understand the API. You can run the cell below to test your implementation.
AFFINE->RELU->AFFINE->Softmax
同时，这个Module不进行具体的梯度下降优化，它之后会与一个Solver交互来进行具体的梯度下降。
而且，它需要学习的参数存储在一个字典self.params中

class TwoLayerNet(object):
    """
    A two-layer fully-connected neural network with ReLU nonlinearity and
    softmax loss that uses a modular layer design. We assume an input dimension
    of D, a hidden dimension of H, and perform classification over C classes.

    The architecure should be affine - relu - affine - softmax.

    Note that this class does not implement gradient descent; instead, it
    will interact with a separate Solver object that is responsible for running
    optimization.

    The learnable parameters of the model are stored in the dictionary
    self.params that maps parameter names to numpy arrays.
    """

    def __init__(self, input_dim=3*32*32, hidden_dim=100, num_classes=10,
                 weight_scale=1e-3, reg=0.0):
        """
        Initialize a new network.

        Inputs:
        - input_dim: An integer giving the size of the input
        - hidden_dim: An integer giving the size of the hidden layer
        - num_classes: An integer giving the number of classes to classify
        - weight_scale: Scalar giving the standard deviation for random
          initialization of the weights.
        - reg: Scalar giving L2 regularization strength.
        """
        self.params = {}
        self.reg = reg

        # TODO: Initialize the weights and biases of the two-layer net. Weights    #
        # should be initialized from a Gaussian centered at 0.0 with               #
        # standard deviation equal to weight_scale, and biases should be           #
        # initialized to zero.
        self.params["W1"] = np.random.randn(input_dim,hidden_dim) * weight_scale
        self.params["b1"] = np.zeros_like(hidden_dim)
        self.params["W2"] = np.random.randn(hidden_dim,num_classes) * weight_scale
        self.params["b2"] = np.zeros_like(num_classes)
    def loss(self, X, y=None):
        """
        Compute loss and gradient for a minibatch of data.

        Inputs:
        - X: Array of input data of shape (N, d_1, ..., d_k)
        - y: Array of labels, of shape (N,). y[i] gives the label for X[i].

        Returns:
        If y is None, then run a test-time forward pass of the model and return:
        - scores: Array of shape (N, C) giving classification scores, where
          scores[i, c] is the classification score for X[i] and class c.

        If y is not None, then run a training-time forward and backward pass and
        return a tuple of:
        - loss: Scalar value giving the loss
        - grads: Dictionary with the same keys as self.params, mapping parameter
          names to gradients of the loss with respect to those parameters.
        """
        ############################################################################
        # TODO: Implement the forward pass for the two-layer net, computing the#
        # class scores for X and storing them in the scores variable.              #
        ############################################################################
        H , cache_layer1 = affine_relu_forward(X,self.params["W1"],self.params["b1"])
        scores , cache_layer2 = affine_forward(H, self.params["W2"], self.params["b2"])
        # If y is None then we are in test mode so just return scores
        if y is None:
            return scores

        loss, grads = 0, {}
        ############################################################################
        # TODO: Implement the backward pass for the two-layer net. Store the loss  #
        # in the loss variable and gradients in the grads dictionary. Compute data #
        # loss using softmax, and make sure that grads[k] holds the gradients for  #
        # self.params[k]. Don't forget to add L2 regularization!                   #
        #                                                                          #
        # NOTE: To ensure that your implementation matches ours and you pass the   #
        # automated tests, make sure that your L2 regularization includes a factor #
        # of 0.5 to simplify the expression for the gradient.                      #
        ############################################################################
        loss, dS = softmax_loss(scores,y)
        loss += 0.5 * self.reg * np.sum(self.params["W1"] * self.params["W1"])
        loss += 0.5 * self.reg * np.sum(self.params["W2"] * self.params["W2"]) # 添加正则项

        dH , dW2 , grads["b2"] = affine_backward(dS,cache_layer2)
        dx, dW1,grads["b1"] = affine_relu_backward(dH , cache_layer1)

        grads["W1"] = dW1 + self.reg * self.params["W1"]
        grads["W2"] = dW2 + self.reg * self.params["W2"] # 正则项损失
        return loss, grads

易错点：最后的损失要加上正则损失，且正则损失要乘以一个正则强度；同时，权重的梯度需要加上一个正则梯度

8. 优化器（Solver）

for this assignment we have split the logic for training models into a separate class.
Open the file cs231n/solver.py and read through it to familiarize yourself with the API. After doing so, use a Solver instance to train a TwoLayerNet that achieves at least 50% accuracy on the validation set.

8.1 Solver剖析

这个抽象分离得很好，前面的模型API负责接收数据，计算损失和梯度，这里的Solver复杂将数据组织进行输入，以及返回梯度并使用给定的优化方法更新梯度。这样，我们能用相同的Solver抽象来优化不同的模型，即 这是所有模型公共的部分，所以可以单独抽象出来。

__init__(self, model, data, **kwargs)
初始化方法接收我们的模型实例，以及数据输入，同时可选参数包括我们的训练参数，以及优化方法的参数（因为有些优化方法是有参数的，最基本的就是学习率）

参数名	具体含义
update_rule：string	优化方法名字，会传给`optim.py`
optim_config：dict	存储优化器的参数，例如学习率
lr_decay	每个epoch后学习率的衰减尺度
batch_size	每个batch样本数目
num_epochs	总的epoch数目
print_every	打印训练损失的迭代(iterations)间隔
verbose	是否打印训练的状态
num_train_samples	每个epoch之后拿来计算精度的训练样本数
num_val_samples	每个epoch之后拿来计算精度的验证集样本数
checkpoint_name	保存的模型检查点名字，需要包括模型参数和我们的配置参数，便于下次恢复训练

然后init()方法暂存这些输入的参数

    def __init__(self, model, data, **kwargs):
        """
        Construct a new Solver instance.
        """
        self.model = model
        self.X_train = data['X_train']
        self.y_train = data['y_train']
        self.X_val = data['X_val']
        self.y_val = data['y_val']

        # Unpack keyword arguments
        self.update_rule = kwargs.pop('update_rule', 'sgd')
        self.optim_config = kwargs.pop('optim_config', {})
        self.lr_decay = kwargs.pop('lr_decay', 1.0)
        self.batch_size = kwargs.pop('batch_size', 100)
        self.num_epochs = kwargs.pop('num_epochs', 10)
        self.num_train_samples = kwargs.pop('num_train_samples', 1000)
        self.num_val_samples = kwargs.pop('num_val_samples', None)

        self.checkpoint_name = kwargs.pop('checkpoint_name', None)
        self.print_every = kwargs.pop('print_every', 10)
        self.verbose = kwargs.pop('verbose', True)

        # Throw an error if there are extra keyword arguments
        if len(kwargs) > 0:
            extra = ', '.join('"%s"' % k for k in list(kwargs.keys()))
            raise ValueError('Unrecognized arguments %s' % extra)

        # Make sure the update rule exists, then replace the string
        # name with the actual function
        if not hasattr(optim, self.update_rule):
            raise ValueError('Invalid update_rule "%s"' % self.update_rule)
        # now, it's a function
        self.update_rule = getattr(optim, self.update_rule)

        self._reset()

然后init()方法调用了_reset()方法，其初始化了一些训练时的状态参数，并且给模型中的每一个参数都设置了优化规则的参数，方便之后传模型参数+优化参数给optim来进行权重更新

    def _reset(self):
        """
        Set up some book-keeping variables for optimization. Don't call this
        manually.
        """
        # Set up some variables for book-keeping
        self.epoch = 0
        self.best_val_acc = 0
        self.best_params = {}
        self.loss_history = []
        self.train_acc_history = []
        self.val_acc_history = []

        # Make a deep copy of the optim_config for each parameter
        self.optim_configs = {}
        for p in self.model.params:
            d = {k: v for k, v in self.optim_config.items()}
            self.optim_configs[p] = d

然后是_step()方法，其使用一个batchsize的数据对权重参数进行更新

    def _step(self):
        """
        Make a single gradient update. This is called by train() and should not
        be called manually.
        """
        # Make a minibatch of training data
        num_train = self.X_train.shape[0]
        batch_mask = np.random.choice(num_train, self.batch_size)
        X_batch = self.X_train[batch_mask]
        y_batch = self.y_train[batch_mask]

        # Compute loss and gradient
        loss, grads = self.model.loss(X_batch, y_batch)
        self.loss_history.append(loss)

        # Perform a parameter update
        for p, w in self.model.params.items():
            dw = grads[p]
            config = self.optim_configs[p]
            next_w, next_config = self.update_rule(w, dw, config)
            self.model.params[p] = next_w
            self.optim_configs[p] = next_config

_save_checkpoint()函数保存当前模型参数以及配置参数

        if self.checkpoint_name is None: return
        checkpoint = {
          'model': self.model,
          'update_rule': self.update_rule,
          'lr_decay': self.lr_decay,
          'optim_config': self.optim_config,
          'batch_size': self.batch_size,
          'num_train_samples': self.num_train_samples,
          'num_val_samples': self.num_val_samples,
          'epoch': self.epoch,
          'loss_history': self.loss_history,
          'train_acc_history': self.train_acc_history,
          'val_acc_history': self.val_acc_history,
        }
        filename = '%s_epoch_%d.pkl' % (self.checkpoint_name, self.epoch)
        if self.verbose:
            print('Saving checkpoint to "%s"' % filename)
        with open(filename, 'wb') as f:
            pickle.dump(checkpoint, f)

check_accuracy(X, y, num_samples=None, batch_size=100) 在每个epoch之后，从X中抽取样本num_samples计算准确度，而且为了防止数据量过大，一次只取一个batchsize来计算

    def check_accuracy(self, X, y, num_samples=None, batch_size=100):
        """
        Check accuracy of the model on the provided data.

        Inputs:
        - X: Array of data, of shape (N, d_1, ..., d_k)
        - y: Array of labels, of shape (N,)
        - num_samples: If not None, subsample the data and only test the model
          on num_samples datapoints.
        - batch_size: Split X and y into batches of this size to avoid using
          too much memory.

        Returns:
        - acc: Scalar giving the fraction of instances that were correctly
          classified by the model.
        """
        # Maybe subsample the data
        N = X.shape[0]
        if num_samples is not None and N > num_samples:
            mask = np.random.choice(N, num_samples)
            N = num_samples
            X = X[mask]
            y = y[mask]

        # Compute predictions in batches
        num_batches = N // batch_size
        if N % batch_size != 0:
            num_batches += 1
        y_pred = []
        for i in range(num_batches):
            start = i * batch_size
            end = (i + 1) * batch_size
            scores = self.model.loss(X[start:end])
            y_pred.append(np.argmax(scores, axis=1))
        y_pred = np.hstack(y_pred)
        acc = np.mean(y_pred == y)

最后就是我们外部真正可以调用的train()方法了，首先，其根据我们的batchsize来计算我么进行一个epoch需要多少次迭代iteration，然后计算总的迭代数。每个epoch之后，我们都衰减学习率learning_rate，且每个epoch之后，我们都计算一下训练集和验证集的精确度。

    def train(self):
        """
        Run optimization to train the model.
        """
        num_train = self.X_train.shape[0]
        iterations_per_epoch = max(num_train // self.batch_size, 1)
        num_iterations = self.num_epochs * iterations_per_epoch

        for t in range(num_iterations):
            self._step()

            # Maybe print training loss
            if self.verbose and t % self.print_every == 0:
                print('(Iteration %d / %d) loss: %f' % (
                       t + 1, num_iterations, self.loss_history[-1]))

            # At the end of every epoch, increment the epoch counter and decay
            # the learning rate.
            epoch_end = (t + 1) % iterations_per_epoch == 0
            if epoch_end:
                self.epoch += 1
                for k in self.optim_configs:
                    self.optim_configs[k]['learning_rate'] *= self.lr_decay

            # Check train and val accuracy on the first iteration, the last
            # iteration, and at the end of each epoch.
            first_it = (t == 0)
            last_it = (t == num_iterations - 1)
            if first_it or last_it or epoch_end:
                train_acc = self.check_accuracy(self.X_train, self.y_train,
                    num_samples=self.num_train_samples)
                val_acc = self.check_accuracy(self.X_val, self.y_val,
                    num_samples=self.num_val_samples)
                self.train_acc_history.append(train_acc)
                self.val_acc_history.append(val_acc)
                self._save_checkpoint()

                if self.verbose:
                    print('(Epoch %d / %d) train acc: %f; val_acc: %f' % (
                           self.epoch, self.num_epochs, train_acc, val_acc))

                # Keep track of the best model
                if val_acc > self.best_val_acc:
                    self.best_val_acc = val_acc
                    self.best_params = {}
                    for k, v in self.model.params.items():
                        self.best_params[k] = v.copy()

        # At the end of training swap the best params into the model
        self.model.params = self.best_params

8.2 实际训练

设置好优化器的参数，我们实际训练，注意我刚开始在训练的时候使用的是默认的学习率1e-2，发现权重更新太大，很快损失就增长到 nan了

model = TwoLayerNet()

##############################################################################
# TODO: Use a Solver instance to train a TwoLayerNet that achieves at least  #
# 50% accuracy on the validation set.                                        #
##############################################################################
solver = Solver(model,data,
                num_epochs = 30,
                update_rule = 'sgd',
                optim_config = {
                    'learning_rate': 1e-3
                },
                lr_decay = 0.95,
                batch_size = 128,
                print_every = 100
               )
solver.train()

可视化训练过程

plt.subplot(2, 1, 1)
plt.title('Training loss')
plt.plot(solver.loss_history, 'o')
plt.xlabel('Iteration')

plt.subplot(2, 1, 2)
plt.title('Accuracy')
plt.plot(solver.train_acc_history, '-o', label='train')
plt.plot(solver.val_acc_history, '-o', label='val')
plt.plot([0.5] * len(solver.val_acc_history), 'k--')
plt.xlabel('Epoch')
plt.legend(loc='lower right')
plt.gcf().set_size_inches(15, 12)
plt.show()

9. 多层神经网络（Multilayer network）

这次我们实现一个含有任意隐藏层的全连接神经网络
Read through the FullyConnectedNet class in the file cs231n/classifiers/fc_net.py.

# 第一步可以先不实现BN和Dropout
class FullyConnectedNet(object):
    """
    A fully-connected neural network with an arbitrary number of hidden layers,
    ReLU nonlinearities, and a softmax loss function. This will also implement
    dropout and batch/layer normalization as options. For a network with L layers,
    the architecture will be

    {affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax

    where batch/layer normalization and dropout are optional, and the {...} block is
    repeated L - 1 times.

    Similar to the TwoLayerNet above, learnable parameters are stored in the
    self.params dictionary and will be learned using the Solver class.
    """

    def __init__(self, hidden_dims, input_dim=3*32*32, num_classes=10,
                 dropout=1, normalization=None, reg=0.0,
                 weight_scale=1e-2, dtype=np.float32, seed=None):
        
        self.normalization = normalization
        self.use_dropout = dropout != 1
        self.reg = reg
        self.num_layers = 1 + len(hidden_dims)
        self.dtype = dtype
        self.params = {}
        
        input_size = input_dim
        for i in range(len(hidden_dims)):
            output_size = hidden_dims[i]
            self.params['W' + str(i+1)] = np.random.randn(input_size,output_size) * weight_scale
            self.params['b' + str(i+1)] = np.zeros(output_size)
            if not self.normalization is None:
                self.params['gamma' + str(i+1)] = np.ones(output_size)
                self.params['beta' + str(i+1)] = np.zeros(output_size)
            input_size = output_size # 下一层的输入
        # 输出层，没有BN操作
        self.params['W' + str(self.num_layers)] = np.random.randn(input_size,num_classes) * weight_scale
        self.params['b' + str(self.num_layers)] = np.zeros(num_classes)
        # When using dropout we need to pass a dropout_param dictionary to each
        # dropout layer so that the layer knows the dropout probability and the mode
        # (train / test). You can pass the same dropout_param to each dropout layer.
        self.dropout_param = {}
        if self.use_dropout:
            self.dropout_param = {'mode': 'train', 'p': dropout}
            if seed is not None:
                self.dropout_param['seed'] = seed

        self.bn_params = []
        if self.normalization=='batchnorm':
            self.bn_params = [{'mode': 'train'} for i in range(self.num_layers - 1)]
        if self.normalization=='layernorm':
            self.bn_params = [{} for i in range(self.num_layers - 1)]

        # Cast all parameters to the correct datatype
        for k, v in self.params.items():
            self.params[k] = v.astype(dtype)


    def loss(self, X, y=None):
        """
        Compute loss and gradient for the fully-connected net.

        Input / output: Same as TwoLayerNet above.
        """
        X = X.astype(self.dtype)
        mode = 'test' if y is None else 'train'

        # Set train/test mode for batchnorm params and dropout param since they
        # behave differently during training and testing.
        if self.use_dropout:
            self.dropout_param['mode'] = mode
        if self.normalization=='batchnorm':
            for bn_param in self.bn_params:
                bn_param['mode'] = mode
        
        cache = {} # 需要存储反向传播需要的参数
        hidden = X
        for i in range(self.num_layers - 1):
            if self.normalization :
                pass
            else:
                hidden , cache[i+1] = affine_relu_forward(hidden,self.params['W' + str(i+1)],
                                                          self.params['b' + str(i+1)])
            if self.use_dropout:
                pass
        # 最后一层不用激活
        scores, cache[self.num_layers] = affine_forward(hidden , self.params['W' + str(self.num_layers)],
                                                       self.params['b' + str(self.num_layers)])

        # If test mode return early
        if mode == 'test':
            return scores
            
        loss, grads = 0.0, {}
        loss, dS = softmax_loss(scores , y)
        # 最后一层没有relu激活
        dhidden, grads['W' + str(self.num_layers)], grads['b' + str(self.num_layers)] \
            = affine_backward(dS,cache[self.num_layers])
        loss += 0.5 * self.reg * np.sum(self.params['W' + str(self.num_layers)] * self.params['W' + str(self.num_layers)])
        grads['W' + str(self.num_layers)] += self.reg * self.params['W' + str(self.num_layers)]

        for i in range(self.num_layers - 1, 0, -1):
            loss += 0.5 * self.reg * np.sum(self.params["W" + str(i)] * self.params["W" + str(i)])
            # 倒着求梯度
            if self.use_dropout:
                pass
            if self.normalization:
                pass
            else:
                dhidden, dw, db = affine_relu_backward(dhidden, cache[i])
            grads['W' + str(i)] = dw + self.reg * self.params['W' + str(i)]
            grads['b' + str(i)] = db
        return loss, grads

然后我们试着训练来过拟合一个小的数据集（来检验一个完整性）
不过这个部分有点坑，只调学习率和初始化因子来让20个epoch之内完全过拟合训练集，大家看着调把，实在不行，先随机搜索一下，看哪个学习率区间内能做到过拟合。而且，发现 权重初始化也特别重要，相同的学习率，有些权重初始化就直接崩掉了。

from random import uniform
for i in range(10):
	lr = 10**unifor(-1,-3)
	weight_scale = 10**uniform(-1,-4)
	test_it()

10. 更新规则（Update-rule）

之前，我们训练神经网络都是使用了朴素的sgd算法，这一节，我们会实现其它更新规则来更新我们的参数，它们被定义在optim.py

10.1 SGD+Momentum

梯度影响当前速度，而不直接影响位置

def sgd_momentum(w, dw, config=None):
    """
    Performs stochastic gradient descent with momentum.

    config format:
    - learning_rate: Scalar learning rate.
    - momentum: Scalar between 0 and 1 giving the momentum value.
      Setting momentum = 0 reduces to sgd.
    - velocity: A numpy array of the same shape as w and dw used to store a
      moving average of the gradients.
    """
    if config is None: config = {}
    config.setdefault('learning_rate', 1e-2)
    config.setdefault('momentum', 0.9)
    v = config.get('velocity', np.zeros_like(w))

    # 梯度影响速度，而不直接影响位置
    v = config['momentum'] * v - config['learning_rate'] * dw
    next_w = w + v
    config['velocity'] = v
    return next_w, config

10.2 RMSProp and Adam

RMSProp 是维持一个第二动量，即梯度的平方和，使得梯度大得方向更新稍微缓慢一点，梯度小的方向更新稍微快一点。同时第二动量也会随着时间消减。

def rmsprop(w, dw, config=None):
    """
    Uses the RMSProp update rule, which uses a moving average of squared
    gradient values to set adaptive per-parameter learning rates.

    config format:
    - learning_rate: Scalar learning rate.
    - decay_rate: Scalar between 0 and 1 giving the decay rate for the squared
      gradient cache.
    - epsilon: Small scalar used for smoothing to avoid dividing by zero.
    - cache: Moving average of second moments of gradients.
    """
    if config is None: config = {}
    config.setdefault('learning_rate', 1e-2)
    config.setdefault('decay_rate', 0.99)
    config.setdefault('epsilon', 1e-8)
    config.setdefault('cache', np.zeros_like(w))
    
    config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * dw * dw
    next_w = w - config['learning_rate'] * dw / (np.sqrt(config['cache'] + config['epsilon']))
    return next_w, config

Adam 使同时维持一个第一动量（类似于之前sgd里的动量）和一个第二动量（即梯度的平方和）。这里，我们还需要设置一个偏置修正项，防止刚开始第一动量大，第二动量小的时候学习率太大。

def adam(w, dw, config=None):
    """
    Uses the Adam update rule, which incorporates moving averages of both the
    gradient and its square and a bias correction term.

    config format:
    - learning_rate: Scalar learning rate.
    - beta1: Decay rate for moving average of first moment of gradient.
    - beta2: Decay rate for moving average of second moment of gradient.
    - epsilon: Small scalar used for smoothing to avoid dividing by zero.
    - m: Moving average of gradient.
    - v: Moving average of squared gradient.
    - t: Iteration number.
    """
    if config is None: config = {}
    config.setdefault('learning_rate', 1e-3)
    config.setdefault('beta1', 0.9)
    config.setdefault('beta2', 0.999)
    config.setdefault('epsilon', 1e-8)
    config.setdefault('m', np.zeros_like(w))
    config.setdefault('v', np.zeros_like(w))
    config.setdefault('t', 0)

    # 第一动量
    config['m'] = config['beta1'] * config['m'] + (1 - config['beta1']) * dw
    # 第二动量
    config['v'] = config['beta2'] * config['v'] + (1 - config['beta2']) * dw * dw
    # 偏置修正
    config['t'] += 1
    m_unbias = config['m'] / (1 - config['beta1'] ** config['t'])
    v_unbias = config['v'] / (1 - config['beta2'] ** config['t'])
    # 更新参数
    next_w = w - m_unbias * config['learning_rate'] / (np.sqrt(v_unbias) + config['epsilon'] )

    return next_w, config

注意：在后续实验中，我发现不加偏置修正的Adam刚开始性能确实会下降。而加了修正项的Adam的比较应该如下图：

11. Train a good model!

实现批量归一化BatchNormalization.ipynb和随机失活Dropout.ipynb后再来构建一个更复杂的网络，在验证集上得到至少50%的精度。

作业提问

Q1: We’ve only asked you to implement ReLU, but there are a number of different activation functions that one could use in neural networks, each with its pros and cons. In particular, an issue commonly seen with activation functions is getting zero (or close to zero) gradient flow during backpropagation. Which of the following activation functions have this problem? If you consider these functions in the one dimensional case, what types of input would lead to this behaviour?

Sigmoid
ReLU
Leaky ReLU

A1: sigmoid会出现正饱和和负饱和，当输入值很大或者很小时；ReLU在负半轴会出现饱和。
Q2: Did you notice anything about the comparative difficulty of training the three-layer net vs training the five layer net? In particular, based on your experience, which network seemed more sensitive to the initialization scale? Why do you think that is the case?
A2： 之前用sgd训练5层神经网络的时候，发现权重初始化特别重要：1）初始化太小，可能导致某一层的神经元没有梯度传回来；2）初始化太大，导致某一层的神经元发生梯度爆炸。因为权重是要不断参与矩阵乘法的，网络越深，权重初始化越重要。
Q3： AdaGrad, like Adam, is a per-parameter optimization method that uses the following update rule:

cache += dw**2
w += - learning_rate * dw / (np.sqrt(cache) + eps)

John notices that when he was training a network with AdaGrad that the updates became very small, and that his network was learning slowly. Using your knowledge of the AdaGrad update rule, why do you think the updates would become very small? Would Adam have the same issue?
A3: AdaGrad的动量不会随着时间的推移而消减，当动量累积越来越大的时候，更新步长就会越来越小，所以需要一个衰减因子。而Adam就不会出现这种问题。

代码记录

.py的名字是一个模块名，其里面的方法可以看作其属性，用hasattr(object, name)判断模块是否有某成员函数，用getattr(object, name[, default])返回模块名为name的函数
使用raise抛出异常
使用pickle.dump()方法来序列化字典进行存储
np.hstack()

你可能感兴趣的:(#,CS231n)

cs231n_深度之眼第二次作业 Jie_Cheney
图像分类数据和label分别是什么？图像分类存在的问题与挑战？图像分类数据包括训练集测试集的数据，在有监督的问题中对于训练集数据来说是有label的，而测试集是等待我们去识别它的类别，不具有label。label就是分类标签，比如cifar10这个数据集，待分类的这10类数据我们可以写成1-10，或者0-9这就叫做label。图像分类存在的问题与挑战：光照，角度，形变，遮挡。使用python加载一
向量，矩阵和张量的导数 | 简单的数学橘子学AI
前段时间看过一些矩阵求导的教程，在看过的资料中，尤其喜欢斯坦福大学CS231n卷积神经网络课程中提到的Erik这篇文章。循着他的思路，可以逐步将复杂的求导过程简化、再简化，直到发现其中有规律的部分。话不多说，一起来看看吧。作者：ErikLearned-Miller翻译：橘子来源：橘子AI笔记（datawitch）本文旨在帮助您学习向量、矩阵和高阶张量（三维或三维以上的数组）的求导方法，以及如何求对
cs231n assignment1——SVM 柠檬山楂荷叶茶 cs231n 支持向量机 python 机器学习
整体思路加载CIFAR-10数据集并展示部分数据数据图像归一化，减去均值（也可以再除以方差）svm_loss_naive和svm_loss_vectorized计算hinge损失，用拉格朗日法列hinge损失函数利用随机梯度下降法优化SVM在训练集和验证集计算准确率，保存最好的模型在测试集进行预测计算准确率加载展示划分数据集加载CIFAR-10数据集#LoadtherawCIFAR-10data.
（2023版）斯坦福CS231n学习笔记：DL与CV教程 (12) | 视觉模型可视化与可解释性（Visualizing and Understanding）女王の专属领地计算机视觉 #计算机视觉 #学习笔记
前言笔记专栏：斯坦福CS231N：面向视觉识别的卷积神经网络（23）课程链接：https://www.bilibili.com/video/BV1xV411R7i5CS231n:深度学习计算机视觉（2017）中文笔记：https://zhuxiaoxia.blog.csdn.net/article/details/801551662023最新课程PPT：https://download.csdn.
2019-02-25~~2019-03-03 第十周周末复盘仰望星空的小狗
一、任务清单1、刷leetcode题目（7道）2、听tensorflow，cs231n和cv课程3、技术文档输出4、恢复早起的作息二、反思1、自从年前工作非常忙，加上遇上一些郁闷的事情，导致年前到现在时间记录中断了很长一段时间。本周开始恢复时间记录，日打卡，周复盘。2、生活中不论谁，肯定会时不时遇上一些令人郁闷的事情，这些郁闷的事情很可能会打乱原本的生活节奏。但是，生活还有很长的路要走，不应该因为
训练神经网络(上)激活函数笔写落去深度学习神经网络人工智能深度学习
本文介绍几种激活函数,只作为个人笔记.观看视频为cs231n文章目录前言一、Sigmoid函数二、tanh函数三、ReLU函数四、LeakyReLU函数五、ELU函数六.在实际应用中寻找激活函数的做法总结前言激活函数是用来加入非线性因素的，提高神经网络对模型的表达能力，解决线性模型所不能解决的问题。一、Sigmoid函数这个函数大家应该熟悉在逻辑回归中曾用到这个sigmoid函数这个函数可以将负无
卷积神经网络 weixin_34283445 人工智能
https://zhuanlan.zhihu.com/p/27642620关于卷积神经网络的讲解，网上有很多精彩文章，且恐怕难以找到比斯坦福的CS231n还要全面的教程。所以这里对卷积神经网络的讲解主要是以不同的思考侧重展开，通过对卷积神经网络的分析，进一步理解神经网络变体中“因素共享”这一概念。注意：该文会跟其他的现有文章有很大的不同。读该文需要有本书前些章节作为预备知识，不然会有理解障碍。没看
CS231n 作业答案 tech0ne
CS231n三次大作业：#第一次作业##原始包下载：作业一完成包地址：作业一JupyterNotebook结果：KNNSVMSoftmaxTwolayernetFeatures第二次作业原始包下载：作业二完成包地址：作业二JupyterNotebook结果：FullyConnectedNetsBatchNormalizationDropoutConvolutionalNetworksTensorf
cs231n作业-assignment1 momentum_ AI python 机器学习 numpy
assignment1(cs231n)文章目录assignment1(cs231n)KNN基础计算distances方法一：双层循环计算distances方法二：单层循环计算distances方法三：无循环根据dists找到每个测试样本的种类KNN模型汇总交叉验证KNN基础计算distances方法一：双层循环dists矩阵是（num_test,num_train）500*5000defcompu
【深度学习理论】(1) 损失函数立Sir 深度学习理论机器学习人工智能神经网络深度学习损失函数
各位同学好，最近学习了CS231N斯坦福计算机视觉公开课，讲的太精彩了，和大家分享一下。已知一张图像属于各个类别的分数，我们希望图像属于正确分类的分数是最大的，那如何定量的去衡量呢，那就是损失函数的作用了。通过比较分数与真实标签的差距，构造损失函数，就可以定量的衡量模型的分类效果，进而进行后续的模型优化和评估。构造损失函数之后，我们的目标就是将损失函数的值最小化，使用梯度下降的方法求得损失函数对于
线性分类器--数据处理骆驼穿针眼计算机视觉与深度学习深度学习
数据集划分通常按照70%，20%，10%来分数据集数据处理斯坦福的线性分类器体验http://vision.stanford.edu/teaching/cs231n-demos/linear-classify/
【CS231n】－学习笔记-1-Intro to Computer Vision, historical context. Alice熹爱学习计算机视觉计算机视觉 CS231n DeepLearning PYTHON
Class:http://cs231n.stanford.eduSchedule:http://cs231n.stanford.edu/syllabus.htmlSlides:http://vision.stanford.edu/teaching/cs231n/slides/winter1516_lecture1.pdfVideo:https://www.youtube.com/watch?v=N
笔记00-杜克大学公开课,图像和视频处理:从火星到好莱坞木木爱吃糖醋鱼
笔记内容介绍》ImageandVideoProcessing:FromMarstoHollywoodwithaStopattheHospital算起来是2017年中的时候，因为要搞深度学习的东西，就自学了斯坦福cs231n的神经网络的课。Youtube上有至少两期的公开课视频。好像从李飞飞离职之后，截止到2017年春季，就没再继续了。现在想想哪门课的内容挺多挺繁杂的。虽然是本科的课，最后好像每个学
向量对向量求导，链式法则构建的乐趣向量对向量求导
这还算不得向量微积分里多么主干的内容，只是一个小技术，但是数学推导很多时候就会用到。http://cs231n.stanford.edu/vecDerivs.pdf这个文献是一个好文献。另优秀翻译：https://zhuanlan.zhihu.com/p/142668996链式法则注意：这里的乘法变成了innerproduct推导过程中比较关键的点：除了利用这文献所讲的分量慢慢推，还有一个要点，首
Win10上关于cs231n（2017）课后作业的环境配置 Diane小山
开始首先，这篇文章是针对那些想完成cs231n作业，但是觉得装linux双系统很麻烦的童鞋。cs231n作业的SetUp官方教程只针对了那些使用Unix(Ubuntu,Macos等)的人，对使用Windows的人十分不友好。安装anaconda百度一篇anaconda的安装教程，照着安装即可。这里需要提醒的有两点：国内的anaconda镜像能用的基本都挂了，所以还是老老实实去官方网站下载吧：）一定
CS231N assignment2 SVM weixin_30363509 数据结构与算法人工智能 python
CS231NAssignment2SupportVectorMachineBegin本文主要介绍CS231N系列课程的第一项作业，写一个SVM无监督学习训练模型。课程主页：网易云课堂CS231N系列课程语言：Python3.61线形分类器以图像为例，一幅图像像素为32*32*3代表长32宽32有3通道的衣服图像，将其变为1*3072的一个向量，即该图像的特征向量。我们如果需要训练1000幅图像，那
【AI】斯坦福CS231n课程练习（1）—— KNN和SVM分类李清焰 CS231n KNN SVM
文章目录一、前言1、CS231n是啥？2、本篇博客任务3、使用的数据集二、知识准备1、KNN是什么？2、SVM是什么？SVM的组成：三、实验——KNN和SVM分类1、KNN图片分类（重要步骤将在目录上体现）（1）在colab上切换目录，加载dataset（2）加载包、设置和外部模块（3）加载、初步处理数据（4）可视化打印一些图片看看我们的数据集长什么样（5）对测试、训练数据进行分组（6）创建KNN
深度学习系列之cs231n assignment1 KNN（二）明曦君深度学习 python 机器学习
写在前面：久经周折，终于能够将KNN系列给大家继续分享了，这次的内容来源于李飞飞教授团队的cs231n深度学习课程的作业1中的KNN研究，我会在全文我遇到困难的地方进行分享，以及一些想法。内容安排深度学习系列依托与cs231n的课程作业，因为只想练习编程，所以不对课程内容进行分享，仅针对编程内容进行分享。那么这一次的分享就是assignment1中K近邻分类器的使用，以及完成其中的四个问题，这四个
cs231n assignment2(3) 没天赋的学琴
assignment2的第三部分，是熟悉深度学习框架pytorch或者tensorflow，这里选择的是使用pytorch框架。该部分主要通过三个层次：Barebones、ModuleAPI、SequentialAPI，来了解pytorch。Barebones在该层次中，需要利用pytorch所提供的一些函数，不仅需要定义神经网络的结构，同时还需编写网络的前向传播以及模型的训练部分；而参数的梯度可
第三十三周学习笔记 luputo 学习笔记
第三十三周学习笔记CS231nDeepLearningSoftwareCPUvsGPUCPU:Fewercores,buteachcoreismuchfasterandmuchmorecapable;greatatsequentialtasksGPU:Morecores,buteachcoreismuchslowerand“dumber”;greatforparalleltasks（matrixm
CNN(卷积神经网络)、RNN(循环神经网络)、DNN，LSTM weixin_34174132 人工智能
http://cs231n.github.io/neural-networks-1https://arxiv.org/pdf/1603.07285.pdfhttps://adeshpande3.github.io/adeshpande3.github.io/A-Beginner's-Guide-To-Understanding-Convolutional-Neural-Networks/Appli
CNN笔记：通俗理解卷积神经网络 I_O_fly 神经网络 cnn 神经网络深度学习
通俗理解卷积神经网络（cs231n与5月dl班课程笔记）1前言2012年我在北京组织过8期machinelearning读书会，那时“机器学习”非常火，很多人都对其抱有巨大的热情。当我2013年再次来到北京时，有一个词似乎比“机器学习”更火，那就是“深度学习”。本博客内写过一些机器学习相关的文章，但上一篇技术文章“LDA主题模型”还是写于2014年11月份，毕竟自2015年开始创业做在线教育后，太
Knn算法与 Svm算法对比一个不知名的码农支持向量机算法机器学习
Knn算法与Svm算法对比这里首先借用一个博主所做的图表，讲的很有理有据(7条消息)[cs231n]KNN与SVM区别_Rookie’Program的博客-CSDN博客_knn和svm的区别这里我们来讲一下我对这两个算法的理解knn看起来就是比较简单的一个数学模型，就是划范围论，精细程度实际上可能没有svm好，并且测试量也不能大，数据一大，处理起来又很麻烦，预测效率也比较低。相反的svm和knn对
斯坦福大学CS520知识图谱系列课程学习笔记：第一讲什么是知识图谱 ngl567
随着知识图谱在人工智能各个领域的广泛使用，知识图谱受到越来越多AI研究人员的关注和学习，已经成为人工智能迈向认知系统的关键技术之一。之前，斯坦福大学的面向计算机视觉的CS231n和面向自然语言处理的CS224n成为了全球非常多AI研究人员的入门经典学习课程。因此，斯坦福大学于今年3月开设了一门专门面向知识图谱的系列课程CS520，官网课程页：https://web.stanford.edu/cla
北京邮电大学计算机视觉与深度学习鲁鹏计算机视觉概述课程手迹 qinyaoze 机器学习 CV手记计算机视觉人工智能深度学习
课程笔记计算机视觉=输入(认知神经科学-理论,运用方法&算法,硬件)+输出(机器人)课程：图像处理-CS131，图像结构-CS231a，图像理论-CS230/CS231nQ-象棋与人工智能的关系？IBM-深蓝，Google-AlphaGo>>机器赢得象棋胜利=强大的搜索算法目标：语义鸿沟，即建立图像像素核语义间的关系发展过程：系统出现-物种大繁荣>>理论研究-猫视觉神经>>积木世界>>MIT图像处
国外AI大牛推荐的10大最有帮助免费在线机器学习课程机器学习与系统
woman_ml.jpg本文编译自twitter用户chipro斯坦福在线自学课程《概率与统计》：该课程涉及概率统计的基本概念，涵盖机器学习4个基本方面：探索性数据分析，产生数据，概率和推理。MIT的《线性代数》：这是我见过的最好的线性代数课程，由传奇教授GilbertStrang（吉尔伯特斯特朗）教授。斯坦福的CS231N：用于视觉识别的卷积神经网络：平衡理论与实践。课堂笔记写得很好，解释了不同
CS231n学习笔记--计算机视觉历史回顾与介绍1 听城
CS231n简介首先我们来看看官方对这门课的介绍：计算机视觉在社会中已经逐渐普及，并广泛运用于搜索检索、图像理解、手机应用、地图导航、医疗制药、无人机和无人驾驶汽车等领域。而这些应用的核心技术就是图像分类、图像定位和图像探测等视觉识别任务。近期神经网络（也就是“深度学习”）方法上的进展极大地提升了这些代表当前发展水平的视觉识别系统的性能。本课程将深入讲解深度学习框架的细节问题，聚焦面向视觉识别任务
计算机视觉实战项目（图像分类+目标检测+目标跟踪+姿态识别+车道线识别+车牌识别）阿利同学计算机视觉分类目标检测
图像分类教程博客_传送门链接:链接在本教程中，您将学习如何使用迁移学习训练卷积神经网络以进行图像分类。您可以在cs231n上阅读有关迁移学习的更多信息。本文主要目的是教会你如何自己搭建分类模型，耐心看完，相信会有很大收获。废话不多说，直切主题…首先们要知道深度学习大都包含了下面几个方面：1.加载（处理）数据2.网络搭建3.损失函数（模型优化）4模型训练和保存把握好这些主要内容和流程，基本上对分类模
cs231n assignment2(2) 没天赋的学琴
assignment2的第二部分的内容，实现一个卷积神经网络。这一部分主要是实现卷积神经网络中的一些所需用到的layer类型：卷积层(convolution)和池化层(这里是实现max-pooling)。这部分的实现是不考虑其运行效率，而在真正的实现应用上，卷积神经网络的运行效率是一个很重要的问题。卷积层卷积层是由一个个过滤器(filter)，每个过滤器的尺寸为:，这里的的大小与输入的图像或act
cs231n作业：Assignment1-Softmax Diane小山
softmax.pydefsoftmax_loss_naive(W,X,y,reg):"""Softmaxlossfunction,naiveimplementation(withloops)InputshavedimensionD,thereareCclasses,andweoperateonminibatchesofNexamples.Inputs:-W:Anumpyarrayofshape(
开发者关心的那些事圣子足道 ios 游戏编程 apple 支付
我要在app里添加IAP，必须要注册自己的产品标识符（product identifiers）。产品标识符是什么？产品标识符（Product Identifiers）是一串字符串，它用来识别你在应用内贩卖的每件商品。App Store用产品标识符来检索产品信息，标识符只能包含大小写字母（A-Z）、数字（0-9）、下划线（-）、以及圆点(.)。你可以任意排列这些元素，但我们建议你创建标识符时使用
负载均衡器技术Nginx和F5的优缺点对比 bijian1013 nginx F5
对于数据流量过大的网络中，往往单一设备无法承担，需要多台设备进行数据分流，而负载均衡器就是用来将数据分流到多台设备的一个转发器。目前有许多不同的负载均衡技术用以满足不同的应用需求，如软/硬件负载均衡、本地/全局负载均衡、更高
LeetCode[Math] - #9 Palindrome Number Cwind java Algorithm 题解 LeetCode Math
原题链接：#9 Palindrome Number 要求：判断一个整数是否是回文数，不要使用额外的存储空间难度：简单分析：题目限制不允许使用额外的存储空间应指不允许使用O(n)的内存空间，O(1)的内存用于存储中间结果是可以接受的。于是考虑将该整型数反转，然后与原数字进行比较。注：没有看到有关负数是否可以是回文数的明确结论，例如
画图板的基本实现 15700786134 画图板
要实现画图板的基本功能，除了在qq登陆界面中用到的组件和方法外，还需要添加鼠标监听器，和接口实现。首先，需要显示一个JFrame界面： public class DrameFrame extends JFrame { //显示
linux的ps命令被触发 linux
Linux中的ps命令是Process Status的缩写。ps命令用来列出系统中当前运行的那些进程。ps命令列出的是当前那些进程的快照，就是执行ps命令的那个时刻的那些进程，如果想要动态的显示进程信息，就可以使用top命令。要对进程进行监测和控制，首先必须要了解当前进程的情况，也就是需要查看当前进程，而 ps 命令就是最基本同时也是非常强大的进程查看命令。使用该命令可以确定有哪些进程正在运行
Android 音乐播放器下一曲连续跳几首歌肆无忌惮_ android
最近在写安卓音乐播放器的时候遇到个问题。在MediaPlayer播放结束时会回调 player.setOnCompletionListener(new OnCompletionListener() { @Override public void onCompletion(MediaPlayer mp) { mp.reset(); Log.i("H
java导出txt文件的例子知了ing java servlet
代码很简单就一个servlet,如下： package com.eastcom.servlet; import java.io.BufferedOutputStream; import java.io.IOException; import java.net.URLEncoder; import java.sql.Connection; import java.sql.Resu
Scala stack试玩, 提高第三方依赖下载速度矮蛋蛋 scala sbt
原文地址： http://segmentfault.com/a/1190000002894524 sbt下载速度实在是惨不忍睹, 需要做些配置优化下载typesafe离线包, 保存为ivy本地库 wget http://downloads.typesafe.com/typesafe-activator/1.3.4/typesafe-activator-1.3.4.zip 解压r
phantomjs安装(linux，附带环境变量设置) ，以及casperjs安装。 alleni123 linux spider
1. 首先从官网 http://phantomjs.org/下载phantomjs压缩包，解压缩到/root/phantomjs文件夹。 2. 安装依赖 sudo yum install fontconfig freetype libfreetype.so.6 libfontconfig.so.1 libstdc++.so.6 3. 配置环境变量 vi /etc/profil
JAVA IO FileInputStream和FileOutputStream，字节流的打包输出百合不是茶 java核心思想 JAVA IO操作字节流
在程序设计语言中，数据的保存是基本，如果某程序语言不能保存数据那么该语言是不可能存在的，JAVA是当今最流行的面向对象设计语言之一，在保存数据中也有自己独特的一面，字节流和字符流 1，字节流是由字节构成的，字符流是由字符构成的字节流和字符流都是继承的InputStream和OutPutStream ,java中两种最基本的就是字节流和字符流类 FileInputStream
Spring基础实例（依赖注入和控制反转） bijian1013 spring
前提条件：在http://www.springsource.org/download网站上下载Spring框架，并将spring.jar、log4j-1.2.15.jar、commons-logging.jar加载至工程1.武器接口 package com.bijian.spring.base3; public interface Weapon { void kil
HR看重的十大技能 bijian1013 提升能力 HR 成长
一个人掌握何种技能取决于他的兴趣、能力和聪明程度，也取决于他所能支配的资源以及制定的事业目标，拥有过硬技能的人有更多的工作机会。但是，由于经济发展前景不确定，掌握对你的事业有所帮助的技能显得尤为重要。以下是最受雇主欢迎的十种技能。　　一、解决问题的能力　　每天，我们都要在生活和工作中解决一些综合性的问题。那些能够发现问题、解决问题并迅速作出有效决
【Thrift一】Thrift编译安装 bit1129 thrift
什么是Thrift The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and s
【Avro三】Hadoop MapReduce读写Avro文件 bit1129 mapreduce
Avro是Doug Cutting(此人绝对是神一般的存在）牵头开发的。开发之初就是围绕着完善Hadoop生态系统的数据处理而开展的（使用Avro作为Hadoop MapReduce需要处理数据序列化和反序列化的场景）,因此Hadoop MapReduce集成Avro也就是自然而然的事情。这个例子是一个简单的Hadoop MapReduce读取Avro格式的源文件进行计数统计，然后将计算结果
nginx定制500，502，503，504页面 ronin47 nginx　错误显示
server { listen 80; error_page 500/500.html; error_page 502/502.html; error_page 503/503.html; error_page 504/504.html; location /test {return502;}} 配置很简单，和配
java-1.二叉查找树转为双向链表 bylijinnan 二叉查找树
import java.util.ArrayList; import java.util.List; public class BSTreeToLinkedList { /* 把二元查找树转变成排序的双向链表题目：输入一棵二元查找树，将该二元查找树转换成一个排序的双向链表。要求不能创建任何新的结点，只调整指针的指向。 10 / \ 6 14 / \
Netty源码学习-HTTP-tunnel bylijinnan java netty
Netty关于HTTP tunnel的说明： http://docs.jboss.org/netty/3.2/api/org/jboss/netty/channel/socket/http/package-summary.html#package_description 这个说明有点太简略了一个完整的例子在这里： https://github.com/bylijinnan
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别 coder_xpf jquery json map val()
JSONUtil.serialize(map)和JSON.toJSONString(map)的区别数据库查询出来的map有一个字段为空通过System.out.println()输出 JSONUtil.serialize(map)： {"one":"1","two":"nul
Hibernate缓存总结 cuishikuan 开源 ssh javaweb hibernate缓存三大框架
一、为什么要用Hibernate缓存？ Hibernate是一个持久层框架，经常访问物理数据库。为了降低应用程序对物理数据源访问的频次，从而提高应用程序的运行性能。缓存内的数据是对物理数据源中的数据的复制，应用程序在运行时从缓存读写数据，在特定的时刻或事件会同步缓存和物理数据源的数据。二、Hibernate缓存原理是怎样的？ Hibernate缓存包括两大类：Hib
CentOs6 dalan_123 centos
首先su - 切换到root下面1、首先要先安装GCC GCC-C++ Openssl等以来模块：yum -y install make gcc gcc-c++ kernel-devel m4 ncurses-devel openssl-devel2、再安装ncurses模块yum -y install ncurses-develyum install ncurses-devel3、下载Erang
10款用 jquery 实现滚动条至页面底端自动加载数据效果 dcj3sjt126com JavaScript
无限滚动自动翻页可以说是web2.0时代的一项堪称伟大的技术，它让我们在浏览页面的时候只需要把滚动条拉到网页底部就能自动显示下一页的结果，改变了一直以来只能通过点击下一页来翻页这种常规做法。无限滚动自动翻页技术的鼻祖是微博的先驱：推特(twitter)，后来必应图片搜索、谷歌图片搜索、google reader、箱包批发网等纷纷抄袭了这一项技术，于是靠滚动浏览器滚动条
ImageButton去边框&Button或者ImageButton的背景透明 dcj3sjt126com imagebutton
在ImageButton中载入图片后，很多人会觉得有图片周围的白边会影响到美观，其实解决这个问题有两种方法一种方法是将ImageButton的背景改为所需要的图片。如：android:background="@drawable/XXX" 第二种方法就是将ImageButton背景改为透明，这个方法更常用在XML里； <ImageBut
JSP之c:foreach eksliang jsp forearch
原文出自：http://www.cnblogs.com/draem0507/archive/2012/09/24/2699745.html <c:forEach>标签用于通用数据循环，它有以下属性属性描述是否必须缺省值 items 进行循环的项目否无 begin 开始条件否 0 end 结束条件否集合中的最后一个项目 step 步长否 1
Android实现主动连接蓝牙耳机 gqdy365 android
在Android程序中可以实现自动扫描蓝牙、配对蓝牙、建立数据通道。蓝牙分不同类型，这篇文字只讨论如何与蓝牙耳机连接。大致可以分三步：一、扫描蓝牙设备： 1、注册并监听广播： BluetoothAdapter.ACTION_DISCOVERY_STARTED BluetoothDevice.ACTION_FOUND BluetoothAdapter.ACTION_DIS
android学习轨迹之四：org.json.JSONException: No value for hyz301 json
org.json.JSONException: No value for items 在JSON解析中会遇到一种错误，很常见的错误 06-21 12:19:08.714 2098-2127/com.jikexueyuan.secret I/System.out﹕ Result:{"status":1,"page":1,&
干货分享：从零开始学编程系列汇总 justjavac 编程
程序员总爱重新发明轮子，于是做了要给轮子汇总。从零开始写个编译器吧系列 (知乎专栏) 从零开始写一个简单的操作系统 (伯乐在线) 从零开始写JavaScript框架 (图灵社区) 从零开始写jQuery框架 (蓝色理想 ) 从零开始nodejs系列文章 (粉丝日志) 从零开始编写网络游戏
jquery-autocomplete 使用手册 macroli jquery Ajax 脚本
jquery-autocomplete学习一、用前必备官方网站：http://bassistance.de/jquery-plugins/jquery-plugin-autocomplete/ 当前版本：1.1 需要JQuery版本：1.2.6 二、使用 <script src="./jquery-1.3.2.js" type="text/ja
PLSQL-Developer或者Navicat等工具连接远程oracle数据库的详细配置以及数据库编码的修改超声波 oracle plsql
　　在服务器上将Oracle安装好之后接下来要做的就是通过本地机器来远程连接服务器端的oracle数据库，常用的客户端连接工具就是PLSQL-Developer或者Navicat这些工具了。刚开始也是各种报错，什么TNS:no listener;TNS:lost connection;TNS:target hosts...花了一天的时间终于让PLSQL-Developer和Navicat等这些客户
数据仓库数据模型之：极限存储--历史拉链表 superlxw1234 极限存储数据仓库数据模型拉链历史表
在数据仓库的数据模型设计过程中，经常会遇到这样的需求： 1. 数据量比较大; 2. 表中的部分字段会被update,如用户的地址，产品的描述信息，订单的状态等等; 3. 需要查看某一个时间点或者时间段的历史快照信息，比如，查看某一个订单在历史某一个时间点的状态，比如，查看某一个用户在过去某一段时间内，更新过几次等等; 4. 变化的比例和频率不是很大，比如，总共有10
10点睛Spring MVC4.1-全局异常处理 wiselyman spring mvc
10.1 全局异常处理使用@ControllerAdvice注解来实现全局异常处理; 使用@ControllerAdvice的属性缩小处理范围 10.2 演示演示控制器 package com.wisely.web; import org.springframework.stereotype.Controller; import org.spring