阿桥今天吃饱了吗

CS231n 课程作业 Assignment Two（二）全连接神经网络（0820）

Assignment Two（二）全连接神经网络

主要工作为：模块化设计、最优化更新的几种方法

一、模块设计

在A1中，实现了完全连接的两层神经网络。但功能上不是很模块化，因为损耗和梯度是在单个整体函数中计算的。这对于简单的两层网络是可管理的，但是随着转向更大的模型，这将变得不切实际。

理想情况下，期望使用更具模块化的设计来构建网络，以便隔离地实现不同的层类型，然后将它们组合在一起成为具有不同体系结构的模型。

在本练习中，使用更加模块化的方法来实现完全连接的网络。对于每一层，我们将实现前向和后向功能。

1）前向函数将接收输入，权重和其他参数，并将返回输出和缓存对象，该对象存储了向后传递所需的数据。

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
  cache = (x, w, z, out) # Values we need to compute gradients
  return out, cache

2）反向函数：向后传递将接收上游导数和缓存对象，并将返回关于输入和权重的梯度

def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  return dx, dw

1.1 Affine layer: forward

def affine_forward(x, w, b):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], x.size // x.shape[0]
    out = np.dot(x.reshape(N, D), w) + b    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = (x, w, b)
    return out, cache

代码分析：

计算映射层的前向计算函数
输入x的形状为（N，d_1，...，d_k），并且包含N的小批量示例，x [i] 表示为（d_1，...，d_k）
将输入整形为尺寸为D = d_1 * ... * d_k的向量
然后将其转换成尺寸为M的输出向量。

输入：
 -x：包含形状为（N，d_1，...，d_k）的输入数据的numpy数组
-w：为N的数组，形状为（D，M）
 -b：数量为（M）的偏置量

 返回一个元组：
 -输出：形状为（N，M）的输出
 -cache：（x，w，b）

Test_1.1 affine_forward
检测误差小于 1e-9

# Test the affine_forward function
num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

输出：

Testing affine_forward function: 
difference:  9.769849468192957e-10

1.2 Affine layer: backward

def affine_backward(dout, cache):
    x, w, b = cache
    dx, dw, db = None, None, None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], w.shape[0]
    dx = np.dot(dout, w.T).reshape(x.shape)
    dw = np.dot(x.reshape(N, D).T, dout)
    db = np.sum(dout, axis = 0)
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx, dw, db

代码分析：

计算映射层反向函数。

 输入：
 -dout：形状为（N，M）的上游导数
 -cache：元组：
   -x：形状为（N，d_1，... d_k）的输入数据
   -w：权重（形状，D，M）
   -b：M个偏置量

 返回一个元组：
 -dx、-dw、-db

Test_1.2 affine_backward

# Test the affine_backward function
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# The error should be around e-10 or less
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出：

Testing affine_backward function:
dx error:  5.399100368651805e-11
dw error:  9.904211865398145e-11
db error:  2.4122867568119087e-11

1.3 ReLU activation: forward

计算ReLU层的正向函数

def relu_forward(x):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    out = np.maximum(0, x)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = x
    return out, cache

Test_1.3 relu_forward

# Test the relu_forward function

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# Compare your output with ours. The error should be on the order of e-8
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))

输出：

Testing relu_forward function:
difference:  4.999999798022158e-08

1.4 ReLU activation: backward

def relu_backward(dout, cache):
    dx, x = None, cache
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    dx = dout * (x > 0)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx

代码分析：

输入：
 -dout
 -cache：输入x，形状与dout相同

 返回值：-dx

Test_1.4 relu_backword

np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# The error should be on the order of e-12
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))

输出：

Testing relu_backward function:
dx error:  3.2756349136310288e-12

1.5 “Sandwich” layers

一种映射层-ReLU连接好的简单结构

def affine_relu_forward(x, w, b):
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache

def affine_relu_backward(dout, cache):
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db

Test_1.5 affine_relu

from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

# Relative error should be around e-10 or less
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出：

Testing affine_relu_forward and affine_relu_backward:
dx error:  2.299579177309368e-11
dw error:  8.162011105764925e-11
db error:  7.826724021458994e-12

1.6 Loss layers: Softmax and SVM

svm_loss的计算

def svm_loss(x, y):
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx

代码分析：

 输入：
 -x：输入数据，形状为（N，C），其中x [i，j]是第 j 个分数第 i 个输入的类。
 -y：标签的向量，形状为（N），其中y [i] 是x [i] 的标签，0 <= y [i]

 
  softmax_loss的计算 
  def softmax_loss(x, y):
    shifted_logits = x - np.max(x, axis=1, keepdims=True)
    Z = np.sum(np.exp(shifted_logits), axis=1, keepdims=True)
    log_probs = shifted_logits - np.log(Z)
    probs = np.exp(log_probs)
    N = x.shape[0]
    loss = -np.sum(log_probs[np.arange(N), y]) / N
    dx = probs.copy()
    dx[np.arange(N), y] -= 1
    dx /= N
    return loss, dx
 
  Test_1.6 svm&softmax_loss 
  np.random.seed(231)
num_classes, num_inputs = 10, 50
x = 0.001 * np.random.randn(num_inputs, num_classes)
y = np.random.randint(num_classes, size=num_inputs)

dx_num = eval_numerical_gradient(lambda x: svm_loss(x, y)[0], x, verbose=False)
loss, dx = svm_loss(x, y)

# Test svm_loss function. Loss should be around 9 and dx error should be around the order of e-9
print('Testing svm_loss:')
print('loss: ', loss)
print('dx error: ', rel_error(dx_num, dx))

dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)
loss, dx = softmax_loss(x, y)

# Test softmax_loss function. Loss should be close to 2.3 and dx error should be around e-8
print('\nTesting softmax_loss:')
print('loss: ', loss)
print('dx error: ', rel_error(dx_num, dx))
 
  输出： 
  Testing svm_loss:
loss:  8.999602749096233
dx error:  1.4021566006651672e-09

Testing softmax_loss:
loss:  2.302545844500738
dx error:  9.384673161989355e-09
 
  二、集成化 
  2.1 Two-layer network 
  上一个作业中，在单个整体类中实现了两层神经网络。 以上已经实现了必要层的模块化版本，现在将各模块集成到一起重新实现两层网络。
 fc_net.py中的TwoLayerNet类 
  class TwoLayerNet(object):
#2.1.1
    def __init__(
        self,
        input_dim=3 * 32 * 32,
        hidden_dim=100,
        num_classes=10,
        weight_scale=1e-3,
        reg=0.0,
    ):
#2.1.2
        self.params = {}
        self.reg = reg
#2.1.3
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        self.params['W1'] = np.random.randn(input_dim, hidden_dim) * weight_scale
        self.params['b1'] = np.zeros(hidden_dim)
        self.params['W2'] = np.random.randn(hidden_dim, num_classes) * weight_scale
        self.params['b2'] = np.zeros(num_classes)        
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    def loss(self, X, y=None):
#2.1.4
        scores = None
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        hidden_layer, first_cache = affine_relu_forward(X, W1, b1)
        scores, second_cache = affine_forward(hidden_layer, W2, b2)        
        pass	#两层的前向传递
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        # If y is None then we are in test mode so just return scores
        if y is None:
            return scores

        loss, grads = 0, {}
#2.1.5
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        loss, dscores = softmax_loss(scores, y)
        dhidden, grads['W2'], grads['b2'] = affine_backward(dscores, second_cache)
        dX, grads['W1'], grads['b1'] = affine_relu_backward(dhidden, first_cache)

        loss += 0.5 * self.reg * (np.sum(W1 * W1) + np.sum(W2 * W2))
        grads['W1'] += self.reg * W1
        grads['W2'] += self.reg * W2        
        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return loss, grads
 
  代码分析 2.1.1 
  具有ReLU非线性和线性关系的两层全连接神经网络，使用模块化层设计的softmax损失。 
架构应该是affine - relu - affine - softmax
将与负责运行的单独的规划求解对象交互优化。
模型的可学习参数存储在字典中
将参数名称映射到numpy数组的self.params
 
  代码分析 2.1.2 
  初始化一个新的网络。
     输入：
     -input_dim：提供输入大小
     -hidden_dim：给出隐藏层大小
     -num_classes：给出要分类的类数
     -weight_scale：出随机的标准偏差，权重初始化
     -reg：标量给出L2正则化强度
 
  代码分析 2.1.3 
  初始化两层网络的权重和偏差
权重使用从以0.0为中心的高斯初始化
标准偏差等于weight_scale，偏差初始化为零。
所有权重和偏差应存储在字典self.params
 
  代码分析 2.1.4 
       计算损耗和梯度。

     输入：
     -X：形状为（N，d_1，...，d_k）的输入数据数组
     -y：形状为（N）的标签阵列

     返回值：
     如果y为None，则运行模型的测试时正向传递并返回：分数，形状为（N，C）数组，其中scores [i，c]是X [i]和类别c的分类分数。
     如果y不为None，则运行训练时向前和向后传递，返回一个元组：
     		-损失：给出损失的标量值
     		-grads：与self.params具有相同键的字典，映射参数相对于这些参数的损耗梯度的名称。
 
  代码分析 2.1.5 
  对两层网络实施反向传递。 将损失存储在损失变量中，并将梯度存储在grads词典中。
使用softmax计算数据loss，并确保grads [k]保持self.params [k]的梯度
L2正则化强度系数0.5
 
  Test_2.1 Two-layer network 
  np.random.seed(231)
N, D, H, C = 3, 5, 50, 7
X = np.random.randn(N, D)
y = np.random.randint(C, size=N)

std = 1e-3
model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)

print('Testing initialization ... ')
W1_std = abs(model.params['W1'].std() - std)
b1 = model.params['b1']
W2_std = abs(model.params['W2'].std() - std)
b2 = model.params['b2']
assert W1_std < std / 10, 'First layer weights do not seem right'
assert np.all(b1 == 0), 'First layer biases do not seem right'
assert W2_std < std / 10, 'Second layer weights do not seem right'
assert np.all(b2 == 0), 'Second layer biases do not seem right'

print('Testing test-time forward pass ... ')
model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)
model.params['b1'] = np.linspace(-0.1, 0.9, num=H)
model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)
model.params['b2'] = np.linspace(-0.9, 0.1, num=C)
X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T
scores = model.loss(X)
correct_scores = np.asarray(
  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],
   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],
   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])
scores_diff = np.abs(scores - correct_scores).sum()
assert scores_diff < 1e-6, 'Problem with test-time forward pass'

print('Testing training loss (no regularization)')
y = np.asarray([0, 5, 1])
loss, grads = model.loss(X, y)
correct_loss = 3.4702243556
assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'

model.reg = 1.0
loss, grads = model.loss(X, y)
correct_loss = 26.5948426952
assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'

# Errors should be around e-7 or less
for reg in [0.0, 0.7]:
  print('Running numeric gradient check with reg = ', reg)
  model.reg = reg
  loss, grads = model.loss(X, y)

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
 
  输出： 
  Testing initialization ... 
Testing test-time forward pass ... 
Testing training loss (no regularization)

Running numeric gradient check with reg =  0.0
W1 relative error: 1.83e-08
W2 relative error: 3.12e-10
b1 relative error: 9.83e-09
b2 relative error: 4.33e-10

Running numeric gradient check with reg =  0.7
W1 relative error: 2.53e-07
W2 relative error: 2.85e-08
b1 relative error: 1.56e-08
b2 relative error: 7.76e-10
 
  2.2 Solver 
  上一个任务中，训练模型的逻辑与模型本身耦合。 按照更模块化的设计，将训练模型的逻辑分为一个单独的类solver.py，使用Solver实例训练TwoLayerNet 
  阅读solver.py（链接待添加） 
  model = TwoLayerNet()
solver = None
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
solver = Solver(model, data,
                    update_rule='sgd',
                    optim_config={
                    'learning_rate': 1e-3,
                    },
                    lr_decay=0.90,
                    num_epochs=10, batch_size=100,
                    print_every=100)
solver.train()
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
 
  输出： 
  (Iteration 1 / 4900) loss: 2.303606
(Epoch 0 / 10) train acc: 0.154000; val_acc: 0.167000
(Iteration 101 / 4900) loss: 1.730715
(Iteration 201 / 4900) loss: 1.819376
(Iteration 301 / 4900) loss: 1.556955
(Iteration 401 / 4900) loss: 1.642453
(Epoch 1 / 10) train acc: 0.418000; val_acc: 0.441000
(Iteration 501 / 4900) loss: 1.670489
(Iteration 601 / 4900) loss: 1.580205
(Iteration 701 / 4900) loss: 1.634635
(Iteration 801 / 4900) loss: 1.317712
(Iteration 901 / 4900) loss: 1.369236
(Epoch 2 / 10) train acc: 0.509000; val_acc: 0.454000
(Iteration 1001 / 4900) loss: 1.501024
(Iteration 1101 / 4900) loss: 1.399012
(Iteration 1201 / 4900) loss: 1.605955
(Iteration 1301 / 4900) loss: 1.385793
(Iteration 1401 / 4900) loss: 1.489241
(Epoch 3 / 10) train acc: 0.514000; val_acc: 0.496000
(Iteration 1501 / 4900) loss: 1.129046
(Iteration 1601 / 4900) loss: 1.399407
(Iteration 1701 / 4900) loss: 1.409527
(Iteration 1801 / 4900) loss: 1.406504
(Iteration 1901 / 4900) loss: 1.246673
(Epoch 4 / 10) train acc: 0.537000; val_acc: 0.498000
(Iteration 2001 / 4900) loss: 1.310008
(Iteration 2101 / 4900) loss: 1.120653
(Iteration 2201 / 4900) loss: 1.402432
(Iteration 2301 / 4900) loss: 1.246794
(Iteration 2401 / 4900) loss: 1.243183
(Epoch 5 / 10) train acc: 0.533000; val_acc: 0.504000
(Iteration 2501 / 4900) loss: 1.216694
(Iteration 2601 / 4900) loss: 1.193732
(Iteration 2701 / 4900) loss: 1.268923
(Iteration 2801 / 4900) loss: 1.053803
(Iteration 2901 / 4900) loss: 1.239267
(Epoch 6 / 10) train acc: 0.551000; val_acc: 0.524000
(Iteration 3001 / 4900) loss: 1.149079
(Iteration 3101 / 4900) loss: 1.116423
(Iteration 3201 / 4900) loss: 1.179117
(Iteration 3301 / 4900) loss: 1.257789
(Iteration 3401 / 4900) loss: 1.270317
(Epoch 7 / 10) train acc: 0.572000; val_acc: 0.503000
(Iteration 3501 / 4900) loss: 0.959642
(Iteration 3601 / 4900) loss: 1.263943
(Iteration 3701 / 4900) loss: 1.123168
(Iteration 3801 / 4900) loss: 1.154868
(Iteration 3901 / 4900) loss: 1.151303
(Epoch 8 / 10) train acc: 0.607000; val_acc: 0.512000
(Iteration 4001 / 4900) loss: 1.146022
(Iteration 4101 / 4900) loss: 0.979731
(Iteration 4201 / 4900) loss: 1.056077
(Iteration 4301 / 4900) loss: 1.026696
(Iteration 4401 / 4900) loss: 0.992628
(Epoch 9 / 10) train acc: 0.624000; val_acc: 0.537000
(Iteration 4501 / 4900) loss: 1.074059
(Iteration 4601 / 4900) loss: 1.246842
(Iteration 4701 / 4900) loss: 1.368044
(Iteration 4801 / 4900) loss: 1.125482
(Epoch 10 / 10) train acc: 0.615000; val_acc: 0.537000
 
  可视化表示（代码略）
 
 很明显，网络的表现符合50%正确率的要求，并能够从epoch3开始稳定保持 
  2.3 Multilayer network 
  实现一个具有任意数量的隐藏层的全连接网络。 
  class FullyConnectedNet(object):
#2.3.1
    def __init__(
        self,
        hidden_dims,
        input_dim=3 * 32 * 32,
        num_classes=10,
        dropout=1,
        use_batchnorm=False,
        reg=0.0,
        weight_scale=1e-2,
        dtype=np.float32,
        seed=None,
    ):
#use_batchnorm=False
#2.3.2
        self.use_batchnorm = use_batchnorm
        self.use_dropout = dropout != 1
        self.reg = reg
        self.num_layers = 1 + len(hidden_dims)
        self.dtype = dtype
        self.params = {}
#2.3.3
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        # 初始化参数
        for i in range(len(hidden_dims)):
            D = hidden_dims[i]
            self.params['W'+str(i+1)] = weight_scale*np.random.randn(input_dim,D)
            self.params['b'+str(i+1)] = np.zeros(D)
            # 使用BN时，增加两个参数，beta初始化为0（均值），gamma初始化为1（方差）
            if use_batchnorm:
                self.params['gamma'+str(i+1)] = np.ones(D)
                self.params['beta'+str(i+1)] = np.zeros(D)
            input_dim = D
        # 输出层单独写
        self.params['W'+str(self.num_layers)] = weight_scale*np.random.randn(hidden_dims[-1],num_classes)
        self.params['b'+str(self.num_layers)] = np.zeros((num_classes))
#use_batchnorm  
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

#将dropout_param字典传递给每层，以便该层知道dropout的概率和模式
        self.dropout_param = {}
        if self.use_dropout:
            self.dropout_param = {"mode": "train", "p": dropout}
            if seed is not None:
                self.dropout_param["seed"] = seed
#2.3.4
        self.bn_params = []
#        if self.normalization == "batchnorm":
#            self.bn_params = [{"mode": "train"} for i in range(self.num_layers - 1)]
#        if self.normalization == "layernorm":
#            self.bn_params = [{} for i in range(self.num_layers - 1)]
        if self.use_batchnorm:
            self.bn_params = [{'mode': 'train'} for i in xrange(self.num_layers - 1)]
        # Cast all parameters to the correct datatype
        for k, v in self.params.items():
            self.params[k] = v.astype(dtype)

    def loss(self, X, y=None):
 #       Input / output: Same as TwoLayerNet above.
        X = X.astype(self.dtype)
        mode = "test" if y is None else "train"
#为batchnorm参数和dropout参数设置训练/测试模式
#        if self.use_dropout:
#            self.dropout_param["mode"] = mode
        if self.dropout_param is not None:
            self.dropout_param['mode'] = mode         
        if self.use_batchnorm:
#        if self.normalization == "batchnorm":
            for bn_param in self.bn_params:
                bn_param["mode"] = mode
        scores = None
#2.3.5
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
#        hidden_layers, caches = list(range(self.num_layers + 1)), list(range(self.num_layers))
        '''创建整数列表（导致“TypeError: ‘range’ object does not support item assignment”）有时你想要得到一个有序的整数列表，所以range() 看上去是生成此列表的不错方式。然而，你需要记住range() 返回的是“range object”，而不是实际的list 值。'''
        cache = {}
        cache_dropout= {}
        input_X = X
        for i in range(self.num_layers-1):            
            W,b = self.params['W'+str(i+1)],self.params['b'+str(i+1)]
            # 如果要用BN，一般是在ReLU层前用
            if self.use_batchnorm:
                gamma,beta = self.params['gamma'+str(i+1)],self.params['beta'+str(i+1)]
                bn_param = self.bn_params[i]
                input_X,cache[i] = affine_bn_relu_forward(input_X,W,b,gamma,beta,bn_param)
            else:
                input_X,cache[i] = affine_relu_forward(input_X,W,b)
            # 如果要用dropout，一般在ReLU后用
            if self.use_dropout:
                input_X,cache_dropout[i] = dropout_forward(input_X,self.dropout_param)
        # 输出层单独写
        W,b = self.params['W'+str(self.num_layers)],self.params['b'+str(self.num_layers)]
        scores,cache[self.num_layers-1] = affine_forward(input_X,W,b)
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        # If test mode return early
        if mode == "test":
            return scores

        loss, grads = 0.0, {}
     #2.3.6
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        # 输出层单独计算
        loss,dscores = softmax_loss(scores,y)
        W = self.params['W'+str(self.num_layers)]
        loss += self.reg*0.5*np.sum(W**2)        
        dout_relu,dW,db = affine_backward(dscores,cache[self.num_layers-1])
        grads['W'+str(self.num_layers)] = dW+self.reg*W
        grads['b'+str(self.num_layers)] = db        
        
        for i in range(self.num_layers-1,0,-1):
            W = self.params['W'+str(i)]
            loss += self.reg*0.5*np.sum(W**2)
            # dropout层在ReLU层后面，所以先计算它的反向求导
            if self.use_dropout:
                dout_relu = dropout_backward(dout_relu,cache_dropout[i-1])
            # 使用BN时不需要对gamma和beta进行正则化
            if self.use_batchnorm:
                dout_relu,dW,db,dgamma,dbeta = affine_bn_relu_backward(dout_relu,cache[i-1])
                grads['gamma'+str(i)] = dgamma
                grads['beta'+str(i)] = dbeta
            else:
                dout_relu,dW,db = affine_relu_backward(dout_relu,cache[i-1])            
            grads['W'+str(i)] = dW+self.reg*W
            grads['b'+str(i)] = db
   
        pass
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        return loss, grads

 
  代码分析 2.3.1 
  具有任意数量隐藏层的全连接神经网络
ReLU、softmax损失函数、批处理/图层归一化、Dropout
架构为 {affine - [batch/layer norm] - relu - [dropout]} x (L - 1) - affine - softmax
学习的参数存储在self.params字典，使用Solver类学习。
 
  代码分析 2.3.2 
      初始化新的FullyConnectedNet。

    输入：
    -hidden_dims：给出每个隐藏层大小的整数列表。
    -input_dim：给出输入大小的整数。
    -num_classes：给出要分类的类数。
    -dropout：介于0和1之间，可提供dropout强度。如果dropout = 1，则网络根本不应该使用dropout。
    -normalization：标准化类型。有效值是“ batchnorm”，“ layernorm”或“无”（不进行标准化）（默认值）。
    -reg：L2正则化强度。
    -weight_scale：权重随机初始化的标准偏差的标量。
    -dtype：一个numpy数据类型对象；所有计算将使用此数据类型执行。使用float64进行数字梯度检查。
    -seed：如果不是“无”，则将此随机种子传递到辍学层。这将使辍学层具有不确定性，因此我们可以对模型进行梯度检查。
 
  代码分析 2.3.3 
  初始化网络参数，将所有值存储在self.params字典中。 将第一层的权重和偏差存储在W1和b1中； 对于第二层，请使用W2和b2等。权重应从以0为中心的正态分布初始化，标准偏差等于weight_scale。 偏差应初始化为零。
使用批量归一化时，将第一层的缩放和平移参数存储在gamma1和beta1中； 对于第二层，使用gamma2和beta2。Scale参数应初始化为1，shift参数应初始化为零。 
 
  代码分析 2.3.4 
  使用批处理标准化，需要跟踪均值和方差，因此向每个batchnorm层传递一个特殊的bn_param对象。 
将self.bn_params [0]传递到第一批标准化层的正向传递，将self.bn_params [1]传递给第二批标准化层的正向传递，依此类推。
 
  代码分析 2.3.5 
  全连接网络实现前向传递，计算X的类分数并将其存储在scores变量中
使用dropout时，将self.dropout_param传递给每个dropout
使用BN时，您需要将self.bn_params [0]传递给第一个BN层的正向传递，将self.bn_params [1]传递给第二个BN层r的正向传递
 
  代码分析 2.3.6 
  全连接网络实现反向传递。 将损失存储在损失变量中，并将梯度存储在grads词典中。 使用softmax计算loss，并确保grads [k]保持self.params [k]的梯度
使用批处理/图层归一化时，不需要正则化缩放和平移参数。
 
  Test_2.3.1
 loss和梯度检查 
  np.random.seed(231)
N, D, H1, H2, C = 2, 15, 20, 30, 10
X = np.random.randn(N, D)
y = np.random.randint(C, size=(N,))

for reg in [0, 3.14]:
  print('Running check with reg = ', reg)
  model = FullyConnectedNet([H1, H2], input_dim=D, num_classes=C,
                            reg=reg, weight_scale=5e-2, dtype=np.float64)

  loss, grads = model.loss(X, y)
  print('Initial loss: ', loss)
  
  # Most of the errors should be on the order of e-7 or smaller.   
  # NOTE: It is fine however to see an error for W2 on the order of e-5
  # for the check when reg = 0.0
  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False, h=1e-5)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))
 
  输出： 
  Running check with reg =  0
Initial loss:  2.638793811090764
W1 relative error: 1.57e-08
W2 relative error: 4.68e-08
W3 relative error: 1.45e-06
b1 relative error: 3.94e-10
b2 relative error: 1.23e-10
b3 relative error: 3.68e-10
Running check with reg =  3.14
Initial loss:  451.3464813074225
W1 relative error: 3.12e-07
W2 relative error: 1.39e-06
W3 relative error: 4.72e-08
b1 relative error: 2.08e-08
b2 relative error: 3.45e-08
b3 relative error: 9.95e-08
 
  Test_2.3.2
 拟合包含50张图像的小型数据集。 每个隐藏层包含100个单元的三层网络。 
  # TODO: Use a three-layer Net to overfit 50 training examples by 
# tweaking just the learning rate and initialization scale.

num_train = 50
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

weight_scale = 1e-2   # Experiment with this!
learning_rate = 1e-4  # Experiment with this!
model = FullyConnectedNet([100, 100],
              weight_scale=weight_scale, dtype=np.float64)
solver = Solver(model, small_data,
                print_every=10, num_epochs=20, batch_size=25,
                update_rule='sgd',
                optim_config={
                  'learning_rate': learning_rate,
                }
         )
solver.train()

plt.plot(solver.loss_history, 'o')
plt.title('Training loss history')
plt.xlabel('Iteration')
plt.ylabel('Training loss')
plt.show()
 
  输出： 
  (Iteration 1 / 40) loss: 27.948607
(Epoch 0 / 20) train acc: 0.300000; val_acc: 0.160000
(Epoch 1 / 20) train acc: 0.420000; val_acc: 0.145000
(Epoch 2 / 20) train acc: 0.700000; val_acc: 0.143000
(Epoch 3 / 20) train acc: 0.760000; val_acc: 0.168000
(Epoch 4 / 20) train acc: 0.880000; val_acc: 0.158000
(Epoch 5 / 20) train acc: 0.920000; val_acc: 0.164000
(Iteration 11 / 40) loss: 1.056135
(Epoch 6 / 20) train acc: 0.900000; val_acc: 0.145000
(Epoch 7 / 20) train acc: 0.980000; val_acc: 0.166000
(Epoch 8 / 20) train acc: 0.980000; val_acc: 0.162000
(Epoch 9 / 20) train acc: 1.000000; val_acc: 0.159000
(Epoch 10 / 20) train acc: 1.000000; val_acc: 0.159000
(Iteration 21 / 40) loss: 0.003744
(Epoch 11 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 12 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 13 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 14 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 15 / 20) train acc: 1.000000; val_acc: 0.157000
(Iteration 31 / 40) loss: 0.002232
(Epoch 16 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 17 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 18 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 19 / 20) train acc: 1.000000; val_acc: 0.157000
(Epoch 20 / 20) train acc: 1.000000; val_acc: 0.157000
 
   
  Test_2.3.3
 使用每层100个单元的五层网络来拟合50个训练示例。
 调整学习率和权重初始化量表 
  # TODO: Use a five-layer Net to overfit 50 training examples by 
# tweaking just the learning rate and initialization scale.

num_train = 50
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

learning_rate = 1e-3  # Experiment with this!
weight_scale = 3e-2   # Experiment with this!
model = FullyConnectedNet([100, 100, 100, 100],
                weight_scale=weight_scale, dtype=np.float64)
solver = Solver(model, small_data,
                print_every=10, num_epochs=20, batch_size=25,
                update_rule='sgd',
                optim_config={
                  'learning_rate': learning_rate,
                }
         )
solver.train()

plt.plot(solver.loss_history, 'o')
plt.title('Training loss history')
plt.xlabel('Iteration')
plt.ylabel('Training loss')
plt.show()
 
  输出： 
  (Iteration 1 / 40) loss: 7.981513
(Epoch 0 / 20) train acc: 0.280000; val_acc: 0.133000
(Epoch 1 / 20) train acc: 0.360000; val_acc: 0.111000
(Epoch 2 / 20) train acc: 0.660000; val_acc: 0.158000
(Epoch 3 / 20) train acc: 0.800000; val_acc: 0.148000
(Epoch 4 / 20) train acc: 0.940000; val_acc: 0.146000
(Epoch 5 / 20) train acc: 0.960000; val_acc: 0.138000
(Iteration 11 / 40) loss: 0.189626
(Epoch 6 / 20) train acc: 0.980000; val_acc: 0.138000
(Epoch 7 / 20) train acc: 1.000000; val_acc: 0.141000
(Epoch 8 / 20) train acc: 1.000000; val_acc: 0.140000
(Epoch 9 / 20) train acc: 1.000000; val_acc: 0.148000
(Epoch 10 / 20) train acc: 1.000000; val_acc: 0.149000
(Iteration 21 / 40) loss: 0.088653
(Epoch 11 / 20) train acc: 1.000000; val_acc: 0.146000
(Epoch 12 / 20) train acc: 1.000000; val_acc: 0.150000
(Epoch 13 / 20) train acc: 1.000000; val_acc: 0.150000
(Epoch 14 / 20) train acc: 1.000000; val_acc: 0.153000
(Epoch 15 / 20) train acc: 1.000000; val_acc: 0.152000
(Iteration 31 / 40) loss: 0.033492
(Epoch 16 / 20) train acc: 1.000000; val_acc: 0.150000
(Epoch 17 / 20) train acc: 1.000000; val_acc: 0.151000
(Epoch 18 / 20) train acc: 1.000000; val_acc: 0.152000
(Epoch 19 / 20) train acc: 1.000000; val_acc: 0.153000
(Epoch 20 / 20) train acc: 1.000000; val_acc: 0.153000
 
   
  三、最优化 
  实现其他几个更新规则（最优化结构），并且和单纯的SGD做一个比较
 原理部分详见：最优化方案 
  3.1 SGD+Momentum 
  首先，在optim.py中完善sgd_momentum 
  def sgd_momentum(w, dw, config=None):
    if config is None:
        config = {}
    config.setdefault("learning_rate", 1e-2)
    config.setdefault("momentum", 0.9)
    v = config.get("velocity", np.zeros_like(w))
    next_w = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    v = config['momentum'] * v - config['learning_rate'] * dw
    next_w = w + v
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    config["velocity"] = v
    return next_w, config
 
  代码分析： 
   -learning_rate：学习率。
 -动量：0到1之间给出了动量值
 -速度：与w和dw形状相同的numpy数组，用于存储梯度的移动平均值。
实现动量更新公式。 将更新的值存储在next_w变量
 
  Test_3.1 sgd_momentum 
  from cs231n.optim import sgd_momentum

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
v = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-3, 'velocity': v}
next_w, _ = sgd_momentum(w, dw, config=config)

expected_next_w = np.asarray([
  [ 0.1406,      0.20738947,  0.27417895,  0.34096842,  0.40775789],
  [ 0.47454737,  0.54133684,  0.60812632,  0.67491579,  0.74170526],
  [ 0.80849474,  0.87528421,  0.94207368,  1.00886316,  1.07565263],
  [ 1.14244211,  1.20923158,  1.27602105,  1.34281053,  1.4096    ]])
expected_velocity = np.asarray([
  [ 0.5406,      0.55475789,  0.56891579, 0.58307368,  0.59723158],
  [ 0.61138947,  0.62554737,  0.63970526,  0.65386316,  0.66802105],
  [ 0.68217895,  0.69633684,  0.71049474,  0.72465263,  0.73881053],
  [ 0.75296842,  0.76712632,  0.78128421,  0.79544211,  0.8096    ]])

# Should see relative errors around e-8 or less
print('next_w error: ', rel_error(next_w, expected_next_w))
print('velocity error: ', rel_error(expected_velocity, config['velocity']))
 
  输出： 
  next_w error:  8.882347033505819e-09
velocity error:  4.269287743278663e-09
 
  SGD与SGD+Momentum的对比略，直观看出SGD+Momentum的优势，收敛更快，效果更好 
  3.2 RMSProp 
  def rmsprop(w, dw, config=None):
    if config is None:
        config = {}
    config.setdefault("learning_rate", 1e-2)
    config.setdefault("decay_rate", 0.99)
    config.setdefault("epsilon", 1e-8)
    config.setdefault("cache", np.zeros_like(w))
    next_w = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    config['cache'] = config['decay_rate'] * config['cache'] + (1 - config['decay_rate']) * dw ** 2
    next_w = w - config['learning_rate'] * dw / (np.sqrt(config['cache']) + config['epsilon'])    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return next_w, config
 
  代码分析： 
   RMSProp，使用平方移动平均值

 配置格式：
 -learning_rate：学习率。
 -delay_rate：0到1之间，给出平方的衰减率
 -epsilon：小标量，用于平滑以避免除以零。
 -cache
 
  Test_3.2 rmsprop 
  # Test RMSProp implementation
from cs231n.optim import rmsprop

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
cache = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-2, 'cache': cache}
next_w, _ = rmsprop(w, dw, config=config)

expected_next_w = np.asarray([
  [-0.39223849, -0.34037513, -0.28849239, -0.23659121, -0.18467247],
  [-0.132737,   -0.08078555, -0.02881884,  0.02316247,  0.07515774],
  [ 0.12716641,  0.17918792,  0.23122175,  0.28326742,  0.33532447],
  [ 0.38739248,  0.43947102,  0.49155973,  0.54365823,  0.59576619]])
expected_cache = np.asarray([
  [ 0.5976,      0.6126277,   0.6277108,   0.64284931,  0.65804321],
  [ 0.67329252,  0.68859723,  0.70395734,  0.71937285,  0.73484377],
  [ 0.75037008,  0.7659518,   0.78158892,  0.79728144,  0.81302936],
  [ 0.82883269,  0.84469141,  0.86060554,  0.87657507,  0.8926    ]])

# You should see relative errors around e-7 or less
print('next_w error: ', rel_error(expected_next_w, next_w))
print('cache error: ', rel_error(expected_cache, config['cache']))
 
  输出： 
  next_w error:  9.524687511038133e-08
cache error:  2.6477955807156126e-09
 
  3.3 Adam 
  def adam(w, dw, config=None):
    if config is None:
        config = {}
    config.setdefault("learning_rate", 1e-3)
    config.setdefault("beta1", 0.9)
    config.setdefault("beta2", 0.999)
    config.setdefault("epsilon", 1e-8)
    config.setdefault("m", np.zeros_like(w))
    config.setdefault("v", np.zeros_like(w))
    config.setdefault("t", 1)

    next_w = None  
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    bt1 = config['beta1']
    bt2 = config['beta2']
    t = config['t']
    lr = config['learning_rate']    
    
    config['m'] = bt1*config['m'] + (1-bt1)*dw
    mt = config['m']/(1-bt1**t)
    config['v'] = bt2*config['v'] + (1-bt2)*dw**2
    vt = config['v']/(1-bt2**t)
    next_w = w - lr*mt/(np.sqrt(vt)+config['epsilon'])
    config['t'] += 1 
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return next_w, config
 
  代码分析： 
   Adam规则合并了渐变及其平方的移动平均值以及偏差校正项。

 配置格式：
 -learning_rate：学习率。
 -beta1：梯度第一矩移动平均值的衰减率。
 -beta2：梯度第二矩的移动平均值的衰减率。
 -epsilon：小标量，用于平滑以避免除以零。
 -m：梯度的移动平均值
 -v：平方梯度的移动平均值
 -t：迭代次数
 
  注意，在这里一定处理好 t=0的情况，实时更新m、v、t 
  Test_3.3 Adam 
  # Test Adam implementation
from cs231n.optim import adam

N, D = 4, 5
w = np.linspace(-0.4, 0.6, num=N*D).reshape(N, D)
dw = np.linspace(-0.6, 0.4, num=N*D).reshape(N, D)
m = np.linspace(0.6, 0.9, num=N*D).reshape(N, D)
v = np.linspace(0.7, 0.5, num=N*D).reshape(N, D)

config = {'learning_rate': 1e-2, 'm': m, 'v': v, 't': 5}
next_w, _ = adam(w, dw, config=config)

expected_next_w = np.asarray([
  [-0.40094747, -0.34836187, -0.29577703, -0.24319299, -0.19060977],
  [-0.1380274,  -0.08544591, -0.03286534,  0.01971428,  0.0722929],
  [ 0.1248705,   0.17744702,  0.23002243,  0.28259667,  0.33516969],
  [ 0.38774145,  0.44031188,  0.49288093,  0.54544852,  0.59801459]])
expected_v = np.asarray([
  [ 0.69966,     0.68908382,  0.67851319,  0.66794809,  0.65738853,],
  [ 0.64683452,  0.63628604,  0.6257431,   0.61520571,  0.60467385,],
  [ 0.59414753,  0.58362676,  0.57311152,  0.56260183,  0.55209767,],
  [ 0.54159906,  0.53110598,  0.52061845,  0.51013645,  0.49966,   ]])
expected_m = np.asarray([
  [ 0.48,        0.49947368,  0.51894737,  0.53842105,  0.55789474],
  [ 0.57736842,  0.59684211,  0.61631579,  0.63578947,  0.65526316],
  [ 0.67473684,  0.69421053,  0.71368421,  0.73315789,  0.75263158],
  [ 0.77210526,  0.79157895,  0.81105263,  0.83052632,  0.85      ]])

# You should see relative errors around e-7 or less
print('next_w error: ', rel_error(expected_next_w, next_w))
print('v error: ', rel_error(expected_v, config['v']))
print('m error: ', rel_error(expected_m, config['m']))
 
  输出： 
  next_w error:  0.0015218451757856217
v error:  4.208314038113071e-09
m error:  4.214963193114416e-09
 
  3.4 四种优化方式的比较 
  learning_rates = {'rmsprop': 1e-4, 'adam': 1e-3}
for update_rule in ['adam', 'rmsprop']:
  print('running with ', update_rule)
  model = FullyConnectedNet([100, 100, 100, 100, 100], weight_scale=5e-2)

  solver = Solver(model, small_data,
                  num_epochs=5, batch_size=100,
                  update_rule=update_rule,
                  optim_config={
                    'learning_rate': learning_rates[update_rule]
                  },
                  verbose=True)
  solvers[update_rule] = solver
  solver.train()
  print()

plt.subplot(3, 1, 1)
plt.title('Training loss')
plt.xlabel('Iteration')

plt.subplot(3, 1, 2)
plt.title('Training accuracy')
plt.xlabel('Epoch')

plt.subplot(3, 1, 3)
plt.title('Validation accuracy')
plt.xlabel('Epoch')

for update_rule, solver in list(solvers.items()):
  plt.subplot(3, 1, 1)
  plt.plot(solver.loss_history, 'o', label=update_rule)
  
  plt.subplot(3, 1, 2)
  plt.plot(solver.train_acc_history, '-o', label=update_rule)

  plt.subplot(3, 1, 3)
  plt.plot(solver.val_acc_history, '-o', label=update_rule)
  
for i in [1, 2, 3]:
  plt.subplot(3, 1, i)
  plt.legend(loc='upper center', ncol=4)
plt.gcf().set_size_inches(15, 15)
plt.show()
 
  输出： 
  
 具体数值略，但很明显adam表现不错（千万千万看好 t ） 
  四、训练和预测 
  4.1 Train 
  训练一个全连接模型（我这里设计了一个简单的5层模型，没有使用BN和Dropout）
 调试出最优参数后，就把循环注释掉了 
  best_model = None
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
best_val = -1
best_solver = None

from copy import deepcopy
learning_rates = 3.1e-4
weight_scales = [2.5e-2]
for ws in weight_scales:
#    for lr in learning_rates:
        model = FullyConnectedNet([500, 400, 300, 200, 100], weight_scale=ws)
        print('weight_scales:%e learning_rate:%e'%(ws,lr))
        
        solver = Solver(model, data,
                  print_every=500,num_epochs=10, batch_size=100,
                  update_rule='adam',
                  optim_config={
                    'learning_rate': lr
                  },
                  verbose=True)
        
        solver.train()
        if solver.best_val_acc>best_val:
            best_val = solver.best_val_acc
            best_solver = deepcopy(solver)
            best_model = model
            print(lr,ws)
        print()
# 得到的最好超参数weight_scales:2.5e-02 learning_rate:3.1e-04
pass
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
 
  输出： 
  weight_scales:2.500000e-02 learning_rate:3.100000e-04
(Iteration 1 / 4900) loss: 9.108568
(Epoch 0 / 10) train acc: 0.108000; val_acc: 0.114000
(Epoch 1 / 10) train acc: 0.460000; val_acc: 0.440000
(Iteration 501 / 4900) loss: 1.504805
(Epoch 2 / 10) train acc: 0.511000; val_acc: 0.457000
(Iteration 1001 / 4900) loss: 1.436040
(Epoch 3 / 10) train acc: 0.550000; val_acc: 0.469000
(Iteration 1501 / 4900) loss: 1.325850
(Epoch 4 / 10) train acc: 0.557000; val_acc: 0.488000
(Iteration 2001 / 4900) loss: 1.141446
(Epoch 5 / 10) train acc: 0.574000; val_acc: 0.489000
(Iteration 2501 / 4900) loss: 1.084415
(Epoch 6 / 10) train acc: 0.595000; val_acc: 0.505000
(Iteration 3001 / 4900) loss: 1.319046
(Epoch 7 / 10) train acc: 0.637000; val_acc: 0.516000
(Iteration 3501 / 4900) loss: 1.013175
(Epoch 8 / 10) train acc: 0.697000; val_acc: 0.516000
(Iteration 4001 / 4900) loss: 1.133661
(Epoch 9 / 10) train acc: 0.654000; val_acc: 0.537000
(Iteration 4501 / 4900) loss: 1.045279
(Epoch 10 / 10) train acc: 0.719000; val_acc: 0.522000
0.00031 0.025
 
  4.2 Test_final 
  y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())
print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())
 
  输出： 
  Validation set accuracy:  0.537
Test set accuracy:  0.518
 
  虽然没有达到55%，但是已经是个不错的成绩 
  五、问题 
  Inline Question 1: 
  提问：We’ve only asked you to implement ReLU, but there are a number of different activation functions that one could use in neural networks, each with its pros and cons. In particular, an issue commonly seen with activation functions is getting zero (or close to zero) gradient flow during backpropagation. Which of the following activation functions have this problem? If you consider these functions in the one dimensional case, what types of input would lead to this behaviour?
 Sigmoid、ReLU、Leaky ReLU 
  翻译：我们仅要求您实现ReLU，但是神经网络可以使用许多不同的激活函数，每种都有其优缺点。 特别地，激活函数通常会遇到的一个问题是在反向传播期间获得零（或接近零）的梯度流。 以下哪个激活功能有此问题？ 如果在一维的情况下考虑这些功能，哪种输入会导致这种行为？ 
  回答： 
  Inline Question 2: 
  提问：Did you notice anything about the comparative difficulty of training the three-layer net vs training the five layer net? In particular, based on your experience, which network seemed more sensitive to the initialization scale? Why do you think that is the case? 
  翻译：您是否注意到训练三层网络与训练五层网络的相对难度？ 特别是，根据您的经验，哪个网络似乎对初始化规模更敏感？ 您为什么认为是这种情况？ 
  回答： 
  Inline Question 3: 
  AdaGrad, like Adam, is a per-parameter optimization method that uses the following update rule: 
  cache += dw**2
w += - learning_rate * dw / (np.sqrt(cache) + eps)
 
  John notices that when he was training a network with AdaGrad that the updates became very small, and that his network was learning slowly. Using your knowledge of the AdaGrad update rule, why do you think the updates would become very small? Would Adam have the same issue?

2025年第二届机器学习与神经网络国际学术会议(MLNN 2025) 分享学术科研与论文的禁小默机器学习神经网络人工智能
重要信息官网：www.icmlnn.org时间：2025年4月22-24日地点：中国-重庆简介2025年第二届机器学习与神经网络国际学术会议（MLNN2025）围绕学习系统与神经网络的核心理论、关键技术和应用展开讨论，涵盖深度学习、计算机视觉、自然语言处理、强化学习等多个子领域，通过特邀报告、主题演讲、海报展示等形式，展示相关领域的最新研究成果和技术创新。征稿主题神经网络机器学习深度学习算法及应用
目标检测领域总结：从传统方法到 Transformer 时代的革新 DoYangTan 目标检测系列目标检测 transformer 人工智能
目标检测领域总结：从传统方法到Transformer时代的革新目标检测是计算机视觉领域的一个核心任务，它的目标是从输入图像中识别并定位出目标物体。随着深度学习的兴起，目标检测方法已经取得了显著的进展。从最早的传统方法到现如今基于Transformer的先进算法，目标检测的发展经历了多个重要的阶段。本文将详细总结目标检测领域的演进，涵盖传统方法、两阶段检测方法、单阶段检测方法和基于Transform
万字深度解析：DeepSeek-V3为何成为大模型时代的“速度之王“？羊不白丶大模型算法
引言在AI军备竞赛白热化的2024年，DeepSeek-V3以惊人的推理速度震撼业界：相比前代模型推理速度提升3倍，训练成本降低70%。这背后是十余项革命性技术的叠加创新，本文将为您揭开这艘"AI超跑"的性能密码。DeepSeek-V3的技术路径证明：计算效率的本质是知识组织的效率。其MoE架构中2048个专家的动态协作，恰似人脑神经网络的模块化运作——每个专家不再是被动执行计算的"劳工"，而是具
神经网络中层与层之间的关联 iisugar 神经网络深度学习计算机视觉
目录1.层与层之间的核心关联：数据流动与参数传递1.1数据流动（ForwardPropagation）1.2参数传递（BackwardPropagation）2.常见层与层之间的关联模式2.1典型全连接网络（如手写数字分类）2.2卷积神经网络（CNN，如图像分类）2.3循环神经网络（RNN/LSTM，如文本生成）2.4Transformer（如机器翻译）3.层间关联的核心原则3.1数据传递的“管道
Pytorch深度学习教程_9_nn模块构建神经网络 tRNA做科研深度学习保姆教程深度学习 pytorch 神经网络
欢迎来到《深度学习保姆教程》系列的第九篇！在前面的几篇中，我们已经介绍了Python、numpy及pytorch的基本使用，进行了梯度及神经网络的实践并学习了激活函数和激活函数，在上一个教程中我们学习了优化算法。今天，我们将开始使用pytorch构建我们自己的神经网络。欢迎订阅专栏进行系统学习：深度学习保姆教程_tRNA做科研的博客-CSDN博客目录1.理解nn模块：(1)使用nn.Sequent
图神经网络实战——分层自注意力网络盼小辉丶图神经网络从入门到项目实战神经网络人工智能深度学习
图神经网络实战——分层自注意力网络0.前言1.分层自注意力网络1.1模型架构1.2节点级注意力1.3语义级注意力1.4预测模块2.构建分层自注意力网络相关链接0.前言在异构图数据集上，异构图注意力网络的测试准确率为78.39%，比之同构版本有了较大提高，但我们还能进一步提高准确率。在本节中，我们将学习一种专门用于处理异构图的图神经网络架构，分层自注意力网络(hierarchicalself-att
Radiance Fields from VGGSfM和Mast3r:两种先进3D重建方法的比较与分析 2401_87458718 3d
VGGSfM和Mast3r:3D场景重建的新方向在计算机视觉和3D重建领域,如何从2D图像重建3D场景一直是一个充满挑战的研究课题。近年来,随着深度学习技术的发展,一些新的方法被提出并取得了显著的进展。本文将重点介绍两种最新的基于深度学习的3D重建方法:VGGSfM和Mast3r,并通过GaussianSplatting技术对它们的性能进行全面比较和分析。VGGSfM:基于视觉几何的深度结构运动恢
基于 PyTorch 的 MNIST 手写数字分类模型欣然～ pytorch 分类人工智能
一、概述本代码使用PyTorch框架构建了一个简单的神经网络模型，用于解决MNIST手写数字分类任务。代码主要包括数据的加载与预处理、神经网络模型的构建、损失函数和优化器的定义、模型的训练、评估以及最终模型的保存等步骤。二、依赖库torch：PyTorch深度学习框架的核心库，提供了张量操作、自动求导等功能。torch.nn：PyTorch的神经网络模块，包含了各种神经网络层、损失函数等。torc
机器学习——分类、回归、聚类、LASSO回归、Ridge回归（自用）代码的建筑师模型学习模型训练机器学习机器学习分类回归正则化项 LASSO Ridge 朴素
纠正自己的误区：机器学习是一个大范围，并不是一个小的方向，比如：线性回归预测、卷积神经网络和强化学都是机器学习算法在不同场景的应用。机器学习最为关键的是要有数据，也就是数据集名词解释：数据集中的一行叫一条样本或者实例，列名称为特征或者属性。样本的数量称为数据量，特征的数量称为特征维度机器学习常用库：Numpy和sklearn朴素的意思是特征的各条件都是相互独立的机器学习（模型、策略、算法）损失函数
【论文阅读】Persistent Homology Captures the Generalization of Neural Networks Without A Validation Set 开心星人论文阅读论文阅读
将神经网络表征为加权的无环图，直接根据模型的权重矩阵构造PD。计算相邻batch的权重矩阵PD之间的距离。比较同调收敛性与神经网络的验证精度变化趋势摘要机器学习从业者通常通过监控模型的某些指标来估计其泛化误差，并在训练数值收敛之前停止训练，以防止过拟合。通常，这种误差度量或任务相关的指标是通过一个验证集（holdoutset）来计算的。因为这些数据没有直接用于更新模型参数，通常假设模型在验证集上的
【论文阅读】实时全能分割模型万里守约论文阅读论文阅读图像分割图像处理计算机视觉
文章目录导言1、论文简介2、论文主要方法3、论文针对的问题4、论文创新点总结导言在最近的计算机视觉领域，针对实时多任务分割的需求日益增长，特别是在交互式分割、全景分割和视频实例分割等多种应用场景中。为了解决这些挑战，本文介绍了一种新方法——RMP-SAM（Real-TimeMulti-PurposeSegmentAnything），旨在实现实时的多功能分割。RMP-SAM结合了动态卷积与高效的模型
卷积神经网络 - 理解卷积核的尺寸 k×k×Cin 谦亨有终 AI学习笔记 cnn 人工智能神经网络深度学习机器学习
卷积神经网络中，每个卷积核的尺寸为k×k×Cin，这一设计的核心原因在于多通道输入的数据结构和跨通道特征整合的需求。以下是详细解释：1.输入数据的结构输入形状：假设输入数据为三维张量，形状为H×W×Cin，其中：H：高度（Height）W：宽度（Width）Cin：通道数（Channelsin）多通道的物理意义：对于RGB图像，Cin=3（红、绿、蓝三通道）。对于中间层的特征图，Cin可能为64、
大模型学习终极指南：从新手到专家的必经之路，全网最详尽解析，你敢挑战吗？大模型入门教程学习人工智能 AI 大模型大模型学习大模型教程 AI大模型
随着人工智能技术的飞速发展，大模型（Large-ScaleModels）已经成为推动自然语言处理（NLP）、计算机视觉（CV）等领域进步的关键因素。本文将为您详细介绍从零开始学习大模型直至成为专家的全过程，包括所需掌握的知识点、学习资源以及实践建议等。无论您是初学者还是有一定基础的专业人士，都能从中获得有价值的指导。一、基础知识准备在开始学习大模型之前，需要先掌握一些基础知识，这些知识将为后续的学
【第1章＞第6节】CMAC小脑模型神经网络的理论学习与MATLAB仿真 fpga和matlab #第1章·神经网络学习 matlab CMAC 小脑模型神经网络人工智能
目录1.使用软件和版本2.CMAC小脑模型神经网络概述2.1CMAC网络结构2.2CMAC地址映射2.3学习过程3.CMAC网络的MATLAB编程实现4.分辨率，重叠度，学习率对CMAC网络的训练性能影响分析4.1分辨率4.2重叠度4.3学习率5.视频操作步骤演示欢迎订阅FPGA/MATLAB/Simulink系列教程《★教程1:matlab入门100例》《★教程2:fpga入门100例》《★教程
AI大模型训练教程 Small踢倒coffee_氕氘氚 python自学经验分享笔记
1.引言随着人工智能技术的快速发展，大模型（如GPT-3、BERT等）在自然语言处理、计算机视觉等领域取得了显著的成果。训练一个大模型需要大量的计算资源、数据和专业知识。本教程将带你了解如何从零开始训练一个AI大模型。2.准备工作2.1硬件要求GPU：推荐使用NVIDIA的高性能GPU，如A100、V100等。内存：至少64GBRAM。存储：SSD存储，至少1TB。#2.2软件环境操作系统：Lin
使用Jupyter Notebook进行深度学习编程 - 深度学习教程 shandianfk_com ChatGPT AI jupyter 深度学习 ide
大家好，今天我们要聊聊如何使用JupyterNotebook进行深度学习编程。深度学习是人工智能领域中的一项重要技术，通过模仿人脑神经网络的方式进行学习和分析。JupyterNotebook作为一个强大的工具，可以帮助我们轻松地进行深度学习编程，尤其适合初学者和研究人员。本文将带领大家一步步了解如何在JupyterNotebook中开展深度学习项目。一、什么是JupyterNotebook？Jup
Opencv之计算机视觉一闭月之泪舞计算机视觉计算机视觉 opencv python
一、环境准备使用opencv库来实现简单的计算机视觉。需要安装两个库：opencv-python和opencv-contrib-python，版本可以自行选择，注意不同版本的opencv中的某些函数名和用法可能不同pipinstallopencv-python==3.4.18.65-ihttps://pypi.tuna.tsinghua.edu.cn/simplepipinstallopencv-
计算机视觉总结 Trank-Lw 计算机视觉深度学习人工智能
以下是针对上述问题的详细解答，并结合代码示例进行说明：1.改进YOLOv5人脸检测模块，复杂光照场景准确率从98.2%提升至99.5%优化具体过程：光照补偿：在数据预处理阶段，采用自适应光照补偿算法，对图像进行实时增强，以减少光照变化对人脸检测的影响。数据增强：在训练数据中增加复杂光照场景下的样本，如强光、弱光、背光等，通过数据增强提高模型对不同光照条件的适应性。模型调整：对YOLOv5模型的网络
景联文科技：以高质量数据标注推动人工智能领域创新与发展景联文科技科技人工智能数据标注
在当今这个由数据驱动的时代，高质量的数据标注对于推动机器学习、自然语言处理（NLP）、计算机视觉等领域的发展具有不可替代的重要性。数据标注过程涉及对原始数据进行加工，通过标注特定对象的特征来生成能够被机器学习模型识别和使用的编码格式，从而使数据更具有意义和可解读性。数据标注的主要类型包括：图像标注：指在图片中标识出目标物体的位置、形状或类别等信息，如自动驾驶技术中的行人、车辆及交通标志的识别。文本
OpenCV 4.2.0与扩展模块安装与应用指南土城三富
本文还有配套的精品资源，点击获取简介：OpenCV4.2.0是一个先进的计算机视觉库，包含了图像处理、计算机视觉和机器学习算法。本压缩包包含OpenCV核心库和扩展模块（opencv_contrib），版本均为4.2.0。该版本引入了性能增强、API优化以及对深度学习框架和硬件加速技术的更新支持。扩展模块提供了额外的实验性算法和功能，有助于研究和开发新算法。指南详细介绍了如何安装和配置这些库，并提
OpenCV ML 模块使用指南 ice_junjun OpenCV opencv 人工智能计算机视觉
一、模块概述OpenCV的ML模块提供了丰富的机器学习算法，可用于解决各种计算机视觉和数据分析问题。本指南将详细介绍该模块中主要的机器学习算法，包括支持向量机（SVM）、K均值聚类（K-Means）和神经网络（ANN），并结合图像分类和聚类分析这两个典型应用场景进行代码实现与解释。二、主要函数及类详解（一）支持向量机（SVM）：cv.ml.SVM_create()功能支持向量机（SVM）是一种强大
强化学习中策略网络模型设计与优化技巧数字扫地僧计算机视觉深度学习
I.引言强化学习（ReinforcementLearning,RL）是一种通过与环境交互，学习如何采取行动以最大化累积奖励的机器学习方法。策略网络（PolicyNetwork）是强化学习中一种重要的模型，它直接输出动作的概率分布或具体的动作。本篇博客将深入探讨策略网络的设计原则、优化技巧，并结合具体实例展示其应用。II.策略网络的基本概念A.策略网络的定义策略网络是一种神经网络，它接受当前状态作为
卷积神经网络Batch Normalization的作用 arron8899 cnn batch 人工智能
BatchNormalization的作用（通俗版）1.像“稳定器”一样校准每层输入想象你在烤多层蛋糕，每层蛋糕的烘烤温度不同（相当于神经网络的每一层数据分布不同）。没有BN时，烤箱温度忽高忽低，导致有的层烤焦（梯度爆炸），有的层不熟（梯度消失）。BN的作用相当于给每一层装了一个自动温度调节器，实时将输入数据调整到标准温度（均值为0，方差为1），保证每层都能均匀受热，训练更稳定。2.让模型训练“少
一切皆是映射：实现神经网络的硬件加速技术：GPU、ASIC（专用集成电路）和FPGA（现场可编程门阵列） AI天才研究院 AI大模型企业级应用开发实战 DeepSeek R1 &大数据AI人工智能大模型计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
文章目录一切皆是映射：实现神经网络的硬件加速技术：GPU、ASIC（专用集成电路）和FPGA（现场可编程门阵列）1.背景介绍2.核心概念与联系3.核心算法原理&具体操作步骤3.1算法原理概述3.2算法步骤详解3.2.1GPU加速3.2.2ASIC加速3.2.3FPGA加速3.3算法优缺点GPUASICFPGA3.4算法应用领域4.数学模型和公式&详细讲解&举例说明4.1数学模型构建4.2公式推导过
给普通人看的深度学习说明书：用快递系统理解AI如何思考嵌入式Jerry Python AI 人工智能深度学习
第一章：理解AI的思维方式（快递版）1.1快递分拣站的故事假设你管理一个快递分拣站：传统方法：手动制定规则（比如根据邮编分拣）机器学习：观察老员工的分拣记录，总结规律深度学习：搭建自动分拣流水线，自主发现隐藏规则1.2神经网络就像智能分拣机传送带（输入层）：接收包裹信息（图片像素/文字等）#就像扫描快递单input_data=[0.2,0.7,0.1]#归一化后的特征数据分拣工人（隐藏层）：每个工
使用PyTorch搭建Transformer神经网络:入门篇 DASA13 pytorch transformer 神经网络
1.简介Transformer是一种强大的神经网络架构,在自然语言处理等多个领域取得了巨大成功。本教程将指导您使用PyTorch框架从头开始构建一个Transformer模型。我们将逐步解释每个组件,并提供详细的代码实现。2.环境设置首先,确保您的系统中已安装Python(推荐3.7+版本)。然后,安装PyTorch和其他必要的库:pipinstalltorchnumpymatplotlib3.P
解析大模型归一化：提升训练稳定性和性能的关键技术秋声studio 口语化解析深度学习人工智能大模型归一化
引言在深度学习领域，特别是在处理大型神经网络模型时，归一化（Normalization）是一项至关重要的技术。它可以提高模型的训练稳定性和性能，在加速收敛方面发挥了重要作用。本文将深入探讨大模型归一化的原理、常见方法及其应用场景，并结合实际案例和代码示例进行说明。一、归一化的作用与理论基础归一化的主要目的是为了提高模型的训练稳定性和性能。具体来说，归一化有以下几个关键作用：提高训练稳定性：在神经网
小白零基础学数学建模系列-引言与课程目录川川菜鸟数学建模小白到精通系列数学建模
目录引言一、我们的专辑包含哪些内容？第一周：数学建模基础与工具第二周：高级数学建模技巧与应用第三周：机器学习基础与数据处理第四周：监督学习与无监督学习算法第五周：神经网络二、学完本专辑能收获到什么？三、适合什么样的人群学习？四、如何学习本专辑？课程目录第1周：数学建模基础与工具第1天：数学建模入门介绍第2天：数学建模工具介绍第3天：线性回归与曲线拟合第4天：线性规划第5天：动态规划第2周：高级数学
初始OpenCV 指尖下的技术 OpenCV opencv 人工智能计算机视觉
OpenCV是一个功能强大、应用广泛的计算机视觉库，它为开发人员提供了丰富的工具和算法，可以帮助他们快速构建各种视觉应用。随着计算机视觉技术的不断发展，OpenCV也将会继续发挥重要的作用。OpenCV提供了大量的计算机视觉算法和图像处理工具，广泛应用于图像和视频的处理、分析以及机器学习领域。所以学习人计算机视觉或者图像处理方面的知识，OpenCV是一个要重点学习的工具库。首先介绍一下OpenCV
回归任务训练--MNIST全连接神经网络（Mnist_NN）豆芽819 深度学习框架PyTorch pytorch 深度学习人工智能机器学习回归
importtorchimportnumpyasnpimportloggingfromtorch.utils.dataimportTensorDataset,DataLoaderfromtorch.utils.dataimportDataLoader#配置日志logging.basicConfig(level=logging.INFO,format='%(asctime)s-%(levelname
Algorithm 香水浓 java Algorithm
冒泡排序 public static void sort(Integer[] param) { for (int i = param.length - 1; i > 0; i--) { for (int j = 0; j < i; j++) { int current = param[j]; int next = param[j + 1];
mongoDB 复杂查询表达式开窍的石头 mongodb
1:count Pg: db.user.find().count(); 统计多少条数据 2:不等于$ne Pg: db.user.find({_id:{$ne:3}},{name:1,sex:1,_id:0}); 查询id不等于3的数据。 3：大于$gt $gte(大于等于) &n
Jboss Java heap space异常解决方法, jboss OutOfMemoryError : PermGen space 0624chenhong jvm jboss
转自 http://blog.csdn.net/zou274/article/details/5552630 解决办法： window->preferences->java->installed jres->edit jre 把default vm arguments 的参数设为-Xms64m -Xmx512m ----------------
文件上传下载解析相对路径不懂事的小屁孩文件上传
有点坑吧，弄这么一个简单的东西弄了一天多，身边还有大神指导着，网上各种百度着。下面总结一下遇到的问题：文件上传，在页面上传的时候，不要想着去操作绝对路径，浏览器会对客户端的信息进行保护，避免用户信息收到攻击。在上传图片，或者文件时，使用form表单来操作。前台通过form表单传输一个流到后台，而不是ajax传递参数到后台，代码如下: <form action=&
怎么实现qq空间批量点赞换个号韩国红果果 qq
纯粹为了好玩！！逻辑很简单 1 打开浏览器console；输入以下代码。先上添加赞的代码 var tools={}; //添加所有赞 function init(){ document.body.scrollTop=10000; setTimeout(function(){document.body.scrollTop=0;},2000);//加
判断是否为中文灵静志远中文
方法一： public class Zhidao { public static void main(String args[]) { String s = "sdf灭礌 kjl d{';\fdsjlk是"; int n=0; for(int i=0; i<s.length(); i++) { n = (int)s.charAt(i); if((
一个电话面试后总结 a-john 面试
今天，接了一个电话面试，对于还是初学者的我来说，紧张了半天。面试的问题分了层次，对于一类问题，由简到难。自己觉得回答不好的地方作了一下总结：在谈到集合类的时候，举几个常用的集合类，想都没想，直接说了list,map。然后对list和map分别举几个类型： list方面：ArrayList,LinkedList。在谈到他们的区别时，愣住了
MSSQL中Escape转义的使用 aijuans MSSQL
IF OBJECT_ID('tempdb..#ABC') is not null drop table tempdb..#ABC create table #ABC ( PATHNAME NVARCHAR(50) ) insert into #ABC SELECT N'/ABCDEFGHI' UNION ALL SELECT N'/ABCDGAFGASASSDFA' UNION ALL
一个简单的存储过程 asialee mysql 存储过程构造数据批量插入
今天要批量的生成一批测试数据，其中中间有部分数据是变化的，本来想写个程序来生成的，后来想到存储过程就可以搞定，所以随手写了一个，记录在此： DELIMITER $$ DROP PROCEDURE IF EXISTS inse
annot convert from HomeFragment_1 to Fragment 百合不是茶 android 导包错误
创建了几个类继承Fragment, 需要将创建的类存储在ArrayList<Fragment>中; 出现不能将new 出来的对象放到队列中,原因很简单; 创建类时引入包是:import android.app.Fragment; 创建队列和对象时使用的包是:import android.support.v4.ap
Weblogic10两种修改端口的方法 bijian1013 weblogic 端口号配置管理 config.xml
一.进入控制台进行修改 1.进入控制台: http://127.0.0.1:7001/console 2.展开左边树菜单域结构->环境->服务器-->点击AdminServer(管理) &
mysql 操作指令征客丶 mysql
一、连接mysql 进入 mysql 的安装目录； $ bin/mysql -p [host IP 如果是登录本地的mysql 可以不写 -p 直接 -u] -u [userName] -p 输入密码，回车，接连；二、权限操作［如果你很了解mysql数据库后，你可以直接去修改系统表，然后用 mysql> flush privileges; 指令让权限生效］ 1、赋权 mys
【Hive一】Hive入门 bit1129 hive
Hive安装与配置 Hive的运行需要依赖于Hadoop，因此需要首先安装Hadoop2.5.2，并且Hive的启动前需要首先启动Hadoop。 Hive安装和配置的步骤 1. 从如下地址下载Hive0.14.0 http://mirror.bit.edu.cn/apache/hive/ 2.解压hive，在系统变
ajax 三种提交请求的方法 BlueSkator Ajax jqery
1、ajax 提交请求 $.ajax({ type:"post", url : "${ctx}/front/Hotel/getAllHotelByAjax.do", dataType : "json", success : function(result) { try { for(v
mongodb开发环境下的搭建入门 braveCS 运维
linux下安装mongodb 1）官网下载mongodb-linux-x86_64-rhel62-3.0.4.gz 2）linux 解压 gzip -d mongodb-linux-x86_64-rhel62-3.0.4.gz; mv mongodb-linux-x86_64-rhel62-3.0.4 mongodb-linux-x86_64-rhel62-
编程之美-最短摘要的生成 bylijinnan java 数据结构算法编程之美
import java.util.HashMap; import java.util.Map; import java.util.Map.Entry; public class ShortestAbstract { /** * 编程之美最短摘要的生成 * 扫描过程始终保持一个[pBegin,pEnd]的range,初始化确保[pBegin,pEnd]的ran
json数据解析及typeof chengxuyuancsdn js typeof json解析
// json格式 var people='{"authors": [{"firstName": "AAA","lastName": "BBB"},' +' {"firstName": "CCC&
流程系统设计的层次和目标 comsci 设计模式数据结构 sql 框架脚本
流程系统设计的层次和目标
RMAN List和report 命令 daizj oracle list report rman
LIST 命令使用RMAN LIST 命令显示有关资料档案库中记录的备份集、代理副本和映像副本的信息。使用此命令可列出： • RMAN 资料档案库中状态不是AVAILABLE 的备份和副本 • 可用的且可以用于还原操作的数据文件备份和副本 • 备份集和副本，其中包含指定数据文件列表或指定表空间的备份 • 包含指定名称或范围的所有归档日志备份的备份集和副本 • 由标记、完成时间、可
二叉树:红黑树 dieslrae 二叉树
红黑树是一种自平衡的二叉树,它的查找,插入,删除操作时间复杂度皆为O(logN),不会出现普通二叉搜索树在最差情况时时间复杂度会变为O(N)的问题. 红黑树必须遵循红黑规则,规则如下 1、每个节点不是红就是黑。 2、根总是黑的 &
C语言homework3，7个小题目的代码 dcj3sjt126com c
1、打印100以内的所有奇数。 # include <stdio.h> int main(void) { int i; for (i=1; i<=100; i++) { if (i%2 != 0) printf("%d ", i); } return 0; } 2、从键盘上输入10个整数，
自定义按钮, 图片在上, 文字在下, 居中显示 dcj3sjt126com 自定义
#import <UIKit/UIKit.h> @interface MyButton : UIButton -(void)setFrame:(CGRect)frame ImageName:(NSString*)imageName Target:(id)target Action:(SEL)action Title:(NSString*)title Font:(CGFloa
MySQL查询语句练习题，测试足够用了 flyvszhb sql mysql
http://blog.sina.com.cn/s/blog_767d65530101861c.html 1.创建student和score表 CREATE TABLE student ( id INT(10) NOT NULL UNIQUE PRIMARY KEY , name VARCHAR
转：MyBatis Generator 详解 happyqing mybatis
MyBatis Generator 详解 http://blog.csdn.net/isea533/article/details/42102297 MyBatis Generator详解 http://git.oschina.net/free/Mybatis_Utils/blob/master/MybatisGeneator/MybatisGeneator.
让程序员少走弯路的14个忠告 jingjing0907 工作计划学习
无论是谁，在刚进入某个领域之时，有再大的雄心壮志也敌不过眼前的迷茫：不知道应该怎么做，不知道应该做什么。下面是一名软件开发人员所学到的经验，希望能对大家有所帮助 1.不要害怕在工作中学习。只要有电脑，就可以通过电子阅读器阅读报纸和大多数书籍。如果你只是做好自己的本职工作以及分配的任务，那是学不到很多东西的。如果你盲目地要求更多的工作，也是不可能提升自己的。放
nginx和NetScaler区别流浪鱼 nginx
NetScaler是一个完整的包含操作系统和应用交付功能的产品，Nginx并不包含操作系统，在处理连接方面，需要依赖于操作系统，所以在并发连接数方面和防DoS攻击方面，Nginx不具备优势。 2.易用性方面差别也比较大。Nginx对管理员的水平要求比较高，参数比较多，不确定性给运营带来隐患。在NetScaler常见的配置如健康检查，HA等，在Nginx上的配置的实现相对复杂。 3.策略灵活度方
第11章动画效果（下） onestopweb 动画
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
FAQ - SAP BW BO roadmap blueoxygen BO BW
http://www.sdn.sap.com/irj/boc/business-objects-for-sap-faq Besides, I care that how to integrate tightly. By the way, for BW consultants, please just focus on Query Designer which i
关于java堆内存溢出的几种情况 tomcat_oracle java jvm jdk thread
【情况一】：　　 java.lang.OutOfMemoryError: Java heap space：这种是java堆内存不够，一个原因是真不够，另一个原因是程序中有死循环；　　如果是java堆内存不够的话，可以通过调整JVM下面的配置来解决：　　<jvm-arg>-Xms3062m</jvm-arg> 　　<jvm-arg>-Xmx
Manifest.permission_group权限组阿尔萨斯 Permission
结构继承关系 public static final class Manifest.permission_group extends Object java.lang.Object android. Manifest.permission_group 常量 ACCOUNTS 直接通过统计管理器访问管理的统计 COST_MONEY可以用来让用户花钱但不需要通过与他们直接牵涉的权限 D

CS231n 课程作业 Assignment Two（二）全连接神经网络（0820）

Assignment Two（二）全连接神经网络

一、模块设计

1.1 Affine layer: forward

1.2 Affine layer: backward

1.3 ReLU activation: forward

1.4 ReLU activation: backward

1.5 “Sandwich” layers

1.6 Loss layers: Softmax and SVM

二、集成化

2.1 Two-layer network

2.2 Solver

2.3 Multilayer network

三、最优化

3.1 SGD+Momentum

3.2 RMSProp

3.3 Adam

3.4 四种优化方式的比较

四、训练和预测

4.1 Train

4.2 Test_final

五、问题

Inline Question 1:

Inline Question 2:

Inline Question 3:

你可能感兴趣的:(计算机视觉,神经网络)