Project url: https://github.com/horcham/numpy-net/
仅使用基本的矩阵运算(工具为numpy), 和合适的数据结构, 自主实现了深度学习框架(名为numpy-net).
其支持全连接、卷积、池化、Batch Normalization、dropout、Residual Block等运算, Momentum、AdaDelta、Adam 等优化器, 并使用该框架实现 LeNet, AlexNet(without LRN),VGG16, ResNet18等经典网络结构, 并使用 MNIST 进行测试, LeNet准确率为 97.11。该项目的特点是类似keras的编程风格, 自定义operation和自动求导.
Numpy-net 的实现思路: 定义 Variable 类以存储节点的数据、梯度、学习率等数据, 并将 Variable 类作为网络的数据单元;定义 Op 通用运算类及其子类(如 conv2d,BatchNorm等)、Layer 通用激活函数类及其子类(如 ReluActivator 等)、Block 通用块类及其残差块类(如 ResBlock 等)、Loss 损失函数类及其子类(如 softmax 等)的初始化方法以及前向计算和反向求导规则;定义 Graph 类,通过调用添加上述类来初始化网络结构序列和优化器,通过定义并调用 Graph 类的前向计算,令网络从前往后对每个计算节点(如 Op)调用前向计算,并构造新的 Variable 数据节点存储当前结果,并用此 Variable 作为下一个 Op 的输入,到末尾损失函数类计算损失及并通过求导规则计算导数,定义并调用 Graph 类的反向求导,令网络从后往前的对每个计算节点调用求导规则,并将梯度存储在输入该计算节点的 Op 中;反向传播结束,定义并调用 Graph 类的更新,对每个需要更新参数的 Variable 依次调用优化器更新参数,至此 numpy-net的训练过程完成。
Numpynet is a neural networks framework implemented by numpy.
Its coding style is like Gluon of mxnet. It can do derivation automatically via definition of operation’s backward function.
Now this framework supports BP network and CNN.
dataset | model | learning rate | epoch | accuracy |
---|---|---|---|---|
MNIST | LeNet | 1e-3 | 30 | 0.9711 |
Variable:
Variable(np.array, lr=0)
Placeholder()
Initial
Uniform(shape)
Normal(shape)
Operation
__init__()
, forward(X1, X2)
__init__()
, forward(X1)
__init__()
, forward(X1, X2)
__init__()
, forward(X1)
__init__(padding, stride, pad)
, forward(X1, X2)
__init__(filter_h, filter_w, stride, pad)
, forward(X1)
__init__(p)
, forward(X1)
__init__(gamma, beta)
, forward(X1)
Block
__init__(X1, X2, scps)
, forward()
Activator
__init__()
, forward(X1)
__init__()
, forward(X1)
__init__()
, forward(X1)
__init__()
, forward(X1)
Loss Function
__init__()
__init__()
Optimizer
__init__()
__init__(beta1=0.9)
__init__()
__init__(beta2=0.999)
__init__(beta1=0.9, beta2=0.999)
Models
Examples
Get numpynet
git clone https://github.com/horcham/numpy-net.git
python setup.py install
and you can import numpynet
and play
import numpynet as nn
There are some demos, examples/mnist.py
is a demo that solving digital
classification by Lenet. you can run them and have fun:
cd examples
python mnist.py
Graph
is a network definition. During Graph
's definition, Variable
,Operation
,Layer
, Loss Function
, Optimizer
are just symbol, we can add them into graph.
import numpynet as nn
input = nn.Variable(nn.UniformInit([1000, 50]), lr=0) # input data
output = nn.Variable(nn.onehot(np.random.choice(['a', 'b'], [1000, 1])), lr=0) # output data, contain two labels
graph = nn.Graph() # Graph initial
X = nn.Placeholder() # Add Placeholder
W0 = nn.Variable(nn.UniformInit([50, 30]), lr=0.01)
graph.add_var(W0) # Add Variable into graph
W1 = nn.Variable(nn.UniformInit([30, 2]), lr=0.01)
graph.add_var(W1)
FC0 = nn.Op(nn.Dot(), X, W0)
graph.add_op(FC0) # Add Operation into graph
act0 = nn.Layer(nn.SigmoidActivator(), FC0)
graph.add_layer(act0) # Add Layer into graph
FC1 = nn.Op(nn.Dot(), act0, W1)
graph.add_op(FC1)
graph.add_loss(nn.Loss(nn.Softmax())) # add Loss function
graph.add_optimizer(nn.SGDOptimizer()) # add Optimizer
After the definition, we can train the net
graph.forward(input) # netword forward
graph.calc_loss(output) # use label Y and calculate loss
graph.backward() # netword backward and calculate gradient
graph.update() # update the variable in graph.add_var by optimizer
Variable
is similar to numpy.array
, but it is a class which also
contains other attributes like lr(learning rate), D(gradient). Any input
and variables should be convert to Variable
before feeding into network.
graph = Graph()
X = Variable(X, lr=0) # if X is not trainable, lr=0
w = Variable(w, lr=1) # if w is trainable, lr=1, and add it into graph
graph.add_var(w)
When definding the graph, we use Placeholder
to represent input data.
X = Placeholder() # Add Placeholder
W0 = Variable(UniformInit([50, 30]), lr=0.01)
FC0 = Op(Dot(), X, W0)
graph.add_op(FC0) # Add Operation into graph
After definition, the graph begin to train, input data(Variable
) and output data(Variable
)
were feed into graph by
graph.forward(input) # netword forward
graph.calc_loss(output) # use label Y and calculate loss
and Placeholder
is replaced by Variable
Operation
is similar to operations of numpy. For example, class Dot is
similar to numpy.dot, but it also contains backward funtions, which is used
to calculate gradient.
During graph
's definition, Operation
is defined as symbol.
How to define an Operation
?
op_handler = Op(operation(), X, W) # operation which need two inputs
op_handler2 = Op(operation(), X) # operation which need one inputs
operation()
: Dot()
,Add
,Conv2d
and so on
X
: First input, Variable
, which is not trainable
W
: Second input, Variable
, which is trainable
For example
FC0 = Op(Dot(), X, W0) # Define operation: Dot
graph.add_op(FC0) # Add Operation into graph
Operation
calculates when graph
begins to forward and backward.
graph.forward() # operation begins forward
graph.backward() # operation begins backward
Dropout
is an Op
, we always add it before fully connected operation. We only need
to define Dropout
in graph:
add0 = Op(Dot(), X, W0)
graph.add_op(add0)
dp0 = Op(Dropout(0.3), add0)
graph.add_op(dp0)
act0 = Layer(SigmoidActivator(), dp0)
graph.add_layer(act0)
BatchNorm
is an Op
, we always add it before Layer
. Before define BatchNorm
,
we should first define gamma
and beta
as trainable Variable
, and add them into
graph by graph.add_var
:
g0, bt0 = Variable(np.random.random(), lr=0.01), Variable(np.random.random(), lr=0.01)
graph.add_var(g0)
graph.add_var(bt0)
and define BarchNorm
in graph:
conv0 = Op(Conv2d(), X, W0)
graph.add_op(conv0)
bn0 = Op(BatchNorm(g0, bt0), conv0)
graph.add_op(bn0)
act0 = Layer(ReluActivator(), bn0)
graph.add_layer(act0)
After definition of graph, when graph.backward
called, gamma
and beta
will be trained
Add Block
to graph, now Block
just support ResBlock
(Block in ResNet-18), the follow shows
its construction
| |-------------> X or (conv --> BN) --------------| |
| | v |
--------> X --> conv --> BN --> Relu --> conv --> BN -----> + ------->
| |
| Block |
When the dimention of BN’s output is different from X, the shortcut switch to conv --> BN
,
otherwise X
.
To define a ResBlock
, you call
block = nn.ResBlock(X1, x2, sc)
X1
is the output of previous op
/block
/layer
. X2
is dict, it contains the parameters
of conv
and BN
in main branch. sc
contains the parameters of
shortcut, it is also dict.
For example, when shortcut is X
# Block 1
W1_1 = nn.Variable(nn.UniformInit([3, 3, 64, 64]), lr=lr)
b1_1 = nn.Variable(nn.UniformInit([64, 1]), lr=lr)
gamma1_1 = nn.Variable(nn.UniformInit([1, 64, self.W, self.H]), lr=lr)
beta1_1 = nn.Variable(nn.UniformInit([1, 64, self.W, self.H]), lr=lr)
W1_2 = nn.Variable(nn.UniformInit([3, 3, 64, 64]), lr=lr)
b1_2 = nn.Variable(nn.UniformInit([64, 1]), lr=lr)
gamma1_2 = nn.Variable(nn.UniformInit([1, 64, self.W, self.H]), lr=lr)
beta1_2 = nn.Variable(nn.UniformInit([1, 64, self.W, self.H]), lr=lr)
self.graph.add_vars([W1_1, b1_1, gamma1_1, beta1_1, \
W1_2, b1_2, gamma1_2, beta1_2])
pamas1 = {'w1': W1_1, 'b1': b1_1, \ # the first conv in main branch
'gamma1': gamma1_1, 'beta1': beta1_1, \ # the first BN in main branch
'w2': W1_2, 'b2': b1_2, \ # the second conv in main branch
'gamma2': gamma1_2, 'beta2': beta1_2} # the second BN in main branch
B1 = nn.ResBlock(act0, pamas1)
self.graph.add_block(B1)
When shortcut comes to conv --> BN
# Block 3
W3_1 = nn.Variable(nn.UniformInit([3, 3, 64, 128]), lr=lr)
b3_1 = nn.Variable(nn.UniformInit([128, 1]), lr=lr)
gamma3_1 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
beta3_1 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
W3_2 = nn.Variable(nn.UniformInit([3, 3, 128, 128]), lr=lr)
b3_2 = nn.Variable(nn.UniformInit([128, 1]), lr=lr)
gamma3_2 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
beta3_2 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
w_sc3 = nn.Variable(nn.UniformInit([3, 3, 64, 128]), lr=lr)
b_sc3 = nn.Variable(nn.UniformInit([128, 1]), lr=lr)
gamma_sc3 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
beta3_sc3 = nn.Variable(nn.UniformInit([1, 128, self.W/2, self.H/2]), lr=lr)
self.graph.add_vars([W3_1, b3_1, gamma3_1, beta3_1, \
W3_2, b3_2, gamma3_2, beta3_2, \
w_sc3, b_sc3, gamma_sc3, beta3_sc3])
pamas3 = {'w1': W3_1, 'b1': b3_1, \ # the first conv in main branch
'gamma1': gamma3_1, 'beta1': beta3_1, \ # the first BN in main branch
'w2': W3_2, 'b2': b3_2, \ # the second conv in main branch
'gamma2': gamma3_2, 'beta2': beta3_2} # the second BN in main branch
sc3 = {'w': w_sc3, 'b': b_sc3, \ # conv in shortcut
'gamma': gamma_sc3, 'beta': beta3_sc3} # BN in shortcut
B3 = nn.ResBlock(pool2, pamas3, sc3)
self.graph.add_block(B3)
Add activations to graph
During Layer
's definition, Layer
is defined as symbol.
How to define an Layer
?
layer_handler = Layer(activator(), X)
X
: Variable
, which is not trainable
For example
act0 = Layer(SigmoidActivator(), add0) # Define Layer: Sigmoid activation
graph.add_layer(act0) # Add Layer into graph
Layer
calculates when graph
begins to forward and backward.
graph.forward() # Layer begins forward
graph.backward() # Layer begins backward
Add Loss Function to graph, it will calculate loss, and begin
calculate gradient.
During Loss
's definition, Loss
is defined as symbol.
How to define an Loss
?
loss_handler = Loss(loss_funtion())
loss_function()
can be MSE()
, Softmax
For example
graph.add_loss(Loss(Softmax()))
Loss
calculates after graph forward, call
graph.calc_loss(Y)
to calculate loss. Y
is labels of data, Variable
After calculating loss, call
graph.backward()
to do backward.
Add Optimizer to graph, it will update trainable Variable
after backward.
How to define an Loss
?
Optimizer_handler = Optimizer()
Optimizer()
can be SGDOptimizer
, MomentumOptimizer
and so on
For example
graph.add_optimizer(AdamOptimizer())
After backward, trainable Variable
needs to update according
its gradient
graph.update()
If you want to define Operation
or Layer
, you only need to
define how this Operation
or Layer
forward and backward.Be careful
that the first input of Operation
must be untrainable Variable
and
the second input must be trainable
Meanwhile, it is easy for you to define Optimizer
, Loss