论文笔记:Network in Network

1. Motivation: conv layers use GLM (WX+b) to extract features; the level of abstraction is low (latent features are linearly seperable) ==> use MLP (can approximate any functions, whereas maxout only approximates convex functions, and MLP can be applied to BP) 

2. Architecture
Mlpconv layer: a cross-channel parametric pooling, or a 1x1 conv



Global average pooling: avg pooling the feature map instead of two FC ==> reduce the number of parameters 


你可能感兴趣的:(DL,cnn,nin)