本文主要参考反向传播之一：softmax函数，添加相应的pytorch的实现

softmax函数

定义及简单实现

记输入为，标签为, 通过softmax之后的输出为，则：

其中，
pytorch库函数和手动实现：

def seed_torch(seed=1234):
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.enabled = False

def softmax(x, dim=1):
    x=x-torch.max(x, dim=dim)[0].unsqueeze(dim=dim) # 防止溢出
    res = torch.exp(x) / torch.sum(torch.exp(x), dim=dim).unsqueeze(dim=dim)
    return res
if __name__ == '__main__':
    seed_torch(1234)
    x=torch.rand(4,7, requires_grad=True)

    print(torch.softmax(x,dim=-1))
    print(softmax(x, dim=-1))

求导推导

当i=j

当i≠j

综上：

其中，为以元素为对角线元素构造的矩阵
公式推导来自：https://zhuanlan.zhihu.com/p/37740860。图中的应为

交叉熵

计算公式

说明：

表示分类任务中的类别数
表示的该样本对应的标签，是一个维的向量，一般使用one-hot编码，表示其第维的值，为0或者1
表示模型计算出的分类概率向量，维，每一维表示预测得到该类的概率，即，由softmax函数计算得到

pytorch 实现：

    seed_torch(1234)
    x=torch.rand(4,7, requires_grad=True)  # 4个样本，共7类
    y=torch.LongTensor([1,3,5,0]) # 对应的标签
    criterion = torch.nn.CrossEntropyLoss()  # pytroch库
    out = criterion(x,y)
    print(out)
   # 自己实现
    gt = torch.zeros(4,7).scatter(1, y.view(4,1),1)  # 生成one-hot标签，scatter的用法可参考jianshu.com/p/b4e9fd4048f4
    loss = -(torch.log(softmax(x, dim=-1)) * gt).sum() / 4  # 对样本求平均
    print(loss)

输出：

反向传播

$\begin{align} \frac{\partial L}{\partial x_k}&=\sum_{i=1}^{C}\frac{\partial L}{\partial \hat{y}_i} \frac{\partial \hat{y}_i}{\partial x_k}& \\ &=-\sum_{i=1}^C \frac{\partial}{\partial \hat{y}_i} (y_iln\hat{y}_i)\cdot\frac{\partial \hat{y}_i}{\partial x_k}& \\ &=-\sum_{i=1}^C \frac{y_i}{\hat{y}_i}\cdot \frac{\partial \hat{y}_i}{\partial x_k}&\\ &=-\frac{y_k}{\hat y_k} \hat y_k(1-\hat y_k) + \sum_{i=1, i\ne k}^C\frac{y_i}{\hat y_i}\hat y_i \hat y_k& \\ &=-y_k + y_k\hat y_k +\sum_{i=1, i\ne k}^Cy_i\hat y_k &\\ &=-y_k+\sum_{i=1}^Cy_i\hat y_k &\\ &=-y_k+\hat y_k\sum_{i=1}^Cy_i &\\ &=\hat y_k - y_k \end{align}$
所以：

pytorch实现

    seed_torch(1234)
    x=torch.rand(4,7)  # 4个样本，共7类
    x1=x.clone()  # 4个样本，共7类
    x2=x.clone()  # 4个样本，共7类
    x1.requires_grad=True
    x2.requires_grad=True
    y=torch.LongTensor([1,3,5,0]) # 对应的标签
    criterion = torch.nn.CrossEntropyLoss()
    out = criterion(x1,y)
    out.backward()
    print('pytorch库 loss:', out, 'grad:', x1.grad)

    gt = torch.zeros(4,7).scatter(1, y.view(4,1),1)
    loss = -(torch.log(softmax(x2, dim=-1)) * gt).sum() / 4  # 对样本求平均
    loss.backward()
    print('手动实现 loss:', loss, 'grad:', x2.grad)

    eta = (torch.softmax(x, dim=-1) - gt) / 4
    print('直接计算 grad:', eta)

输出：

交叉熵反向传播推导及pytorch实现