手写一个神经网络(不用Pytorch模块)来实现MINIST手写数据集

文章目录

  • 概览
  • 思路
    • Softmax
    • Softmax's Derivate
    • CrossEntropyLoss
    • CrossEntropyLoss's Derivate
    • CEwithLogitLoss
    • CEwithLogitLoss's Derivate
  • 完整代码
  • 成果展示

概览

这是我们老师之前上课布置的一个小作业,现分享出来,纪念一下。因为当时是用paddle实现的,众所周知,paddle~pytorch,所以想用torch运行的伙伴,把paddle部分改成torch就行。不过比较狗血的是,像损失函数那些,torch中都内置了相关的函数,比如BCEloss,Softmax之类的,我们老师要我们自己写出来。。。还有反向传播的部分也是。

思路

Softmax

Softmax主要还是注意其反向传播过程。

 def value(self, x: np.ndarray) -> np.ndarray:
        n, k = x.shape
        beta = x.max(axis = 1).reshape((n, 1))
        tmp = np.exp(x - beta)
        numer = np.sum(tmp, axis = 1, keepdims = True)
        val = tmp / numer
        return val

Softmax’s Derivate

def derivative(self, x: np.ndarray) -> np.ndarray:
        n, k = x.shape
        D = np.zeros((k, k, n))
        for i in range(n):
            tmp = x[i:i+1, :]
            val = self.value(x)
            D[:,:,i] = np.diag(val.reshape(-1)) - val.T.dot(val)

CrossEntropyLoss

    def value(self, yhat: np.ndarray, y: np.ndarray) -> float:
        yhat = np.clip(yhat, 0.0001, 0.9999) #同逻辑回归的截断操作
        los = -np.mean(np.multiply(np.log(yhat), y) + np.multiply(np.log(1 - yhat), (1 - y)))
        return los

CrossEntropyLoss’s Derivate

    def derivative(self, yhat: np.ndarray, y: np.ndarray) -> np.ndarray:
        der = (yhat - y) / (yhat * (1 - yhat))
        return der

CEwithLogitLoss

这个与普通的BCE不太一样,是将经过softmax后的yhat输入进去。

    def value(self, logits: np.ndarray, y: np.ndarray) -> float:
        n, k = y.shape
        beta = logits.max(axis = 1).reshape((n, 1))
        tmp = logits - beta
        tmp = np.exp(tmp)
        tmp = np.sum(tmp, axis = 1)
        tmp = np.log(tmp+1.0e-40)
        los = -np.sum(y*logits) + np.sum(beta) + np.sum(tmp)
        los = los / n
        return los

CEwithLogitLoss’s Derivate

    def derivative(self, logits: np.ndarray, y: np.ndarray) -> np.ndarray:
        n, k = y.shape
        beta = logits.max(axis = 1).reshape((n, 1))
        tmp = logits - beta
        tmp = np.exp(tmp)
        numer = np.sum(tmp, axis = 1, keepdims = True)
        yhat = tmp/numer
        der = (yhat - y) / n
        return der

弄清楚以上几部分就差不多了。

完整代码

import numpy as np
import struct
import matplotlib.pyplot as plt
import os
from PIL import Image
from sklearn.utils import gen_batches
from sklearn.metrics import classification_report, confusion_matrix
from typing import *
from numpy.linalg import *


train_image_file = './data/data136845/train-images-idx3-ubyte'
train_label_file = './data/data136845/train-labels-idx1-ubyte'
test_image_file = './data/data136845/t10k-images-idx3-ubyte'
test_label_file = './data/data136845/t10k-labels-idx1-ubyte'


def decode_image(path):
    with open(path, 'rb') as f:
        magic, num, rows, cols = struct.unpack('>IIII', f.read(16))
        images = np.fromfile(f, dtype=np.uint8).reshape(-1, 784)
        images = np.array(images, dtype = float)
    return images

def decode_label(path):
    with open(path, 'rb') as f:
        magic, n = struct.unpack('>II',f.read(8))
        labels = np.fromfile(f, dtype=np.uint8)
        labels = np.array(labels, dtype = float)
    return labels

def load_data():
    train_X = decode_image(train_image_file)
    train_Y = decode_label(train_label_file)
    test_X = decode_image(test_image_file)
    test_Y = decode_label(test_label_file)
    return (train_X, train_Y, test_X, test_Y)
trainX, trainY, testX, testY = load_data()

num_train, num_feature = trainX.shape
plt.figure(1, figsize=(20,10))
for i in range(8):
    idx = np.random.choice(range(num_train))
    plt.subplot(int('24'+str(i+1)))
    plt.imshow(trainX[idx,:].reshape((28,28)))
    plt.title('label is %d'%trainY[idx])
plt.show()

# normalize input value between 0 and 1.
trainX, testX = trainX/255, testX/255

# convert all scaler labels to one-hot vectors.
def to_onehot(y):
    y = y.astype(int)
    num_class = len(set(y))
    Y = np.eye((num_class))
    return Y[y]

trainY = to_onehot(trainY)
testY = to_onehot(testY)
num_train, num_feature = trainX.shape
num_test, _ = testX.shape
_, num_class = trainY.shape
print('number of features is %d'%num_feature)
print('number of classes is %d'%num_class)
print('number of training samples is %d'%num_train)
print('number of testing samples is %d'%num_test)
from abc import ABC, abstractmethod, abstractproperty

class Activation(ABC):
    '''
    An abstract class that implements an activation function
    '''
    @abstractmethod
    def value(self, x: np.ndarray) -> np.ndarray:
        '''
        Value of the activation function when input is x.
        Parameters:
          x is an input to the activation function.
        Returns: 
          Value of the activation function. The shape of the return is the same as that of x.
        '''
        return x
    @abstractmethod
    def derivative(self, x: np.ndarray) -> np.ndarray:
        '''
        Derivative of the activation function with input x.
        Parameters:
          x is the input to activation function
        Returns: 
          Derivative of the activation function w.r.t x.
        '''
        return x

class Identity(Activation):
    '''
    Identity activation function. Input and output are identical. 
    '''

    def __init__(self):
        super(Identity, self).__init__()

    def value(self, x: np.ndarray) -> np.ndarray:
        return x
    
    def derivative(self, x: np.ndarray) -> np.ndarray:
        n, m = x.shape
        return np.ones((n, m))
    

class Sigmoid(Activation):
    '''
    Sigmoid activation function y = 1/(1 + e^(x*k)), where k is the parameter of the sigmoid function 
    '''

    def __init__(self, k: float = 1.):
        '''
        Parameters:
          k is the parameter of the sigmoid function.
        '''
        self.k = k
        super(Sigmoid, self).__init__()

    def value(self, x: np.ndarray) -> np.ndarray:
        '''
        Parameters:
          x is a two dimensional numpy array.
        Returns:
          element-wise sigmoid value of the two dimensional array.
        '''
        '''
        #### YOUR CODE BELOW ####
        '''
        val = 1 / (1 + np.exp(np.negative(x * self.k)))
        return val

    def derivative(self, x: np.ndarray) -> np.ndarray:
        '''
        Parameters:
          x is a two dimensional array.
        Returns:
          a two dimensional array whose shape is the same as that of x. The returned value is the elementwise 
          derivative of the sigmoid function w.r.t. x.
        '''
        '''
        #### YOUR CODE BELOW ####
        '''
        val = 1 / (1 + np.exp(np.negative(x * self.k)))
        der = val * (1 - val)
        return der
    
class ReLU(Activation):
    '''
    Rectified linear unit activation function
    '''

    def __init__(self):
        super(ReLU, self).__init__()

    def value(self, x: np.ndarray) -> np.ndarray:
        '''
        #### YOUR CODE BELOW ####
        '''
        val = x*(x>=0)
        return val

    def derivative(self, x: np.ndarray) -> np.ndarray:
        '''
        The derivative of the ReLU function w.r.t. x. Set the derivative to 0 at x=0.
        Parameters:
          x is the input to ReLU function
        Returns:
          elementwise derivative of ReLU. The shape of the returned value is the same as that of x.
        '''
        '''
        #### YOUR CODE BELOW ####
        '''
        der = np.ones(x.shape)*(x>=0)
        return der


class Softmax(Activation):
    '''
    softmax nonlinear function.
    '''

    def __init__(self):
        '''
        There are no parameters in softmax function.
        '''
        super(Softmax, self).__init__()

    def value(self, x: np.ndarray) -> np.ndarray:
        '''
        Parameters:
          x is the input to the softmax function. x is a two dimensional numpy array. Each row is the input to the softmax function
        Returns:
          output of the softmax function. The returned value is with the same shape as that of x.
        '''
        '''
        #### YOUR CODE BELOW ####
        '''
        n, k = x.shape
        beta = x.max(axis = 1).reshape((n, 1))
        tmp = np.exp(x - beta)
        numer = np.sum(tmp, axis = 1, keepdims = True)
        val = tmp / numer
        return val

    def derivative(self, x: np.ndarray) -> np.ndarray:
        '''
        Parameters:
          x is the input to the softmax function. x is a two dimensional numpy array.
        Returns:
          a two dimensional array representing the derivative of softmax function w.r.t. x.
        '''
        n, k = x.shape
        D = np.zeros((k, k, n))
        for i in range(n):
            tmp = x[i:i+1, :]
            val = self.value(x)
            D[:,:,i] = np.diag(val.reshape(-1)) - val.T.dot(val)

##################################################################################################################
# LOSS FUNCTIONS
##################################################################################################################

class Loss(ABC):
    '''
    Abstract class for a loss function
    '''
    @abstractmethod
    def value(self, yhat: np.ndarray, y: np.ndarray) -> float:
        '''
        Value of the empirical loss function.
        Parameters:
          y_hat is the output of a neural network. The shape of y_hat is (n, k).
          y contains true labels with shape (n, k).
        Returns:
          value of the empirical loss function.
        '''
        return 0

    @abstractmethod
    def derivative(self, yhat: np.ndarray, y: np.ndarray) -> np.ndarray:
        '''
        Derivative of the empirical loss function with respect to the predictions.
        Parameters:
          
        Returns:
          The derivative of the loss function w.r.t. y_hat. The returned value is a two dimensional array with 
          shape (n, k)
        '''
        return yhat

class CrossEntropy(Loss):
    '''
    Cross entropy loss function
    '''

    def value(self, yhat: np.ndarray, y: np.ndarray) -> float:
        '''
        #### YOUR CODE BELOW ####
        '''
        #m = yhat.shape[0]
        yhat = np.clip(yhat, 0.0001, 0.9999) #同逻辑回归的截断操作
       # los = -np.sum(y * np.log(yhat)) 
        los = -np.mean(np.multiply(np.log(yhat), y) + np.multiply(np.log(1 - yhat), (1 - y)))
        return los

    def derivative(self, yhat: np.ndarray, y: np.ndarray) -> np.ndarray:
        '''
        #### YOUR CODE HERE ####
        '''

        der = (yhat - y) / (yhat * (1 - yhat))
        return der

class CEwithLogit(Loss):
    '''
    Cross entropy loss function with logits (input of softmax activation function) and true labels as inputs.
    '''
    def value(self, logits: np.ndarray, y: np.ndarray) -> float:
        '''
        #### YOUR CODE BELOW ####
        '''
        #m = y.shape[0]
        #logits = np.clip(logits, 0.0001, 0.9999) #同逻辑回归的截断操作
        #los = (-1/m) * np.sum(y * logits) 
        #los = np.sum(y * logits)- np.log(np.sum(np.exp(logits)))
        #delta = 1e-7
        #los = np.sum(-np.log(logits + delta) * y - np.log(1 - logits + delta) * (1 - y)) / m
        #los = -np.sum(y * np.log(logits) + (1 - y) * np.log(1 - logits)) / m
        n, k = y.shape
        beta = logits.max(axis = 1).reshape((n, 1))
        tmp = logits - beta
        tmp = np.exp(tmp)
        tmp = np.sum(tmp, axis = 1)
        tmp = np.log(tmp+1.0e-40)
        los = -np.sum(y*logits) + np.sum(beta) + np.sum(tmp)
        los = los / n
        return los


    def derivative(self, logits: np.ndarray, y: np.ndarray) -> np.ndarray:
        '''
        #### YOUR CODE BELOW ####
        '''
        #logits = logits - np.max(logits, axis=1, keepdims=True)
        #logits_sum = np.sum(np.exp(logits), axis=1, keepdims=True)
        #print("np.exp(logits)的数值:\n",np.exp(logits))
        #print("logits_sum的数值:\n",logits_sum)
        #val = np.exp(logits) / logits_sum
        #val = np.divide(val, logits_sum, out = np.zeros_like(val), where = logits_sum != 0)
        #val = np.exp(logits) / np.sum(np.exp(logits))
        #logits = np.clip(logits, 0.0001, 0.9999) #同逻辑回归的截断操作
        #val = np.exp(logits) / np.sum(np.exp(logits))
        #der = y - val
        n, k = y.shape
        beta = logits.max(axis = 1).reshape((n, 1))
        tmp = logits - beta
        tmp = np.exp(tmp)
        numer = np.sum(tmp, axis = 1, keepdims = True)
        yhat = tmp/numer
        der = (yhat - y) / n
        return der
In [12]
##################################################################################################################
# METRICS
##################################################################################################################

def accuracy(y_hat: np.ndarray, y: np.ndarray) -> float:
    '''
    Accuracy of predictions, given the true labels.
    Parameters:
      y_hat is a two dimensional array. Each row is a softmax output.
      y is a two dimensional array. Each row is a one-hot vector.
    Returns:
      accuracy which is a float number
    '''
    '''
    #### YOUR CODE HERE ####
    '''
    #max_index = np.argmax(y_hat, axis=1) #找出每行的最大索引
    #y_hat[np.arange(y_hat.shape[0]), max_index] = 1 #直接令其为1
    #acc = np.sum(np.argmax(y_hat, axis=1) == np.argmax(y, axis=1)) #统计个数
    #acc = acc * 1.0 / y.shape[0] #求准确率
    n = y.shape[0]
    acc = np.sum(np.argmax(y_hat, axis = 1) == np.argmax(y, axis = 1)) / n
    return acc

# the following code implements a three layer neural network, namely input layer, hidden layer and output layer
digits = 10 # number of classes
_, n_x = trainX.shape
n_h = 64 # number of nodes in the hidden layer
learning_rate = 0.0001

sigmoid = ReLU() # activation function in the hidden layer
softmax = Softmax() # nonlinear function in the output layer
loss = CEwithLogit() # loss function
epoches = 2000 

# initialization of W1, b1, W2, b2
W1 = np.random.randn(n_x, n_h)
b1 = np.random.randn(1, n_h)
W2 = np.random.randn(n_h, digits)
b2 = np.random.randn(1, digits)

# training procedure
for epoch in range(epoches):
    Z1 = np.dot(trainX, W1) + b1
    A1 = sigmoid.value(Z1)
    Z2 = np.dot(A1, W2) + b2
    A2 = softmax.value(Z2)
    cost = loss.value(Z2, trainY)

    dZ2 = loss.derivative(Z2, trainY)
    dW2 = np.dot(A1.T, dZ2)
    db2 = np.sum(dZ2, axis = 0, keepdims=True)

    dA1 = np.dot(dZ2, W2.T)
    dZ1 = dA1 * sigmoid.derivative(Z1) 
    dW1 = np.dot(trainX.T, dZ1)
    db1 = np.sum(dZ1, axis = 0, keepdims=True)

    W2 = W2 - learning_rate * dW2
    b2 = b2 - learning_rate * db2
    W1 = W1 - learning_rate * dW1
    b1 = b1 - learning_rate * db1


    if (epoch % 100 == 0):
        print("Epoch", epoch, "cost: ", cost)

print("Final cost:", cost)

# testing procedure
Z1 = np.dot(testX, W1) + b1
A1 = sigmoid.value(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = softmax.value(Z2)

predictions = np.argmax(A2, axis = 1)
labels = np.argmax(testY, axis = 1)

print(confusion_matrix(predictions, labels))
print(classification_report(predictions, labels))

# design a neural network class

class NeuralNetwork():
    '''
    Fully connected neural network.
    Attributes:
      n_layers is the number of layers.
      activation is a list of Activation objects corresponding to each layer's activation function.
      loss is a Loss object corresponding to the loss function used to train the network.
      learning_rate is the learning rate.
      W is a list of weight matrix used in each layer.
      b is a list of biases used in each layer.
    '''

    def __init__(self, layer_size: List[int], activation: List[Activation], loss: Loss, learning_rate: float = 0.01) -> None:
        '''
        Initializes a NeuralNetwork object
        '''
        assert len(activation) == len(layer_size), \
        "Number of sizes for layers provided does not equal the number of activation"
        self.layer_size = layer_size
        self.num_layer = len(layer_size)
        self.activation = activation
        self.loss = loss
        self.learning_rate = learning_rate
        self.W = []
        self.b = []
        for i in range(self.num_layer-1):
            W = np.random.randn(layer_size[i], layer_size[i+1]) #/ np.sqrt(layer_size[i])
            b = np.random.randn(1, layer_size[i+1])
            self.W.append(W)
            self.b.append(b)
        self.A = []
        self.Z = []

    def forward(self, X: np.ndarray) -> (List[np.ndarray], List[np.ndarray]):
        '''
        Forward pass of the network on a dataset of n examples with m features. Except the first layer, each layer
        computes linear transformation plus a bias followed by a nonlinear transformation.
        Parameters:
          X is the training data with shape (n, m).
        Returns:
          A is a list of numpy data, representing the output of each layer after the first layer. There are 
            self.num_layer numpy arrays in the list and each array is of shape (n, self.layer_size[i]).
          Z is a list of numpy data, representing the input of each layer after the first layer. There are
            self.num_layer numpy arrays in the list and each array is of shape (n, self.layer_size[i]).
        '''
        num_sample = X.shape[0]
        A, Z = [], []
        for i in range(self.num_layer):
            if i == 0:
                a = X.copy()
                z = X.copy()
            else:
                a = A[-1]
                z = a.dot(self.W[i-1]) + self.b[i-1]
                a = self.activation[i].value(z)
            Z.append(z)
            A.append(a)
        self.A = A
        self.Z = Z
        return Z[-1], A[-1]

    def backward(self, dLdyhat) -> List[np.ndarray]:
        '''
        Backward pass of the network on a dataset of n examples with m features. The derivatives are computed from 
          the end of the network to the front.
        Parameters:
          Z is a list of numpy data, representing the input of each layer. There are self.num_layer numpy arrays in 
            the list and each array is of shape (n, self.layer_size[i]).
          dLdyhat is the derivative of the empirical loss w.r.t. yhat which is the output of the neural network.
            dLdyhat is with shape (n, self.layer_size[-1])
        Returns:
          dZ is a list of numpy array. Each numpy array in dZ represents the derivative of the emipirical loss function
            w.r.t. the input of that specific layer. There are self.n_layer arrays in the list and each array is of 
            shape (n, self.layer_size[i])
        '''
        dZ = []
        for i in range(self.num_layer-1, -1, -1):
            if i == self.num_layer - 1:
                dLdz = dLdyhat
            else:
                dLda = np.dot(dLdz, self.W[i].T)
                dLdz = self.activation[i].derivative(self.Z[i])*dLda# derivative w.r.t. net input for layer i
            dZ.append(dLdz)
        dZ = list(reversed(dZ))
        self.dLdZ = dZ
        return

    def update_weights(self) -> List[np.ndarray]:
        '''
        Having computed the delta values from the backward pass, update each weight with the sum over the training
        examples of the gradient of the loss with respect to the weight.
        :param X: The training set, with size (n, f)
        :param Z_vals: a list of z-values for each example in the dataset. There are n_layers items in the list and
                       each item is an array of size (n, layer_sizes[i])
        :param deltas: A list of delta values for each layer. There are n_layers items in the list and
                       each item is an array of size (n, layer_sizes[i])
        :return W: The newly updated weights (i.e. self.W)
        '''
        for i in range(self.num_layer-1):
            a = self.A[i]
            dW = np.dot(a.T, self.dLdZ[i+1])
            db = np.sum(self.dLdZ[i+1], axis = 0, keepdims = True)
            self.W[i] -= self.learning_rate*dW
            self.b[i] -= self.learning_rate*db
        return
    
    def one_epoch(self, X: np.ndarray,  Y: np.ndarray, batch_size: int, train: bool = True)-> (float, float):
        '''
        One epoch of either training or testing procedure.
        Parameters:
          X is the data input. X is a two dimensional numpy array.
          Y is the data label. Y is a one dimensional numpy array.
          batch_size is the number of samples in each batch.
          train is a boolean value indicating training or testing procedure.
        Returns:
          loss_value is the average loss function value.
          acc_value is the prediction accuracy. 
        '''
        n = X.shape[0]
        slices = list(gen_batches(n, batch_size))
        num_batch = len(slices)
        idx = list(range(n))
        np.random.shuffle(idx)
        loss_value, acc_value = 0, 0
        for i, index in enumerate(slices):
            index = idx[slices[i]]
            x, y = X[index,:], Y[index]
            z, yhat = model.forward(x)   # Execute forward pass
            if train:
                dLdz = self.loss.derivative(z, y)         # Calculate derivative of the loss with respect to out
                self.backward(dLdz)     # Execute the backward pass to compute the deltas
                self.update_weights()  # Calculate the gradients and update the weights
            loss_value += self.loss.value(z, y)*x.shape[0]
            acc_value += accuracy(yhat, y)*x.shape[0]
        loss_value = loss_value/n
        acc_value = acc_value/n
        return loss_value, acc_value
def train(model : NeuralNetwork, X: np.ndarray, Y: np.ndarray, batch_size: int, epoches: int) -> (List[np.ndarray], List[float]):
    '''
    trains the neural network.
    Parameters:
      model is a NeuralNetwork object.
      X is the data input. X is a two dimensional numpy array.
      Y is the data label. Y is a one dimensional numpy array.
      batch_size is the number of samples in each batch.
      epoches is an integer, representing the number of epoches.
    Returns:
      epoch_loss is a list of float numbers, representing loss function value in all epoches.
      epoch_acc is a list of float numbers, representing the accuracies in all epoches.
    '''
    loss_value, acc = model.one_epoch(X, Y, batch_size, train = False)
    epoch_loss, epoch_acc = [loss_value], [acc]
    print('Initialization: ', 'loss %.4f  '%loss_value, 'accuracy %.2f'%acc)
    for epoch in range(epoches):
        if epoch%100 == 0 and epoch > 0: # decrease the learning rate
            model.learning_rate = min(model.learning_rate/10, 1.0e-5)
        loss_value, acc = model.one_epoch(X, Y, batch_size, train = True)
        if epoch%10 == 0:
            print("Epoch {}/{}: Loss={}, Accuracy={}".format(epoch, epoches, loss_value, acc))
        epoch_loss.append(loss_value)
        epoch_acc.append(acc)
    return epoch_loss, epoch_acc

# training procedure
num_sample, num_feature = trainX.shape
epoches = 200
batch_size = 512
Loss = []
Acc = []
learning_rate = 1/num_sample*batch_size
np.random.seed(2022)
model = NeuralNetwork([784, 256, 64, 10], [Identity(), ReLU(), ReLU(), Softmax()], CEwithLogit(), learning_rate = learning_rate)
epoch_loss, epoch_acc = train(model, trainX, trainY, batch_size, epoches)

# testing procedure
test_loss, test_acc = model.one_epoch(testX, testY, batch_size, train = False)
z, yhat = model.forward(testX)
yhat = np.argmax(yhat, axis = 1)
y = np.argmax(testY, axis = 1)
print(yhat.shape, y.shape)
print(confusion_matrix(yhat, y))
print(classification_report(yhat, y))

完整代码中有两部分,一部分是纯手写的,另一部分是用神经网络写的。

成果展示

Initialization:  loss 786.6578   accuracy 0.09
Epoch 0/200: Loss=67.51685728447026, Accuracy=0.6034333333333334
Epoch 10/200: Loss=1.5569211739656414, Accuracy=0.6363833333333333
Epoch 20/200: Loss=1.195973422899852, Accuracy=0.6734166666666667
Epoch 30/200: Loss=1.0514650120496702, Accuracy=0.7037666666666667
Epoch 40/200: Loss=0.9529855833743788, Accuracy=0.7264666666666667
Epoch 50/200: Loss=0.9016336932334414, Accuracy=0.7411833333333333
Epoch 60/200: Loss=0.8326483342395156, Accuracy=0.7559
Epoch 70/200: Loss=0.78866026237516, Accuracy=0.7687
Epoch 80/200: Loss=0.7709660201411853, Accuracy=0.77445
Epoch 90/200: Loss=0.726955942954085, Accuracy=0.7852333333333333
Epoch 100/200: Loss=0.8591143145912691, Accuracy=0.7367
Epoch 110/200: Loss=0.5961169773508302, Accuracy=0.8235333333333333
Epoch 120/200: Loss=0.5877743327835988, Accuracy=0.8259833333333333
Epoch 130/200: Loss=0.5841449699894233, Accuracy=0.8258833333333333
Epoch 140/200: Loss=0.5818215242196157, Accuracy=0.8264
Epoch 150/200: Loss=0.5801588790387723, Accuracy=0.8270833333333333
Epoch 160/200: Loss=0.5788907581218291, Accuracy=0.8273833333333334
Epoch 170/200: Loss=0.5778682345964669, Accuracy=0.8274666666666667
Epoch 180/200: Loss=0.577021751676371, Accuracy=0.82755
Epoch 190/200: Loss=0.5762952337956709, Accuracy=0.8273833333333334
(10000,) (10000,)
[[ 891    0   23   10    0   20   25    7   28   10]
 [   0 1062   11    3    4    2    3    7    5    7]
 [   7   12  848   35   20    9   21   27   27    6]
 [   2   18   16  810    7   70    1    8   31   22]
 [   3    1   19    6  757   14   18   11    8  101]
 [  22    6   13   75   10  616   19    2   53   19]
 [  23    2   20    1   35   46  849    0   19    8]
 [   7    3   21   19    6   10    1  872   14   42]
 [  19   22   45   24    6   62   14    9  756   17]
 [   6    9   16   27  137   43    7   85   33  777]]
              precision    recall  f1-score   support

           0       0.91      0.88      0.89      1014
           1       0.94      0.96      0.95      1104
           2       0.82      0.84      0.83      1012
           3       0.80      0.82      0.81       985
           4       0.77      0.81      0.79       938
           5       0.69      0.74      0.71       835
           6       0.89      0.85      0.87      1003
           7       0.85      0.88      0.86       995
           8       0.78      0.78      0.78       974
           9       0.77      0.68      0.72      1140

    accuracy                           0.82     10000
   macro avg       0.82      0.82      0.82     10000
weighted avg       0.82      0.82      0.82     10000

你可能感兴趣的:(Pytorch,神经网络,pytorch,深度学习,人工智能,机器学习)