对于多类问题,类别标签 y ∈ { 1 , 2 , ⋯ , C } y\in\{1,2,\cdots,C\} y∈{1,2,⋯,C}可以有C个取值,给定一个样本 x \bf x x,Softmax回归预测的属于类别c的条件概率为 p ( y = c ∣ x ) = s o f t m a x ( w c T x ) = e w c T x ∑ c ′ = 1 C e w c ′ T x p(y=c|{\bf x})={\rm softmax}({\bf w}_c^{\rm T}{\bf x}) \\ =\frac{e^{{\bf w}_c^{\rm T}{\bf x}}}{\sum_{c^{'}=1}^Ce^{{\bf w}_{c^{'}}^{\rm T}{\bf x}}} p(y=c∣x)=softmax(wcTx)=∑c′=1Cewc′TxewcTx其中, w c {\bf w}_c wc是第c类的权重向量
Softmax回归的决策函数可以表示为 y ^ = a r g m a x c = 1 C p ( y = c ∣ x ) = a r g m a x c = 1 C w c T x \hat{y}=\overset{C}{\underset{c=1}{{\rm argmax}}}p(y=c|{\bf x})\\=\overset{C}{\underset{c=1}{{\rm argmax}}}{\bf w}_c^{\rm T}{\bf x} y^=c=1argmaxCp(y=c∣x)=c=1argmaxCwcTx
可以看出当C=2时,就是我们之前讨论过的Logistic回归
仍然可以用梯度下降法来完成参数优化
给定N个训练样本 { ( x ( n ) , y ( n ) } n = 1 N \{({\bf x}^{(n)},y^{(n)}\}_{n=1}^N {(x(n),y(n)}n=1N,使用交叉熵损失函数来完成参数矩阵 W \bf W W的优化。为了方便起见,用C维的one-hot向量 y ∈ { 0 , 1 } C {\bf y}\in\{0,1\}^C y∈{0,1}C来表示类别标签
风险函数为 R ( W ) = − 1 N ∑ n = 1 N ( y ( n ) ) T log y ^ ( n ) \mathcal R({\bf W})=-\frac{1}{N}\sum_{n=1}^N({\bf y}^{(n)})^{\rm T}\log{\hat {\bf y}}^{(n)} R(W)=−N1n=1∑N(y(n))Tlogy^(n)其中 y ^ ( n ) {\hat {\bf y}}^{(n)} y^(n)为样本在每个类别的后验概率
风险函数关于 W \bf W W的梯度为 ∂ R ( W ) ∂ W = − 1 N ∑ n = 1 N x ( n ) ( y ( n ) − y ^ ( n ) ) T \frac{\partial {\mathcal R}({\bf W})}{\partial {\bf W}}=-\frac{1}{N}\sum_{n=1}^N{\bf x}^{(n)}({\bf y}^{(n)}-{\hat {\bf y}}^{(n)})^{\rm T} ∂W∂R(W)=−N1n=1∑Nx(n)(y(n)−y^(n))T
从而可以采用梯度下降法完成训练
我们下面实现利用softmax回归进行多分类
通过程序生成一个三类样本、两组特征的数据集,如图
样本均值分别为(2.5,-2.5),(0,5),(-5,-5)
import numpy as np
from makedata import MakeData#生成数据,见上“生成数据集”链接
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
def softmax(z):
# 计算softmax函数
e_z = np.exp(z - np.max(z))
return e_z / np.sum(e_z)
def one_hot_encode(y, num_classes):
# 将标签转换为one-hot编码
num_samples = y.shape[0]
one_hot = np.zeros((num_samples, num_classes))
one_hot[np.arange(num_samples), y] = 1
return one_hot
class SoftmaxRegression:
def __init__(self, num_classes, num_features):
self.num_classes = num_classes
self.num_features = num_features
self.w = np.zeros((num_features, num_classes))
def train(self, X, y, learning_rate=0.01, num_iterations=100):
num_samples = X.shape[0]
y_enc = one_hot_encode(y, self.num_classes)
for i in range(num_iterations):
# 前向传播
scores = np.dot(X, self.w)
prob = softmax(scores)
# 反向传播
gradient = (1 / num_samples) * np.dot(X.T, (prob - y_enc))
# 权重更新
self.w -= learning_rate * gradient
def predict(self, X):
scores = np.dot(X, self.w)
prob = softmax(scores)
return np.argmax(prob, axis=1)
if __name__ == '__main__':
# 创建一个softmax回归模型,假设有3类和2个特征
model = SoftmaxRegression(num_classes=3, num_features=2)
M = [[2.5,-2.5],[0,5],[-5,-5]]
data = MakeData(3, 2, 500, M)
X,y = data.produce_data()
y = y.astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# 训练模型
model.train(X_train, y_train)
# 使用训练好的模型进行预测
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy) # 输出测试集准确率
输出结果
可见分类效果还是很好的