逻辑回归(logistic regression)又称“对数几率回归。虽然它的名字是回归,但却是一种分类学习方法。逻辑回归也可以从二元分类扩展到多分类,这就是多项式回归。
1.构造预测函数h(x)
对数几率函数是一种“Sigmoid"函数,Sigmoid的函数输出是介于(0,1)之间,中间值是0.5。sig(t)<0.5则说明当前数据属于反类/0类;sig(t)>0.5则说明当前数据属于正类/1类。所以可以将sigmoid函数看成样本数据的概率密度函数。
2.构造损失函数
用最大似然法估计参数,优点:大样本数据中参数的估计稳定,偏差小,估计方差小。
概率函数:
因为样本数据(m个)独立,取似然函数为:
取对数似然函数:
基于最大似然估计推导得到Cost函数和J函数:
3.梯度下降求最小值
模型的改进
避免过拟合---正则化
让准确率最大化---SVM
Kernal Logistics Regression
请参考别人写的文章https://blog.csdn.net/qq_34993631/article/details/79345889
配视频
https://www.youtube.com/watch?v=AbaIkcQUQuo
L1,L2正则化比较
L2是收缩,L2稀疏性
L2比L1要快
建议用逻辑回归是至少用一个正则化,特征要标准化。
线性回归与逻辑回归的比较
逻辑回归与朴素贝叶斯的比较
因为朴素贝叶斯对数据做出了更强的假设,但需要更少的示例来估计参数
例子
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve,roc_auc_score,accuracy_score,confusion_matrix
from sklearn.linear_model import LogisticRegression
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
'admitted': [1,1,1,1,1,1,0,1,1,0,0,1,1,1,1,0,0,1,0,0,0,0,0,0,0,1,1,0,1,1,0,0,1,1,1,0,0,0,0,1]
}
df = pd.DataFrame(candidates,columns= ['gmat', 'gpa','work_experience','admitted'])
X = df[['gmat', 'gpa','work_experience']]
y = df['admitted']
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0) #train is based on 75% of the dataset, test is based on 25% of dataset
logistic_regression= LogisticRegression() #逻辑回归
logistic_regression.fit(X_train,y_train) #训练
y_pred=logistic_regression.predict(X_test) #预测
print (X_test) #test dataset
print (y_pred) #predicted values
print(confusion_matrix(y_test, y_pred))
print("Accuracy:",accuracy_score(y_test, y_pred))
#Plot ROC curve
y_pred_proba = logistic_regression.predict_proba(X_test)[::,1]
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
auc = roc_auc_score(y_test, y_pred_proba)
plt.plot(fpr,tpr,label="data 1, auc="+str(auc))
plt.legend(loc=4)
plt.show()
ROC曲线的纵轴是“真正例率”,横轴是“假正例率”。