机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))

1.环境安装

安装anaconda

wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
chmod +x Anaconda3-2022.05-Linux-x86_64.sh 
./Anaconda3-2022.05-Linux-x86_64.sh

点击回车

yes

继续点击 Enter

输入 yes(这里我写错了不是yex是yes)

机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第1张图片

 重新加载

source ~/.bashrc
conda create -n jqxx python=3.7

完成后

sudo su

进入bash环境

再输入

conda activate jqxx

 进入

 更换pip镜像源

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

然后下载绘图库

pip install matplotlib

 下载pytorch开源机器学习框架

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch

yes

测试安装成功

首先输入: python 然后在输入:import torch

机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第2张图片

 ctrl+D退出

下载数据集

conda install -c anaconda scikit-learn

更新

conda update scikit-learn

 ctrl+D退出

环境回到bash

这里我直接下载pycharm的免费版在ubuntu的应用商店里下载

下载完成后我试用《机器学习原理及应用》机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第3张图片

 书上的p18页代码

原代码如下

from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=load_breast_cancer()
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
train.score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))

用python在jqxx环境运行后报错

机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第4张图片

 查找资料发现

ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
这个警告是使用sklearn中的LogisticRegression出现的LogisticRegression里有一个max_iter(最大迭代次数)可以设置
————————————————
版权声明:本文为CSDN博主「龙晨天」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_30759585/article/details/106119180

所以这里我们修改原代码第6行

model=LogisticRegression()

model=LogisticRegression(max_iter=3000)

再次运行代码

错误减少了说明修改正确

仔细查看代码发现写错了错把下划线写成点号修改第8行 为

train_score=model.score(X_train,y_train)

得出结果

接下来我用mnist数据集来测试、

1.按照资料的老方法

找到你的项目所在目录,创建一个mldata文件夹

然后下载mnist数据集的mat文件放在该目录

用度盘下载(安装度盘时可以用deb的安装包用代码

 sudo dpkg -i 文件名

安装)到文件夹再移动到你的py文件所在目录

再用代码测试

from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist

 报错

 查找资料发现(原网址(1条消息) fetch_mldata报错ImportError: cannot import name ‘fetch_mldata‘ from ‘sklearn.datasets‘_鸾镜朱颜暗换的博客-CSDN博客

代码中不再适用fetch_mldata()将之替换为fetch_openml()

创建新文件写入代码

from sklearn.datasets import fetch_openml
dataset = fetch_openml("mnist_784")

(如果有

报错就

pip install pandas

 再测试)

没有报错

所以修改代码为

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression(max_iter=3000)
model.fit(X_train,y_train)
train_score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))

进行测试

需要等一会不要急

得出结果

机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第5张图片

 去掉代码里的max限制

代码为

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
train_score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))

得出结果

 加入模型评估算法测试并修改测试

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accury_score_value=accuracy_score(y_test,y_pred,average='macro')
recall_score_value=recall_score(y_test,y_pred,average='macro')
precision_score_value=precision_score(y_test,y_pred,average='macro')
classification_report_value=classification_report(y_test,y_pred,average='macro')
print("acc:",accuracy_score_value)
print("rec:",recall_score_value)
print("pre:",precision_score_value)
print(classification_report_value)

报错

查找资料我发现需要添加一个average的值(资料链接(1条消息) 【ValueError: Target is multiclass but average=‘binary‘. Please choose another average setting, one 】_有情怀的机械男的博客-CSDN博客

)于是更改recall为

recall_score_value=recall_score(y_test,y_pred,average='macro')

 再次测试

 发现问题

再用同样的方法修改测试后发现报错

 拼写错误

修改最终成品代码为

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accuracy_score_value=accuracy_score(y_test,y_pred)
recall_score_value=recall_score(y_test,y_pred,average='macro')
precision_score_value=precision_score(y_test,y_pred,average='macro')
classification_report_value=classification_report(y_test,y_pred)
print("acc:",accuracy_score_value)
print("rec:",recall_score_value)
print("pre:",precision_score_value)
print(classification_report_value)

运算结果为

机器学习学习记录1(环境安装,书中代码测试,基于mnist数据集的逻辑回归模型构造训练评估(acc,rec,pre))_第6张图片

你可能感兴趣的:(机器学习,学习,逻辑回归)