安装anaconda
wget https://repo.anaconda.com/archive/Anaconda3-2022.05-Linux-x86_64.sh
chmod +x Anaconda3-2022.05-Linux-x86_64.sh
./Anaconda3-2022.05-Linux-x86_64.sh
yes
重新加载
source ~/.bashrc
conda create -n jqxx python=3.7
完成后
sudo su
进入bash环境
再输入
conda activate jqxx
进入
更换pip镜像源
pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
然后下载绘图库
pip install matplotlib
下载pytorch开源机器学习框架
conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
yes
首先输入: python 然后在输入:import torch
ctrl+D退出
下载数据集
conda install -c anaconda scikit-learn
更新
conda update scikit-learn
ctrl+D退出
环境回到bash
这里我直接下载pycharm的免费版在ubuntu的应用商店里下载
书上的p18页代码
原代码如下
from sklearn.datasets import load_breast_cancer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=load_breast_cancer()
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
train.score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))
用python在jqxx环境运行后报错
查找资料发现
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)
这个警告是使用sklearn中的LogisticRegression出现的LogisticRegression里有一个max_iter(最大迭代次数)可以设置
————————————————
版权声明:本文为CSDN博主「龙晨天」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_30759585/article/details/106119180
所以这里我们修改原代码第6行
model=LogisticRegression()
为
model=LogisticRegression(max_iter=3000)
再次运行代码
错误减少了说明修改正确
仔细查看代码发现写错了错把下划线写成点号修改第8行 为
train_score=model.score(X_train,y_train)
得出结果
找到你的项目所在目录,创建一个mldata文件夹
然后下载mnist数据集的mat文件放在该目录
用度盘下载(安装度盘时可以用deb的安装包用代码
sudo dpkg -i 文件名
安装)到文件夹再移动到你的py文件所在目录
再用代码测试
from sklearn.datasets import fetch_mldata
mnist = fetch_mldata('MNIST original')
mnist
报错
查找资料发现(原网址(1条消息) fetch_mldata报错ImportError: cannot import name ‘fetch_mldata‘ from ‘sklearn.datasets‘_鸾镜朱颜暗换的博客-CSDN博客
)
代码中不再适用fetch_mldata()
,将之替换为fetch_openml()
。
创建新文件写入代码
from sklearn.datasets import fetch_openml
dataset = fetch_openml("mnist_784")
(如果有
报错就
pip install pandas
再测试)
没有报错
所以修改代码为
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression(max_iter=3000)
model.fit(X_train,y_train)
train_score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))
进行测试
需要等一会不要急
得出结果
去掉代码里的max限制
代码为
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
train_score=model.score(X_train,y_train)
test_score=model.score(X_test,y_test)
print('train score:{train_score:.6f};test score:{test_score:.6f}'.format(train_score=train_score,test_score=test_score))
得出结果
加入模型评估算法测试并修改测试
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accury_score_value=accuracy_score(y_test,y_pred,average='macro')
recall_score_value=recall_score(y_test,y_pred,average='macro')
precision_score_value=precision_score(y_test,y_pred,average='macro')
classification_report_value=classification_report(y_test,y_pred,average='macro')
print("acc:",accuracy_score_value)
print("rec:",recall_score_value)
print("pre:",precision_score_value)
print(classification_report_value)
报错
查找资料我发现需要添加一个average的值(资料链接(1条消息) 【ValueError: Target is multiclass but average=‘binary‘. Please choose another average setting, one 】_有情怀的机械男的博客-CSDN博客
)于是更改recall为
recall_score_value=recall_score(y_test,y_pred,average='macro')
再次测试
发现问题
再用同样的方法修改测试后发现报错
拼写错误
修改最终成品代码为
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
cancer=fetch_openml("mnist_784")
X_train,X_test,y_train,y_test = train_test_split(cancer.data,cancer.target,test_size=0.2)
model=LogisticRegression()
model.fit(X_train,y_train)
y_pred=model.predict(X_test)
accuracy_score_value=accuracy_score(y_test,y_pred)
recall_score_value=recall_score(y_test,y_pred,average='macro')
precision_score_value=precision_score(y_test,y_pred,average='macro')
classification_report_value=classification_report(y_test,y_pred)
print("acc:",accuracy_score_value)
print("rec:",recall_score_value)
print("pre:",precision_score_value)
print(classification_report_value)
运算结果为