下载
截止发稿时cuda的最新版本为11.7.下载后安装,安装完成后检查系统环境变量PATH是否包含以下路径.
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\libnvvp
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\extras\CUPTI\lib64
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include
下载
为方便以后使用我的安装目录为E:\Anaconda3,如以后重装操作系统后只需要重新设置一下系统环境变量PATH即可,避免重复安装.
默认安装时勾选"Add Anaconda3 to the system PATH environment variable".如漏勾选或不能勾选时,可先行安装,然后将以下路么添加至环境变量PATH中
E:\Anaconda3
E:\Anaconda3\Scripts
E:\Anaconda3\Library\bin
测试python
python -V
测试conda
conda --version
显示出python和conda的版本信息,说明Anaconda安装正确.
#以管理员身份运行
conda update conda
conda update anaconda
pip install --upgrade pip
#计算机控制面板->程序与应用->卸载 //windows
#或者找到C:\ProgramData\Anaconda3\Uninstall-Anaconda3.exe执行卸载
rm -rf anaconda #ubuntu
最后,建议清理下.bashrc中的Anaconda路径。
conda env list #显示所有的虚拟环境
conda create -n kmcb python=3.8 #创建python3.8的xxxx虚拟环境
conda activate kmcb #开启xxxx环境
conda deactivate #关闭环境
conda remove -n kmcb --all #删除xxxx虚拟环境
conda list #查看已经安装的文件包
conda list -n xxx #指定查看xxx虚拟环境下安装的package
conda update xxx #更新xxx文件包
conda uninstall xxx #卸载xxx文件包
conda clean -p #删除没有用的包
conda clean -a
conda clean -t #tar打包
conda clean -y -all #删除所有的安装包及cache
pytorch安装在环境kmcb中,其中python版本为3.8.环境名称kmcb可以更新为你自己的环境名称
从https://pytorch.org/get-started/locally/获取命令脚本,截止发稿时生成脚本的条件为PyTorch 1.11.0,windows,conda,python,cuda 11.3
conda create -n kmcb python=3.8
conda activate kmcb
conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
关闭Anaconda Prompt (Anaconda3),然后重新打开
conda activate kmcb
python
然后输入python脚本
import torch
#显示是否支持cuda
print(torch.cuda.is_available())
#显示支持cuda设备的数量
print( torch.cuda.device_count())
打印内容为Ture和>=1的数字时表示cuda有效.
在Anaconda Prompt (Anaconda3)中运行
chcp 65001
conda activate kmcb
cd D:\MyWork\2022\cnn
git clone https://github.com/ultralytics/yolov5
cd yolov5
pip install -r requirements.txt
chcp 65001
conda activate kmcb
cd D:\MyWork\2022\cnn\yolov5
python train.py --img 640 --batch 16 --epochs 5 --data ./data/coco128.yaml --cfg ./models/yolov5s.yaml --weights yolov5s.pt --device 0
常用参数说明
--img 表示训练集和验证集图片宽度
--batch 表示每一批次载入的图片数量,-1表示自动.设置过大可能会导致oom
--epochs 表示训练轮次
--data 表示待训练的数据集配置文件,配置文件位于yolov5/data目录,coco128数据集会自动下载,默认安装位置../datasets
--cfg 表示训练模型的配置文件,配置文件位于yolov5/models目录
--weights 表示初始时的训练权重文件,会自动下载
--device 表示使用的gpu,必须安装配置cuda.可以设置多个gpu同时训练--device 0,1,2,3,如果要采用cpu则为--device cup
截止发稿时使用默认配置在训练完成后会报一堆异常信息,我截取了第一个,后面都差不多,如下:
Exception ignored in: <function StorageWeakRef.__del__ at 0x0000026C942FDA60>
Traceback (most recent call last):
File "E:\Anaconda3\envs\kmcb\lib\site-packages\torch\multiprocessing\reductions.py", line 36, in __del__
File "E:\Anaconda3\envs\kmcb\lib\site-packages\torch\storage.py", line 520, in _free_weak_ref
AttributeError: 'NoneType' object has no attribute '_free_weak_ref'
...
主要错误
Exception ignored in: <function StorageWeakRef.__del__ at 0x0000026C942FDA60>
AttributeError: 'NoneType' object has no attribute '_free_weak_ref'
错误详情见yolov5 issues kaplansinan的回复.
错误是由torch torchvision软件包版本过高和yolov5不匹配导致的,可适当降低,版本参考https://download.pytorch.org/whl/cu113/torch_stable.html
chcp 65001
conda activate kmcb
cd D:\MyWork\2022\cnn\yolov5
#查看软件包版本
conda list
#截止发稿时pytorch=1.11.0,torchvision=0.12.0,现在降版本
pip install torch==1.10.0+cu113 torchaudio==0.10.0 torchvision==0.11.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
现在关闭Anaconda Prompt (Anaconda3)后再打开,然后再次运行coco128数据集测试.
错误信息
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
解决方案
del E:\Anaconda3\envs\kmcb\Library\bin\libiomp5md.dll
你可以随时查看已完成的训练或正在训练的信息.如果正在训练过程中请再打开一个Anaconda Prompt (Anaconda3)窗口,然后输入
chcp 65001
conda activate kmcb
cd D:\MyWork\2022\cnn\yolov5
tensorboard --logdir=runs/train
输出内容为
TensorFlow installation not found - running with reduced feature set.
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.9.0 at http://localhost:6006/ (Press CTRL+C to quit)
每次训练的结果都放在yolov5/runs/train目录中.每次训练的结果都会单独保存到一个exp文件里,训练结束后会保存两个模型
chcp 65001
conda activate kmcb
cd D:\MyWork\2022\cnn\yolov5
python detect.py --weights "runs/train/exp5/weights/best.pt" --source "01.tif"
建议安装
chcp 65001
conda activate kmcb
pip install coremltools onnx scikit-learn labelimg protobuf
pip uninstall wandb
rd /S /Q D:\MyWork\2022\cnn\yolov5\runs