Pytorch:VSCode利用nn.DataParallel将模型计算涉及到的数据自动转移到GPU,并在指定的多个GPU上进行训练或调试

一、train.py

在训练阶段、验证、测试阶段添加代码:

# whether use multi gpu:
if self.args .multi_gpu:
	model = nn.DataParallel(model)
else:
	model = model 

使用 nn.DataParallel将model包装之后,各种数据在喂给模型之前如果在cpu上,会自动转移到GPU上,不需要手动将各个数据利用.cuda(),或.to('cuda')进行转移。

import torch
import config
import argparse


...


model = Mymodel()

...
    
# whether use multi gpu:

if self.args .multi_gpu:
	model = nn.DataParallel(model)
else:
	model = model 

train()

二、修改launch.json(VSCode

{
    // Use IntelliSense to learn about possible attributes.
    // Hover to view descriptions of existing attributes.
    // For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "justMyCode": true
        }
    ]
}

注销掉代码:

// "program": "${file}",
// "console": "integratedTerminal",

添加代码:

"connect": {
	"host": "localhost", 
	"port": 50678 
}
{
    // 使用 IntelliSense 了解相关属性。 
    // 悬停以查看现有属性的描述。
    // 欲了解更多信息,请访问: https://go.microsoft.com/fwlink/?linkid=830387
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python: Current File",
            "type": "python",
            "request": "attach",
            // "program": "${file}",
            // "console": "integratedTerminal",
            "justMyCode": false,
            "connect": {
                "host": "localhost", 
                "port": 50678 
            }
        }
    ]
}

 三、train.sh

利用debugpy调试,设定调试用的GPU标号为2

#!/usr/bin/env bash

#修改gpu编号
export CUDA_VISIBLE_DEVICES=2,3
python3 -m debugpy --listen 50678  --wait-for-client train.py

四、在vscode的terminal运行:

sh -x train.sh

运行完成shell脚本后停在python脚本调用的入口,点击F5进入python调试。

你可能感兴趣的:(AI/模型训练,pytorch,深度学习,python)