执行环境:mindspore/mindspore-gpu-cuda10.1:1.7.0镜像
代码如下所示:
if __name__=="__main__":
ms_profiler = Profiler(output_path="./prof_result")
# Init a SummaryCollector callback instance, and use it in model.train or model.eval
specified = {"collect_metric": True, "histogram_regular": "^conv1.*|^conv2.*", "collect_graph": True,
"collect_dataset_graph": True}
summary_collector = SummaryCollector(summary_dir="./summary_dir/summary_01", collect_specified_data=specified,
collect_freq=1, keep_default_action=False, collect_tensor_freq=200)
net = LinearNet()
net_loss = nn.loss.MSELoss()
optim = nn.Momentum(net.trainable_params(), learning_rate=0.01, momentum=0.6)
model = Model(net, net_loss, optim)
epoch = 1
model.train(epoch, ds_train, callbacks=[LossMonitor(100), summary_collector], dataset_sink_mode=False)
ms_profiler.analyse()
复制
直接执行后出现权限问题:
[ERROR] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.980.770 [mindspore/ccsrc/profiler/device/gpu/gpu_profiling.cc:487] StopCUPTI] CUPTI Error:CUPTI_ERROR_INSUFFICIENT_PRIVILEGES function:CuptiUnsubscribe. You may not have access to the NVIDIA GPU performance counters on the target device. Please use the root account to run profiling or configure permissions. If there is still the problem, please refer to the GPU performance tuning document on the official website of mindinsight.
[ERROR] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.980.893 [mindspore/ccsrc/profiler/device/gpu/gpu_profiling.cc:488] StopCUPTI] CUPTI Error:CUPTI_ERROR_NOT_INITIALIZED function:CuptiActivityFlushAll. You may not have access to the NVIDIA GPU performance counters on the target device. Please use the root account to run profiling or configure permissions. If there is still the problem, please refer to the GPU performance tuning document on the official website of mindinsight.
[ERROR] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.980.946 [mindspore/ccsrc/profiler/device/gpu/gpu_profiling.cc:491] StopCUPTI] CUPTI Error:CUPTI_ERROR_NOT_INITIALIZED function:CuptiActivityDisable. You may not have access to the NVIDIA GPU performance counters on the target device. Please use the root account to run profiling or configure permissions. If there is still the problem, please refer to the GPU performance tuning document on the official website of mindinsight.
[ERROR] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.980.981 [mindspore/ccsrc/profiler/device/gpu/gpu_profiling.cc:491] StopCUPTI] CUPTI Error:CUPTI_ERROR_NOT_INITIALIZED function:CuptiActivityDisable. You may not have access to the NVIDIA GPU performance counters on the target device. Please use the root account to run profiling or configure permissions. If there is still the problem, please refer to the GPU performance tuning document on the official website of mindinsight.
[ERROR] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.981.012 [mindspore/ccsrc/profiler/device/gpu/gpu_profiling.cc:491] StopCUPTI] CUPTI Error:CUPTI_ERROR_NOT_INITIALIZED function:CuptiActivityDisable. You may not have access to the NVIDIA GPU performance counters on the target device. Please use the root account to run profiling or configure permissions. If there is still the problem, please refer to the GPU performance tuning document on the official website of mindinsight.
[WARNING] PROFILER(5125,7f104abf0740,python):2022-06-23-02:27:25.982.601 [mindspore/ccsrc/profiler/device/gpu/gpu_data_saver.cc:138] WriteFile] No operation detail infos to write.
Traceback (most recent call last):
File "profile.py", line 69, in
ms_profiler.analyse()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 334, in analyse
self._gpu_analyse()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 710, in _gpu_analyse
reduce_op_type = self._get_step_reduce_op_type()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/profiler/profiling.py", line 759, in _get_step_reduce_op_type
with open(step_trace_file_path, 'r') as f_obj:
FileNotFoundError: [Errno 2] No such file or directory: '/home/prof_result/profiler/step_trace_profiling_0.txt'
复制
但是使用sudo
给与管理员权限之后出现如下错误:找不到nvcc。
Traceback (most recent call last):
File "profile.py", line 2, in
from mindspore import context
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/__init__.py", line 17, in
from .run_check import run_check
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/run_check/__init__.py", line 17, in
from ._check_version import check_version_and_env_config
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/run_check/_check_version.py", line 454, in
check_version_and_env_config()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/run_check/_check_version.py", line 434, in check_version_and_env_config
env_checker.check_version()
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/run_check/_check_version.py", line 143, in check_version
nvcc_version = self._get_nvcc_version(False)
File "/usr/local/python-3.7.5/lib/python3.7/site-packages/mindspore/run_check/_check_version.py", line 85, in _get_nvcc_version
timeout=3, text=True, capture_output=True, check=False)
File "/usr/local/python-3.7.5/lib/python3.7/subprocess.py", line 488, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/python-3.7.5/lib/python3.7/subprocess.py", line 800, in __init__
restore_signals, start_new_session)
File "/usr/local/python-3.7.5/lib/python3.7/subprocess.py", line 1551, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'nvcc': 'nvcc'
复制
最后我直接使用sudo su
切换到管理员身份并执行python代码,最后还是会出现最开始的权限问题。
启动docker时使用--privileged=true选项即可