python out of memory_显存充足,但报错 out of memory

用cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml训练自己的数据集

显存充足,但报错 out of memory,请问该怎么解决这个问题?

`python3 -u tools/train.py -c configs/dcn/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.yml -o pretrain_weights=models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms --use_tb=True --tb_log_dir=tb_log_caltech/scalar --eval

P.S. batch_size已经设置为1,

尝试多进程方式 python -m paddle.distributed.launch --selected_gpus 0,1,2,3 tools/train.py ... 也有同样的问题

实在不知道要怎么做了,求指教...

2020-04-06 17:36:49,707-INFO: 6707 samples in file dataset/coco/annotations/instances_val2007.json

2020-04-06 17:36:49,712-INFO: places would be ommited when DataLoader is not iterable

W0406 17:36:50.419684 27808 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.0, Runtime API Version: 10.0

W0406 17:36:50.422271 27808 device_context.cc:245] device: 0, cuDNN Version: 7.6.

2020-04-06 17:36:51,523-INFO: Loading parameters from models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms...

2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]

2020-04-06 17:36:51,524-WARNING: models/cascade_rcnn_cbr200_vd_fpn_dcnv2_nonlocal_softnms.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]

loading annotations into memory...

Done (t=0.20s)

creating index...

index created!

2020-04-06 17:36:53,468-WARNING: Found an invalid bbox in annotations: im_id: 5387, area: -10.0 x1: 348, y1: 176, x2: 348, y2: 196.

2020-04-06 17:36:53,481-WARNING: Found an invalid bbox in annotations: im_id: 5765, area: -10.0 x1: 71, y1: 174, x2: 71, y2: 197.

2020-04-06 17:36:53,686-INFO: 15649 samples in file dataset/coco/annotations/instances_train2007.json

2020-04-06 17:36:53,699-INFO: places would be ommited when DataLoader is not iterable

I0406 17:36:54.286912 27808 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 4 cards are used, so 4 programs are executed in parallel.

W0406 17:37:00.890586 27808 fuse_all_reduce_op_pass.cc:74] Find all_reduce operators: 730. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 604.

I0406 17:37:01.025799 27808 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1

I0406 17:37:28.992170 27808 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True

I0406 17:37:29.978662 27808 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0

W0406 17:37:42.174067 508 operator.cc:181] deformable_conv raises an exception paddle::memory::allocation::BadAlloc,

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)

1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)

2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)

3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)

4 paddle::memory::allocation::Allocator::Allocate(unsigned long)

5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)

6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)

7 paddle::memory::Alloc(paddle::platform::Place const&, unsigned long)

8 paddle::memory::Alloc(paddle::platform::DeviceContext const&, unsigned long)

9 paddle::framework::Tensor paddle::framework::ExecutionContext::AllocateTmpTensor(paddle::framework::DDim const&, paddle::platform::CUDADeviceContext const&) const

10 paddle::operators::DeformableConvCUDAKernel<:platform::cudadevicecontext float>::Compute(paddle::framework::ExecutionContext const&) const

11 std::_Function_handler >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)

12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const

13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const

14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)

15 paddle::framework::details::ComputationOpHandle::RunImpl()

16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)

17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<:framework::blockingqueue> const&, unsigned long*)

18 std::_Function_handler<:unique_ptr std::__future_base::_result_base::_deleter> (), std::__future_base::_Task_setter<:unique_ptr std::__future_base::_result_base::_deleter>, void> >::_M_invoke(std::_Any_data const&)

19 std::__future_base::_State_base::_M_do_set(std::function<:unique_ptr std::__future_base::_result_base::_deleter> ()>&, bool&)

20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.

Please check whether there is any other process using GPU 2.

If yes, please stop them, or start PaddlePaddle on another GPU.

If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)

F0406 17:37:42.174147 508 exception_holder.h:37] std::exception caught,

C++ Call Stacks (More useful to developers):

0 std::string paddle::platform::GetTraceBackStringstd::string(std::string&&, char const*, int)

1 paddle::memory::allocation::CUDAAllocator::AllocateImpl(unsigned long)

2 paddle::memory::allocation::AlignedAllocator::AllocateImpl(unsigned long)

3 paddle::memory::allocation::AutoGrowthBestFitAllocator::AllocateImpl(unsigned long)

4 paddle::memory::allocation::Allocator::Allocate(unsigned long)

5 paddle::memory::allocation::RetryAllocator::AllocateImpl(unsigned long)

6 paddle::memory::allocation::AllocatorFacade::Alloc(paddle::platform::Place const&, unsigned long)

7 paddle::memory::Alloc(paddle::platform::Place const&, unsigned long)

8 paddle::memory::Alloc(paddle::platform::DeviceContext const&, unsigned long)

9 paddle::framework::Tensor paddle::framework::ExecutionContext::AllocateTmpTensor(paddle::framework::DDim const&, paddle::platform::CUDADeviceContext const&) const

10 paddle::operators::DeformableConvCUDAKernel<:platform::cudadevicecontext float>::Compute(paddle::framework::ExecutionContext const&) const

11 std::_Function_handler >::operator()(char const*, char const*, int) const::{lambda(paddle::framework::ExecutionContext const&)#1}>::_M_invoke(std::_Any_data const&, paddle::framework::ExecutionContext const&)

12 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&, paddle::framework::RuntimeContext*) const

13 paddle::framework::OperatorWithKernel::RunImpl(paddle::framework::Scope const&, paddle::platform::Place const&) const

14 paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, paddle::platform::Place const&)

15 paddle::framework::details::ComputationOpHandle::RunImpl()

16 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync(paddle::framework::details::OpHandleBase*)

17 paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp(paddle::framework::details::OpHandleBase*, std::shared_ptr<:framework::blockingqueue> const&, unsigned long*)

18 std::_Function_handler<:unique_ptr std::__future_base::_result_base::_deleter> (), std::__future_base::_Task_setter<:unique_ptr std::__future_base::_result_base::_deleter>, void> >::_M_invoke(std::_Any_data const&)

19 std::__future_base::_State_base::_M_do_set(std::function<:unique_ptr std::__future_base::_result_base::_deleter> ()>&, bool&)

20 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const

Error Message Summary:

ResourceExhaustedError:

Out of memory error on GPU 2. Cannot allocate 66.797119MB memory on GPU 2, available memory is only 31.062500MB.

Please check whether there is any other process using GPU 2.

If yes, please stop them, or start PaddlePaddle on another GPU.

If no, please decrease the batch size of your model.

at (/paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:69)

*** Check failure stack trace: ***

@ 0x7fdb53276c2d google::LogMessage::Fail()

@ 0x7fdb5327a6dc google::LogMessage::SendToLog()

@ 0x7fdb53276753 google::LogMessage::Flush()

@ 0x7fdb5327bbee google::LogMessageFatal::~LogMessageFatal()

@ 0x7fdb558509b8 paddle::framework::details::ExceptionHolder::Catch()

@ 0x7fdb558fc68e paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()

@ 0x7fdb558fb29f paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()

@ 0x7fdb558fb564 _ZNSt17_Function_handlerIFvvESt17reference_wrapperISt12_Bind_simpleIFS1_ISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS6_12OpHandleBaseESt6atomicIiESt4hashISA_ESt8equal_toISA_ESaISt4pairIKSA_SC_EEESA_RKSt10shared_ptrINS5_13BlockingQueueImEEEEUlvE_vEEEvEEEE9_M_invokeERKSt9_Any_data

@ 0x7fdb532cf983 std::_Function_handler<>::_M_invoke()

@ 0x7fdb5305dc37 std::__future_base::_State_base::_M_do_set()

@ 0x7fdb8dfe5a99 __pthread_once_slow

@ 0x7fdb558f6a52 _ZNSt13__future_base11_Task_stateISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS4_12OpHandleBaseESt6atomicIiESt4hashIS8_ESt8equal_toIS8_ESaISt4pairIKS8_SA_EEES8_RKSt10shared_ptrINS3_13BlockingQueueImEEEEUlvE_vEESaIiEFvvEE6_M_runEv

@ 0x7fdb5305fe64 _ZZN10ThreadPoolC1EmENKUlvE_clEv

@ 0x7fdb7f27d3e7 execute_native_thread_routine_compat

@ 0x7fdb8dfde6ba start_thread

@ 0x7fdb8dd1441d clone

@ (nil) (unknown)

Aborted (core dumped)

nvidia-smi

Mon Apr 6 17:51:08 2020

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 410.78 Driver Version: 410.78 CUDA Version: 10.0 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

|===============================+======================+======================|

| 0 TITAN X (Pascal) Off | 00000000:05:00.0 On | N/A |

| 26% 46C P8 19W / 250W | 380MiB / 12188MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 1 TITAN X (Pascal) Off | 00000000:06:00.0 Off | N/A |

| 26% 46C P8 18W / 250W | 2MiB / 12196MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 2 TITAN X (Pascal) Off | 00000000:09:00.0 Off | N/A |

| 25% 44C P8 19W / 250W | 2MiB / 12196MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

| 3 TITAN X (Pascal) Off | 00000000:0A:00.0 Off | N/A |

| 23% 40C P8 17W / 250W | 2MiB / 12196MiB | 0% Default |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: GPU Memory |

| GPU PID Type Process name Usage |

|=============================================================================|

| 0 1523 G /usr/lib/xorg/Xorg 215MiB |

| 0 3783 G /opt/teamviewer/tv_bin/TeamViewer 41MiB |

| 0 12214 G /usr/bin/nvidia-settings 1MiB |

| 0 20941 G compiz 109MiB |

| 0 26423 G .../local/MATLAB/R2018a/bin/glnxa64/MATLAB 6MiB |

+-----------------------------------------------------------------------------+

`

你可能感兴趣的:(python,out,of,memory)