一、multiprocessing.pool.RemoteTraceback

遇到如下问题多半时数据有问题`。

// A code block
var foo = 'bar';
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 429, in _worker_fn
    batch = batchify_fn([_worker_dataset[i] for i in samples])
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 429, in <listcomp>
    batch = batchify_fn([_worker_dataset[i] for i in samples])
  File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/paper_dataset.py", line 375, in __getitem__
    data_dict = self._transforms(data_dict)
  File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 13, in __call__
    args = trans(args)
  File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 468, in __call__
    dst_points = np.array([[rdw(), rdh()], [w-1-rdw(), rdh()], [w-1-rdw(), h-1-rdh()], [rdw(), h-1-rdh()]])
  File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/data/transforms_paper.py", line 466, in <lambda>
    rdh = lambda: np.random.randint(0, self.max_affine_xy_ratio * h)
  File "mtrand.pyx", line 746, in numpy.random.mtrand.RandomState.randint
  File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data2/enducation/paper_recog_total/train-paper-recog/line_detect/scripts/train_gluon_testpaper.py", line 233, in <module>
    for batch_cnt, data_batch in enumerate(tqdm.tqdm(train_loader)):
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/site-packages/mxnet/gluon/data/dataloader.py", line 484, in __next__
    batch = pickle.loads(ret.get(self._timeout))
  File "/home/unaguo/anaconda3/envs/pt1.3-py3.6/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
ValueError: low >= high

解决思路:
将mx.gluon.data.DataLoader中修改thread_pool=True,什么意思呢?
If True, use threading pool instead of multiprocessing pool. Using threadpool can avoid shared memory usage. If DataLoader is more IO bounded or GIL is not a killing problem, threadpool version may achieve better performance than multiprocessing.
翻译:如果True,则使用线程池而不是多处理池。使用线程池可以避免共享内存的使用。如果“DataLoader”的IO范围更大,或者GIL不是致命的问题是,线程池版本可能实现比多处理更好的性能。

train_loader = mx.gluon.data.DataLoader(train_dataset, batch_size=config.TRAIN.batch_size,
                                            shuffle=True, num_workers=2, thread_pool=True,
                                            last_batch="discard", batchify_fn=batch_fn)

你可能感兴趣的:(深度学习,问题处理,深度学习)