Debug python - Segmentation fault (core dumped)

现象:

  • Pytorch代码,之前可以训练,突然出现了Segmentation fault (core dumped)错误,啥也跑不了

调试:
运行的时候,加上

 python -q -X faulthandler train1.py 

解决:
可以发现问题引起的代码

Current thread 0x00007f555a406080 (most recent call first):
  File "", line 219 in _call_with_frames_removed
  File "", line 1043 in create_module
  File "", line 583 in module_from_spec
  File "", line 670 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "", line 219 in _call_with_frames_removed
  File "", line 1035 in _handle_fromlist
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/pywrap_tfe.py", line 28 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "", line 219 in _call_with_frames_removed
  File "", line 1035 in _handle_fromlist
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 35 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "", line 219 in _call_with_frames_removed
  File "", line 1035 in _handle_fromlist
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 40 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "", line 219 in _call_with_frames_removed
  File "", line 953 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/__init__.py", line 41 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/huggingface_hub/keras_mixin.py", line 24 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/huggingface_hub/__init__.py", line 63 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/hub.py", line 17 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/helpers.py", line 18 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/beit.py", line 29 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/__init__.py", line 1 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/__init__.py", line 2 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "/home/teletraan/baseline/competition/bird2022/src/timm_audio/models/timm_model.py", line 5 in <module>
  File "", line 219 in _call_with_frames_removed
  File "", line 728 in exec_module
  File "", line 677 in _load_unlocked
  File "", line 967 in _find_and_load_unlocked
  File "", line 983 in _find_and_load
  File "train1.py", line 30 in <module>
Segmentation fault (core dumped)

所以问题出在该环境既有torch又有tensorflow,导致了奇怪的问题,该环境下卸载tensorflow-gpu,问题解决

可能

编译安装tensorflow。因为有些cpu的指令集可能和pip安装的tensorflow的不一致, 这个时候可能会出现core dump的问题, 因此我们需要从新编译tensorflow.

你可能感兴趣的:(python,tensorflow,深度学习)