现象:
调试:
运行的时候,加上
python -q -X faulthandler train1.py
解决:
可以发现问题引起的代码
Current thread 0x00007f555a406080 (most recent call first):
File "" , line 219 in _call_with_frames_removed
File "" , line 1043 in create_module
File "" , line 583 in module_from_spec
File "" , line 670 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/pywrap_tensorflow.py", line 64 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "" , line 219 in _call_with_frames_removed
File "" , line 1035 in _handle_fromlist
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/pywrap_tfe.py", line 28 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "" , line 219 in _call_with_frames_removed
File "" , line 1035 in _handle_fromlist
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/eager/context.py", line 35 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "" , line 219 in _call_with_frames_removed
File "" , line 1035 in _handle_fromlist
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/python/__init__.py", line 40 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "" , line 219 in _call_with_frames_removed
File "" , line 953 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/tensorflow/__init__.py", line 41 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/huggingface_hub/keras_mixin.py", line 24 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/huggingface_hub/__init__.py", line 63 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/hub.py", line 17 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/helpers.py", line 18 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/beit.py", line 29 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/models/__init__.py", line 1 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/.pyenv/versions/comp/lib/python3.7/site-packages/timm/__init__.py", line 2 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "/home/teletraan/baseline/competition/bird2022/src/timm_audio/models/timm_model.py", line 5 in <module>
File "" , line 219 in _call_with_frames_removed
File "" , line 728 in exec_module
File "" , line 677 in _load_unlocked
File "" , line 967 in _find_and_load_unlocked
File "" , line 983 in _find_and_load
File "train1.py", line 30 in <module>
Segmentation fault (core dumped)
所以问题出在该环境既有torch又有tensorflow,导致了奇怪的问题,该环境下卸载tensorflow-gpu,问题解决
编译安装tensorflow。因为有些cpu的指令集可能和pip安装的tensorflow的不一致, 这个时候可能会出现core dump的问题, 因此我们需要从新编译tensorflow.