谷歌 colab GPU 内存查看与释放

谷歌 colab GPU 内存查看与释放

参考链接

http://thoughtsondl.blogspot.com/2018/06/how-to-release-or-reset-gpu-memory-in.html

# colab
https://colab.research.google.com

1. 查看 GPU 内存占用情况

  • 首先安装支持包

    # memory footprint support libraries/code
    !ln -sf /opt/bin/nvidia-smi /usr/bin/nvidia-smi
    !pip install gputil
    !pip install psutil
    !pip install humanize
    
  • 然后运行下面的代码

    import psutil
    import humanize
    import os
    import GPUtil as GPU
    
    GPUs = GPU.getGPUs()
    # XXX: only one GPU on Colab and isn’t guaranteed
    gpu = GPUs[0]
    def printm():
        process = psutil.Process(os.getpid())
        print("Gen RAM Free: " + humanize.naturalsize(psutil.virtual_memory().available), " |     Proc size: " + humanize.naturalsize(process.memory_info().rss))
        print("GPU RAM Free: {0:.0f}MB | Used: {1:.0f}MB | Util {2:3.0f}% | Total     {3:.0f}MB".format(gpu.memoryFree, gpu.memoryUsed, gpu.memoryUtil*100, gpu.memoryTotal))
    printm()
    
  • 运行结果大致如下所示:

    Gen RAM Free: 12.7 GB  |     Proc size: 117.8 MB
    GPU RAM Free: 11441MB | Used: 0MB | Util   0% | Total     11441MB
    

2. 查看具体占用情况

  • 命令

    !ps -aux|grep python
    
  • 结果

    root          75  0.2  0.0      0     0 ?        Zs   12:18   0:14 [python3] <defunct>
    root          95  0.0  0.0      0     0 ?        Z    12:18   0:01 [python3] <defunct>
    root         645  0.4  0.0      0     0 ?        Zs   12:30   0:22 [python3] <defunct>
    root         665  0.0  0.0      0     0 ?        Z    12:30   0:00 [python3] <defunct>
    root         878  0.1  0.0      0     0 ?        Zs   12:34   0:08 [python3] <defunct>
    root         898  0.0  0.0      0     0 ?        Z    12:34   0:00 [python3] <defunct>
    root        1157  4.4  0.0      0     0 ?        Zs   12:37   3:44 [python3] <defunct>
    root        1177  0.0  0.0      0     0 ?        Z    12:37   0:01 [python3] <defunct>
    root        1540  5.1  0.0      0     0 ?        Zs   12:48   3:42 [python3] <defunct>
    root        1560  0.0  0.0      0     0 ?        Z    12:49   0:00 [python3] <defunct>
    root        1919  5.9  0.0      0     0 ?        Zs   12:57   3:45 [python3] <defunct>
    root        1939  0.0  0.0      0     0 ?        Z    12:57   0:01 [python3] <defunct>
    root        2360 11.5  0.0      0     0 ?        Zs   13:08   6:00 [python3] <defunct>
    root        2380  0.1  0.1 128920 16772 ?        Sl   13:08   0:03 /usr/bin/python3 /usr/local/lib/python3.7/dist-packages/debugpy/adapter --for-server 38435 --host 127.0.0.1 --port 21826 --server-access-token b2414ede3edba389484a9e85b6689d9d30a08c5210fb245f001428483f77d560
    root        3025  0.3  0.4 196464 62136 ?        Sl   13:32   0:06 /usr/bin/python2 /usr/local/bin/jupyter-notebook --ip="172.28.0.2" --port=9000 --FileContentsManager.root_dir="/" --MappingKernelManager.root_dir="/content"
    root        3032 93.3 15.7 41776532 2100560 ?    Ssl  13:32  26:11 /usr/bin/python3 -m ipykernel_launcher -f /root/.local/share/jupyter/runtime/kernel-2145fd1d-b94a-4a65-b375-8ace07dcd021.json
    root        3052  0.2  0.1 128924 16224 ?        Sl   13:32   0:03 /usr/bin/python3 /usr/local/lib/python3.7/dist-packages/debugpy/adapter --for-server 38657 --host 127.0.0.1 --port 19937 --server-access-token c608ee68a5811a689b01f7454ee5a7fbd7ac78c56dc32a1a96a08ec4c768782d
    root        3518  0.0  0.0  18380  3092 ?        S    14:00   0:00 bash -c tail -n +0 -F "/root/.config/Google/DriveFS/Logs/drive_fs.txt" | python3 /opt/google/drive/drive-filter.py > "/root/.config/Google/DriveFS/Logs/timeouts.txt" 
    root        3520  0.4  0.0  31740  9680 ?        S    14:00   0:00 python3 /opt/google/drive/drive-filter.py
    root        3525  0.0  0.0  39196  6548 ?        S    14:00   0:00 /bin/bash -c ps -aux|grep python
    root        3527  0.0  0.0  38576  5612 ?        S    14:00   0:00 grep python
    

3. 释放内存

  • 命令

    !kill -9 2380			# 后面的数字是上一个 cell 中运行结果中 root 后面的数字
    
  • 删除所有进程后再次查看 GPU 占用, 结果如下

    Gen RAM Free: 12.7 GB  |     Proc size: 117.6 MB
    GPU RAM Free: 11441MB | Used: 0MB | Util   0% | Total     11441MB
    

你可能感兴趣的:(深度学习,云计算)