Linux【9】-进程管理3-2-2--手工释放GPU显存
经常有开发反馈他们的程序已停掉,但是GPU显存无法释放,我们在使用tensorflow+pycharm 或者PyTorch写程序的时候, 有时候会在控制台终止掉正在运行的程序,但是有时候程序已经结束了,nvidia-smi也看到没有程序了,但是GPU的内存并没有释放,这是怎么回事呢?
使用PyTorch设置多线程(threads)进行数据读取(DataLoader),其实是假的多线程,他是开了N个子进程(PID都连着)进行模拟多线程工作,所以你的程序跑完或者中途kill掉主进程的话,子进程的GPU显存并不会被释放,需要手动一个一个kill才行,具体方法描述如下:
1、查看现象
nvidia-smi
Mon Dec 6 14:26:33 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 TITAN V Off | 00000000:04:00.0 Off | N/A |
| 34% 42C P8 26W / 250W | 9575MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 TITAN V Off | 00000000:05:00.0 Off | N/A |
| 35% 45C P8 28W / 250W | 8503MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 TITAN V Off | 00000000:08:00.0 Off | N/A |
| 34% 45C P8 28W / 250W | 8503MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 TITAN V Off | 00000000:09:00.0 Off | N/A |
| 36% 46C P8 28W / 250W | 8503MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 4 TITAN V Off | 00000000:84:00.0 Off | N/A |
| 28% 37C P8 27W / 250W | 4MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 5 TITAN V Off | 00000000:85:00.0 Off | N/A |
| 28% 34C P8 25W / 250W | 4MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 6 TITAN V Off | 00000000:88:00.0 Off | N/A |
| 28% 35C P8 26W / 250W | 4MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 7 TITAN V Off | 00000000:89:00.0 Off | N/A |
| 28% 34C P8 24W / 250W | 4MiB / 12066MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
2、查看进程
fuser -v /dev/nvidia*
/dev/nvidia5: i001 11085 F.... nvidia-smi
i001 18493 F...m python
i001 33238 F...m python
i001 33239 F...m python
i001 33240 F...m python
i001 33251 F...m python
i001 33256 F...m python
i001 33257 F...m python
i001 33258 F...m python
i001 33261 F...m python
i001 33264 F...m python
i001 33265 F...m python
i001 33269 F...m python
i001 33270 F...m python
i001 33271 F...m python
i001 33278 F...m python
3、取出PID
fuser -v /dev/nvidia*|awk -F " " '{print $0}' >/tmp/pid.file
4、强制杀掉进程
while read pid ; do kill -9 $pid; done </tmp/pid.file
参考资料
这里是一个广告位,,感兴趣的都可以发邮件聊聊:tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn
个人公众号,比较懒,很少更新,可以在上面提问题,如果回复不及时,可发邮件给我: tiehan@sina.cn