MPI程序调试--笔记

有一个不优美但实用的方法,就是在程序中添加如下一段代码:

tmp = 0do while(tmp.eq.0)call sleep(2)enddo 

其功能就相当于插入一个断点,在MPI程序调试中,还可以用来判断该断点之前的程序是否出现导致程序崩溃退出的错误,个人觉得很好用。

[root@c0109 zlt]# cat hello.F program hello implicit none include 'mpif.h' integer myid,numprocs,ierr call MPI_INIT(ierr) call MPI_COMM_RANK(MPI_COMM_WORLD,myid,ierr) call MPI_COMM_SIZE(MPI_COMM_WORLD,numprocs,ierr) print *, 'Hello world, Process',myid call MPI_FINALIZE(ierr) end[root@c0109 zlt]# mpif90 -g hello.F -o hello[root@c0109 zlt]# mpirun -gdb -np 2 hello0-1: (gdb) run0-1: Continuing.0: Hello world, Process 01: Hello world, Process 10-1: 0-1: Program exited normally.0-1: (gdb)


这条命令将会产生4个xterms运行gdb,每个实例对应一个进程,就好像在用gdb调试一个串行程序一样。注意的是,不能在SSH终端上使用。

[root@c0109 zlt]# mpirun -np 4 xterm -e gdb my_mpi_application 


打开新的终端窗口,查看hello进程:

[root@c0108 test]# ps aux |grep helloroot 4342 0.0 0.0 11108 592 ? S 13:54 0:00 /root/zmpi/test/hello 4297root 4359 0.0 0.1 136188 5160 ? S 13:55 0:00 python2 /usr/local/bin/mpdgdbdrv.py helloroot 4360 0.0 0.1 136188 5160 ? S 13:55 0:00 python2 /usr/local/bin/mpdgdbdrv.py helloroot 4361 0.0 0.1 81124 7348 ? S 13:55 0:00 gdb -q helloroot 4362 0.0 0.1 81124 7348 ? S 13:55 0:00 gdb -q helloroot 4388 0.0 0.0 65328 772 pts/1 R+ 13:58 0:00 grep hello 


找到相应的进程号后,就可以开启gdb分别对每个进程进行调试。

0-1:  (gdb) run 4362
0-1:  Continuing. 

 

 

这条命令运行MPI程序,使用Memchecker

[root@c0109 zlt]# mpirun -np 2 valgrind ./hello 

 

串行调试:

[root@c0108 zlt]# gcc -g hello.c -o hello[root@c0108 zlt]# gdbGNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5_5.2)Copyright (C) 2009 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law. Type "show copying"and "show warranty" for details.This GDB was configured as "x86_64-redhat-linux-gnu".For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.(gdb) file helloReading symbols from /root/zmpi/zlt/hello...done.(gdb) r #运行程序(run命令简写),如果程序有命令行参数,亦放在此后Starting program: /root/zmpi/zlt/hello Hello world!Program exited with code 01.(gdb) list #相当于list,从第一行开始列出源码1 #include <stdio.h>2 #include <stdlib.h>34 int main() {5 char *buf;6 buf = "Hello world!";7 printf("%s/n",buf);8 return 1;9 }(gdb) break 3 # break test.c:3 ,设置断点,在源程序第3行处Breakpoint 1 at 0x4004a0: file hello.c, line 3.(gdb) info break #查看断点信息Num Type Disp Enb Address What1 breakpoint keep y 0x00000000004004a0 in main at hello.c:3(gdb) runStarting program: /root/zmpi/zlt/hello Breakpoint 1, main () at hello.c:66 buf = "Hello world!";(gdb) n #next,单条语句执行,c继续运行程序7 printf("%s/n",buf);(gdb) nHello world!8 return 1;(gdb) print buf #打印变量buf的值$1 = 0x4005b8 "Hello world!"(gdb) what is bufNo symbol "is" in current context.(gdb) whatis buf #查看变量buf的类型type = char *(gdb) delete breakpoint 1(gdb) info breakNo breakpoints or watchpoints.(gdb) quit #退出gdbA debugging session is active. Inferior 2 [process 22747] will be killed.Quit anyway? (y or n) y[root@c0108 zlt]#  

 

附:用GDB调试程序 

http://dsec.pku.edu.cn/~yuhj/wiki/gdb.html

 

 

你可能感兴趣的:(c,python,Integer,application,include,debugging)