Linux strace工具,进程诊断、排错、跟踪系统调用和信号量

strace
跟踪系统调用和信号量,是一个很好的诊断、排错的工具。
每行输出都是一个系统调用,包括函数和返回值。

示例

--直接打印信息的方式

[oracle@sean ~]$ strace cat /dev/null
execve("/bin/cat", ["cat", "/dev/null"], [/* 29 vars */]) = 0
brk(0)                                  = 0x124f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fb33d554000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=63649, ...}) = 0
省略......
close(3)                                = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 2), ...}) = 0
open("/dev/null", O_RDONLY)             = 3
fstat(3, {st_mode=S_IFCHR|0666, st_rdev=makedev(1, 3), ...}) = 0
read(3, "", 32768)                      = 0
close(3)                                = 0
close(1)                                = 0
close(2)                                = 0
exit_group(0)                           = ?
+++ exited with 0 +++

--输出信息到指定文件
[oracle@sean ~]$ strace -o /tmp/cat.txt cat /dev/null



--常用参数
--参考博客 http://www.cnblogs.com/ggjucheng/archive/2012/01/08/2316692.html

-c 统计每一系统调用的所执行的时间,次数和出错的次数等.
-d 输出strace关于标准错误的调试信息.
-f 跟踪由fork调用所产生的子进程.
-ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
-h 输出简要的帮助信息.
-i 输出系统调用的入口指针.
-q 禁止输出关于脱离的消息.
-r 打印出相对时间关于,,每一个系统调用.
-t 在输出中的每一行前加上时间信息.
-tt 在输出中的每一行前加上时间信息,微秒级.
-ttt 微秒级输出,以秒了表示时间.
-T 显示每一调用所耗的时间.
-v 输出所有的系统调用.一些调用关于环境变量,状态,输入输出等调用由于使用频繁,默认不输出.
-V 输出strace的版本信息.
-x 以十六进制形式输出非标准字符串
-xx 所有字符串以十六进制形式输出.

-a column
设置返回值的输出位置.默认 为40.
-e expr
指定一个表达式,用来控制如何跟踪.格式如下:
[qualifier=][!]value1[,value2]...
qualifier只能是 trace,abbrev,verbose,raw,signal,read,write其中之一.value是用来限定的符号或数字.默认的 qualifier是 trace.感叹号是否定符号.例如:
-eopen等价于 -e trace=open,表示只跟踪open调用.而-etrace!=open表示跟踪除了open以外的其他调用.有两个特殊的符号 all 和 none.
注意有些shell使用!来执行历史记录里的命令,所以要使用\\.
-e trace=set
只跟踪指定的系统 调用.例如:-e trace=open,close,rean,write表示只跟踪这四个系统调用.默认的为set=all.
-e trace=file
只跟踪有关文件操作的系统调用.
-e trace=process
只跟踪有关进程控制的系统调用.
-e trace=network
跟踪与网络有关的所有系统调用.
-e strace=signal
跟踪所有与系统信号有关的 系统调用
-e trace=ipc
跟踪所有与进程通讯有关的系统调用
-e abbrev=set
设定 strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all.
-e raw=set
将指 定的系统调用的参数以十六进制显示.
-e signal=set
指定跟踪的系统信号.默认为all.如 signal=!SIGIO(或者signal=!io),表示不跟踪SIGIO信号.
-e read=set
输出从指定文件中读出 的数据.例如:
-e read=3,5
-e write=set
输出写入到指定文件中的数据.
-o filename
将strace的输出写入文件filename
-p pid
跟踪指定的进程pid.
-s strsize
指定输出的字符串的最大长度.默认为32.文件名一直全部输出.
-u username
以username 的UID和GID执行被跟踪的命令



--查看Oracle pmon进程示例

[oracle@sean tmp]$ ps -ef|grep ora_
oracle    2355     1  0 10:21 ?        00:00:02 ora_pmon_sean
oracle    2357     1  0 10:21 ?        00:00:02 ora_psp0_sean
oracle    2383     1  1 10:21 ?        00:03:55 ora_vktm_sean
oracle    2387     1  0 10:21 ?        00:00:00 ora_gen0_sean
oracle    2389     1  0 10:21 ?        00:00:00 ora_diag_sean
oracle    2391     1  0 10:21 ?        00:00:00 ora_dbrm_sean
oracle    2393     1  0 10:21 ?        00:00:07 ora_dia0_sean
oracle    2395     1  0 10:21 ?        00:00:00 ora_mman_sean
oracle    2397     1  0 10:21 ?        00:00:01 ora_dbw0_sean
oracle    2399     1  0 10:21 ?        00:00:01 ora_lgwr_sean
oracle    2401     1  0 10:21 ?        00:00:04 ora_ckpt_sean
oracle    2403     1  0 10:21 ?        00:00:01 ora_smon_sean
oracle    2405     1  0 10:21 ?        00:00:00 ora_reco_sean
oracle    2407     1  0 10:21 ?        00:00:03 ora_mmon_sean
oracle    2409     1  0 10:21 ?        00:00:05 ora_mmnl_sean
oracle    2411     1  0 10:21 ?        00:00:00 ora_d000_sean
oracle    2413     1  0 10:21 ?        00:00:00 ora_s000_sean
oracle    2422     1  0 10:21 ?        00:00:01 ora_arc0_sean
oracle    2426     1  0 10:21 ?        00:00:00 ora_arc1_sean
oracle    2430     1  0 10:21 ?        00:00:00 ora_arc2_sean
oracle    2432     1  0 10:21 ?        00:00:00 ora_ctwr_sean
oracle    2434     1  0 10:21 ?        00:00:00 ora_arc3_sean
oracle    2436     1  0 10:21 ?        00:00:00 ora_fbda_sean
oracle    2438     1  0 10:21 ?        00:00:00 ora_qmnc_sean
oracle    2454     1  0 10:21 ?        00:00:02 ora_cjq0_sean
oracle    2471     1  0 10:22 ?        00:00:00 ora_q000_sean
oracle    2473     1  0 10:22 ?        00:00:00 ora_q001_sean
oracle    2513     1  0 10:26 ?        00:00:00 ora_smco_sean
oracle    5609     1  0 14:42 ?        00:00:00 ora_w000_sean

--我们看下pmon进程内部的一些调用情况,下面截取是的每隔3s的输出片段,可以明显看出pmon每隔3s会去查看几个重要进程的状态信息
[oracle@sean ~]$ strace -ttT -p 2355

--看函数名字有点像获取资源使用情况get resource usage,ru代表run,utime代表use time,stime代表system time,只是猜测
14:41:18.505309 getrusage(RUSAGE_SELF, {ru_utime={0, 574912}, ru_stime={1, 768731}, ...}) = 0 <0.000018>
14:41:18.505394 getrusage(RUSAGE_SELF, {ru_utime={0, 574912}, ru_stime={1, 768731}, ...}) = 0 <0.000014>
--获取psp0进程状态(Spawns Oracle background processes after initial instance startup)
14:41:18.505474 open("/proc/2357/stat", O_RDONLY) = 12 <0.000052>
14:41:18.505570 read(12, "2357 (oracle) S 1 2357 2357 0 -1"..., 999) = 247 <0.000037>
14:41:18.505645 close(12)               = 0 <0.000020>
--获取vktm进程状态(Virtual Keeper of Time Process)Oracle一个维持内部时钟进程
14:41:18.505709 open("/proc/2383/stat", O_RDONLY) = 12 <0.000023>
14:41:18.505767 read(12, "2383 (oracle) S 1 2383 2383 0 -1"..., 999) = 247 <0.000028>
14:41:18.505828 close(12)               = 0 <0.000015>
--获取gen0进程状态(General Task Execution Process)通用任务执行进程
14:41:18.505881 open("/proc/2387/stat", O_RDONLY) = 12 <0.000022>
14:41:18.505936 read(12, "2387 (oracle) S 1 2387 2387 0 -1"..., 999) = 240 <0.000025>
14:41:18.505996 close(12)               = 0 <0.000015>
--获取dbrm进程状态(Database Resource Manager Process)资源管理进程
14:41:18.506048 open("/proc/2391/stat", O_RDONLY) = 12 <0.000097>
14:41:18.506181 read(12, "2391 (oracle) S 1 2391 2391 0 -1"..., 999) = 240 <0.000027>
14:41:18.506244 close(12)               = 0 <0.000016>
--获取mman进程状态(Memory Manager Process)内存管理进程,管理内存组件的大小调整等
14:41:18.506299 open("/proc/2395/stat", O_RDONLY) = 12 <0.000024>
14:41:18.506354 read(12, "2395 (oracle) S 1 2395 2395 0 -1"..., 999) = 240 <0.000024>
14:41:18.506413 close(12)               = 0 <0.000015>
--获取dbw0进程状态(Database Writer Process)这个熟悉,把buffer cache中的脏数据写到磁盘(DBW0-DBW9 and DBWa-DBWz)
14:41:18.506467 open("/proc/2397/stat", O_RDONLY) = 12 <0.000021>
14:41:18.506522 read(12, "2397 (oracle) S 1 2397 2397 0 -1"..., 999) = 245 <0.000025>
14:41:18.506580 close(12)               = 0 <0.000015>
--获取lgwr进程状态(Log Writer Process)日志写进程,将log buffer中的redo写入到online redo log
14:41:18.506631 open("/proc/2399/stat", O_RDONLY) = 12 <0.000021>
14:41:18.506683 read(12, "2399 (oracle) S 1 2399 2399 0 -1"..., 999) = 242 <0.000023>
14:41:18.506738 close(12)               = 0 <0.000015>
--获取ckpt进程状态(Checkpoint Process)检查点进程,通知dbwr写进程,并将最新的检查点信息写到所有数据文件头部和控制文件中
14:41:18.506789 open("/proc/2401/stat", O_RDONLY) = 12 <0.000021>
14:41:18.506840 read(12, "2401 (oracle) S 1 2401 2401 0 -1"..., 999) = 244 <0.000024>
14:41:18.506896 close(12)               = 0 <0.000015>
--获取smon进程状态(System Monitor Process)系统监视进程,处理如实例恢复、死事务恢复、和一些维护任务(临时空间回收、数据字典清理、undo表空间管理)
14:41:18.506946 open("/proc/2403/stat", O_RDONLY) = 12 <0.000022>
14:41:18.506998 read(12, "2403 (oracle) S 1 2403 2403 0 -1"..., 999) = 244 <0.000024>
14:41:18.507054 close(12)               = 0 <0.000098>
--获取ctwr进程状态(Change Tracking Writer Process)块修改跟踪写进程,Tracks changed data blocks as part of the Recovery Manager block change tracking feature
14:41:18.507201 open("/proc/2432/stat", O_RDONLY) = 12 <0.000026>
14:41:18.507265 read(12, "2432 (oracle) S 1 2432 2432 0 -1"..., 999) = 240 <0.000026>
14:41:18.507323 close(12)               = 0 <0.000017>
14:41:18.507398 times({tms_utime=57, tms_stime=176, tms_cutime=0, tms_cstime=0}) = 430963879 <0.000014>
14:41:18.507451 times({tms_utime=57, tms_stime=176, tms_cutime=0, tms_cstime=0}) = 430963879 <0.000013>
14:41:18.507503 poll([{fd=9, events=POLLIN|POLLRDNORM}], 1, 0) = 0 (Timeout) <0.000017>
14:41:18.507573 times({tms_utime=57, tms_stime=176, tms_cutime=0, tms_cstime=0}) = 430963879 <0.000013>
14:41:18.507620 poll([{fd=9, events=POLLIN|POLLRDNORM}], 1, 3000^CProcess 2355 detached
 
 
[oracle@sean proc]$ cat /proc/2357/stat
2357 (oracle) S 1 2357 2357 0 -1 4202496 18182 53541 0 0 70 218 2 20 20 0 1 0 5205 750350336 4198 18446744073709551615 4194304 197998444 140724114573664 140724114563384 269900231018 0 0 100680199 1098923256 18446744071581119782 0 0 17 0 0 0 0 0 0


--下面截取是的每隔60s会输出一次的片段
[oracle@sean ~]$ strace -ttT -p 2355

15:08:07.638837 socket(PF_NETLINK, SOCK_RAW, 0) = 12 <0.000021>
15:08:07.638880 bind(12, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0 <0.000011>
15:08:07.638915 getsockname(12, {sa_family=AF_NETLINK, pid=2355, groups=00000000}, [12]) = 0 <0.000009>
15:08:07.638949 sendto(12, "\24\0\0\0\26\0\1\3\327\262PY\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 <0.000024>
15:08:07.638998 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\327\262PY3\t\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108 <0.000016>
15:08:07.639044 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\327\262PY3\t\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128 <0.000013>
15:08:07.639084 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\327\262PY3\t\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20 <0.000010>
15:08:07.639119 close(12)               = 0 <0.000073>
15:08:07.639220 socket(PF_NETLINK, SOCK_RAW, 0) = 12 <0.000013>
15:08:07.639255 bind(12, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0 <0.000010>
15:08:07.639287 getsockname(12, {sa_family=AF_NETLINK, pid=2355, groups=00000000}, [12]) = 0 <0.000009>
15:08:07.639318 sendto(12, "\24\0\0\0\26\0\1\3\327\262PY\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 <0.000016>
15:08:07.639356 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\327\262PY3\t\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108 <0.000012>
15:08:07.639396 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\327\262PY3\t\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128 <0.000012>
15:08:07.639433 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\327\262PY3\t\0\0\0\0\0\0", 4096}], msg_controllen=0, msg_flags=0}, 0) = 20 <0.000010>
15:08:07.639467 close(12)               = 0 <0.000012>
15:08:07.639505 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 12 <0.000018>
15:08:07.639549 fstat(12, {st_mode=S_IFREG|0644, st_size=182, ...}) = 0 <0.000010>
15:08:07.639584 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f19df9a9000 <0.000014>
15:08:07.639622 read(12, "127.0.0.1   localhost sean.ora11"..., 4096) = 182 <0.000020>
15:08:07.639673 read(12, "", 4096)      = 0 <0.000010>
15:08:07.639703 close(12)               = 0 <0.000010>
15:08:07.639731 munmap(0x7f19df9a9000, 4096) = 0 <0.000032>
15:08:07.639789 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 12 <0.000023>
15:08:07.639831 connect(12, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 <0.000015>
15:08:07.639871 getsockname(12, {sa_family=AF_INET, sin_port=htons(33248), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 <0.000009>
15:08:07.639907 connect(12, {sa_family=AF_UNSPEC, sa_data="\0\0\0\0\0\0\0\0\0\0\0\0\0\0"}, 16) = 0 <0.000009>
15:08:07.639938 connect(12, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("192.0.2.66")}, 16) = 0 <0.000011>
15:08:07.639971 getsockname(12, {sa_family=AF_INET, sin_port=htons(27021), sin_addr=inet_addr("192.0.2.66")}, [16]) = 0 <0.000009>
15:08:07.640003 close(12)               = 0 <0.000014>
15:08:07.640039 socket(PF_NETLINK, SOCK_RAW, 0) = 12 <0.000012>
15:08:07.640069 bind(12, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0 <0.000009>
15:08:07.640099 getsockname(12, {sa_family=AF_NETLINK, pid=2355, groups=00000000}, [12]) = 0 <0.000008>
15:08:07.640130 sendto(12, "\24\0\0\0\22\0\1\3\327\262PY\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20 <0.000126>
15:08:07.640333 recvmsg(12, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\360\3\0\0\20\0\2\0\327\262PY3\t\0\0\0\0\4\3\1\0\0\0I\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 2024 <0.000013>
15:08:07.640474 close(12)               = 0 <0.000894>
15:08:07.641423 getsockopt(0, SOL_SOCKET, SO_SNDBUF, 0x7ffcbe11d13c, 0x7ffcbe11d138) = -1 ENOTSOCK (Socket operation on non-socket) <0.000010>
15:08:07.641465 getsockopt(0, SOL_SOCKET, SO_RCVBUF, 0x7ffcbe11d13c, 0x7ffcbe11d138) = -1 ENOTSOCK (Socket operation on non-socket) <0.000009>
15:08:07.641501 socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 12 <0.000023>
15:08:07.641545 fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0 <0.000009>
15:08:07.641581 connect(12, {sa_family=AF_INET, sin_port=htons(1521), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in progress) <0.000062>
15:08:07.641681 getsockopt(12, SOL_SOCKET, SO_SNDBUF, [16384], [4]) = 0 <0.000009>
15:08:07.641716 getsockopt(12, SOL_SOCKET, SO_RCVBUF, [87380], [4]) = 0 <0.000008>
15:08:07.641751 times({tms_utime=65, tms_stime=194, tms_cutime=0, tms_cstime=0}) = 431124768 <0.000008>
15:08:07.641783 times({tms_utime=65, tms_stime=194, tms_cutime=0, tms_cstime=0}) = 431124768 <0.000008>
15:08:07.641817 rt_sigaction(SIGPIPE, {SIG_IGN, ~[ILL ABRT BUS FPE SEGV USR2 XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3ed780f7e0}, {SIG_IGN, ~[ILL ABRT BUS FPE KILL SEGV USR2 STOP XCPU XFSZ SYS RTMIN RT_1], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x3ed780f7e0}, 8) = 0 <0.000009>
15:08:07.641884 times({tms_utime=65, tms_stime=194, tms_cutime=0, tms_cstime=0}) = 431124768 <0.000008>
15:08:07.641917 times({tms_utime=65, tms_stime=194, tms_cutime=0, tms_cstime=0}) = 431124768 <0.000600>
15:08:07.642567 poll([{fd=9, events=POLLIN|POLLRDNORM}, {fd=12, events=POLLOUT}], 2, 3000) = 1 ([{fd=12, revents=POLLOUT|POLLERR|POLLHUP}]) <0.000073>
15:08:07.642693 getsockname(12, {sa_family=AF_INET, sin_port=htons(33518), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 <0.000011>
15:08:07.642774 write(12, "\0h\0\0\1\0\0\0\1:\1,\0\0 \0\377\377\177\10\0\0\1\0\0.\0:\0\0\0\0"..., 104) = -1 ECONNREFUSED (Connection refused) <0.000013>
15:08:07.642830 close(12)               = 0 <0.000021>
15:08:07.642919 times({tms_utime=65, tms_stime=194, tms_cutime=0, tms_cstime=0}) = 431124768 <0.000010>
15:08:07.642959 poll([{fd=9, events=POLLIN|POLLRDNORM}], 1, 3000^CProcess 2355 detached
 



你可能感兴趣的:(Linux)