查找最耗iowait的进程

转载于http://blogold.chinaunix.net/u1/43502/index.html

很多时候发现linux系统输入一些命令很慢,用top查看IOwait占用CPU很高,top下面列出的进程中,不论按cpu排序、内存排序、时间排序,都看不出来到底哪个进程(哪个分区)占用ipwait最高。

Waiting

CPU花费在等待I/O操作上的总时间,与blocked相似,一个系统不应该花费太多的时间在等待I/O操作上,否则你应该进一步检测I/O子系统是否存在瓶颈。


那么到底怎么知道是哪个进程导致iowait过高?

系统日志是没有记录这些内容的,但是内核中有相应的方式。 Linux 内核里提供了一个 block_dump 参数用来把 block 读写(WRITE/READ)状况 dump 到日志里,这样可以通过 dmesg 命令来查看
看一下介绍

block_dump enables block I/O debugging when set to a nonzero value.Ifyou want to findoutwhich process caused the disk to spin up(see /proc/sys/vm/laptop_mode),you can gather information by setting the flag.


Whenthisflag is set,Linux reports all disk readandwrite operations that take place,andall block dirtyings done to files.Thismakes it possible to debug why a disk needs to spin up,andto increase battery life even more.The outputofblock_dump is written to the kernel output,andit can be retrieved using"dmesg".When you use block_dumpandyour kernel logging level also includes kernel debugging messages,you probably want to turn off klogd,otherwise the outputofblock_dump will be logged,causing disk activity that isnotnormally there.


我们首先想个办法让iowait上来,可以用dd,也可以用cp,下面一个简单的脚本:

[root@fan3838 tmp]#cat a.sh
#!/bin/bash
whiletrue
do
<wbr><wbr>cd /usr/share<br><wbr><wbr>mkdir doc1<br><wbr><wbr>cp<span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(0,0,204)">-</span>ra doc/<span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(0,0,204)">*</span>doc1/<br><wbr><wbr>rm<span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(0,0,204)">-</span><span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(255,0,0)">rf</span>doc1<br> done</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>

[root@fan3838 tmp]#./a.sh &


如果使用block_dump那么需要关闭syslog,否则klog会打印很多日志到messages中,这样更加让系统不堪重负。

service syslog stop


打开block dump:

echo 1 > /proc/sys/vm/block_dump


统计方法:

[root@fan3838 tmp]#dmesg | egrep "READ|WRITE|dirtied" | egrep -o '([a-zA-Z]*)' | sort | uniq -c | sort -rn | head
<wbr><wbr>1675 kjournald<br><wbr><wbr>1060 cp<br><wbr><wbr><wbr><wbr><wbr>3 bash<br></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>


网上有人写了一个perl脚本来处理输出,能得到更直观的结果:

参考:http://www.xaprb.com/blog/2009/08/23/how-to-find-per-process-io-statistics-on-linux/

下载地址:http://aspersa.googlecode.com/svn/trunk/iodump
这是一个perl脚本,原理是:将dmesg清空,然后统计1秒内dmesg所dump的block信息。
while true; do sleep 1; dmesg -c; done | perl iodump
因为给出的方法只要结果不要dmesg输出内容,所以封装到while true中然后交给iodump处理
所以执行此命令需要ctrl+c停止之后才能看到结果。

[root@fan3838 tmp]#while true; do sleep 1; dmesg -c; done|perl /root/iodump
# Caught SIGINT.
TASK<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>PID<wbr><wbr><wbr><wbr><wbr>TOTAL<wbr><wbr><wbr><wbr><wbr>READ<wbr><wbr><wbr><wbr>WRITE<wbr><wbr><wbr><wbr>DIRTY DEVICES<br> kjournald<wbr><wbr><wbr><wbr>349<wbr><wbr><wbr><wbr><wbr><wbr><span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(255,1,2)">4185</span><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0<wbr><wbr><wbr><wbr><wbr>4185<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0 sda2<br> cp<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>4701<wbr><wbr><wbr><wbr><wbr><span style="word-wrap:normal; word-break:normal; line-height:18px; color:rgb(255,1,2)">1051</span><wbr><wbr><wbr><wbr>1051<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0 sda2<br> bash<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>4762<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>1<wbr><wbr><wbr><wbr><wbr><wbr>1<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0<wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>0 sda2<br></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>

上面这个结果已经很清楚了,最高的就是cp产生的,说明cp在大概1秒钟之内读写最多,达到1051次
那么造成iowait的罪魁祸首就是cp了。
当然应该多执行几次能得到更精确的结果,有可能是“协助作案”呢。

另:为什么kjournald最多而不“验证”一下这个进程呢?这进程是干什么的?搜索一下,ext3文件系统日志相关。这个进程正常。

测试完毕不要忘记关掉block_dump和启动syslog:
echo 0 > /proc/sys/vm/block_dump
service syslog start

你可能感兴趣的:(wait)