有用的Linux系统监控Mark

1.CPU

a. 察看cpu整体使用率 vmstat, 看cpu部分，具体指标

us: Time spent running non-kernel code. (user time, including nice time)

sy: Time spent running kernel code. (system time)

id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.

wa: Time spent waiting for IO. Prior to Linux 2.5.41, included in idle.

st: Time stolen from a virtual machine. Prior to Linux 2.6.11, unknown.

例如：

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

1 0 0 2420532 87324 1977656 0 0 369 102 628 10 88 7 6 76 10

1 0 0 2420648 87324 1978072 0 0 0 0 2587 36 98 2 12 87 0

0 0 0 2420524 87324 1978008 0 0 0 0 2032 26 07 1 11 88 0

1 0 0 2420648 87324 1977848 0 0 0 0 2143 26 22 1 10 88 0

1 0 0 2420648 87324 1977848 0 0 0 8 2402 29 78 2 11 88 0

理论上追求sy越小us越大越好

b. 专门察看cpu的工具，可以看 mpstat

具体指标可以man mpstat

例如：

CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle

all 5.81 0.05 6.92 7.03 0.00 0.04 0.00 0.00 80.15

0 6.75 0.06 5.53 17.43 0.00 0.07 0.00 0.00 70.17

1 5.46 0.00 7.70 2.45 0.00 0.00 0.00 0.00 84.39

2 6.20 0.12 6.38 6.06 0.00 0.00 0.00 0.00 81.24

3 4.85 0.00 8.09 2.18 0.00 0.09 0.00 0.00 84.79

可以看到比vmstat详细了许多。

c. top从进程的角度看cpu

Tasks: 170 total, 1 running, 168 sleeping, 0 stopped, 1 zombie

Cpu(s): 4.1%us, 7.3%sy, 0.0%ni, 88.3%id, 0.2%wa, 0.0%hi, 0.0%si, 0.0%st

Mem: 8087420k total, 6244888k used, 1842532k free, 181312k buffers

Swap: 4088504k total, 0k used, 4088504k free, 2359388k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1952 lysu 20 0 2643m 2.1g 2.1g S 25 27.3 8:51.22 VirtualBox

2420 lysu 20 0 1329m 812m 76m S 6 10.3 4:29.38 java

1146 root 20 0 169m 24m 13m S 5 0.3 0:52.43 Xorg

d.cpu调度队列深度

还是用vmstat看procces的r部分

Procs

r: The number of processes waiting for run time.

b: The number of processes in uninterruptible sleep.

2. 内存相关

a. 看内存使用情况还是用vmstat

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

r b swpd free buff cache si so bi bo in cs us sy id wa

0 0 0 1752528 182916 2436380 0 0 176 93 692 1237 5 7 83 5

0 0 0 1747808 182916 2441180 0 0 0 168 2867 3812 2 5 92 0

0 0 0 1747932 182916 2440540 0 0 0 0 2751 3776 2 7 91 0

0 0 0 1747932 182916 2440540 0 0 0 0 3019 4469 2 7 91 0

0 0 0 1747932 182924 2440540 0 0 0 40 2624 3608 3 6 90 2

主要关注free部分和swap的page-in和page-out，当free不足的时候swap出入都会增加。

b.或用top，或察看/proc/meminfo的内容

lysu@lysu-Latitude-E5420:~$ tail /proc/meminfo

VmallocChunk: 34359318356 kB

HardwareCorrupted: 0 kB

AnonHugePages: 0 kB

HugePages_Total: 0

HugePages_Free: 0

HugePages_Rsvd: 0

HugePages_Surp: 0

Hugepagesize: 2048 kB

DirectMap4k: 57344 kB

DirectMap2M: 8232960 kB

c. 察看锁抢占使用pidstat -w

lysu@lysu-Latitude-E5420:~$ pidstat -w -I -p 5712 5

Linux 3.0.0-13-generic (lysu-Latitude-E5420) 2011-12-11 _x86_64_ (4 CPU)

20:52:02 PID cswch/s nvcswch/s Command

20:52：07 5712 0.00 0.00 chrome

20:52：12 5712 0.40 0.00 chrome

20:52：17 5712 0.00 0.00 chrome

20:52：22 5712 0.20 0.00 chrome

cswch/s 表示进程在一秒内因为访问锁或其他因素主动放弃当前上下文而发生切换的情况。
要计算主动上下文切换消耗的时钟频率，可以，（cswch/s）/（虚拟处理器数）*80000/(计算机总时钟频率)...如果比例为3%~5%。

nvswch/s表示被动的放弃上下文而发生切换，过高的被动切换表示可能运行的线程数目超过cpu一时所能提供的，应对策略是把进程set到固定的cpu上？不解

3. 网络IO

一般使用netstat来看网络情况，可以看到收发多少包，但无法看使用率

使用nicstat看网络使用率

4. 磁盘IO

a. 使用iostat，察看b%部分可以知道磁盘使用率

如想区分系统和用户io可以加-xm

磁盘io过高，可以通过换更快的磁盘/把io分到多个磁盘上/调整程序使用缓存

b. 使用lsof察看进程使用的文件

lsof -c 进程名

lysu@lysu-Latitude-E5420:~$ sudo lsof -c thunderbird

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

thunderbi 6674 lysu cwd DIR 8,1 4096 17825794 /home/lysu

thunderbi 6674 lysu rtd DIR 8,1 4096 2 /

thunderbi 6674 lysu txt REG 8,1 39512 5386054 /usr/lib/thunderbird-8.0/thunderbird-bin

thunderbi 6674 lysu mem REG 8,1 27032 528062 /lib/x86_64-linux-gnu/libnss_dns-2.13.so

thunderbi 6674 lysu mem REG 8,1 10368 524353 /lib/libnss_mdns4_minimal.so.2

lsof也可以列出指定目录或文件被那些进程访问，具体情况man lsof

c. inotify

( http://www.infoq.com/cn/articles/inotify-linux-file-system-event-monitoring)
使用lsof是主动去轮询，而inotify是有变化通知我们里面的inotifywatch、inotifywait也可以用来看某文件夹被其他程序更改的情况

lysu@lysu-Latitude-E5420:~$ inotifywait -r -m ~/.thunderbird/

Setting up watches. Beware: since -r was given, this may take a while!

Watches established.

/home/lysu/.thunderbird/lxbmhh1d.default/ImapMail/mail.qunar-1.com/ OPEN INBOX

/home/lysu/.thunderbird/lxbmhh1d.default/ImapMail/mail.qunar-1.com/ ACCESS INBOX

/home/lysu/.thunderbird/lxbmhh1d.default/ImapMail/mail.qunar-1.com/ CLOSE_NOWRITE,CLOSE INBOX

风吹草动都可以看到非常不错.

ps: 最后还有个没有搞成功的sar...( http://blog.sina.com.cn/s/blog_46018a590100jxo8.html)

有用的Linux系统监控Mark

有用的Linux系统监控Mark

1.CPU

b. 专门察看cpu的工具，可以看 mpstat

c. top从进程的角度看cpu

d.cpu调度队列深度

2. 内存相关 a. 看内存使用情况还是用vmstat procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----

a. 看内存使用情况还是用vmstat

b.或用top，或察看/proc/meminfo的内容

c. 察看锁抢占使用pidstat -w

3. 网络IO

4. 磁盘IO

a. 使用iostat，察看b%部分可以知道磁盘使用率

b. 使用lsof察看进程使用的文件

c. inotify

你可能感兴趣的:(有用的Linux系统监控Mark)

2. 内存相关

a. 看内存使用情况还是用vmstat

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----