[转] linux 常用定位问题命令总结
1:查看CPU负载--mpstat
mpstat -P ALL [internal [count]]
参数的含义如下:
-P ALL 表示监控所有CPU
internal 相邻的两次采样的间隔时间
count 采样的次数
mpstat命令从/proc/stat获得数据输出
输出的含义如下:
CPU 处理器ID
user 在internal时间段里,用户态的CPU时间(%) ,不包含 nice值为负 进程 ?usr/?total*100
nice 在internal时间段里,nice值为负进程的CPU时间(%) ?nice/?total*100
system 在internal时间段里,核心时间(%) ?system/?total*100
iowait 在internal时间段里,硬盘IO等待时间(%) ?iowait/?total*100
irq 在internal时间段里,软中断时间(%) ?irq/?total*100
soft 在internal时间段里,软中断时间(%) ?softirq/?total*100
idle 在internal时间段里,CPU除去等待磁盘IO操作外的因为任何原因而空闲的时间闲置时间 (%) ?idle/?total*100
intr/s 在internal时间段里,每秒CPU接收的中断的次数 ?intr/?total*100
CPU总的工作时间total_cur=user+system+nice+idle+iowait+irq+softirq
total_pre=pre_user+ pre_system+ pre_nice+ pre_idle+ pre_iowait+ pre_irq+ pre_softirq
user=user_cur – user_pre
total=total_cur-total_pre
其中_cur 表示当前值,_pre表示interval时间前的值。上表中的所有值可取到两位小数点。
2:查看磁盘io情况及CPU负载--vmstat
usage: vmstat [-V] [-n] [delay [count]]
-V prints version.
-n causes the headers not to be reprinted regularly.
-a print inactive/active page stats.
-d prints disk statistics
-D prints disk table
-p prints disk partition statistics
-s prints vm table
-m prints slabinfo
-S unit size
delay is the delay between updates in seconds.
unit size k:1000 K:1024 m:1000000 M:1048576 (default is K)
count is the number of updates.
vmstat从/proc/stat获得数据
输出的含义如下:
FIELD DESCRIPTION FOR VM MODE
Procs
r: The number of processes waiting for run time.
b: The number of processes in uninterruptible sleep.
Memory
swpd: the amount of virtual memory used.
free: the amount of idle memory.
buff: the amount of memory used as buffers.
cache: the amount of memory used as cache.
inact: the amount of inactive memory. (-a option)
active: the amount of active memory. (-a option)
Swap
si: Amount of memory swapped in from disk (/s).
so: Amount of memory swapped to disk (/s).
IO
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
System
in: The number of interrupts per second, including the clock.
cs: The number of context switches per second.
CPU
These are percentages of total CPU time.
us: Time spent running non-kernel code. (user time, including nice time)
sy: Time spent running kernel code. (system time)
id: Time spent idle. Prior to Linux 2.5.41, this includes IO-wait time.
wa: Time spent waiting for IO. Prior to Linux 2.5.41, shown as zero.
st: Time spent in involuntary wait. Prior to Linux 2.6.11, shown as zero.
3:查看内存使用情况--free
usage: free [-b|-k|-m|-g] [-l] [-o] [-t] [-s delay] [-c count] [-V]
-b,-k,-m,-g show output in bytes, KB, MB, or GB
-l show detailed low and high memory statistics
-o use old format (no -/+buffers/cache line)
-t display total for RAM + swap
-s update every [delay] seconds
-c update [count] times
-V display version information and exit
[root@Linux /tmp]# free
total used free shared buffers cached
Mem: 255268 238332 16936 0 85540 126384
-/+ buffers/cache: 26408 228860
Swap: 265000 0 265000
Mem:表示物理内存统计
-/+ buffers/cached:表示物理内存的缓存统计
Swap:表示硬盘上交换分区的使用情况,这里我们不去关心。
系统的总物理内存:255268Kb(256M),但系统当前真正可用的内存b并不是第一行free 标记的 16936Kb,它仅代表未被分配的内存。
第1行 Mem:
total:表示物理内存总量。
used:表示总计分配给缓存(包含buffers 与cache )使用的数量,但其中可能部分缓存并未实际使用。
free:未被分配的内存。
shared:共享内存,一般系统不会用到,这里也不讨论。
buffers:系统分配但未被使用的buffers 数量。
cached:系统分配但未被使用的cache 数量。buffer 与cache 的区别见后面。
total = used + free
第2行 -/+ buffers/cached:
used:也就是第一行中的used - buffers-cached 也是实际使用的内存总量。
free:未被使用的buffers 与cache 和未被分配的内存之和,这就是系统当前实际可用内存。
free 2= buffers1 + cached1 + free1 //free2为第二行、buffers1等为第一行
buffer 与cache 的区别
A buffer is something that has yet to be "written" to disk.
A cache is something that has been "read" from the disk and stored for later use
第3行:
对操作系统来讲是Mem的参数.buffers/cached 都是属于被使用,所以它认为free只有16936.
对应用程序来讲是(-/+ buffers/cach).buffers/cached 是等同可用的,因为buffer/cached是为了提高文件读取的性能,当应用程序需在用到内存的时候,buffer/cached会很快地被回收。
所以从应用程序的角度来说,可用内存=系统free memory+buffers+cached.
swap
swap就是LINUX下的虚拟内存分区,它的作用是在物理内存使用完之后,将磁盘空间(也就是SWAP分区)虚拟成内存来使用.
4:查看网卡情况--sar
详细见man
4.1:查看网卡流量:sar -n DEV delay count
服务器网卡最大能承受流量由网卡本身决定,分为10M、10/100自适应、100+以及1G网卡,一般普通服务器用的是百兆,也有用千兆的。
输出解释:
IFACE
Name of the network interface for which statistics are reported.
rxpck/s
Total number of packets received per second.
txpck/s
Total number of packets transmitted per second.
rxbyt/s
Total number of bytes received per second.
txbyt/s
Total number of bytes transmitted per second.
rxcmp/s
Number of compressed packets received per second (for cslip etc.).
txcmp/s
Number of compressed packets transmitted per second.
rxmcst/s
Number of multicast packets received per second.
4.2:查看网卡失败情况:sar -n EDEV delay count
输出解释:
IFACE
Name of the network interface for which statistics are reported.
rxerr/s
Total number of bad packets received per second.
txerr/s
Total number of errors that happened per second while transmitting packets.
coll/s
Number of collisions that happened per second while transmitting packets.
rxdrop/s
Number of received packets dropped per second because of a lack of space in linux buffers.
txdrop/s
Number of transmitted packets dropped per second because of a lack of space in linux buffers.
txcarr/s
Number of carrier-errors that happened per second while transmitting packets.
rxfram/s
Number of frame alignment errors that happened per second on received packets.
rxfifo/s
Number of FIFO overrun errors that happened per second on received packets.
txfifo/s
Number of FIFO overrun errors that happened per second on transmitted packets.
5:定位问题进程--top, ps
top -d delay,详细见man
ps aux 查看进程详细信息
ps axf 查看进程树
6:查看某个进程与文件关系--losf
需要root权限才能看到全部,否则只能看到登录用户权限范围内的内容
lsof -p 77//查看进程号为77的进程打开了哪些文件
lsof -d 4//显示使用fd为4的进程
lsof abc.txt//显示开启文件abc.txt的进程
lsof -i :22//显示使用22端口的进程
lsof -i tcp//显示使用tcp协议的进程
lsof -i tcp:22//显示使用tcp协议的22端口的进程
lsof +d /tmp//显示目录/tmp下被进程打开的文件
lsof +D /tmp//同上,但是会搜索目录下的目录,时间较长
lsof -u username//显示所属user进程打开的文件
7:查看程序运行情况--strace
usage: strace [-dffhiqrtttTvVxx] [-a column] [-e expr] ... [-o file]
[-p pid] ... [-s strsize] [-u username] [-E var=val] ...
[command [arg ...]]
or: strace -c [-e expr] ... [-O overhead] [-S sortby] [-E var=val] ...
[command [arg ...]]
常用选项:
-f:除了跟踪当前进程外,还跟踪其子进程。
-c:统计每一系统调用的所执行的时间,次数和出错的次数等.
-o file:将输出信息写到文件file中,而不是显示到标准错误输出(stderr)。
-p pid:绑定到一个由pid对应的正在运行的进程。此参数常用来调试后台进程。
8:查看磁盘使用情况--df
test@wolf:~$ df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 3945128 1810428 1934292 49% /
udev 745568 80 745488 1% /dev
/dev/sda3 12649960 1169412 10837948 10% /usr/local
/dev/sda4 63991676 23179912 37561180 39% /data
9:查看网络连接情况--netstat
常用:netstat -lpn
选项说明:
-p, --programs display PID/Program name for sockets
-l, --listening display listening server sockets
-n, --numeric don't resolve names
-a, --all, --listening display all sockets (default: connected)