(一).iostat:
syntax:iostat <option> interval count
default display:terminal,disk,cpu===>【tdc】
iostat -xtdc 5 3
explanation:
【x】====>eXtend
【t】====>terminal
【d】=====>disk
【c】=====>cpu
Results and Solutions:
The values to look from the iostat output are:
Reads/writes per second (r/s , w/s)
Percentage busy (%b)
Service time (svc_t)
If a disk shows consistently high reads/writes along with , the percentage busy (%b) of the disks is greater than 5 percent, and the average service time (svc_t) is greater than 30 milliseconds, then one of the following action needs to be taken
1.)Tune the application to use disk i/o more efficiently by modifying the disk queries and using available cache facilities of application servers .
==>修改磁盘查询或者引用应用服务器的可用缓存设施
2.) Spread the file system of the disk on to two or more disk using disk striping feature of volume manager /disksuite etc.
==>扩展磁盘的文件系统到两个或者两个以上的磁盘上
3.) Increase the system parameter values for inode cache , ufs_ninode , which is Number of inodes to be held in memory. Inodes are cached globally (for UFS), not on a per-file system basis
==>增加系统参数如:inode cache和ufs_ninode的值
4.) Move the file system to another faster disk /controller or replace existing disk/controller to a faster
one.
==>将文件系统移动到更快速的磁盘或者控制器上,或者将现存的磁盘或者控制器替换为一个更快的设备
(二).vmstat:
syntax:vmstat <option> interval count
default display:process,memory,paging,disk,interrupts,cpu
vmstat -pci 5 3
【p】====>paging
【c】=====>cache
【i】====>interrupt
Results and Solutions:
A. CPU issues:
Following columns has to be watched to determine if there is any cpu issue
Processes in the run queue (procs r)
User time (cpu us)
System time (cpu sy)
Idle time (cpu id)
procs cpu
r b w us sy id
0 0 0 4 14 82
0 0 1 3 35 62
0 0 1 3 33 64
0 0 1 1 21 78
Problem symptoms:
1.) If the number of processes in run queue (procs r) are consistently greater than the number of CPUs on the system it will slow down system as there are more processes then available CPUs .
==>如果procs r,即运行队列的进程数,这个值持续性超过系统的cpu个数时,系统就会慢下来,除非有更多cpu可供进程使用。
2.) if this number is more than four times the number of available CPUs in the system then system is facing shortage of cpu power and will greatly slow down the processess on the system.
==>如果procs r,这一值超过了当前系统可用的cpu个数的4倍时,系统将面临cpu电力不足的问题,随之引发的是当前系统运行的进程会很快的慢下来。
3.) If the idle time (cpu id) is consistently 0 and if the system time (cpu sy) is double the user time (cpu us) system is facing shortage of CPU resources.
===>如果cpu id,即cpu的空闲时间持续性的为0,并且cpu系统时间两倍于cpu用户时间的话,系统将面临cpu资源短缺的问题。
Resolution :
Resolution to these kind of issues involves tuning of application procedures to make efficient use of cpu and as a last resort increasing the cpu power or adding more cpu to the system.
===>解决以上问题,可采取的措施有:调整应用程序来有效的使用cpu,或者增加cpu的电力,或者增加系统cpu的配置。
B. Memory Issues:
Memory bottlenecks are determined by the scan rate (sr) . The scan rate is the pages scanned by the clock algorithm per second. If the scan rate (sr) is continuously over 200 pages per second then there is a memory shortage.
===>内存瓶颈取决于sr,即扫描率。扫描率指的是每秒钟时针算法扫描的页数。如果sc持续超过200,说明内存短缺。
Resolution :
1. Tune the applications & servers to make efficient use of memory and cache.
==>调整应用和服务器,以保证有效的利用内存和缓存。
2. Increase system memory .
===>增加系统内存
3. Implement priority paging in s in pre solaris 8 versions by adding line "set priority paging=1" in
/etc/system. Remove this line if upgrading from Solaris 7 to 8 & retaining old /etc/system file.
===>实现优先分页:(1)在solaris 8之前的版本中,通过在配置文件:/etc/system中加入参数:【set priority paging=1】
(2)在如果是从solaris 7升级到8的话,删除该行,并且保留原来的配置文件 /etc/system
注意:
po===> page output,从RAM到虚拟内存(swap disk)
pi===> page input,从虚拟内存(swap disk)到RAM
参见如下的例子:
/////////////begin//////////
vmstat 1 5
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr 1m 1m 1m 1m in sy cs us sy id
0 0 0 1463016 493928 25 119 7 2 2 0 1 0 0 0 0 270 191 148 2 1 97
0 0 0 999136 304200 0 6 0 0 0 0 0 0 0 0 0 6152 10880 3799 13 5 82
0 0 0 999136 304200 0 0 0 0 0 0 0 0 0 0 0 6154 10431 3855 19 5 76
0 0 0 999136 304200 0 0 0 0 0 0 0 0 0 0 0 6444 11306 3942 10 8 82
0 0 0 999000 304096 30 244 0 8 8 0 0 0 0 0 0 5237 9895 3307 16 7 76
fault 显示每秒的中断数
in--》设备中断 (installation fault)
sy--》系统中断 (system fault)
cs--》cpu交换(cpu switch)
主要从两方面指标衡量:在用的和未在用的
cpu:如果r经常大于4(安装的cpu总数) ,且id经常少于40,表示cpu的负荷很重。(两方面:1.空闲的cpu;2.运行队列中的进程数)
memory:如果pi,po长期不等于0,表示内存(RAM)不足。 (两方面:PI,PO长期不等于0,主要是PI长期(持续性)不为0时,说明内存不足)
io:如果disk经常不等于0,且在b中的队列数大于3,表示io性能不好。(两方面:1.disk;2.block数)
当r值超过了CPU个数,就会出现CPU瓶颈,解决办法大体几种:
1.最简单的就是增加CPU个数
2.通过调整任务执行时间,如大任务放到系统不繁忙的情况下进行执行,进而平衡系统任务
3.调整已有任务的优先级
当内存的需求大于RAM的数量,服务器启动了虚拟内存机制,通过虚拟内存,可以将RAM段移到SWAP DISK的特殊磁盘段上,
这样会出现虚拟内存的页导出和页导入现象,页导出并不能说明RAM瓶颈,虚拟内存系统经常会对内存段进行页导出,
但页导入操作就表明了服务器需要更多的内存了,页导入需要从SWAP DISK上将内存段复制回RAM,导致服务器速度变慢。
pi:虚拟内存--->RAM
po:RAM----->虚拟内存
/////////////end///////////
(三)netstat:
syntax:netstat <option/s>
netstat -an
explanation:
【a】====>all sockets
【r】====>system routing tables
【i】===>interface
【m】====>network memory buffers
【p】====>statistics of specified protocol
【s】===>statistics of per-protocol
【D】====>status of DHCP
【n】===>only for ip
【d】====>dropped packets per interface
【I】====>information of specified interface
【v】====>verbose
if you see a lots of connections in FIN_WAIT state tcp/ip parameters have to be tuned because the
connections are not being closed and they gets accumulating . After some time system may run out of
resource . TCP parameter can be tuned to define a time out so that connections can be released and
used by new connection.
==>如果发现有大量的连接处于FIN_WAIT状态的话,tcp/ip参数必须进行调整,因为这些连接处于不能关闭的状态,而且正在累积(会变得越来越多)。
一段时间后,系统可能会资源不足(耗尽资源)。TCP参数可以定义一个超时时间,以便连接能够被释放或者使用新的连接。
注意:几个命令:
1.prtconf |grep 'Memory' ===>显示内存数
2.psrinfo -v ===>显示cpu数及状态
3.prtdiag ====>显示内存和cpu等的信息