DBA日记,AIX磁盘繁忙率与响应时间的关系?

景:
昨天在进行数据库性能测试,监控数据库的磁盘繁忙度,突然有感,这个繁忙度指标代表什么呢?是不是就是说繁忙度达到一定的值后,磁盘的响应时间就会上升呢?
衍生问题:我的的环境是oracle 11g +asm ,其中有一个asm diskgroup是由三个asm disk组成,asm的组成冗余模式是external ,那么我是否能通过增加磁盘数量来降低这asm disk的繁忙度呢?如果之前的问题成立,降低繁忙度就能降低IO的响应时间,那么这个也是优化点。
分析:
1. 资料搜集
1.1 %busy can never be used as an indicator individually for an IO problem. This is because IO
throughput can scale dramatically depending on whether it is large block, small block,
sequential or random IO and the hdisk driver does not know for every adapter/storage subsystem
what their real throughput is for each of those conditions. At best, it is a sorting mechanism
for hdisks.
Good/bad IO is a function of service times. iostat -Dl reports read and write service
times. A physical spindle can do ~300 IOs/second (IOPS) and reads < 10 msec and writes < 2
msec are considered good. Depending on your RAID/config/subsystem you figure out with those
values are reasonable.
1.2 sar监控命令收集
Listing 4. Using sar
# sar -d 1 2
AIX l488pp065_pub 1 7 00F604884C00 08/11/10
System configuration: lcpu=4 drives=1 ent=0.25 mode=Uncapped
11:38:44 device %busy avque r+w/s Kbs/s avwait avserv
11:38:45 hdisk0 1 0.0 6 24 0.0 1.9
11:38:46 hdisk0 0 0.0 3 15 0.0 2.3
Average hdisk0 0 0.0 4 19 0.0 2.1
Let's break down the column headings from Listing 4.
%busy: This command reports back the portion of time that the device was busy servicing transfer requests.
avque: In AIX Version 5.3, this command reports back the number of requests waiting to be sent to disk.<<<============ 当该值大于0时,表示当前的IO请求已经超过IO设备承载的最大值。 [重要指标]
r+w/s: This command reports back the number of read or write transfers to or from a device (512 byte units).
avwait: This command reports the average wait time per request (milliseconds).<<<<===========等待处理的时间,目测与avque同时并存
avserv: This command reports the average service time per request (milliseconds). <<<<=====服务时间,应该是一个IO从请求、等待处理,处理,返回的全部时间之和。
# crontab -l | grep sa1
0 8-17 * * 1-5 /usr/lib/sa/sa1 1200 3 &
0 * * * 0,6 /usr/lib/sa/sa1 &
0 18-7 * * 1-5 /usr/lib/sa/sa1 &
1.3 topas 工具
pay close attention to "Wait" (in the CPU section up top), which also helps determine if the system is I/O bound. If you see high numbers here, you can then use other tools ( such as filemon, fileplace, lsof, or lslv ) to help you figure out which processes, adapters, or file systems are causing your bottlenecks.
Also useful is the topas physical hard disk output ( -D ). It shows disk statistics and can show you if a single hardware disk is being hammered and would benefit from having filesystems or information spread and moved over other disks.
 you should check the ART/AWT and MRT/MWT which show the average and maximum wait times for reads and writes to the disk. High values indicate a very busy disk. The AQW shows the average number of queues waiting per request to the I/O device. Again, high values may indicate a disk that is unable to keep up with the demands being requested of it. <<<<<<<<<==============ART/AWT:平均读/写磁盘的等待时间,MRT/MWT:最大读/写磁盘等待时间
1.4 nmon工具  
Here is one small example of where nmon really shines. Say you want to know which processes are taking most of the disk I/O and you want to be able to correlate it with the actual disk to clearly illustrate I/O per process. nmon usage helps you more than any other tool. To do this with nmon, use the -t option; set your timing and then sort by I/O channel.
How do you use nmon to capture data and import it into the analyzer? Use the sudo command and run nmon for three hours, taking a snapshot every 30 seconds: # sudo nmon -f -t -r test1 -s 30 -c 180 . Then sort the output file that gets created: # sort -A testsystem_yymmdd.nmon > testsystem_yymmdd.csv.
When this is completed, ftp the .csv file to your PC, start the nmon analyzer spreadsheet (enable macros), and click on analyze nmon data. You can download the nmon analyzer from here <<<<<<<<<<<<<<==================使用nmon能够显示出按IO对进程进行排序
2. 资料分析
2.1 针对IO监控,AIX提供以下几个工具:
sar -d seconds insterval :能够显示磁盘的繁忙度,磁盘等待队列的长度,每个请求
等待时间,每个请求的服务时间,以及每秒传输的字节数。
topas:是一个综合工具,能够快速全面了解系统资源使用情况,按下“d”选项,能够看
到最繁忙磁盘的平均与最大等待时间,有传输数据。
nmon:则能按 io 资源的使用对进程进行排序。
2.2 综合分析:当磁盘繁忙度达到100%时,必然出等待队列,队列的服务时间及等待时间必然
上升,IO请求的处理速度下降,响应时间上升。
2.3 繁忙度:IO设备服务IO请求总时间/采样时间 的百分比,所以通过繁忙度这个指标,能够大概估算出是否有足够的资源,应付后续的进程所发出IO请求的增长,如并发 ;与此同时,也能说明,当前的IO处理能力,是否满足当前的IO负载。
3. 实验

结论:
1) 繁忙度达到一定的值80%以上后,磁盘的响应时间就会上升。
2)繁忙度达到一定的值80%%以上后,磁盘等待队列出现,响应时间就会上升。
3)衍生问题,在另一章节中继续跟进。

你可能感兴趣的:(操作系统,ORACLE,DBA)