NAME
collectl - Collects data that describes the current system status.
简单翻译成中文就是:收集当前系统状态数据并予以显示
Collectl是一个系统指标收集工具。可以守护进程方式和交互方式运行。支持从一系列的子系统中收集数据。包含一个Graphite接口,使得数据可以轻易地传递给Graphite进行存储。
下面是官方的介绍:
There are a number of times in which you find yourself needing performance data. These can include benchmarking, monitoring a system's general heath or trying to determine what your system was doing at some time in the past. Sometimes you just want to know what the system is doing right now. Depending on what you're doing, you often end up using different tools, each designed to for that specific situation. Unlike most monitoring tools that either focus on a small set of statistics, format their output in only one way, run either interatively or as a daemon but not both, collectl tries to do it all. You can choose to monitor any of a broad set of subsystems which currently include buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp.
下载: http://sourceforge.net/projects/collectl/files/
安装就不啰嗦了,非常简单!rpn包或源码安装!
使用使用介绍
collectl有三种运行模式:
1. Interactive Mode(交互模式): This is the default and in this mode data is read from /proc and passes through analyze.
2. Record Mode(记录模式):read data from live system and write to file or display on terminal
使用语法:collectl [-f file] [options]
3. Playback Mode(回放模式):read data from one or more raw data files and display on terminal
使用语法:collectl -p file1 [file2 ...] [options]
众多监控工具中、collectl支持的性能数据种类应该是最全的一个,监控的子系统项类型:
SUMMARY SUBSYSTEMS --摘要子系统:显示的比较简单.
b - buddy info (memory fragmentation)
c - CPU
d - Disk
f - NFS V3 Data
i - Inode and File System
j - Interrupts
l - Lustre
m - Memory
n - Networks
s - Sockets
t - TCP
x - Interconnect
y - Slabs (system object caches)
DETAIL SUBSYSTEMS --细节子系统:显示比较详细的信息.
C - CPU
D - Disk
E - Environmental data (fan, power, temp), via ipmitool
F - NFS Data
J - Interrupts
L - Lustre OST detail OR client Filesystem detail
M - Memory node data, which is also known as numa data
N - Networks
T - 65 TCP counters only available in plot format
X - Interconnect
Y - Slabs (system object caches)
Z - Processes
上面这些监控项目必须要以 -s 参数来指定,如:collectl -ss ,并且是运行在回放模式下.
常用的参数及说明:
collect 默认不带参数的情况下显示如下:
[root@twexdb1 qzhijun]# collectl
waiting for 1 second sample...
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter ctxsw KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
0 0 1032 439 0 0 0 0 2 23 6 21
0 0 1049 345 8 16 265 10 0 3 1 6
0 0 1074 229 0 0 0 0 3 25 6 23
0 0 1091 226 0 0 0 0 2 19 3 16
可以看到显示的内容:CPU/Disks/Network,显示的比较简单。
-s 显示子系统
1.显示摘要子系统信息指定项目信息:
举例:
1).只显示CPU的简单信息
[root@twexdb1 qzhijun]# collectl -sc
waiting for 1 second sample...
#<----CPU[HYPER]----->
#cpu sys inter ctxsw
0 0 1099 342
0 0 1060 355
0 0 1115 266
0 0 1032 147
Ouch!
2).同时显示内存和磁盘的简单信息
[root@twexdb1 qzhijun]# collectl -sdm
waiting for 1 second sample...
#<-----------Memory-----------><----------Disks----------->
#Free Buff Cach Inac Slab Map KBRead Reads KBWrit Writes
118M 270M 5G 5G 223M 1G 0 0 264 8
118M 270M 5G 5G 223M 1G 0 0 0 0
118M 270M 5G 5G 223M 1G 0 0 52 10
119M 270M 5G 5G 223M 1G 8 16 1157 52
119M 270M 5G 5G 223M 1G 0 0 0 0
Ouch!
这个子系统也可以原来collectl这个命令不带任何参数的情况下追加或减少显示的信息,用+/-.
3).增加内存的显示信息:
[root@twexdb1 qzhijun]# collectl -s+m
waiting for 1 second sample...
#<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network---------->
#cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
0 0 2348 1851 116M 270M 5G 5G 223M 1G 0 0 0 0 2 22 4 19
1 0 3513 3354 116M 270M 5G 5G 223M 1G 0 0 316 18 78 777 120 701
0 0 1108 304 116M 270M 5G 5G 223M 1G 8 16 1 1 142 1605 184 1368
0 0 1151 683 115M 270M 5G 5G 223M 1G 0 0 28 4 9 65 31 60
Ouch!
4).同时增加内存与网络的显示信息:
[root@twexdb1 qzhijun]# collectl -s+mn
waiting for 1 second sample...
#<----CPU[HYPER]-----><-----------Memory-----------><----------Disks-----------><----------Network---------->
#cpu sys inter ctxsw Free Buff Cach Inac Slab Map KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
0 0 1032 554 116M 270M 5G 5G 224M 1G 0 0 352 9 4 40 11 35
0 0 1032 180 116M 270M 5G 5G 224M 1G 0 0 0 0 1 11 2 12
0 0 1026 174 116M 270M 5G 5G 224M 1G 8 16 1 1 1 4 1 6
0 0 1032 177 116M 270M 5G 5G 224M 1G 0 0 0 0 1 4 1 7
Ouch!
5).在默认显示信息的基础上减去CPU的信息:
[root@twexdb1 qzhijun]# collectl -s-c
waiting for 1 second sample...
#<----------Disks-----------><----------Network---------->
#KBRead Reads KBWrit Writes KBIn PktIn KBOut PktOut
8 16 1 1 29 278 52 230
0 0 0 0 50 556 69 463
0 0 20 3 6 49 14 46
0 0 1516 81 74 675 235 603
8 16 337 8 2 18 8 21
0 0 0 0 1 4 1 6
Ouch!
2.显示详细子系统指定项目信息:
[root@twexdb1 qzhijun]# collectl -sD
waiting for 1 second sample...
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
c0d0 0 0 0 0 0 0 0 0 0 0 0 0 0
sda 8 0 16 1 0 0 1 1 0 1 0 0 0
sdb 0 0 0 0 44 5 6 7 7 2 0 0 0
sdc 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-0 8 0 16 1 0 0 1 1 0 1 0 0 0
dm-1 0 0 0 0 44 0 11 4 4 4 0 0 0
dm-2 0 0 0 0 0 0 0 0 0 0 0 0 0
dm-3 0 0 0 0 0 0 0 0 0 0 0 0 0
c0d0 0 0 0 0 0 0 0 0 0 0 0 0 0
还可以指定特定的磁盘:--dskfilt
[root@twexdb1 qzhijun]# collectl -sD --dskfilt sdb
waiting for 1 second sample...
监控某个特定的进程:
[root@twexdb1 qzhijun]# collectl -sZ --procfilt Cmysql --procopts c
waiting for 60 second sample...
# PROCESS SUMMARY (counters are /sec)
# PID User PR PPID THRD S VSZ RSS CP SysT UsrT Pct AccuTime MajF MinF Command
6839 root 18 1 0 S 10M 1M 3 0.00 0.00 0 00:00.09 0 0 /bin/sh
7002 mysql 14 6839 300 S 2G 1G 15 0.18 3.96 6 728:25:39 0 0 /usr/local/mysql/bin/mysqld
Ouch!
# DISK STATISTICS (/sec)
# <---------reads---------><---------writes---------><--------averages--------> Pct
#Name KBytes Merged IOs Size KBytes Merged IOs Size RWSize QLen Wait SvcTim Util
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
sdb 0 0 0 0 0 0 0 0 0 0 0 0 0
Ouch!
--procfilt Process Filters
c - substring of the command being executed as explicitly read from /proc/pid/stat. Note that this can actually be a perl expression, so if you
want a command that ends in a particular string all you need to is append a to the end of the string. Otherwise it would match any commands con-
taining that string.
C - any command that starts with the specified string
f - full path of the command, including arguments, as read from /proc/pid/cmdline. Like the c modifier this too can be a perl expression.
p - pid
P - parent pid
u - any process ownerd by this user’s UID or in the range specifide by uxxx-yyy
U - any process owned by this username
--top 类似以linux下面的top工具那样实时显示.
如:
collectl -sCj --top
--iosize :显示平均的I/O大小(多了Size字段)
显示时间戳:
-oT 显示时间
-oD 显示日期和时间
-oDm 显示日期时间和毫秒
-i 指定监控时间间隔(以秒为单位)
[root@twexdb1 qzhijun]# collectl -sm -i 2
waiting for 2 second sample...
#<-----------Memory----------->
#Free Buff Cach Inac Slab Map
120M 276M 5G 5G 224M 1G
120M 276M 5G 5G 224M 1G
120M 276M 5G 5G 224M 1G
120M 276M 5G 5G 224M 1G
121M 276M 5G 5G 224M 1G
121M 276M 5G 5G 223M 1G
例:
以1/4秒采集系统数据并保存到日志文件中:
collectl -i.25 -oDm --iosize > testPerf.log
该程序还支持发送数据到远程主机,请参看man说明: man collectl
[root@twexdb1 qzhijun]# collectl --help
This is a subset of the most common switches and even the descriptions are
abbreviated. To see all type 'collectl -x', to get started just type 'collectl'
usage: collectl [switches]
-c, --count count collect this number of samples and exit
-f, --filename file name of directory/file to write to
-i, --interval int collection interval in seconds [default=1]
-o, --options options misc formatting options, --showoptions for all
d|D - include date in output
T - include time in output
z - turn off compression of plot files
-p, --playback file playback results from 'file' (be sure to quote
if wild carded) or the shell might mess it up
-P, --plot generate output in 'plot' format
-s, --subsys subsys specify one or more subsystems [default=cdn]
--verbose display output in verbose format (automatically
selected when brief doesn't make sense)
Various types of help
-h, --help print this text
-v, --version print version
-V, --showdefs print operational defaults
-x, --helpextend extended help, more details descriptions too
-X, --helpall shows all help concatenated together
--showoptions show all the options
--showsubsys show all the subsystems
--showsubopts show all subsystem specific options
--showtopopts show --top options
--showheader show file header that 'would be' generated
--showcolheaders show column headers that 'would be' generated
--showslabaliases for SLUB allocator, show non-root aliases
--showrootslabs same as --showslabaliases but use 'root' names