1.小命令 sysdig

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | bash

执行 sysdig -cl | less出现的结果

Category: Application


httplog         HTTP requests log

httptop         Top HTTP requests

memcachelog     memcached requests log

Category: CPU Usage


spectrogram     Visualize OS latency in real time.

subsecoffset    Visualize subsecond offset execution time.


                Top containers by CPU usage

topprocs_cpu    Top processes by CPU usage

Category: Errors



                Top containers by number of errors

topfiles_errors Top files by number of errors

topprocs_errors top processes by number of errors

Category: I/O


echo_fds        Print the data read and written by processes.

fdbytes_by      I/O bytes, aggregated by an arbitrary filter field

fdcount_by      FD count, aggregated by an arbitrary filter field

fdtime_by       FD time group by

iobytes         Sum of I/O bytes on any type of FD

iobytes_file    Sum of file I/O bytes

spy_file        Echo any read/write made by any process to all files. Optionall

                y, you can provide the name of one file to only intercept reads

                /writes to that file.

stderr          Print stderr of processes

stdin           Print stdin of processes

stdout          Print stdout of processes


                Top containers by R+W disk bytes

topfiles_bytes  Top files by R+W bytes

topfiles_time   Top files by time

topprocs_file   Top processes by R+W disk bytes

Category: Logs


spy_logs        Echo any write made by any process to a log file. Optionally, e

                xport the events around each log message to file.

spy_syslog      Print every message written to syslog. Optionally, export the e

                vents around each syslog message to file.

Category: Misc


around          Export to file the events around the time range where the given

                 filter matches.

Category: Net


iobytes_net     Show total network I/O bytes

spy_ip          Show the data exchanged with the given IP address

spy_port        Show the data exchanged using the given IP port number

topconns        Top network connections by total bytes


                Top containers by network I/O

topports_server Top TCP/UDP server ports by R+W bytes

topprocs_net    Top processes by network I/O

Category: Performance


bottlenecks     Slowest system calls

fileslower      Trace slow file I/O

netlower        Trace slow network I/0

proc_exec_time  Show process execution time

scallslower     Trace slow syscalls

topscalls       Top system calls by number of calls

topscalls_time  Top system calls by time

Category: Security



                List the login shell IDs


                print shellshock attacks

spy_users       Display interactive user activity

Category: System State


lscontainers    List the running containers

lsof            List (and optionally filter) the open file descriptors.

netstat         List (and optionally filter) network connections.

ps              List (and optionally filter) the machine processes.

Category: Tracers



                Export spans duration as statds metrics.

Use the -i flag to get detailed information about a specific chisel

2.sysdig案例分析 - 用fdbytes_by chisel来分析磁盘I/O活动


今天来分享一下fdbytes_by的用法,该案例可以探测到系统的那个文件的I/O占用最高(不光是file,还可以是network I/O),而且可以查到哪个进程在读写该文件,并且可以查看到内核级的I/O活动明细。应用场景可以观察一下你的文件系统是否是在高效运转,或者调查一个磁盘I/O延迟的故障。配合dstat --top-io可以更容易定位到进程名字,但是今天介绍的主要是sysdig的fdbytes_by chisel用法,可以想象成没有dstat工具可用的场景下


# sysdig -i fdbytes_by 

Category: I/O


fdbytes_by      I/O bytes, aggregated by an arbitrary filter field

Groups FD activity based on the given filter field, and returns the key that ge

nerated the most input+output bytes. For example, this script can be used to li

st the processes or TCP ports that generated most traffic.


[string] key - The filter field used for grouping


2.1 首先我们来抓取30M的sysdig包来用分析使用。

sysdig -w fdbytes_by.scap -C 30

2.2 然后我们来分析这次抓包没个文件描述符对文件系统的I/O活动:

sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.type

Bytes               fd.type             


45.16M              file

9.30M               ipv4

87.55KB             unix


60B                 pipe



# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.directory

Bytes               fd.directory        


38.42M              /etc

7.59M               /

5.04M               /var/www/html

1.38M               /var/log/nginx

304.73KB            /root/.zsh_history/root

7.31KB              /lib/x86_64-linux-gnu

2.82KB              /dev

2.76KB              /dev/pts

1.62KB              /usr/lib/x86_64-linux-gnu


2.4 那我们看一下,具体访问的是哪个文件呢

# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.name fd.directory=/etc

Bytes               fd.name             


38.42M              /etc/services

2.5 Bingo!找到了,原来是/etc/services被访问的最多,因为services是系统文件,所以可以判断肯定是read的操作达到了38.42M,那我们来看一下哪个进程访问的此文件呢?

# sysdig -r fdbytes_by.scap0 -c fdbytes_by proc.name "fd.filename=services and fd.directory=/etc"

Bytes               proc.name           


38.42M              nscd

2.6 找到元凶了,原来是nscd缓存程序,那他为什么会读取这么多次的services文件呢?在继续看:

# sysdig -r fdbytes_by.scap0 -A -s 4096 -c echo_fds proc.name=nscd 


ab -k -c 2000 -n 300000 


# sysdig -r fdbytes_by.scap0 -c topprocs_file 

Bytes               Process             PID                 


38.42M              nscd                1343

6.43M               nginx               4804

304.89KB            zsh                 32402

9.20KB              ab                  20774

2.79KB              screen              18338

2.37KB              sshd                12812


ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.94s user 2.77s system 9% cpu 38.561 total

ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.93s user 2.79s system 10% cpu 34.632 total




3.性能调优之综合篇 - Linux系统性能监控和故障排查利器Sysdig


Sysdig最新版提供了Docker容器镜像,可以很方便的直接拉取Docker镜像,另一方它提供容器级别的信息采集指令(sysdig -pc container.name=your_container_name),支持查询指定容器之间的网络流量、指定容器的CPU使用率等。

公司旗下的商用软件Sysdig Cloud则是容器级别的系统信息和网络流量监控、调试软件,这个在CoreOS Fest 大会上有介绍,它支持Real-Time Dashboard, Historical Replay, Dynamic Topology and Intelligent Alert, 可以想象成Nagios对系统的监控

软件安装请参考官方文档:http://www.sysdig.org/install/ 相对于SystemTap的安装Sysdig更容易些,本篇文章有点长就不浪费在安装上了,熟悉Ansible的可以去直接用sysdig的Galaxy:https://galaxy.ansible.com/detail#/role/692

Sysdig的语法在record 和replay系统跟踪方面跟Tcpdump和perf很像;在系统性能分析方面的语法chisels又跟SystemTap和dstat的--top*很像,只不过SystemTap需要自己写tap(代码写好了,比Sysdig强大), Sysdig是已经帮你写好了;在交互式使用方面又跟htop很像。

最简单的使用方法是直接输入sysdig, 他会捕获系统的每一个事件并且直接输出到屏幕。