1.小命令 sysdig

curl -s https://s3.amazonaws.com/download.draios.com/stable/install-sysdig | bash

执行 sysdig -cl | less出现的结果

Category: Application

---------------------

httplog         HTTP requests log

httptop         Top HTTP requests

memcachelog     memcached requests log


Category: CPU Usage

-------------------

spectrogram     Visualize OS latency in real time.

subsecoffset    Visualize subsecond offset execution time.

topcontainers_cpu

                Top containers by CPU usage

topprocs_cpu    Top processes by CPU usage


Category: Errors

----------------

topcontainers_error

                Top containers by number of errors

topfiles_errors Top files by number of errors

topprocs_errors top processes by number of errors


Category: I/O

-------------

echo_fds        Print the data read and written by processes.

fdbytes_by      I/O bytes, aggregated by an arbitrary filter field

fdcount_by      FD count, aggregated by an arbitrary filter field

fdtime_by       FD time group by

iobytes         Sum of I/O bytes on any type of FD

iobytes_file    Sum of file I/O bytes

spy_file        Echo any read/write made by any process to all files. Optionall

                y, you can provide the name of one file to only intercept reads

                /writes to that file.

stderr          Print stderr of processes

stdin           Print stdin of processes

stdout          Print stdout of processes

topcontainers_file

                Top containers by R+W disk bytes

topfiles_bytes  Top files by R+W bytes

topfiles_time   Top files by time

topprocs_file   Top processes by R+W disk bytes


Category: Logs

--------------

spy_logs        Echo any write made by any process to a log file. Optionally, e

                xport the events around each log message to file.

spy_syslog      Print every message written to syslog. Optionally, export the e

                vents around each syslog message to file.


Category: Misc

--------------

around          Export to file the events around the time range where the given

                 filter matches.


Category: Net

-------------

iobytes_net     Show total network I/O bytes

spy_ip          Show the data exchanged with the given IP address

spy_port        Show the data exchanged using the given IP port number

topconns        Top network connections by total bytes

topcontainers_net

                Top containers by network I/O

topports_server Top TCP/UDP server ports by R+W bytes

topprocs_net    Top processes by network I/O


Category: Performance

---------------------

bottlenecks     Slowest system calls

fileslower      Trace slow file I/O

netlower        Trace slow network I/0

proc_exec_time  Show process execution time

scallslower     Trace slow syscalls

topscalls       Top system calls by number of calls

topscalls_time  Top system calls by time


Category: Security

------------------

list_login_shells

                List the login shell IDs

shellshock_detect

                print shellshock attacks

spy_users       Display interactive user activity


Category: System State

----------------------

lscontainers    List the running containers

lsof            List (and optionally filter) the open file descriptors.

netstat         List (and optionally filter) network connections.

ps              List (and optionally filter) the machine processes.


Category: Tracers

-----------------

tracers_2_statsd

                Export spans duration as statds metrics.


Use the -i flag to get detailed information about a specific chisel



2.sysdig案例分析 - 用fdbytes_by chisel来分析磁盘I/O活动

http://shanker.blog.51cto.com/1189689/1771418


今天来分享一下fdbytes_by的用法,该案例可以探测到系统的那个文件的I/O占用最高(不光是file,还可以是network I/O),而且可以查到哪个进程在读写该文件,并且可以查看到内核级的I/O活动明细。应用场景可以观察一下你的文件系统是否是在高效运转,或者调查一个磁盘I/O延迟的故障。配合dstat --top-io可以更容易定位到进程名字,但是今天介绍的主要是sysdig的fdbytes_by chisel用法,可以想象成没有dstat工具可用的场景下


首先我们先来看一下今天的主角fdbytes_by的用法明细:


# sysdig -i fdbytes_by 

Category: I/O

-------------

fdbytes_by      I/O bytes, aggregated by an arbitrary filter field

Groups FD activity based on the given filter field, and returns the key that ge

nerated the most input+output bytes. For example, this script can be used to li

st the processes or TCP ports that generated most traffic.

Args:

[string] key - The filter field used for grouping

答题意思是以文件描述符的各种活动所产生的IO大小来进行排序。


2.1 首先我们来抓取30M的sysdig包来用分析使用。

sysdig -w fdbytes_by.scap -C 30

2.2 然后我们来分析这次抓包没个文件描述符对文件系统的I/O活动:

sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.type

Bytes               fd.type             

--------------------------------------------------------------------------------

45.16M              file

9.30M               ipv4

87.55KB             unix

316B                

60B                 pipe

可以看到file占用的45.16M,是最大的FD,

2.3然后我们来看一下按目录的I/O活动来排序

# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.directory

Bytes               fd.directory        

--------------------------------------------------------------------------------

38.42M              /etc

7.59M               /

5.04M               /var/www/html

1.38M               /var/log/nginx

304.73KB            /root/.zsh_history/root

7.31KB              /lib/x86_64-linux-gnu

2.82KB              /dev

2.76KB              /dev/pts

1.62KB              /usr/lib/x86_64-linux-gnu

发现访问最多的是/etc目录


2.4 那我们看一下,具体访问的是哪个文件呢

# sysdig -r fdbytes_by.scap0 -c fdbytes_by fd.name fd.directory=/etc

Bytes               fd.name             

--------------------------------------------------------------------------------

38.42M              /etc/services

2.5 Bingo!找到了,原来是/etc/services被访问的最多,因为services是系统文件,所以可以判断肯定是read的操作达到了38.42M,那我们来看一下哪个进程访问的此文件呢?

# sysdig -r fdbytes_by.scap0 -c fdbytes_by proc.name "fd.filename=services and fd.directory=/etc"

Bytes               proc.name           

--------------------------------------------------------------------------------

38.42M              nscd


2.6 找到元凶了,原来是nscd缓存程序,那他为什么会读取这么多次的services文件呢?在继续看:


# sysdig -r fdbytes_by.scap0 -A -s 4096 -c echo_fds proc.name=nscd 


原来是nscd在读取services中定义的端口跟服务名称之间的关系,我在抓包的过程中是运行了ab做nginx的静态页面压力测试,本来希望看到的是nginx的读写会很高,没想到中途出现了这个nscd来捣乱:


ab -k -c 2000 -n 300000 

http://shanker.heyoa.com/index.html

# sysdig -r fdbytes_by.scap0 -c topprocs_file 

Bytes               Process             PID                 

--------------------------------------------------------------------------------

38.42M              nscd                1343

6.43M               nginx               4804

304.89KB            zsh                 32402

9.20KB              ab                  20774

2.79KB              screen              18338

2.37KB              sshd                12812



后来我分别测试了一下开启nscd的情况下ab的测试时间,和不开nscd做缓存的情况下,确实开启nscd做本地services的缓存会提高10.189%。


ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.94s user 2.77s system 9% cpu 38.561 total

ab -k -c 2000 -n 300000 http://shanker.heyoa.com/index.html  0.93s user 2.79s system 10% cpu 34.632 total



nscd缓存加速可以参考之前的这篇文章


http://shanker.blog.51cto.com/1189689/1735058


至此,整个分析就结束了,本文只是一个例子,跟大家分享如何使用chisel的fdbytes_by,sysdig还提供了很多chisel共大家分析系统。


3.性能调优之综合篇 - Linux系统性能监控和故障排查利器Sysdig

http://shanker.blog.51cto.com/1189689/1768735

Sysdig最新版提供了Docker容器镜像,可以很方便的直接拉取Docker镜像,另一方它提供容器级别的信息采集指令(sysdig -pc container.name=your_container_name),支持查询指定容器之间的网络流量、指定容器的CPU使用率等。

公司旗下的商用软件Sysdig Cloud则是容器级别的系统信息和网络流量监控、调试软件,这个在CoreOS Fest 大会上有介绍,它支持Real-Time Dashboard, Historical Replay, Dynamic Topology and Intelligent Alert, 可以想象成Nagios对系统的监控

软件安装请参考官方文档:http://www.sysdig.org/install/ 相对于SystemTap的安装Sysdig更容易些,本篇文章有点长就不浪费在安装上了,熟悉Ansible的可以去直接用sysdig的Galaxy:https://galaxy.ansible.com/detail#/role/692


Sysdig的语法在record 和replay系统跟踪方面跟Tcpdump和perf很像;在系统性能分析方面的语法chisels又跟SystemTap和dstat的--top*很像,只不过SystemTap需要自己写tap(代码写好了,比Sysdig强大), Sysdig是已经帮你写好了;在交互式使用方面又跟htop很像。


最简单的使用方法是直接输入sysdig, 他会捕获系统的每一个事件并且直接输出到屏幕。