日常运维中,异常分析、性能优化、水位分析都离不开 Linux 内核提供的各种分析工具,这里对运维常用工具做一个汇总备忘。
分析异常、性能优化,其实就是依据运维工具提供的 Linux 内核各个模块的负载情况,结合相关应用的执行原理,抽丝剥茧后定位问题,然后解决问题。而水位分析,则是拉长时间维度,分析业务量变化与 Linux 的资源使用比率变化的关系。
应用性能的优化过程,其实就是向底层系统和业务的贴近(定制化)过程。
注: 引用图,其中 dtrace/blktrace/stap 暂时没用过
$ dstat
--total-cpu-usage-- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai stl| read writ| recv send| in out | int csw
1 1 99 0 0| 253k 18M| 0 0 | 0 0 | 170 277
1 1 98 0 0| 0 0 | 534B 908B| 0 0 | 139 184
2 2 97 0 0| 0 0 | 60B 516B| 0 0 | 137 193
2 1 98 0 0| 0 0 | 60B 516B| 0 0 | 94 152
sar -d # 块设备
sar -r # 内存
sar -u # CPU
sar -n DEV # 网络
$ iostat -dxk 1 10
Linux 4.15.0-72-generic (iZ8vbbr6ewt5krcaodkr28Z) 01/29/2020 _x86_64_ (2 CPU)
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
vda 3.84 0.93 120.07 19.85 0.05 0.34 1.28 26.65 1.62 3.34 0.01 31.27 21.44 0.21 0.10
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device r/s w/s rkB/s wkB/s rrqm/s wrqm/s %rrqm %wrqm r_await w_await aqu-sz rareq-sz wareq-sz svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
# -d 只显示磁盘数据
# -k 以 kbytes 为单位显示数据
# -x 显示详细的磁盘 IO 数据
$ iotop
Total DISK READ : 27.45 K/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ: 27.45 K/s | Actual DISK WRITE: 19.61 K/s
TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND
192 be/3 root 0.00 B/s 0.00 B/s 0.00 % 0.14 % [jbd2/vda1-8]
632 be/2 root 27.45 K/s 0.00 B/s 0.00 % 0.12 % AliYunDun
1 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % init noibrs splash
2 be/4 root 0.00 B/s 0.00 B/s 0.00 % 0.00 % [kthreadd]
$ lsof
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
systemd 1 root cwd DIR 252,1 4096 2 /
systemd 1 root rtd DIR 252,1 4096 2 /
systemd 1 root txt REG 252,1 1595792 142354 /lib/systemd/systemd
systemd 1 root mem REG 252,1 1700792 131544 /lib/x86_64-linux-gnu/libm-2.27.so
systemd 1 root mem REG 252,1 121016 131519 /lib/x86_64-linux-gnu/libudev.so.1.6.9
tcpdump tcp port 3306
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
18:59:44.818840 IP 39.149.12.221.25741 > iZ8vbbr6ewt5krcaodkr28Z.mysql: Flags [S], seq 2126500721, win 64240, options [mss 1452,nop,wscale 8,nop,nop,sackOK], length 0
18:59:44.818880 IP iZ8vbbr6ewt5krcaodkr28Z.mysql > 39.149.12.221.25741: Flags [R.], seq 0, ack 2126500722, win 0, length 0
# 展示网卡信息
$ netstat -i
Kernel Interface table
Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 73915 0 0 0 27295 0 0 0 BMRU
lo 65536 416 0 0 0 416 0 0 0 LRU
# 找出程序使用的 socket
$ netstat -ap | grep mysql
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 65746 18196/mysqld /var/run/mysqld/mysqld.sock
unix 3 [ ] STREAM CONNECTED 373447 18196/mysqld /var/run/mysqld/mysqld.sock
# 处于监听状态的进程和端口信息(可以看到:mysqld 绑定了 127.0.0.1 ,无法接受外部请求)
$ netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 662/sshd
tcp 0 0 127.0.0.1:3306 0.0.0.0:* LISTEN 18196/mysqld
tcp 0 0 127.0.0.53:53 0.0.0.0:* LISTEN 315/systemd-resolve
udp 0 0 127.0.0.53:53 0.0.0.0:* 315/systemd-resolve
udp 0 0 172.26.95.106:68 0.0.0.0:* 266/systemd-network
udp 0 0 127.0.0.1:323 0.0.0.0:* 450/chronyd
udp6 0 0 ::1:323 :::* 450/chronyd
# 收发文件
$ nc 192.168.21.248 9999 < test # 通过192.168.21.248的9999TCP端口发送数据文件
$ nc -l 9999 > zabbix.file # 开启一个本地9999的TCP端口,用来接收文件内容
# 测网速
$ nc -l 9999 > /dev/null # A 机器收数据
$ nc 10.0.1.161 9999
$ perf stat ./t1
Performance counter stats for './t1':
262.738415 task-clock-msecs # 0.991 CPUs
2 context-switches # 0.000 M/sec
1 CPU-migrations # 0.000 M/sec
81 page-faults # 0.000 M/sec
9478851 cycles # 36.077 M/sec (scaled from 98.24%)
6771 instructions # 0.001 IPC (scaled from 98.99%)
111114049 branches # 422.908 M/sec (scaled from 99.37%)
8495 branch-misses # 0.008 % (scaled from 95.91%)
12152161 cache-references # 46.252 M/sec (scaled from 96.16%)
7245338 cache-misses # 27.576 M/sec (scaled from 95.49%)
0.265238069 seconds time elapsed
$ perf top
Samples: 14K of event 'cpu-clock', Event count (approx.): 850319680
Overhead Shared Object Symbol
3.45% [kernel] [k] _raw_spin_unlock_irqrestore
3.14% [kernel] [k] __do_page_fault
2.76% libcrypto.so.1.1 [.] OPENSSL_LH_insert
2.07% [kernel] [k] finish_task_switch
$ top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
615 root 10 -10 138012 16748 13868 S 1.7 0.4 4:54.60 AliYunDun
1 root 20 0 77768 8904 6664 S 0.0 0.2 0:01.92 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
$ free -m
total used free shared buff/cache available
Mem: 3944 247 2636 2 1061 3460
Swap: 947 0 947
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 0 2699512 157704 929332 0 0 5 43 307 492 1 0 99 0 0
0 0 0 2699536 157704 929332 0 0 0 0 573 901 0 0 100 0 0
0 0 0 2699536 157704 929332 0 0 0 0 569 902 0 0 100 0 0
$ pidstat
Average: UID PID %usr %system %guest %wait %CPU CPU Command
Average: 0 8 0.00 0.10 0.00 0.00 0.10 - rcu_sched
Average: 0 395 0.10 0.00 0.00 0.00 0.10 - aliyun-service
Average: 0 615 0.40 1.09 0.00 0.00 1.49 - AliYunDun
Average: 0 12735 0.00 0.40 0.00 0.00 0.40 - pidstat
Average: 109 18196 0.10 0.00 0.00 0.00 0.10 - mysqld
$ mpstat
06:30:02 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
06:30:03 PM all 24.88 0.00 15.42 12.44 0.00 0.00 0.00 0.00 0.00 47.26
06:30:04 PM all 25.37 0.00 16.92 10.95 0.00 0.00 0.00 0.00 0.00 46.77
06:30:05 PM all 22.89 0.00 15.92 12.94 0.00 0.00 0.00 0.00 0.00 48.26