最近家里WiFi网络有点不稳定。于是用PING来检查是否真的有掉包情况。Windows的PING在有丢包的情况,会随机出现卡住的情况,即不更新PING的反馈,直接到你按ctrl+c,它才会继续。于是用Linux的PING来试试。我用的是一台Fedora,直接接到AP所连接的上行以太网口,先确认以太网口是否有问题。
$ uname -a
Linux Fedora 5.3.7-301.fc31.x86_64 #1 SMP Mon Oct 21 19:18:58 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
在shell输入PING命令,打开O选项,提示丢包(默认情况下,不提示丢包,你只能通过不连续的icmp序号发现丢包,不方便),打开D选项,即提供时间戳。利用tee命令在shell获取ping的返回的同时,存到一个文本文件,用来分析:
$ ping 192.168.1.1 -OD | tee ping.log
ping.log的内容大概是这样的:
PING 192.168.2.1 (192.168.2.1) 56(84) bytes of data.
[1582125725.624008] 64 bytes from 192.168.2.1: icmp_seq=1 ttl=64 time=0.497 ms
[1582125726.627403] 64 bytes from 192.168.2.1: icmp_seq=2 ttl=64 time=0.551 ms
[1582125727.651623] 64 bytes from 192.168.2.1: icmp_seq=3 ttl=64 time=0.668 ms
我们将用python3或GNU awk脚本(二者的效果相当)来分析这个log,统计丢包率、环回时延等指标。先把最终的分析结果呈上来:
--- 192.168.2.1 ping statistics ---
34706 packets transmitted, 34702 received, 0.01% packet loss, time 9 hours 3129 seconds.
rtt min/avg/max=0.307/0.645/1024.000 ms
1. python3
获取目标主机地址:
try:
with open(log_file_name, 'r') as f:
line_list = f.readlines()
lines = ''.join(line_list)
host_match = re.search(r'PING\s+(\d+\.\d+\.\d+\.\d+)', line_list[0])
self._dst_host = host_match.group(1) if host_match else None
获取第1个和最后1个icmp序号,这样就可以知道一共发送了多少个包,计算丢包率的时候作为分母:
ts_match = re.search(r'\[(\d+\.\d+)\][\w\.\s:]+\s+icmp_seq=(\d+)', lines)
self._ts_begin = float(ts_match.group(1))
self._icmp_seq_begin = int(ts_match.group(2))
ts_match = re.search(r'\[(\d+\.\d+)\][\w\.\s:]+\s+icmp_seq=(\d+)', \
line_list[-1])
self._ts_end = float(ts_match.group(1))
self._icmp_seq_end = int(ts_match.group(2))
正常响应的ICMP reply,特征是有rtt时间,如果是丢包,则是报告"no answer yet"。我们依靠正常返回的字符串格式,提取所有的正常返回,数一数个数,接下来计算丢包率用得上。
self._match_list = re.findall(\
r'\[(\d+\.\d+)\].*?:\s+icmp_seq=(\d+).*?time=(\d+\.?\d+)', lines)
re.findall返回的是一个list of tuple,因为我们在正则表达式里定义了3个组。将它转化为3个list。
raw_ts_tuple, raw_icmp_seq_tuple, raw_rtt_tuple = \
zip(*self._match_list)
self._ts_list = list(map(float, list(raw_ts_tuple)))
self._icmp_seq_list = list(map(int, list(raw_icmp_seq_tuple)))
self._rtt_list = list(map(float, list(raw_rtt_tuple)))`
计算丢包率等统计量。
duration = self._ts_end - self._ts_begin
total_packets = self._icmp_seq_end - self._icmp_seq_begin + 1
received_packets = len(self._icmp_seq_list)
print('---%s ping stattistics ---' % (self._dst_host))
print('%d packets transmitted, %d received, %.2f%% packet loss, time %d hours %d seconds.' % \
(total_packets, \
received_packets, \
100 - received_packets/total_packets * 100, \
(duration) // 3600, \
(duration) % 3600 ) )
print('rtt min/avg/max=%.3f/%.3f/%.3f ms' % ( min(self._rtt_list) ,\
mean(self._rtt_list), max(self._rtt_list)))
PING了一个晚上。丢包率是万分之一,应该说挺稳定的。下次接上WLAN AP再来PING看看,能否发现丢包。
对于格式化的文本分析,awk也是很应景的。于是用awk也来试试。
2. awk
$ awk -f ping.awk ping.log
将得出如下结果:
--- 192.168.2.1 ping statistics ---
34706 packets transmitted, 34702 received, 0.01% packet loss, time 9 hours 3129 seconds.
rtt min/avg/max=0.307/0.645/1024.000 ms
ping.awk的内容下面来解释。初始化一下变量:
BEGIN {
lineno=0;
rtt_sum=0.0;
pkt_count_recv = 0;
is_init="False";
}
在awk, $ n就是第n个字符串,字符串之间的分隔符默认是空格,也可以用选项来自定义分隔符。$1就是第1个字符串[1582125725.624008],$6是icmp_seq=1。
ts = $1;
icmp_seq = $6;
rtt = $8;
这些字符串有些多余的东东,比如[1582125725.624008],方括号是多余的;icmp_seq=1,icmp=是多余的。用gsub函数把多余的东东去掉。
gsub(/^\[/, "", ts);
gsub(/$\]/, "", ts);
gsub(/[a-z_]+=/, "", icmp_seq);
我们刚才也提到,ICMP reply正常的标志是有"time="字符串,如果能用sub函数把它找出来,说明是一个正常的ICMP reply。
is_answer = sub(/time=/,"", rtt);
我们统计这些正常的ICMP reply, 平均rtt=rtt累加 / ICMP reply个数。
if (is_answer>0) {
if (is_init=="False") {
rtt_min = rtt
rtt_max = rtt
is_init="True"
}
rtt += 0.0
rtt_sum += rtt;
pkt_count_recv ++;
if (rtt_min>rtt) rtt_min=rtt;
if (rtt_max
在END段处理我们刚才统计的东东
END {
pkt_count_trans = the_icmp_seq["end"]-the_icmp_seq["begin"]+1;
loss_ratio = (pkt_count_trans-pkt_count_recv) / pkt_count_trans * 100;
duration = (the_ts["end"] - the_ts["begin"]);
time_seconds = duration % 3600;
time_hours = (duration - time_seconds)/3600;
printf("--- %s ping statistics ---\n", dest_host);
printf("%d packets transmitted, %d received, %.2f%% packet loss, time %d hours %d seconds.\n",\
pkt_count_trans, pkt_count_recv, loss_ratio, time_hours, time_seconds);
printf("rtt min/avg/max=%.3f/%.3f/%.3f ms\n", rtt_min, (rtt_sum / pkt_count_recv), rtt_max);
}