那天Snow很Xfan在查这个问题,我也一起看了看,收集了一些命令,以后再碰到类似的问题应该可以节省不少时间。
1. I want to check what system call from mongo cost most oftime
[jianxu1@phx7b02c-1d65 ~]$ sudostrace -f-c -p 16931 make sure add –f to record vfork LWP
% time seconds usecs/call calls errorssyscall
------ ----------- -------------------- --------- ----------------
79.27 90.786037 6783 13385 recvfrom
8.24 9.439600 8684 1087 nanosleep
7.87 9.014995 240 37634 6914 futex
3.00 3.440658 2870 1199 select
1.47 1.681699 210212 8 restart_syscall
0.09 0.107694 20 5282 sendto
0.03 0.034485 454 76 mmap
0.01 0.009579 195 49 write
0.00 0.004785 10 492 164 stat
0.00 0.001218 25 49 fdatasync
0.00 0.000556 37 15 read
0.00 0.000342 7 49 lseek
0.00 0.000153 8 19 close
0.00 0.000000 0 19 open
0.00 0.000000 0 15 fstat
0.00 0.000000 0 15 munmap
0.00 0.000000 0 8 getdents
------ ----------- -------------------- --------- ----------------
100.00 114.521801 59401 7078 total
2. How to find my network interface speed
[jianxu1@phx7b02c-1d65 ~]$ sudoethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s network is 1G NIC --> 这里要提醒大家,Duplex是full,所以1G只是单方向的速度,双向速度是 2G
Duplex: Full
Port: Twisted Pair
PHYAD: 1
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: pumbg
Wake-on: g
Current message level: 0x00000003 (3)
Link detected: yes
3. How to check if my network is saturated
[jianxu1@phx7b02c-1d65 ~]$ sar -nDEV | head -10
Linux 2.6.32-220.el6.x86_64(phx7b02c-1d65.stratus.phx.ebay.com) 11/08/2012 _x86_64_ (24 CPU)
12:00:01AM IFACE rxpck/s txpck/s rxkB/s txkB/s rxcmp/s txcmp/s rxmcst/s
12:10:01AM lo 7.52 7.52 18.32 18.32 0.00 0.00 0.00
12:10:01AM eth0 33121.49 77322.77 27613.92 110934.60 0.00 0.00 0.02
12:10:01AM eth1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
我当时的理解是: receive is 27MB/s,send is 110MB/s, our NIC speed is 128MB/s, signal of network saturated
其实我上面的解释是不对的,1G 的duplex=Full的网卡,进出加起来的理论速度峰值应该是 2G,而不是1G。但是理论值不等于实际情况。
4. How to check what process is eating my network resources.
When we use mongostat:
[jianxu1@xxxxxxx ~]$mongostat
connected to: 127.0.0.1
insert query update deletegetmore command flushes mapped vsize res faults locked% idx miss % qr|qw ar|aw netIn netOut conn set repl time
0 500 100 0 142 108 0 32.2g 77.2g 26g 0 6.2 0 0|0 1|0 12m 62m 400 aaaaaaa M 17:47:18
0 976 181 0 170 107 0 32.2g 77.2g 26g 0 26.2 0 0|0 0|0 16m 93m 400 aaaaaaa M 17:47:19
0 1077 219 0 203 133 0 32.2g 77.2g 26g 0 11.8 0 0|0 2|0 17m 100m 400 aaaaaaa M 17:47:20
NetIn/Out’s unit is bits, so totalnet in/out is around 120 M bits/second, much less than 1G bits/second, meaningmongostat does not cover all network related with mongo.
注意: 我上面关于unit is bits 的判断也是错误的,后面会解释。
Use iftop ! http://www.ex-parrot.com/~pdw/iftop/
Here’s the place you can download http://pkgs.repoforge.org/iftop/, I downloaded http://pkgs.repoforge.org/iftop/iftop-0.17-1.el6.rf.x86_64.rpmbecause our redhat server is 6.2
[jianxu1@xxxxxx ~]$ sudoiftop -i eth0
You can press 1 to let it sort ,you can press p to let it display port info, press p again to only displaymachine information.
Here, take yyyyyyy:43892 in the other side as an example,loginto that machine, use
[jianxu1@yyyyyyy~]$ sudonetstat -tup | grep 43892
tcp 0 0 yyyyyyy:43892 xxxxxxx:27017 ESTABLISHED 21811/mongod // now we know most of network resource is cost by sync among mongo nodes within cluster.
I’m thinking of if we should splitthe NICs, we use dedicated NIC just for SYNC.
我当时送了上面的小结以后,一位叫John 的 资深DBA 纠正了我的错误:
>>NetIn/Out’s unit is bits,so total net in/out is around 120 M bits/second,
>> much less than 1Gbits/second, meaning mongostat does not cover all network related with mongo.
I don't think that's correct. Ithink mongostat gets its network data from db.serverStatus().network, whichnames the fields "bytesIn" and "bytesOut". Anyway, thenumbers make a lot more sense if you assume bytes. 100M is about 80% of thetheoretical max for a GigE interface, and is likely to be the practical limitin this case. Note that the interface is full-duplex so you should not add inand out traffic; the limit is 1 GB/sec in and 1 GB/sec out.
我后来自己调查了一下,特别是NetIn/NetOut的单位问题,John是对的,他后来给 10gen发了Bug,让他们纠正官方文档:
For the unit of netIn andnetOut: I says it’s bits because I refer to http://docs.mongodb.org/manual/reference/mongostat/
netIn
The amount of network traffic, in bits,received by the MongoDB. This includes traffic from mongostat itself.
But after I dig into the sourcecode, it seems the official document is wrong ! soyou are right again J
Here’s why I say the officialdocument is wrong:
When I check https://github.com/mongodb/mongo/blob/master/src/mongo/tools/stat.cpp à the source code comments says it’s bits.
out << " netIn \t- network traffic in - bits\n";
out << " netOut \t- network traffic out - bits\n";
Then I dig into the source codehere:
You are right, mongostat actuallyleverage the output of serverStatus(), what mongostat actually does is to diffthe “network” output from serverStatus() between two consecutive samples.
Output of serverStatus:
"network" : {
"bytesIn" : 62159758,
"bytesOut" : 216745737,
"numRequests" : 591863
},
https://github.com/mongodb/mongo/blob/master/src/mongo/tools/stat_util.cpp
if ( a["network"].isABSONObj() && b["network"].isABSONObj() ) {
BSONObj ax = a["network"].embeddedObject();
BSONObj bx = b["network"].embeddedObject();
_appendNet( result , "netIn" , diff( "bytesIn" , ax , bx ) ); //jianxu1: ax and bx represents two samples from serverStatus()
_appendNet( result , "netOut", diff( "bytesOut", ax , bx ) );
}
double StatUtil::diff( const string& name , const BSONObj& a , const BSONObj& b ) {
BSONElement x = a.getFieldDotted( name.c_str() );
BSONElement y = b.getFieldDotted( name.c_str() );
if ( ! x.isNumber() || ! y.isNumber() )
return -1;
return ( y.number() - x.number() ) / _seconds; à//jianxu1: the result of diff is still bytes now
}
void StatUtil::_appendNet( BSONObjBuilder& result , const string& name , double diff ) { //jianxu1: just beautify the output, does not do byte to bits logic
// I think 1000 is correct for megabit, but I've seen conflicting things (ERH 11/2010)
const double div = 1000;
string unit = "b";
if ( diff >= div ) {
unit = "k";
diff /= div;
}
if ( diff >= div ) {
unit = "m";
diff /= div;
}
if ( diff >= div ) {
unit = "g";
diff /= div;
}
string out = str::stream() << (int)diff << unit;
_append( result , name , 6 , out );
}