概述
Linux中的QoS分为入口(Ingress)部分和出口(Egress)部分,入口部分主要用于进行入口流量限速(policing),出口部分主要
用于队列调度(queuingscheduling)。
大多数排队规则(qdisc)都是用于输出方向的,输入方向只有一个排队规则,即ingressqdisc。ingressqdisc本身的功能很有限,
但可用于重定向incomingpackets。通过Ingressqdisc把输入方向的数据包重定向到虚拟设备ifb,而ifb的输出方向可以配置
多种qdisc,就可以达到对输入方向的流量做队列调度的目的。
Q:为什么大多数的流量控制都是在输出方向的?
A:It is easiest to create traffic control rules for traffic flowing outof an interface, since we can control when the system
sends data,but controlling when we receive data requires an additionalintermediate queue to be created to buffer
incoming data.
原理图如下:
Ingress qdisc
Theingress qdisc itself does not require any parameters. It differs fromother qdiscs in that it does not occupy the
rootof a device. Attach it like this:
#tc qdisc add dev eth0 ingress
Thisallows you to have other, sending qdiscs on your device besides theingress qdisc.
Aboutthe ingress qdisc
Ingressqdisc (known as ffff:) can't have any children classes. (hence theexistence of IMQ)
Theonly thing you can do with the ingress qdisc is attach filters.
Aboutfiltering on the ingress qdisc
Sincethere are no classes to which to direct the packets, the onlyreasonable option is to drop the packets.
Withclever use of filtering, you can limit particular traffic signaturesto particular uses of your bandwidth.
入口流量的限速
#tc qdisc add dev eth0 ingress
#tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 match ipsrc 0.0.0.0/0 police rate 2048kbps burst 1m drop flowid :1
ifb
IFB— Intermediate Functional Block device。
Q:How can we use qdisc (e.g., netem) on incoming traffic?
A:You need to use IFB. This network device allows attaching queueingdisciplines to incoming packets.
Touse an IFB, you must have IFB support in your kernel (configurationoption CONFIG_IFB). Assuming that
youhave a modular kernel, the name of the IFB module is ifb and may beloaded using the command
modprobeifb (if you have modprobe installed) or insmod/path/to/module/ifb.
iplink set ifb0 up
iplink set ifb1 up
Bydefault, two IFB devices(ifb0 and ifb1) are created.
IFBallows for queueing incoming traffic for shaping instead of dropping.
ifb模块需要手动加载。
#modprobe ifb
启用虚拟设备ifb0。
#ip link set dev ifb0 up
使用ifb0做输入方向的重定向。
tcfilter add dev eth0 parent fff: protocol ip u32 match u32 0 0 flowid1:1 action mirred egress redirect dev ifb0
使用ifb0做输出方向的重定向。
tcfilter add dev eth0 parent 1: protocol ip u32 match u32 0 0 flowid1:1 action mirred egress redirect dev ifb0
实例
用ingressqdisc和ifb做ingress方向的队列调度。
#modprobe ifb
#ip link set dev ifb0 up txqueuelen 1000
#tc qdisc add dev eth1 ingress
#tc filter add dev eth1 parent ffff: protocol ip u32 match u32 0 0flowid 1:1 action mirred egress redirect dev ifb0
#tc qdisc add dev ifb0 root netem delay 50ms loss 1%
Author
zhangskd@ csdn
Reference
[1] http://lartc.org/howto/index.html
[2] http://www.linuxfoundation.org/collaborate/workgroups/networking/ifb
TheIntermediate Functional Block device isthe successor to the IMQ iptables module that was neverintegrated. Advantage over current IMQ; cleaner in particular inSMP; with a _lot_ less code. Old Dummy device functionality ispreserved while new one only kicks in if you use actions.
Touse an IFB, you must have IFB support in your kernel (configurationoption CONFIG_IFB). Assuming that you have a modular kernel, the nameof the IFB module is 'ifb' and may be loaded using thecommand modprobe ifb (ifyou have modprobe installed) or insmod/path/to/module/ifb.
iplink set ifb0 up
iplink set ifb1 up
Bydefault, two IFB devices (ifb0 and ifb1) are created
Asfar as i know the reasons listed below is why people use IMQ. Itwould be nice to know of anything else that i missed.
qdiscs/policiesthat are per device as opposed to system wide. IMQ allows forsharing.
Allowsfor queueing incoming traffic for shaping instead of dropping.I am not aware of any study that shows policing is worse thanshaping in achieving the end goal of rate control. I would beinterested if anyone is experimenting. (re shaping vs policing: thedesire for shaping comes more from the need to have complex ruleslike with htb)
Very interestinguse: if you are serving p2p you may wanna give preference to yourown localy originated traffic (when responses come back) vs someoneusing your system to do bittorent. So QoSing based on state comes inas the solution. What people did to achieve this was stick the IMQsomewhere prelocal hook. I think this is a pretty neat feature tohave in Linux in general. (i.e not just for IMQ).
ButI wont go back to putting netfilter hooks in the device to satisfythis. I also dont think its worth it hacking ifb some more tobe
aware of say L3 info and play iprule tricks to achieve this.
Insteadthe plan is to have a contrack related action. This action willselectively either query/create contrack state on incoming packets.Packets could then be redirected to ifb based on what happens (e.g.on incoming packets); if we find they are of known state we couldsend to a different queue than one which didnt have existing state.This all however is dependent on whatever rules the admin enters.
Atthe moment this function does not exist yet. I have decided insteadof sitting on the patch to release it and then if theres pressure iwill add this feature.
Whatyou can do with ifb currently with actions
Whatyou can do with ifb currently with actions
Letssay you are policing packets from alias 192.168.200.200/32 youdont want those to exceed 100kbps going out.
tcfilter add dev eth0 parent 1: protocol ip prio 10 u32 matchip src 192.168.200.200/32 flowid1:2 action police rate 100kbit burst 90k drop
Ifyou run tcpdump on eth0 you will see all packets going out withsrc 192.168.200.200/32 droppedor not
Extendthe rule a little to see only the ones that made it out:
tcfilter add dev eth0 parent 1: protocol ip prio 10 u32 match ipsrc 192.168.200.200/32 flowid1:2 action police rate 10kbit burst 90k drop actionmirred egress mirror dev ifb0
Nowfire tcpdump on ifb0 to see only those packets ..
tcpdump-n -i ifb0 -x -e -t
Essentiallya good debugging/logging interface.
Ifyou replace mirror with redirect, those packets will be blackholedand will never make it out. This redirect behavior changes with newpatch (but not the mirror).
IFBExample
Many readershave found this page to be unhelpful in terms of expressing how IFBis useful and how it should be used usefully.
Theseexamples are taken from a posting of Jamalat http://www.mail-archive.com/[email protected]/msg04900.html
Whatthis script will demonstrate is the following sequence:
any packet coming going out on eth0 10.0.0.229 is classified as class1:10 and redirected to ifb0.
on reaching ifb0 the packet is classified as class 1:2
subjected to a token buffer shaping of rate 20kbit/s
sent back to eth0
on coming back to eth0, the classificaction 1:10 is still valid andthis packet is put through an HTB classifier which limits the rate to256Kbps
What thisscript will demonstrate is the following sequence:
1)any packet coming going out on eth0 10.0.0.229 is classified as
class1:10 and redirected to ifb0.
2) a)on reaching ifb0 the packet is classified as class 1:2
b) subjected to a token buffer shaping of rate 20kbit/s
c) sent back to eth0
3) on comingback to eth0, the classificaction 1:10 is still valid
andthis packet is put through an HTB classifier which limits the rate
to256Kbps
exportTC="/sbin/tc"
$TCqdisc del dev ifb0 root handle 1: prio
$TCqdisc add dev ifb0 root handle 1: prio
$TCqdisc add dev ifb0 parent 1:1 handle 10: sfq
$TCqdisc add dev ifb0 parent 1:2 handle 20: tbf \
rate20kbit buffer 1600 limit 3000
$TCqdisc add dev ifb0 parent 1:3 handle 30:sfq
$TCfilter add dev ifb0 parent 1: protocol ip prio 1 u32 \
matchip dst 11.0.0.0/24 flowid1:1
$TC filter add dev ifb0 parent1: protocol ip prio 2 u32 \
matchip dst 10.0.0.0/24 flowid1:2
ifconfig ifb0 up
$TCqdisc del dev eth0 root handle 1: htb default 2
$TCqdisc add dev eth0 root handle 1: htb default 2
$TCclass add dev eth0 parent 1: classid 1:1 htb rate 800Kbit
$TCclass add dev eth0 parent 1: classid 1:2 htb rate 800Kbit
$TCclass add dev eth0 parent 1:1 classid 1:10 htb rate 256kbit ceil384kbit
$TC class add dev eth0parent 1:1 classid 1:20 htb rate 512kbit ceil 648kbit
$TCfilter add dev eth0 parent 1: protocol ip prio 1 u32 \
matchip dst 10.0.0.229/32 flowid1:10 \
action mirred egressredirect dev ifb0
A Littletest (be careful if you are sshed in and are classifying on
thatIP, counters may be not easy to follow)
-----
Aping ...
mambo:~# ping -c210.0.0.229
// first atifb0
// observe that secondfilter twice being successful
mambo:~#$TC -s filter show dev ifb0 parent 1:
filterprotocol ip pref 1 u32
filterprotocol ip pref 1 u32 fh 800: ht divisor 1
filterprotocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0flowid
1:1 (rule hit 2success 0)
match0b000000/ffffff00 at 16 (success 0 )
filterprotocol ip pref 2 u32
filterprotocol ip pref 2 u32 fh 801: ht divisor 1
filterprotocol ip pref 2 u32 fh 801::800 order 2048 key ht 801 bkt 0flowid
1:2 (rule hit 2success 2)
match0a000000/ffffff00 at 16 (success 2 )
//nextthe qdisc numbers ..
//Observe that1:2 has 2 packets
mambo:~#$TC -s qdisc show dev ifb0
qdiscprio 1: bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
Sent196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
qdiscsfq 10: parent 1:1 limit 128p quantum 1514b
Sent0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
qdisctbf 20: parent 1:2 rate 20000bit burst 1599b lat 546.9ms
Sent196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
qdiscsfq 30: parent 1:3 limit 128p quantum 1514b
Sent0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
//Next look at eth0
// observe class1:10 which is where the pings went through after
//they came back from the ifb0 device.
mambo:~#$TC -s class show dev eth0
classhtb 1:1 root rate 800000bit ceil 800000bit burst 1699b cburst1699b
Sent 196 bytes 2 pkt (dropped0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
lended:0 borrowed: 0 giants: 0
tokens:16425 ctokens: 16425
classhtb 1:10 parent 1:1 prio 0 rate 256000bit ceil 384000bit burst1631b
cburst 1647b
Sent196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
lended:2 borrowed: 0 giants: 0
tokens:49152 ctokens: 33110
classhtb 1:2 root prio 0 rate 800000bit ceil 800000bit burst 1699b cburst1699b
Sent 47714 bytes 321 pkt(dropped 0, overlimits 0 requeues 0)
rate3920bit 3pps backlog 0b 0p requeues 0
lended:321 borrowed: 0 giants: 0
tokens:16262 ctokens: 16262
classhtb 1:20 parent 1:1 prio 0 rate 512000bit ceil 648000bit burst1663b
cburst 1680b
Sent0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
rate0bit 0pps backlog 0b 0p requeues 0
lended:0 borrowed: 0 giants: 0
tokens:26624 ctokens: 21251
-----
mambo:~#$TC -s filter show dev eth0 parent 1:
filterprotocol ip pref 1 u32
filterprotocol ip pref 1 u32 fh 800: ht divisor 1
filterprotocol ip pref 1 u32 fh 800::800 order 2048 key ht 800 bkt 0flowid
1:10 (rule hit 235success 4)
match0a0000e5/ffffffff at 16 (success 4 )
action order 1: mirred (Egress Redirect to device ifb0)stolen
index 2 ref 1 bind 1 installed 114 sec used 100 sec
Action statistics:
Sent 196 bytes 2 pkt (dropped 0, overlimits 0 requeues 0)
rate 0bit 0pps backlog 0b 0p requeues 0
IFBrequirements
Inorder to use ifb you need:
Supportfor ifb on kernel (2.6.20 works OK)
Menu option: Device drivers -> Network device support ->Intermediate Functional Block support
Module name: ifb
Tc iproute2 with support of"actions" (2.6.20 - 20070313 works OK and package fromDebian etch is outdated). You can download it fromhere: http://developer.osdl.org/dev/iproute2/download/
Ingressqdisc
Allqdiscs discussed so far are egress qdiscs. Each interface however canalso have an ingress qdisc which is not used to send packets out tothe network adaptor. Instead, it allows you to apply tc filters topackets coming in over the interface, regardless of whether they havea local destination or are to be forwarded.
As the tcfilters contain a full Token Bucket Filter implementation, and arealso able to match on the kernel flow estimator, there is a lot offunctionality available. This effectively allows you to policeincoming traffic, before it even enters the IP stack.
14.4.1.Parameters & usage
The ingress qdisc itself does notrequire any parameters. It differs from other qdiscs in that it doesnot occupy the root of a device. Attach it like this:
#delete original
tc qdisc del dev eth0 ingress
tc qdisc deldev eth0 root
#add new qdisc and filter
tc qdisc add dev eth0 ingress
tcfilter add dev eth0 parent ffff: protocol ip prio 50 u32 matchip src 0.0.0.0/0 policerate 2048kbps burst 1m drop flowid :1
tc qdisc add dev eth0 roottbf rate 2048kbps latency 50ms burst 1m
Iplayed a bit with the ingress qdisc after seeing Patrick andStef
talking about it and came up with a few notes and a fewquestions.
: The ingress qdisc itself has no parameters. The only thing you can do
: is using the policers. I havea link with a patch to extend this:
: http://www.cyberus.ca/~hadi/patches/action/ Maybethis can help.
:
: I have some more info about ingress inmy mail files, but I have to
: sort it out and put it somewhereon docum.org. But I still didn't
: found the the time to do so.
Regardingpolicers and the ingress qdisc. I have never used thembefore
today, but have the following understanding.
Aboutthe ingress qdisc:
- ingress qdisc (known as "ffff:") can't have any childrenclasses (hence the existence of IMQ)
- the only thing you can do with the ingress qdisc is attachfilters
Aboutfiltering on the ingress qdisc:
-since there are no classes to which to direct the packets, theonly reasonable option (reasonable, indeed!) is to drop thepackets
- with clever use of filtering, you can limit particulartraffic signatures to particular uses of your bandwidth
QoSUsing ifb and ingress qdisc
Addsome qdisc/class/filter to eth0/ifb0/ifb1
tcqdisc add dev eth0 ingress 2>/dev/null
# ingressfilter
tc filter add dev eth0 parent ffff: protocol ip prio 10u32 match u32 0 0 flowid 1:1 action mirred egress redirect devifb0
# egress filter
tc filter add dev eth0 parent 1:protocol ip prio 10 u32 match u32 0 0 flowid 1:1 action mirredegress redirect dev ifb1