使用 IMQ+HTB+iptable 统一流量控制心得

 
  IMQ 是中介队列设备的简称,是一个虚拟的网卡设备,与物理网卡不同的是,通过它可以进行全局的流量整形,不需要一个网卡一个网卡地限速。这对有多个ISP接入的情况特别方便。配合 Iptables,可以非常方便地进行上传和下载限速。

 IMQ(Intermediate queueing device,中介队列
设备)
中介队列设备不是一个队列规定,但它的使用与队列规定是紧密相连的.就Linux
而言,队列规定是附带在网卡上的,所有在这个网卡上排队的数据都排进这个队
列规定.根据这个概念,出现了两个局限:
1. 只能进行出口整形(虽然也存在入口队列规定,但在上面实现分类的队列规定
的可能性非常小).
2. 一个队列规定只能处理一块网卡的流量,无法设置全局的限速.
IMQ就是用来解决上述两个局限的.简单地说,你可以往一个队列规定中放任
何东西.被打了特定标记的数据包在netfilter的NF_IP_PRE_ROUTING 和
NF_IP_POST_ROUTING两个钩子函数处被拦截,并被送到一个队列规定中,该
队列规定附加到一个IMQ设备上.对数据包打标记要用到iptables的一种处理方
法.
这样你就可以对刚刚进入网卡的数据包打上标记进行入口整形,或者把网卡们当
成一个个的类来看待而进行全局整形设置.你还可以做很多事情,比如:把http
流量放到一个队列规定中去,把新的连接请求放到一个队列规定中去,……


一、网络环境简介
目的站点 (Internet)
    IP: 218.x.x.x (用 TARGET_IP 表示)
路由器 (Router)
    eth1_外网IP: 221.x.x.x (用INET_IP表示)
    eth0_内网IP: 192.168.0.1 (用 GW_IP 表示)
内网客户机 (PC)
    IP: 192.168.0.2 (用 LAN_IP 表示)

二、数据包流程分析
要明白怎么控制流量,在什么地方控制,首先得弄清楚数据包从进网卡到出网卡这个过程中,内核对数据包做了哪些操作,具体可以看这里的图示:
http://www.docum.org/docum.org/kptd/
下面就客户机上传下载时,数据包的流程走向进行一些分析

 下载流程
========
    PC 向 Internet 发起数据下载请求
    Internet 回应相应数据
    数据包通过 eth1 流入 Router [src: TARGET_IP, dst: INET_IP]
    Router 重写目的地址(DNAT) [src:TARGET_IP, dst: LAN_IP]
   
    转发到局域网网卡 eth0
    通过 eth0 流出 Router, 进入局域网
    PC 接收到数据
    显然,要控制 PC 的下载速率,在 DNAT 之后可以做到
    小结: 下载控制, 控制外网向客户机发送数据的速率
         (在 DNAT之后, iptables 的 POSTROUTING 链)

上传流程
========
    Internet 向 PC 发起数据上传请求
    PC 回应相应数据
    数据包通过 eth0 流入 Router [src: LAN_IP, dst: TARGET_IP]
   
    Router 重写源标地址(SNAT) [src:INET_IP, dst: TARGET_IP]
    转发到广域网网卡 eth1
    通过 eth1 流出 Router, 进入 Internet
    Internet 接收到数据
    显然,要控制 PC 的上传速率,在 SNAT 之前可以做到
    小结: 上传控制, 控制客户机向外网发送数据的速率
          (在 SNAT之前, iptables 的 PREROUTING 链)

三、让Linux支持 IMQ

    Linux 内核以及 Iptables 并不直接支持 IMQ, 需要打补丁才行.
    我用的是 linux-2.6.18, iptables-1.3.6,可以从 http://www.linuximq.net/
    或 http://www.digriz.org.uk/jdg-qos-script/ 下载到
    打补丁的过程就不多说了....
    内核支持 IMQ 后,通过 ip link show 可以看到有 imq0 这样的设备
    (有多少个取决于你编译内核时的配置,默认有2个)

四、配合 Iptables 限速
    假设 imq0 用于下载限速,imq1 用于上传限速, 先设置好 imq 设备的队列规定、过滤器之类的,如同真实网卡一样
    IMQ 规则定义好后,只需在 iptables 的 mangle 链中加入2条规则即可:
        #### 下载限速, 出口 eth0
        iptables -t mangle -A POSTROUTING -o eth0 -j IMQ --todev 0
        #### 上传限速,入口 eth0
        iptables -t mangle -A PREROUTING -i eth0 -j IMQ --todev
 1

五、单机限速
    下载限速: 判断数据包的目的 IP
    上传限速: 判断数据包的来源 IP
    提示: 由于上传限速流控点是在 SNAT 之前,那是数据包里面还含有局域网IP的信息,故可以直接根据IP源信息来定位,无需再通过 iptables 做 MARK.
    TC 例子:
        ### 限制 192.168.0.2 下载 100K,最大 120K
        tc class add dev imq0 parent 1:1 classid 1:10 htb \
           rate 100kbps ceil 120kbps burst 10kb prio 2
        tc qdisc add dev imq0 parent 1:10 handle 10 sfq perturb 10
        tc filter add dev imq0 protocol ip parent 1:0 prio 100 u32 \
           match ip dst 192.168.0.2 classid 1:10
        ### 限制 192.168.0.2 上传 40K,最大 50K
        tc class add dev imq1 parent 1:1 classid 1:10 htb \
           rate 40kbps ceil 50kbps burst 10kb prio 2
        tc qdisc add dev imq1 parent 1:10 handle 10 sfq perturb 10
        tc filter add dev imq1 protocol ip parent 1:0 prio 100 u32 \
           match ip src 192.168.0.2 classid 1:10



9.7.1. 配置范例
我们首先想到的是进行入口整形,以便让你自己得到高保证的带宽 .就象配置
其它网卡一样:
tc qdisc add dev imq0 root handle 1: htb default 20
tc class add dev imq0 parent 1: classid 1:1 htb rate 2mbit burst 15k
tc class add dev imq0 parent 1:1 classid 1:10 htb rate 1mbit
tc class add dev imq0 parent 1:1 classid 1:20 htb rate 1mbit
tc qdisc add dev imq0 parent 1:10 handle 10: pfifo
tc qdisc add dev imq0 parent 1:20 handle 20: sfq
tc filter add dev imq0 parent 10:0 protocol ip prio 1 u32 match ip dst 10.0.0.230/32 flowid 1:10

在这个例子中,使用了u32进行分类.其它的分类器应该也能实现.然后,被打
上标记的包被送到imq0排队.
iptables -t mangle -A PREROUTING -i eth0 -j IMQ --todev 0
ip link set imq0 up
iptables的IMQ处理方法只能用在PREROUTING和POSTROUTING链的mangle
表中.语法是:
IMQ [ --todev n ]
n: imq设备的编号
注:ip6tables也提供了这种处理方法.
请注意,如果数据流是事后才匹配到IMQ处理方法上的,数据就不会入队.数
据流进入imq的确切位置取决于这个数据流究竟是流进的还是流出的.下面是
netfilter(也就是iptables)在内核中预先定义优先级:
enum nf_ip_hook_priorities {
NF_IP_PRI_FIRST = INT_MIN,
NF_IP_PRI_CONNTRACK = -200,
NF_IP_PRI_MANGLE = -150,
NF_IP_PRI_NAT_DST = -100,
NF_IP_PRI_FILTER = 0,
NF_IP_PRI_NAT_SRC = 100,
NF_IP_PRI_LAST = INT_MAX,
};
对于流入的包,imq把自己注册为优先权等于NF_IP_PRI_MANGLE+1,也就是
说数据包在经过了PREROUTING链的mangle表之后才进入imq设备.
对于流出的包,imq使用优先权等于NF_IP_PRI_LAST,也就是说不会白白处理
本应该被filter表丢弃的数据包.
关于补丁和更多的文档请参阅imq网站.




Ingress qdisc

All qdiscs discussed so far are egress qdiscs. Each interface however can also have an ingress qdisc which is not used to send packets out to the network adaptor. Instead, it allows you to apply tc filters to packets coming in over the interface, regardless of whether they have a local destination or are to be forwarded.

As the tc filters contain a full Token Bucket Filter implementation, and are also able to match on the kernel flow estimator, there is a lot of functionality available. This effectively allows you to police incoming traffic, before it even enters the IP stack.

14.4.1. Parameters & usage

The ingress qdisc itself does not require any parameters. It differs from other qdiscs in that it does not occupy the root of a device. Attach it like this:

# delete original
tc qdisc del dev eth0 ingress
tc qdisc del dev eth0 root

# add new qdisc and filter
tc qdisc add dev eth0 ingress
tc filter add dev eth0 parent ffff: protocol ip prio 50  u32 match ip src 0.0.0.0/0 police rate 2048kbps burst 1m drop flowid :1
tc qdisc add dev eth0 root tbf rate 2048kbps latency 50ms burst 1m


I played a bit with the ingress qdisc after seeing Patrick and Stef
talking about it and came up with a few notes and a few questions.

: The ingress qdisc itself has no parameters.  The only thing you can do
: is using the policers.  I have a link with a patch to extend this :
http://www.cyberus.ca/~hadi/patches/action/ Maybe this can help.
:
: I have some more info about ingress in my mail files, but I have to
: sort it out and put it somewhere on docum.org.  But I still didn't
: found the the time to do so.

Regarding policers and the ingress qdisc.  I have never used them before
today, but have the following understanding.

About the ingress qdisc:

  - ingress qdisc (known as "ffff:") can't have any children classes     (hence the existence of IMQ)
  - the only thing you can do with the ingress qdisc is attach filters


About filtering on the ingress qdisc:

   - since there are no classes to which to direct the packets, the only reasonable option (reasonable, indeed!) is to drop the packets
  - with clever use of filtering, you can limit particular traffic signatures to particular uses of your bandwidth


Here's an example of using an ingress policer to limit inbound traffic
from a particular set of IPs on a per IP basis.  In this case, traffic
from each of these source IPs is limited to a T1's worth of bandwidth.
Note that this means that this host can receive up to 1536kbit (768kbit +
768kbit) worth of bandwidth from these two source IPs alone.

# -- start of script
#! /bin/ash
#
# -- simulate a much smaller amount of bandwidth than the 100MBit interface
#
RATE=1536kbit
DEV=eth0
SOURCES="10.168.53.2/32 10.168.73.10/32 10.168.28.20/32"

# -- attach our ingress qdisc
#
tc qdisc add dev $DEV ingress

# -- cap bandwidth from particular source IPs
#

for SOURCE in $SOURCES ; do

  tc filter add dev $DEV parent ffff: protocol ip   \
    u32 match ip src $SOURCE flowid :1              \
    police rate $RATE mtu 12k burst 10k drop

done

# -- end of script

Now, if you are using multiple public IPs on your masquerading/SNAT host,
you can use "u32 match ip dst $PER_IP" with a drop action to force a
particular rate on inbound traffic to that IP.

My entirely unquantified impression is that latency suffers as a result,
but traffic is indeed bandwidth limited.

Just a few notes of dissection:

  tc filter add dev $DEV   # -- the usual beginnings
    parent ffff:           # -- the ingress qdisc itself
    protocol ip            # -- more preamble  | make sure to visit
    u32 match ip           # -- u32 classifier | http://lartc.org/howto/
    src $SOURCE            # -- could also be "dst $SOME_LOCAL_IP"
    flowid :1              # -- ??? (but it doesn't work without this)
    police rate $RATE      # -- put a policer here
    mtu 12k burst 10k      # -- ???
    drop                   # -- drop packets exceeding our police params

Maybe a guru or two out there (Stef?, Bert?, Jamal?, Werner?) can explain
why mtu needs to be larger than 1k (didn't work for me anyway) and also
how these other parameters should be used.





你可能感兴趣的:(使用 IMQ+HTB+iptable 统一流量控制心得)