Performance of rdt3.0
- rdt3.0 is correct, but performance is very bad
[以下以1 Gbps link,15ms delay, 8000 bit packet为例]
RTT = round-trip-time = end-to-end delay * 2
[RTT = 15ms * 2 = 30ms]
[D_trans = L/R = 8000bits / 10^9 bits/sec = 0.008 ms]
utilization 利用率
- fraction of time sender busy sending
- U_sender = (L/R) / (RTT + L/R)
[U_sender = 0.008 / 30+0.008 = 0.00027]
- if RTT=30 msec, 1KB pkt every 30 msec: 33kB/sec throughput over 1 Gbps link
这里的throughput(吞吐量)的计算方法:1KB pkt需要0.03s,即一秒钟可以传输33个1KB的pkt,throughput就是33kB/sec
rdt3.0: stop-and-wait operation
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第1张图片](http://img.e-com-net.com/image/info8/fc83025c403a43e0af781f48fd998ca7.jpg)
- 明确RTT的范围,从发送数据完毕开始,到发送方接收到ACK为止
Pipelined protocols 流水线协议
- pipelining: sender allows multiple, “in-flight”, yet-to-be-acknowledged packets (packets with no ACK as yet)
- range of sequence numbers must be increased
- buffering at sender and/or receiver
- two generic forms of pipelined protocols: go-Back-N, selective repeat
pipelined protocols’ utilization
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第2张图片](http://img.e-com-net.com/image/info8/17ab142bd1e64db6a01da6e785fd8d1e.jpg)
- RTT的定义相较于rdt3.0的 stop-and-wait 仍然不变,是第一个文件的发送完毕开始,到收到第一个ACK结束
- 所以,这里的U_sender 的分母就是RTT+L/R,与文件数量无关
- 分子与文件数量有关,有几个就是几 L/R
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第3张图片](http://img.e-com-net.com/image/info8/62ba740937cb4cd5bde1f9fe52d7111b.jpg)
- “window” of up to N, consecutive packets with no ACK
- ACK(n): ACKs all packets up to, including sequence # n -“cumulative ACK”
对包括序列# n - "累积ACK "在内的所有数据包进行ACK
- may receive duplicate ACKs (see receiver)
- 每个pkt都有timer
- timeout(n): retransmit packet n and all higher seq # pkts in window
重传数据包n和所有在window中更高的seq # PKTS.
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第4张图片](http://img.e-com-net.com/image/info8/fff742dc640c4aeb808ba7d415965e68.jpg)
- 先看最左面,N=4,即窗口的大小就是4,一开始是0123
- sender 开始send pkt0,receiver开始rcv pkt0,deliver,然后send ack0证明成功收到了pkt0,sender这边rcv了ack0,接收到了receiver收到pkt0这个信息,然后send了pkt4,并且窗口移动一位,到1234(因为pkt0已经完事了),这个是一个成功的过程
- pkt2到中途loss了,receiver在没接收到pkt2的情况下,接收到了pkt3,receiver意识到了不对,中间丢东西了,然后立马discard(扔掉)了收到的pkt3,并且resend了ack1,不让sender的窗口继续移动(如果send了ack2/3,窗口会移动),pkt4和pkt5同理
- 当pkt2 timeout,sender意识到pkt2丢了,然后立马从pkt2开始重新send(此时窗口为2345,也就是从窗口的第一个开始send),send了pkt2345,然后receiver那边也照常接收,如果有问题就重复上面的过程
- deliver是交付到application layer
selective repeat
- receiver individually acknowledges all correctly received packets
- buffers packets, as needed, for eventual in-order delivery to upper layer
- sender only resends packets for which ACK not received (sender timer for each unACKed packet)
- sender window
- N consecutive seq #’s
- N个连续的seq#
- limits seq #s of sent, unACKed packets
- 限制发送的、未打包的数据包的seq次数
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第5张图片](http://img.e-com-net.com/image/info8/f89302dea4e741e2a924820ad12dfad8.jpg)
- 如果pkt丢了的话,后面的要么也是丢了,要么就会被buffer(缓存)【具体看后面的图】
- data from above : if next available seq # in window, send pkt
- timeout(n) : resend pkt n, restart timer
重发PKT n,重启定时器
- ACK(n) in [sendbase,sendbase+N-1]:
- mark pkt n as received
- 标记PKT n为收到
- if n smallest unACKed pkt, advance window base to next unACKed seq #
- 如果n是最小的unACKed pkt,提前窗口到下一个unACKed seq#
- 【↑此处存疑,等以后补充】
- pkt n in [rcvbase, rcvbase+N-1]
- send ACK(n)
- out-of-order: buffer
- in-order: deliver (also deliver buffered, in-order pkts), advance window to next not-yet-received pkt
- pkt n in [rcvbase-N,rcvbase-1]
- otherwise : ignore
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第6张图片](http://img.e-com-net.com/image/info8/04dcc080ed4a498aae69caa9cb2da278.jpg)
- sender window 的移动方式和之是一样的
- 当pkt2 loss时,pkt345都照常发送,但是在receiver处被buffer,同时和正常情况send了ack345,当sender没接收到ack2而直接接收到了ack3时,他会将ack3 record,ack45也是同样情况
- pkt2 timeout,sender重新send了pkt2,当receiver收到pkt2时,他发现这个是之前没收到的pkt2,会按顺序的把pkt2和之前buffer的ptk345一起按顺序deliver,并且send ack2
- sender收到ack2,发现了receiver正常deliver了pkt2345,于是将window挪到了6789的位置(和正常完成pkt2345的结果是一样的)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第7张图片](http://img.e-com-net.com/image/info8/1d96cbb3f6a946afbd1269bc54d208e3.jpg)
- 很简单,就是告诉你ab两种情况receiver看到的都是一样的,a还好,b会出现数据传输错误,所以在这种情况下window size 要和重复数据的一组的大小相同(图中为4,即同时包括了0123)
TCP seq. numbers,ACKs
TCP header
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第8张图片](http://img.e-com-net.com/image/info8/f9b62945ecc84ef2975c2de96bc34d52.jpg)
- 主要关注sequence number和acknowledge number
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第9张图片](http://img.e-com-net.com/image/info8/1f100d30414c46c5a874682b43b030d2.jpg)
- 说实话,这个图我也没怎么看懂,我觉得不如看下面的定义和图来的实在
sequence number
- byte stream “number” of first byte in segment’s data
【会有一个初始值,再加上1,比如一开始的初始值是1000,那么seq num就是1001,也可以理解为,sender告诉receiver“我发送的数据是从序号1001”开始的】
acknowledgement number
- seq # of next byte expected from other side (cumulative ACK)
seq #从另一边期望的下一个字节(累积ACK)
【这边接收到了另一边的数据,并且反馈说我收到了这些数据,所以ACK就应该是上次收到的数据的seq+1,即期望对方下一次的sequence number是多少】
- 我们按照上面的定义来分析
- 第一次是从A到B,seq=42 说明A告诉B:“我这边发送的数据是从42号开始的”,ACK=79,说明A告诉B:“我想要的数据是从79号开始的”
- 第二次是从B到A,ACK=43.说明B告诉A:“我这边收到了42号哈,哥们你下次从43开始给我传”,seq=79,说明A告诉B:“我这边发送的数据是你之前要的从79号开始的哈”
- 第三次是由A到B,seq=43,说明A告诉B:“我想要的数据是从43号开始的”,ACK=80,说明A告诉B:“我收到了79号,我想要80号”
- 不想理解的省流记法:上一个的seq+1=下一个的ACK,上一个的ACK=下一个的seq
TCP RTT,timeout
How to set TCP timeout value?如何设置TCP的超时时间呢?
- 首先就是至少要比RTT长(废话,要不然文件没传完呢,直接timeout了),但是每次的RTT都不同
- 但是这个时间过短或者过长都有缺点
timeout value too short
- early timeout 过早的超时
- unnecessary retransmission 没必要的重传(浪费时间和资源)
timeout value too long
- slow reaction to segment loss 对段丢失反应缓慢
- 问题的关键就来到了如何去估算RTT上
- 第一个重要概念:Sample RTT
- SampleRTT: measured time from segment transmission until ACK receipt (ignore retransmissions)
- 从段传输到ACK接收的测量时间(忽略重发)
- 也就是我们之前在rdt3.0 stop-and-wait 和pipelining protocols 那里说的RTT的定义
- SampleRTT will vary, want estimated RTT “smoother”
- average several recent measurements, not just current SampleRTT
- EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT
- 前一个EstimatedRTT是new,后一个是old,即用前一个去计算后一个
- exponential weighted moving average
- 自回归移动
- influence of past sample decreases exponentially fast
- 过去样本的影响呈指数快速下降
- typical value: a = 0.125
- 典型值: a = 0.125
- EstimatedRTT_new = 0.875 · EstimatedRTT + 0.125 · SampleRTT
Jacobsen/Karel’s Algorithm
核心思想就是timeout interval=EstimatedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin
那么这个safety margin(安全界限)是什么呢
DevRTT = (1-b)*DevRTT +b *| SampleRTT-EstimatedRTT |
- | SampleRTT-EstimatedRTT | 为差值的绝对值
- (typically, b = 0.25)
- DevRTT 就是上面的safety margin
TimeoutInterval = EstimatedRTT + 4***DevRTT **
TCP fast retransmit
- time-out period often relatively long: long delay before resending lost packet
- detect lost segments via duplicate ACKs.
- sender often sends many segments back-to-back
- if segment is lost, there will likely be many duplicate ACKs.
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第10张图片](http://img.e-com-net.com/image/info8/176daa168a0c4997847f198b6f7151e7.jpg)
TCP flow control
- 简单来说,就是receiver来控制sender传输的速度
- receiver controls sender, so sender will not overflow receiver buffer by transmitting too much, too fast
- receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments
- rwnd = receive window,即receiver告诉sender我可以接受的数据量
- RcvBuffer size set via socket options (typical default is 4096 bytes)
- 通过套接字选项设置的RcvBuffer大小(典型的默认值是4096字节)
- many operating systems autoadjust RcvBuffer
- 许多操作系统自动调整RcvBuffer
- sender limits amount of unACKed (“in-flight”) data to receiver’s rwnd value
- guarantees receive buffer will not overflow
- MSS (mentioned in Nagle algorithm) is a parameter specifying the largest amount of data in a single IP datagram that should be sent by a remote host.
- MTU is a parameter specifying the largest amount of data that a communication protocol or system can pass onwards. For example, standards (e.g. Ethernet) can fix the size of an MTU, or systems (such as pointto-point serial links) may set MTU at connect time.
- MSS size is set according to MTU:
MSS = MTU – IP header size – TCP header size
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第11张图片](http://img.e-com-net.com/image/info8/6b34ca0ece1d4fbbadf66cc430636bec.jpg)
Nagle’s algorithm
- A problem can occur when an application generates data very slowly.
- Consider, ssh that generates data only when a user types.
- TCP will send the data as it arrives at the send buffer if there is space left in the send buffer.
- This means (for ssh) one packet sent every time user hits key.
- Overhead of this is huge (TCP header + IP header + frame header to send one byte).
这样的开销是巨大的(TCP报头+ IP报头+帧报头发送一个字节)。
- Cure is known as “Nagle’s algorithm”.
- The sending TCP sends the first piece of data it receives – no matter no small or large
sending TCP发送它接收到的第一块数据——无论大小
- Sending TCP accumulates data in the buffer and waits until one of the following before sending the segment:
- The receiving TCP sends an acknowledgement
- Data has accumulated to fill a maximum size segment
- Repeat step 2
- Note: Sometimes Nagle’s algorithm should be switched off – e.g. when fast interaction is vital and you want small packet sizes to be sent.
算法用来解决的问题——silly window syndrome
- Silly Window Syndrome occurs when the TCP system is forced to send very small packets. Named because window size is “silly”.
- This can happen in two separate ways:
Sender produces data very slowly.
- Same problem as Nagle’s algorithm.
Receiver processes data very slowly.
- Single byte or small number removed from full receive buffer.
- 从全部接收缓冲区中删除的单字节或小数目。
- Sender is informed of opportunity to send small number of bytes and immediately sends filling buffer.
- 发送方被告知有机会发送少量字节,并立即发送填充缓冲区。
- Process repeats.
- 过程反复
- Cure – receiver does not advertise windows that would cause sender to send small amounts of data.
TCP connection management
- before exchanging data, sender/receiver “handshake”
- agree to establish connection (each knowing the other willing to establish connection)
- 同意建立联系(双方都知道对方愿意建立联系)
- agree on connection parameters
- 同意连接参数
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第12张图片](http://img.e-com-net.com/image/info8/370523d23ce24be5aaafcff1135dccf5.jpg)
2-way handshake
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第13张图片](http://img.e-com-net.com/image/info8/91ce9c5443664859803657c4690ddc16.jpg)
2-ways handshake不适用的情况
- retransmitted messages (e.g. req_conn(x)) due to message loss
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第14张图片](http://img.e-com-net.com/image/info8/432ed3c3b7044cd1ba5f10c58e1e771e.jpg)
【这种情况就是在acc_conn(x)还没回来的时候就timeout了,于是client retransmit了一个,但是到了server的时候,connection已经结束了,所以只是建立了一个client-server的单项连接,即client连接到了server,但server不知道有client】
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第15张图片](http://img.e-com-net.com/image/info8/0f8ff0e38ddb4d2a8e0881efae2da0b8.jpg)
3-way handshake
分为 SYN —— SYN-ACK —— ACK
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第16张图片](http://img.e-com-net.com/image/info8/4e2a32703e6c4ea38829718280e87417.jpg)
- 第一次握手(SYN):SYNbit=1,告诉server想要建立新连接,seq=x,说明send了一个序号为x的数据来建立连接
- 第二次握手(SYN-ACK):SYNbit=1,ACKbit=1,告诉client server收到了数据,seq=y,说明server返回了一个序号为y的数据,ACKnum=x+1,说明接下来需要client发送序号为x+1的数据(和前面的ACK,seq意义相同)
- 第三次握手(ACK):ACKbit=1,告诉server我收到了你的数据,ACKnum=y+1,告诉server希望下次收到的是y+1开始的数据
在finite state machine中的图解
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第17张图片](http://img.e-com-net.com/image/info8/a04a20d0a60c4a15ad46aefac651efc2.jpg)
【不知道有什么用,只是说了TCP的三次握手在finite state machine中】
TCP:closing a connection
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第18张图片](http://img.e-com-net.com/image/info8/7f4e3e3061f44f31b723c0578a5d17a2.jpg)
【和建立连接的过程大同小异,中间的两个server → client的可以合在一起,叫做FIN-ACK,如果分开的话,在这两段之间client仍然可以向server传输数据】
Principles of congestion control
- 什么是congestion(拥塞)?
- informally: “too many sources sending too much data too fast for network to handle”
- how does this look?
- lost packets (buffer overflow at routers)
- long delays (queueing in router buffers)
causes and costs of congestion
- Too much traffic enters router – buffer fills up, this increases delay (and hence reduces throughput).
- Much too much traffic enters router – buffer overfills and causes loss. Packet needs to be retransmitted.
- If packet is lost after several “hops” then many resources are wasted. (e.g. Packet travels from A to B to C to D then lost at D – it has taken up space at A, B and C unnecessarily).
- Useful concept: goodput – this is the rate at which data reaches the application layer. Different from throughput because of:
- loss
- retransmission
- corrupted packets
additive increase & multiplicative decrease
- approach:sender increases transmission rate (window size), probing for usable bandwidth, until loss occurs
- Set cwnd – congestion window to initial value
- additive increase: increase cwnd by 1 MSS every RTT until loss detected
加性增加:每RTT增加1 MSS,直到检测到丢失
- multiplicative decrease: cut cwnd in half after loss
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第19张图片](http://img.e-com-net.com/image/info8/93ae499d25b44c608e8276fe3f4d3f3a.jpg)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第20张图片](http://img.e-com-net.com/image/info8/1e2c06cf48034d8bb5c89be87624b182.jpg)
- sender limits transaction: LastByteSent - LastByteAcked ≤ cwnd
【(黄色+绿色)- 绿色,就是黄色的部分要小于cwnd】
- cwnd is dynamic, function of perceived network congestion
- TCP sending rate:
- roughly: send cwnd bytes, wait RTT for ACKS, then send more bytes
- rate ≈ cwnd / RTT (bytes/sec)
slow start
- when connection begins, increase rate exponentially until first loss event:
- initially cwnd = 1 MSS
- double cwnd every RTT
- done by incrementing cwnd for every ACK received
- 通过增加接收到的每个ACK的cwnd来完成
- summary: initial rate is slow but ramps up exponentially fast
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第21张图片](http://img.e-com-net.com/image/info8/70995171371942869d920544282c056f.jpg)
TCP flavours TCP的种类
- There are lots of implementations of TCP.
- The protocol specifies certain things but leaves others free.
- For example TCP protocols can choose how they want to react to duplicate ACKs or what their initial window sizes are.
- TCP protocols are sometimes named after places with casinos (gambling): Reno, Tahoe, New Reno
TCP协议有时以赌场(赌博)的地方命名:Reno, Tahoe, New Reno
TCP: detecting, reacting to loss
TCP RENO 是怎么处理的
- loss indicated by timeout:
- cwnd set to 1 MSS;
- window then grows exponentially (as in slow start) to threshold, then grows linearly
- 窗口然后指数增长(如慢启动)到阈值,然后线性增长
TCP Tahoe 是怎么处理的
- TCP Tahoe always sets cwnd to 1 (timeout or 3 duplicate acks)
TCP Tahoe总是将cwnd设置为1(超时或3个重复的ack)
- Q: when should the exponential increase switch to linear?
- A: when cwnd gets to 1/2 of its value before timeout.
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第22张图片](http://img.e-com-net.com/image/info8/40191fa221a54f6783fc8573d3ab44b4.jpg)
- Implementation: 实现方法
- variable ssthresh 用变量ssthresh
- on loss event, ssthresh is set to 1/2 of cwnd just before loss event
- 丢失事件时,阈值设置为丢失事件前CWND的1/2
TCP throughput
- W: window size (measured in bytes) where loss occurs
- avg. window size (# in-flight bytes) is ¾ W
- avg. throughput is 3/4 W per RTT
- avg TCP thruput = 3/4 W/RTT (bytes/sec)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第23张图片](http://img.e-com-net.com/image/info8/62c8b44690594db6b54177e6b05a0908.jpg)
TCP Fairness
- fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K
Fairness and UDP
▪ multimedia apps often do not use TCP
- do not want rate throttled by congestion control
▪ instead use UDP: send audio/video at constant rate, tolerate packet loss
Fairness, parallel TCP connections
- application can open multiple parallel connections between two hosts
- web browsers do this
- e.g., link of rate R with 9 existing connections:
- 例如,速率为R的连接有9个现有连接:
- new app asks for 1 TCP, gets rate R/10
- 新应用程序请求1个TCP,得到速率R/10【1/(9+1)】
- new app asks for 11 TCPs, gets just over R/2
- 新的应用程序要求11个tcp,得到刚刚超过R/2【11/(9+11)】
开始Network Layer部分
Two key network-layer functions
- network-layer functions:
- forwarding: move packets from router’s input to appropriate router output
- 转发:将数据包从路由器的输入移动到适当的路由器输出
- routing: determine route taken by packets from source to destination
- 路由:确定数据包从源到目的所经过的路由
- forwarding: process of getting through one road junction.
- routing: process of planning trip from source to destination
Network layer
- transport segment from sending to receiving host
- on sending side encapsulates segments into datagrams
- on receiving side, delivers segments to transport layer
- network layer protocols in every host, router
- router examines header fields in all IP datagrams passing through it
IPv4 address notation
- There are three common notations to show an IPv4 address:
- binary notation 二进制记数法
- dotted-decimal notation (most commonly used) 点分十进制记数法(最常用)
- hexadecimal notation 十六进制表示法
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第24张图片](http://img.e-com-net.com/image/info8/39a9e5eb6e814eaeb6553c05aaff6931.jpg)
data plane
- local, per-router function
- determines how datagram arriving on router input port is forwarded to router output port
- forwarding function
- 【就是决定datagram怎么在路由器里从input传递到相应的output,主要用于路由器里的forwarding】
Control plane
- network-wide logic
- determines how datagram is routed among routers along end-end path from source host to destination host
- two control-plane approaches:
- traditional routing algorithms: implemented in routers
- 传统的路由算法:在路由器中实现
- software-defined networking (SDN): implemented in (remote) servers
- 软件定义网络(SDN):在(远程)服务器中实现
Per-router control plane
- Individual routing algorithm components in each and every router interact in the control plane
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第25张图片](http://img.e-com-net.com/image/info8/b1d1ef07e28f4a1eb42f9e096c048388.jpg)
【上面是control plane,下面是data plane,二者共同组成router】
Logically centralized control plane
- A distinct (typically remote) controller interacts with local control agents (CAs)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第26张图片](http://img.e-com-net.com/image/info8/e36701ffad714ca89e8162ceaa564369.jpg)
Network layer service models
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第27张图片](http://img.e-com-net.com/image/info8/99cbd2e93b1942aabc8640c7d2630d51.jpg)
【第一行是最常用的网络架构,“best effort”的意思是虽然Internet不能保证任何事,但是会尽全力】
Router architecture overview
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第28张图片](http://img.e-com-net.com/image/info8/6893d908ec3440ccb8cbb0dc21b1f541.jpg)
【数据从input ports进,output ports出,中间的high-speed switching fabric(高速交换网络)可以分配出口(比如下面的ports的数据从上面的ports出去)】
Input port function
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第29张图片](http://img.e-com-net.com/image/info8/ee5517d01bdf4e06a571eeaa831dcbdd.jpg)
- line termination:physical layer —— bit-level reception
- link layer protocol(receive):data link layer —— e.g., Ethernet
- lookup, forwarding: decentralized switching 分散切换
- using header field values, lookup output port using forwarding table in input port memory (“match plus action”)
- 使用报头字段值,使用输入端口内存中的转发表查找输出端口(“匹配+动作”)
- goal: complete input port processing at ‘line speed’
- 目标:以“线速度”完成输入端口处理
- queuing: if datagrams arrive faster than forwarding rate into switch fabric
- 排队:如果数据报到达交换机的速度比转发速度快
- destination-based forwarding: forward based only on destination IP address (traditional)
- 基于目的的转发:只基于目的IP地址转发(传统)
- generalized forwarding: forward based on any set of header field values
- 广义转发:基于任何一组报头字段值进行转发
Longest prefix matching
- when looking for forwarding table entry for given destination address, use longest address prefix that matches destination address.
- 查找指定目的地址的转发表表项时,使用与目的地址匹配的最长地址前缀。
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第30张图片](http://img.e-com-net.com/image/info8/18fa3bb0ae1e453e820a058ef6505cee.jpg)
- longest prefix matching: often performed using ternary content addressable memories (TCAMs) specialised very high speed memory
- 最长前缀匹配:通常使用三元内容可寻址存储器(TCAMs)专用高速存储器执行
- content addressable: present address to TCAM: retrieve address in one clock cycle, regardless of table size
- 内容可寻址:到TCAM的当前地址:在一个时钟周期内检索地址,无论表的大小
- Cisco Catalyst: can up ~1M routing table entries in TCAM
- 交换机:可以在TCAM中增加~1M的路由表项
switching fabrics
transfer packet from input buffer to appropriate output buffer
switching rate: rate at which packets can be transfer from inputs to outputs
- often measured as multiple of input/output line rate
- 通常用输入/输出线速率的倍数来衡量
- N inputs: switching rate N times line rate desirable
- N输入:开关速率N倍线路速率可取
three types of switching fabrics
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第31张图片](http://img.e-com-net.com/image/info8/f2a909f9a5ff4d37a2966656437d39e1.jpg)
Switching via memory
first generation routers:
- traditional computers with switching under direct control of CPU
- 由中央处理器直接控制开关的传统计算机
- packet copied to system’s memory
- 包拷贝到系统内存
- speed limited by memory bandwidth (2 bus crossings per datagram)
- 受内存带宽限制的速度(每个数据报有2个总线交叉)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第32张图片](http://img.e-com-net.com/image/info8/ec3cc3a76c4441b58ce1660779df44ab.jpg)
【每个packet从input port经过system bus到memory,再从memory经过system bus到output port】
Switching via a bus
- datagram from input port memory to output port memory via a shared bus
- 通过共享总线从输入端口内存到输出端口内存的数据报
- bus contention: switching speed limited by bus bandwidth
- 总线竞争:切换速度受总线带宽限制
- 32 Gbps bus, Cisco 5600: sufficient speed for access and enterprise routers
- 32 Gbps总线,Cisco 5600:足够的速度用于访问和企业路由器
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第33张图片](http://img.e-com-net.com/image/info8/d8b76744180240dd95549d2e44014129.jpg)
【以32 Gbps bus为例,但是假如输入的速率是10*10 Gbps,那么这个bus就会变成bottle neck,将速率控制在32Gbps】
Switching via interconnection network
- overcome bus bandwidth limitations
- 克服总线带宽限制
- banyan networks, crossbar, other interconnection nets initially developed to connect processors in multiprocessor
- advanced design: fragmenting datagram into fixed length cells, switch cells through the fabric.
- Cisco 12000: switches 60 Gbps through the interconnection network
Cisco 12000:通过互连网络交换60gbps
Input port queuing
- fabric slower than input ports combined → queueing may occur at input queues
- queueing delay and loss due to input buffer overflow!
- 由于输入缓冲区溢出导致排队延迟和丢失
- Head-of-the-Line (HOL) blocking: queued datagram at front of queue prevents others in queue from moving forward.
- head -of- line (HOL)阻塞:在队列前面排队的数据报阻止队列中的其他数据报向前移动
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第34张图片](http://img.e-com-net.com/image/info8/c351315953614956b818550e9f5ebc21.jpg)
Output port
- buffering required when datagrams arrive from fabric faster than the transmission rate
- Datagram (packets) can be lost due to congestion, lack of buffers
- 数据报(包)可能由于拥塞、缓冲区缺乏而丢失
- scheduling discipline chooses among queued datagrams for transmission
- Priority scheduling – who gets best performance, network neutrality
- 优先调度——谁能获得最好的性能,网络中立性
Output port queneing
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第35张图片](http://img.e-com-net.com/image/info8/d35e36cec3fd421288a0ab5280b45b93.jpg)
- buffering when arrival rate via switch exceeds output line speed
- 当通过开关的到达速率超过输出线速度时进行缓冲
- queueing (delay) and loss due to output port buffer overflow!
- 由于输出端口缓冲区溢出,排队(延迟)和丢失!
How much buffering?
- RFC 3439 rule of thumb: average buffering equal to “typical” RTT (say 250 msec) times link capacity C
- RFC 3439经验法则:平均缓冲等于“典型的”RTT(比如250 msec)乘以链路容量C
- e.g., C = 10 Gpbs link: 2.5 Gbit buffer
- recent recommendation: with N flows, buffering equal to (RTT*C) / (N^1/2)
Scheduling mechanisms
- scheduling: choose next packet to send on link.
- 调度:选择链路上发送的下一个报文。
- FIFO scheduling – first in first out 先进先出调度
- Like an orderly queue of people, no pushing in.
- 就像一排有序的队伍,没有人插队。
- If queue is full last packets are dropped.
- 如果队列已满,最后的数据包将被丢弃。
- Priority scheduling 优先调度
- Some packets are more important
- Example: You need live video packets now, email could wait.
- Round robin scheduling 循环调度法
- If your queue is from several inputs ports treat them fairly
- 如果您的队列来自多个输入端口,则公平对待它们
- Pick a packet from input port 1, then 2, then 3, then 4
- 从输入端口1中选取一个数据包,然后是端口2,端口3,端口4
- Port which is sending lots of traffic doesn’t block others.
- 发送大量流量的端口不会阻塞其他端口。
- Weighted Fair Queue (like this but give some queues a little more priority – give a little more traffic to port 1)
- 加权公平队列(就像这样,但给一些队列更多的优先级——给端口1更多的流量)【谁速度快谁权重大】
IP datagram format
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第36张图片](http://img.e-com-net.com/image/info8/0878b187674d4eae8344fd74635a525c.jpg)
IP fragmentation, reassembly
- network links have MTU (max.transfer size) -largest possible link-level frame
- 网络链路有MTU (max。传输大小)—最大可能的链路级帧
- 【说人话就是你最大的能传到下一层的数据量,你拆分的时候要按照最大的size=MTU去拆分】
- different link types, different MTUs
- large IP datagram divided (“fragmented”) within net
- 大型IP数据报在网络内被分割(“碎片化”)
- one datagram becomes several datagrams
- 一个数据报可以变成多个数据报
- “reassembled” only at final destination
- “重组”只在最终目的地
- IP header bits used to identify, order related fragments
- IP头位用于识别、排序相关片段
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第37张图片](http://img.e-com-net.com/image/info8/9d3410eeb4534232a742868bc945c211.jpg)
- length <= MTU
- fragflag:如果这个切片后面还有切片,那么就为1,否则为0
- offset:较长的分组的分片 , 中间的某个分片 , 在原来的 IP 分组中的相对位置 ; 单位是 8字节 ; 也就是说除了最后一个分片 , 每个分片的长度是 8字节的整数倍 ;
- 1480 bytes是因为有20bytes的IP头部
- MTU里也有20bytes的头部,所以需要1480+3*20 的总length
IP addressing
- IP address: 32-bit identifier for host, router interface
- IP地址:主机、路由器接口的32位标识符
- interface: connection between host/router and physical link
- 接口:主机/路由器与物理链路之间的连接
- routers typically have multiple interfaces
- 路由器通常有多个接口
- host typically has one or two interfaces (e.g., wired Ethernet, wireless 802.11)
- 主机通常有一个或两个接口(例如,有线以太网、无线802.11)。
- IP addresses associated with each interface
- IP地址与各接口关联
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第38张图片](http://img.e-com-net.com/image/info8/ab3e0407376541219a36c8bb1c29fd47.jpg)
Classful IP addressing
- In classful addressing, the IP address space is divided into five classes: A, B, C, D, E.
- 在分类寻址中,IP地址空间被分为五类:A、B、C、D、E。
- Starting number, n (first byte), shows whether Class A, B or C
- 起始数n(第一个字节)显示是A、B还是C类
- Class A: n<128 (up to 16m hosts)
- Class B: 128 <= n < 192 (up to 65K hosts)
- Class C: 192 <= n <224 (up to 254 hosts)
- 根据Class ABC的判定来判定那些是Netid,那些是Hostid
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第39张图片](http://img.e-com-net.com/image/info8/ff9086fa82b8455ea0b4b7a572605d18.jpg)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第40张图片](http://img.e-com-net.com/image/info8/86c73878c6a94f4686b56d9190ea89b2.jpg)
Addresses for private networks
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第41张图片](http://img.e-com-net.com/image/info8/915971d90912423fb2db68be1657e72a.jpg)
- These addresses are “special” and not used on the general Internet. You use them to set up test networks or networks of machines not accessible from outside.
- 这些地址是“特殊的”,不会在一般的互联网上使用。您可以使用它们来设置测试网络或无法从外部访问的机器网络。
- Subnet——device interfaces with same subnet part of IP address
can physically reach each other without intervening router
- subnet part - high order bits
- 子网部分-高阶位
- host part - low order bits
- 主机部分-低阶位
- E.g.中,223.1.1是subnet part,1为host part
to determine the subnets, detach each interface from its host or router, creating islands of isolated networks
each isolated network is called a subnet
subnet mask (or slash notation) number of bits taken to identify network
- /8 size of old class A
- /16 size of class B
- /24 size of class C
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第42张图片](http://img.e-com-net.com/image/info8/543a45f48fbb477c8124440af058402f.jpg)
Network mask and subnetwork mask
- Subnetting increases length of netid and decreases length of hostid.
- 子网划分增加了netid的长度,减少了hostid的长度。
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第43张图片](http://img.e-com-net.com/image/info8/9fa3091c68254555a401d58688de981d.jpg)
- To divide a network to s number of subnetworks, each of equal numbers of hosts, the subnetid for each subnetwork can be calculated as
- 要将一个网络划分为s个子网,每个子网的主机数量相等,每个子网的subnetid可以计算为
举例:classB的分割成四个子网,那么n_sub = 16 + log2(4) = 18,那么每一个子网的子网掩码就是18
Three-Level Addressing: Subnetting
- The idea of splitting a block to smaller blocks is referred to as subnetting.
- In subnetting, a network is divided into several smaller subnetworks (subnets) with each subnetwork having its own subnetwork address.
Why subnetting?
- an organization that was granted a large block of IP addresses (a long time ago this would be class A, class B etc)
- 一个被授予大量IP地址的组织(很久以前这是A类,B类等)
- wants to divide this into smaller blocks of addresses that are individual networks.
- 想把它分成更小的地址块,它们是单独的网络。
- Or perhaps organization wants to sell some of its IP addresses off.
- 或者组织想要出售一些IP地址。.
A subnetting example
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第44张图片](http://img.e-com-net.com/image/info8/98828fc7dceb48c6911a7ed2aa76ea51.jpg)
- 由图可知,这个网络是ClassB(第一个数字为141,以及最后是/16)
- 分为四个子网,所以n_sub=16+log2(4)=18
- 子网的IP地址的第三组八位二进制数分别为00000000,01000000,10000000,11000000【十八位的最后两位在这里是前两位】
- 所以子网的IP地址前三个数字就可以确定了,如图
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第45张图片](http://img.e-com-net.com/image/info8/922efd82216b454498e8d900a5f6a400.jpg)
- a private site router is used to divide the network into four subnetworks.
- 私网路由器用于将网络划分为四个子网。
- after subnetting, each subnetwork can now have almost 2^14 hosts.
- 子网划分后,每个子网可以有2^14个主机。
- /16 and /18 show the length of the netid and subnetids.
Classless addressing
- In classless addressing, variable-length blocks are assigned that belong to no class.
- 在无类寻址中,不属于任何类的可变长度块被分配。
- In this architecture, the entire address space (232 addresses) is divided into blocks of different sizes.
- 在这种体系结构中,整个地址空间(232个地址)被划分为不同大小的块。
- The slash notation is formally referred to as classless interdomain routing or CIDR (Classless InterDomainRouting) notation.
- 斜杠表示法正式称为无类域间路由表示法或CIDR(无类域间路由表示法)。
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第46张图片](http://img.e-com-net.com/image/info8/bbce2756a5904c9fb9de4e07ec28bdaf.jpg)
Network address is the address with all the host bits set to zero – address like any other but represents the network.
Network address:
The network address is also the first usable IP address in the block
Broadcast address (sends to everyone) is the address with all the host bits set to one.
E.g. 10000001.01000010.00011011.11111111
Broadcast address:
Splitting IP address by slash notation(方法及注意事项)
- For a /n address, the first n bits are the network address and the last 32-n bits are for the host.
- If the host parts are all 1s then this is the broadcast address – sent to all hosts on the network.
- Host has m = 32-n bits (e.g. /20 has m=12)
- Room for 2^m – 1 hosts (e.g. /20 has 4095)【减去了broadcast address,如果要avoid internet address的话就是2^m - 2】
- Example:
- Network address is
- Broadcast address is
- (95 is 01011111 255 is 11111111)
- /31 only has 1 host address
▪ /30 is smallest subnet we can have – 3 hosts
▪ /30 commonly used to connect just two routers
(which must be on same subnet).
VLSM (variable length subnet mask)
![【北邮国院大三上】互联网协议_Internet Protocol_PART B_第47张图片](http://img.e-com-net.com/image/info8/7d5e4d3b9327431a80274cf3725ca84f.jpg)
DHCP: Dynamic Host Configuration Protocol
- goal: allow host to dynamically obtain its IP address from network server when it joins network
- 目标:允许主机在加入网络时从网络服务器动态获取自己的IP地址
- can renew its lease on address in use
- 在使用的地址上可以续租
- allows reuse of addresses (only hold address while connected/“on”)
- 允许重复使用地址(仅在连接时保持地址/ " on ")
- support for mobile users who want to join network (more shortly)
- 对想要加入网络的移动用户的支持(更简短)