BT协议翻译
文章出处:http://wiki.theory.org/BitTorrentSpecification
翻译:陈亮
e-mail: [email protected]
版本: 0.1
说明:本文只翻译了原文中的一部分
日期:2004-11-8 --- 2004-11-9
Peer wire protocol (TCP) 基于tcp的那部分协议
Overview
总述
The peer protocol facilitates the exchange of pieces as described in the metainfo file.
这个通讯协议用来更加容易的交换在一个原文件中的文件片断。译注:原文件是指要下载的文件。
Note here that the original specification also used the term "piece" when describing the peer protocol, but as a different term than "piece" in the metainfo file. For that reason, the term "block" will be used in this specification to describe the data that is exchanged between peers over the wire.
注意:原来的协议中指定了这个单词“片断”,在本协议中也出现了但是和原来的不一样。因此,我们把它改称为“块”,用来表示在本协议中传输的数据。
A client must maintain state information for each connection that it has with a remote peer:
--choked: Whether or not the remote peer has choked this client. When a peer chokes the client, it is a notification that no requests will be answered until the client is unchoked. The client should not attempt to send requests for blocks, and it should consider all pending (unanswered) requests to be discarded by the remote peer.
--interested: Whether or not the remote peer is interested in something this client has to offer. This is a notification that the remote peer will begin requesting blocks when the client unchokes them.
如果一个客户端和另一个远程的客户端建立的连接,远程的客户端就必须保持两个状态信息。
--阻塞:表明远程客户端不接受本客户端的任何请求消息。所有请求消息都将被抛弃。
--感兴趣的:表明远程客户端接受本客户端的请求。
Note that this also implies that the client will also need to keep track of whether or not it is interested in the remote peer, and if it has the remote peer choked or unchoked. So, the real list looks something like this:
--am_choking: this client is choking the peer
--am_interested: this client is interested in the peer
--peer_choking: peer is choking this client
--peer_interested: peer is interested in this client
注意:这也是一个暗示表示本客户端也需要保持它是否是对远程客户端感兴趣的。所以一其就有4种状态:
--本对远阻塞:本客户端对远程客户端阻塞。
--本对远感兴趣:本客户端对远程客户端感兴趣。
--远对本阻塞:远程客户端对本客户端阻塞。
--远对本感兴趣:远程客户端对本客户端是感兴趣的。
Client connections start out as "choked" and "not interested". In other words:
--am_choking = 1
--am_interested = 0
--peer_choking = 1
--peer_interested = 0
用数字表示一下阻塞 和感兴趣
--本对远阻塞:1
--本对远感兴趣:0
--远对本阻塞:1
--远对本感兴趣:0
A block is downloaded by the client when the client is interested in a peer, and that peer is not choking the client. A block is uploaded by a client when the client is not choking a peer, and that peer is interested in the client.
当远程客户端对本客户端是非阻塞,本客户端对远程客户端感兴趣时,本客户端可以从远程客户端下载一块数据。当远程客户端对本客户端感兴趣,本客户端对远程客户端不阻塞时,本端可以上载一块数据到远程客户端。
It is important for the client to keep its peers informed as to whether or not it is interested in them. This state information should be kept up-to-date with each peer even when the client is choked. This will allow peers to know if the client will begin downloading when it is unchoked (and vice-versa).
本客户端保持远程客户端对它是否感兴趣的信息是重要的。即使本客户端被阻塞,这份信息应该保持为最新的。这将允许远程客户端知道如果本客户端将要开始下载本客户端什么时候被阻塞。(译注:没看明白,也许等写完客户端后就明白了)
Data Types
Unless specified otherwise, all integers in the peer wire protocol are encoded as four byte big-endian values. This includes the length prefix on all messages that come after the handshake.
数据类型
除非另有说明,所有的整数型在本协议中被编码为4字节(高位在前低位在后)值。这个包括了在握手之后所有信息的长度-前缀。
Message flow
The peer wire protocol consists of an initial handshake. After that, peers communicate via an exchange of length-prefixed messages. The length-prefix is an integer as described above.
以下的是信息的介绍:
本协议有一个握手过程。在握手之后远程客户端通讯通过一个带有长度前缀的交换信息。长度前缀在上面说明了.
Handshake
The handshake is a required message and must be the first message transmitted by the client.
handshake: <pstrlen><pstr><reserved><info_hash><peer_id>
--pstrlen: string length of <pstr>, as a single raw byte
--pstr: string identifier of the protocol
--reserved: eight (8) reserved bytes. Each bit in these bytes can be used to change the behavior of the protocol. An email from Bram suggests that trailing bits should be used first, so that leading bits may be used to change the meaning of trailing bits.
--info_hash: 20-byte SHA1 hash of the info key in the metainfo file. This is the same info_hash that is transmitted in tracker requests.
--peer_id: 20-byte string used as a unique ID for the client. This is the same peer_id that is transmitted in tracker requests.
In version 1.0 of the BitTorrent protocol, pstrlen=19, and pstr="BitTorrent protocol".
握手:
在所有的消息之前,握手必须通过。
握手格式: <pstrlen><pstr><reserved><info_hash><peer_id>
--pstrlen:<pstr>的长度。是ascii字符.必须为'19'
--pstr: 协议定义的字符串。必须为"BitTorrent protocol"
--reserved: 8字节的空字符.
--info_hash:20字节的SHA1值。详细的说明看 metainfo file.是info的SHA1值。可从.torrent文件中得到。
--peer_id: 20字节的唯一值用来标识远程客户端。从track服务器的返回中得到。
The initiator of a connection is expected to transmit their handshake immediately. The recipient may wait for the initiator's handshake, if it is capable of serving multiple torrents simultaneously (torrents are uniquely identified by their info_hash). However, the recipient must respond as soon as it sees the info_hash part of the handshake. The tracker's NAT-checking feature does not send the peer_id field of the handshake.
连接的发起者应该立即发送它的握手字符串。连接的接收者可能在等待发起者的握手,如果它有能力为这个文件提供下载(文件被--info_hash唯一的标识)。不论如何接收者都应该根据info_hash字段尽快的回复发起者。track服务器的NAT检查特性不会发送用于握手的peer_id字段????。
If a client receives a handshake with an info_hash that it is not currently serving, then the client must drop the connection.
如果远程客户端接收到一个info_hash不是它当前能提供的info_hash,则远程客户端必须断开连接。
If the initiator of the connection receives a handshake in which the peer_id does not match the expected peer_id, then the initiator is expected to drop the connection. Note that the initiator presumably received the peer information from the tracker, which includes the peer_id that was registered by the peer. The peer_id from the tracker and in the handshake are expected to match.
如果连接的发起者收到的peer_id和期望的peer_id不一样,则发起者断开连接。注意:发起者可以从track服务器收到远程客户端的信息。peer_id字段在tracker服务器和握手中期望是一致的。
peer_id
There are mainly two conventions how to encode client and client version information into the peer_id, Azureus-style and Shadow's-style.
peer_id的说明
这是两个主要的习惯如何编码客户端和客户端的版本信息到peer_id中。有Azureus的风格 和 Shadow的风格.
Azureus-style uses the following encoding: '-', two characters for client id, four ascii digits for version number, '-', followed by random numbers.
For example: '-AZ2060-'...
known clients that uses this encoding style are:
'AZ' - Azureus
'BB' - BitBuddy
'CT' - CTorrent
'MT' - MoonlightTorrent
'LT' - libtorrent
'BX' - Bittorrent X
'TS' - Torrentstorm
'TN' - TorrentDotNET
'SS' - SwarmScope
'XT' - XanTorrent
Azureus的风格使用以下的编码: '-', 客户端ID两个字符,4个ascii数据表示版本信息,'-',接着的是随机数。如:'-AZ2060-'...
已知的客户端使用本风格的有:
'AZ' - Azureus
'BB' - BitBuddy
'CT' - CTorrent
'MT' - MoonlightTorrent
'LT' - libtorrent
'BX' - Bittorrent X
'TS' - Torrentstorm
'TN' - TorrentDotNET
'SS' - SwarmScope
'XT' - XanTorrent
Shadow's style uses the following encoding: one ascii alphanumeric for client identification, three ascii digits for version number, '----', followed by random numbers.
For example: 'S587----'...
known clients that uses this encoding style are:
'S' - Shadow's client
'U' - UPnP NAT Bit Torrent
'T' - BitTornado
'A' - ABC
Shadow的风格使用了如下的编码:一个包括ascii字母或数字的客户端标识,三个ascii数字是版本信息, '----', 然扣是随机数.如:'S587----'...
已知的客户端使用本风格的有:
'S' - Shadow's client
'U' - UPnP NAT Bit Torrent
'T' - BitTornado
'A' - ABC
Bram's client now uses this style... 'M3-4-2--'.
Bram的客户端使用这种风格... 'M3-4-2--'.
BitComet does something different still. Its peer_id consists of four ASCII characters 'exbc', followed by a null byte, followed by a single ASCII numeric digit, followed by random characters. The digit seems to denote the version of the software, though it appears to have no connection with the real version number. The digit is incremented with each new BitComet release.
BitComet使用了不同的风格。它的peer_id包括了4个ascii字符'exbc',接着是一个空字符,接着是一个ascii数字,接着是随机字符。数字像是软件的版本号,通过它表示有一个真实版本号。数据是增长的在每个BitComet发行版中。
Many clients are using all random numbers or 12 zeroes followed by random numbers (like older versions of Bram's client).
多数的客户端使用所有的随机数或是12个0接着随机数。(像是老版本的Bram客户端)
Messages
All of the remaining messages in the protocol take the form of <length prefix><message ID><payload>. The length prefix is a four byte big-endian value. The message ID is a single decimal character. The payload is message dependent.
消息
在协议中其余所有的消息都要有以下的形式:<length prefix><message ID><payload>
--length prefix:前文有说明。
--message ID: 是一个10以内的十进制字符。
--payload: 是message决定的(随着message的不同而不同)。
keep-alive: <len=0000>
The keep-alive message is a message with zero bytes, specified with the length prefix set to zero. There is no message ID and no payload.
keep-alive: <len=0000>
keep-alive(保持活动)消息是一个0字节的消息,<length prefix>为0000.本消息没有<message ID>和<payload>。
choke: <len=0001><id=0>
The choke message is fixed-length and has no payload.
choke: <len=0001><id=0>
choke(阻塞)消息是一个固定长度的消息,并且没有<payload>。<length prefix>=0001, <message ID>=0。
unchoke: <len=0001><id=1>
The unchoke message is fixed-length and has no payload.
unchoke: <len=0001><id=1>
unchoke(疏通)消息是一个固定长度的消息,并且没有<payload>。<length prefix>=0001, <message ID>=1。
interested: <len=0001><id=2>
The interested message is fixed-length and has no payload.
interested: <len=0001><id=2>
interested(感兴趣的)消息是一个固定长度的消息,并且没有<payload>。<length prefix>=0001, <message ID>=2。
not interested: <len=0001><id=3>
The not interested message is fixed-length and has no payload.
not interested: <len=0001><id=3>
not interested(不感兴趣的)消息是一个固定长度的消息,并且没有<payload>。<length prefix>=0001, <message ID>=3。
have: <len=0005><id=4><piece index>
The have message is fixed length. The payload is the zero-based index of a piece that has been successfully downloaded.
have: <len=0005><id=4><piece index>
have(拥有一个片断)消息是一个固定长度的消息。<payload>字段表明它已经下载完哪一片,是一个基于0的索引。<length prefix>=0005, <message ID>=4。
bitfield: <len=0001+X><id=5><bitfield>
The bitfield message may only be sent immediately after the handshaking sequence is completed, and before any other messages are sent. It is optional, and need not be sent if a client has no pieces.
The bitfield message is variable length, where X is the length of the bitfield. The payload is a bitfield representing the pieces that have been successfully downloaded. The high bit in the first byte corresponds to piece index 0. Bits that are cleared indicated a missing piece, and set bits indicate a valid and available piece. Spare bits at the end are set to zero.
A bitfield of the wrong length is considered an error. Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set.
bitfield: <len=0001+X><id=5><bitfield>
bitfield(比特组)消息可能只在握手完成后立既发送(在其它任何消息发送之前)。它是可选的,如果本客户端不拥有任何片断,则它不需要被发送。
(比特组)消息的长度是变化的,X是后面<bitfield>(<payload>)字段的长度。<payload>字段是一个比特组表示已经被下载的片断。第一个字节的高位应该与第一个片断相对应,以此类推。当这个片断没下完或没下时应该置0,当片断已下完并且是活动的(可以被别人下)就置1.多余的(最后的)位都置0.
一个错误是比特组长度被认为是一个错误。如果本客户端收到一个长度错误的或是设置了多余的比特位的比特组,客户端应该断开连接。
request: <len=0013><id=6><index><begin><length>
The request message is fixed length, and is used to request a block. The payload contains the following information
--index: integer specifying the zero-based piece index
--begin: integer specifying the zero-based byte offset within the piece
--length: integer specifying the requested length. This value must not exceed 2^17 bytes, typical values are 2^15 bytes.
The observant reader will note that a block is typically smaller than a piece (which is commonly >= 2^18 bytes). A client should close the connection if it receives a request for more than 2^17 bytes.
request: <len=0013><id=6><index><begin><length>
request(请求)消息是一个固定长度的消息,它用来向远程客户端请求一个块。<payload>字段包括了以下的信息:
--index:表示了欲请求片断的索引(整数) ,4字节
--begin: 表示了在欲请求片断中的索引。哪一块。,4字节
--length:表示了请求块的大小。这个值必须不大于2^17字节,典型应该是 2^15字节。,4字节
读者注意到一片将一定要大于一块(通常要>=2^18字节)。如果一个客户端收到了一个大小2^^17字节,就应该关闭这个连接。
piece: <len=0009+X><id=7><index><begin><block>
The piece message is variable length, where X is the length of the block. The payload contains the following information
--index: integer specifying the zero-based piece index
--begin: integer specifying the zero-based byte offset within the piece
--block: block of data, which is a subset of the piece specified by index.
piece: <len=0009+X><id=7><index><begin><block>
piece(片断)消息是一个变长的消息,X是<block>的长度。<payload>字段包括了如下的信息:
--index: 表示了片断的索引(整数) ,4字节
--begin: 表示了在片断中的索引。哪一块。,4字节
--block: 是begin索引所指向的块的数据。
cancel: <len=0013><id=8><index><begin><length>
The cancel message is fixed length, and is used to cancel block requests. The payload is identical to that of the "request" message. It is typically used during "End Game" (see the Algorithms section below).
cancel: <len=0013><id=8><index><begin><length>
cancel(取消)消息是一个固定长度的消息,并且它被用来取消块请求。<payload>字段等同于"request"消息。
典型的它用在"End Game"中,看算法节。
Algorithms
Super Seeding
(This was not part of the original specification)
The super-seed feature in S-5.5 and on is a new seeding algorithm designed to help a torrent initiator with limited bandwidth "pump up" a large torrent, reducing the amount of data it needs to upload in order to spawn new seeds in the torrent.
When a seeding client enters "super-seed mode", it will not act as a standard seed, but masquerades as a normal client with no data. As clients connect, it will then inform them that it received a piece -- a piece that was never sent, or if all pieces were already sent, is very rare. This will induce the client to attempt to download only that piece.
When the client has finished downloading the piece, the seed will not inform it of any other pieces until it has seen the piece it had sent previously present on at least one other client. Until then, the client will not have access to any of the other pieces of the seed, and therefore will not waste the seed's bandwidth.
This method has resulted in much higher seeding efficiencies, by both inducing peers into taking only the rarest data, reducing the amount of redundant data sent, and limiting the amount of data sent to peers which do not contribute to the swarm. Prior to this, a seed might have to upload 150% to 200% of the total size of a torrent before other clients became seeds. However, a large torrent seeded with a single client running in super-seed mode was able to do so after only uploading 105% of the data. This is 150-200% more efficient than when using a standard seed.
Super-seed mode is NOT recommended for general use. While it does assist in the wider distribution of rare data, because it limits the selection of pieces a client can downlad, it also limits the ability of those clients to download data for pieces they have already partially retrieved. Therefore, super-seed mode is only recommended for initial seeding servers.
Why not rename it to e.g. "Initial Seeding Mode" or "Releaser Mode" then?
Piece downloading strategy
Clients may choose to download pieces in random order.
A better strategy is to download pieces in rarest first order. The client can determine this by keeping the initial bitfield from each peer, and updating it with every have message. Then, the client can download the pieces that appear least frequently in these peer bitfields.
End Game
When a download is almost complete, there's a tendency for the last few blocks to trickle in slowly. To speed this up, the client sends requests for all of its missing blocks to all of its peers. To keep this from becoming horribly inefficient, the client also sends a cancel to everyone else every time a block arrives.
There is no documented thresholds, recommended percentages, or block counts that could be used as a guide or Recommended Best Practice here.
Choking and Optimistic Unchoking
Choking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.
The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work well both in a network consisting entirely of themselves and in a network consisting mostly of this one.
There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to peers who let it download. Finally, it should try out unused connections once in a while to find out if they might be better than the currently used ones, known as optimistic unchoking.
The currently deployed choking algorithm avoids fibrillation by only changing choked peers once every ten seconds.
Reciprocation and number of uploads capping is managed by unchoking the four peers which have the best upload rate and are interested. This maximizes the client's download rate. These four peers are referred to as downloaders, because they are interested in downloading from the client.
Peers which have a better upload rate (as compared to the downloaders) but aren't interested get unchoked. If they become interested, the downloader with the worst upload rate gets choked. If a client has a complete file, it uses its upload rate rather than its download rate to decide which peers to unchoke.
For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of it's upload rate (if interested, it counts as one of the four allowed downloaders). Which peer is optimistically unchoked rotates every 30 seconds. Newly connected peers are three times as likely to start as the current optimistic unchoke as anywhere else in the rotation. This gives them a decent chance of getting a complete piece to upload.
Anti-snubbing
Occasionally a BitTorrent peer will be choked by all peers which it was formerly downloading from. In such cases it will usually continue to get poor download rates until the optimistic unchoke finds better peers. To mitigate this problem, when over a minute goes by without getting a single piece from a particular peer, BitTorrent assumes it is "snubbed" by that peer and doesn't upload to it except as an optimistic unchoke. This frequently results in more than one concurrent optimistic unchoke, (an exception to the exactly one optimistic unchoke rule mentioned above), which causes download rates to recover much more quickly when they falter.