基于Composite模式实现BitTorrent文件解析
.torrent文件中的元素
BitTorrent是当前最为流行的文件共享工具,在 www.bittorrent.com网站上,给出了BitTorrent所使用的协议;在该文档中,描述了.torrent文件的编码格式:bencode,参见rfc2047。据此文档描述,.torrent文件的元素类型为:
字符串(String):以数字开始,该数字为字符个数,在字符和数字之间用“:”分隔。如:4:spam表示“spam”。
整数:以“i”开始,以“e”结束,中间为数值。如:i3e表示3。
列表:以“l”开始,以“e”结束,中间为列表值。如:l4:spam4eggse表示[‘spam’, ‘eggs’]。
字典:以“d”开始,以“e”结束,中间为键、值对。如:d3:cow3:moo4:spam4eggse表示{‘cow’:’moo’, ‘spam’:’eggs’}。
其中列表和字典可以包含其他元素类型,如字典的值可以是列表,列表的值可以是字典。正是鉴于.torrent元素间的特征,才选用Composite模式。
基于Composite模式的设计
基于Composite模式和.torrent文件的实际情况,可以得到如图的设计类图。
DictionaryHandler和ListHandler是复合体,其特征是具有容器。
Handler是接口类(为实现方便,把一些简单公共功能在基类中实现)。
class
Handler
{
public
:
Handler(void);
Handler(ContentType type) : m_type(type){}
virtual ~Handler(void);
virtual int handle(istream& is) = 0;
virtual void get_result(ostream& os) const = 0;
ContentType get_type() const;
protected
:
//this method will verify if the next charactor in the file stream is 'c';
//if so, it will move one charactor in the file stream.
void check_one_char(istream& is, char c);
ContentType m_type;
};
类IntegerHandler实现了这些接口,并提供返回结果的方法:
class
IntegerHandler : public Handler
{
public
:
IntegerHandler() : Handler(NUM){}
virtual int handle(istream& is);
virtual void get_result(ostream& os) const;
int get_value() const;
private
:
int m_value;
};
同样,StringHandler也实现了这些接口
:
class
StringHandler : public Handler
{
public
:
StringHandler() : Handler(STRING){}
virtual int handle(istream& is);
virtual void get_result(ostream& os) const;
string get_value() const;
private
:
string m_value;
};
现在看复合体定义的不同之处
:容器。
class
ListHandler : public Handler
{
public
:
ListHandler() : Handler(LIST){}
virtual ~ListHandler();
virtual int handle(istream& is);
virtual void get_result(ostream& os) const;
//typedef list<Handler*> ItemList
ItemList get_value() const;
private
:
//container, which used for composite pattern
ItemList m_items;
};
class
DictionaryHandler : public Handler
{
public
:
DictionaryHandler() : Handler(DICT){}
virtual ~DictionaryHandler();
virtual int handle(istream& is);
virtual void get_result(ostream& os) const;
//typedef map<string, Handler*> DictMap;
DictMap get_value() const;
private
:
//container, which used for compositepattern
DictMap m_dicts;
};
以DictionaryHandler为例说明,Composite模式中如何在复合体中包含其他元素。在字典元素中,键总是字符串(否则就不是字典了);
所以总是:
1)以StringHandler先试图读取键值,如果有异常抛出,则可能遇到了其他类型的元素,或者结束该字典;这样处理的好处是:此代码同样能处理值。
如果捕获异常,现将文件流回滚;检查下一个字符:
2)如果是’e’,字典结束。
3)如果是’l’,表明列表开始,新建一个ListHandler来获取下一个元素信息;记录该对象到容器中;
4)如果是’i’,表明数字开始,新建一个IntegerHandler来获取下一个元素信息;记录该对象到容器中;
5)如果是’d’,表明数字开始,新建一个DictionaryHandler来获取下一个元素信息;记录该对象到容器中;
6)否则按照字符串处理,新建一个StringHandler来获取下一个元素信息;记录该对象到容器中;
以1)到6)反复执行,直到遇到’e’结束。
7)最后,返回从文件流中处理字符串的个数。
下图为其字典元素的调用时序:
源代码
这里给出DictionaryHandler和ListHandler实现代码
:
int
DictionaryHandler::handle(istream& is)
{
int pos_start = is.tellg();
check_one_char(is, 'd');
while(!is.eof())
{
#ifdef
_DEBUG
cout << "now handle:" << hex << uppercase <<is.peek() << endl;
#endif
//the key of a dictionary is always a string
//try to read a string key out of the file stream
//if failed, then set the current position in the file stream back
//then check if another content type is encountered.
string key = "";
StringHandler handler;
int pos_cur = is.tellg();
try
{
handler.handle(is);
key = handler.get_value();
}
catch(BitTorrentParserException& be)
{
#ifdef
_DEBUG
cout << be.what() << endl;
#endif
//set the position back
//this always happens at the end of the dictionary content
//or another list for the key starts
is.seekg(pos_cur, ios::beg);
is.clear();
}
//the same as the list handler
char c = is.peek();
if(c == 'e')
{
Handler::check_one_char(is, c);
break;
}
else if(c == 'l')
{
ListHandler* handler = new ListHandler();
handler->handle(is);
m_dicts.insert(make_pair(key, handler));
}
else if(c == 'd')
{
DictionaryHandler* handler = new DictionaryHandler();
handler->handle(is);
m_dicts.insert(make_pair(key, handler));
}
else if(c == 'i')
{
IntegerHandler* handler = new IntegerHandler();
handler->handle(is);
m_dicts.insert(make_pair(key, handler));
}
else
{
StringHandler* handler = new StringHandler();
handler->handle(is);
m_dicts.insert(make_pair(key, handler));
}
}
return static_cast<int>(is.tellg()) - pos_start;
}
//this method will extract between 'l' and 'e'
//@param, is, input file stream
//@return, the charactor read.
int ListHandler::handle(istream& is)
{
//record the first position in the file stream
int pos_start = is.tellg();
//bypass the 'l'
is.ignore(1, 'l');
while(!is.eof())
{
//check the content type
char c = is.peek();
if(c == 'e')
{
//if 'e', then end the list
Handler::check_one_char(is, c);
break;
}
else if(c == 'd')
{
//if 'd', the new a DictionaryHandler to handle the following content
DictionaryHandler* handler = new DictionaryHandler();
handler->handle(is);
m_items.push_back(handler);
}
else if(c == 'i')
{
//if 'd', the new a IntegerHandler to handle the following content
IntegerHandler * handler = new IntegerHandler();
handler->handle(is);
m_items.push_back(handler);
}
else if(c == 'l')
{
//if 'd', the new a ListHandler to handle the following content
ListHandler* handler = new ListHandler();
handler->handle(is);
m_items.push_back(handler);
}
else
{
//all other situation, use a String Handler to get the content
StringHandler * handler = new StringHandler();
handler->handle(is);
m_items.push_back(handler);
}
}
return static_cast<int>(is.tellg()) - pos_start;
}
//this method will extract the number before ':' and then read the string
//@param, is, input file stream
//@return, the charactor read.
int StringHandler::handle(istream& is)
{
int pos_start = is.tellg();
char buffer[32];
memset(buffer, 0, sizeof(buffer));
//get number before ':'
is.get(buffer, sizeof(buffer) - 1, ':');
int content_length = atoi(buffer);
//never less than 0
if(content_length <= 0&& buffer[0] != '0')
{
throw BitTorrentParserException(string(buffer), is.tellg(), "number is expected.");
}
if(content_length > 0)
{
//read the string
m_value.resize(content_length);
char* temp = new char[content_length + 1];//plus '/0'
is.ignore(1, ':'); //:
//get the content
is.read(temp, content_length);
m_value.assign(temp, content_length);
delete temp;
}
else
{
//for null string
is.ignore(1, ':'); //:
m_value = "";
}
return static_cast<int>(is.tellg()) - pos_start;
}
//this method will extract the charactors between 'i' and 'e'
//@param, is, input file stream
//@return, the charactor read.
int IntegerHandler::handle(istream& is)
{
int pos_start = is.tellg();
char buffer[32];
memset(buffer, 0, sizeof(buffer));
is.ignore(1, 'i');
is.get(buffer, sizeof(buffer) - 1,'e');
is.ignore(1, 'e');
m_value = atoi(buffer);
//always not less than 0
if(m_value < 0)
{
throw BitTorrentParserException(string(buffer), pos_start, "value is less than zero");
}
return static_cast<int>(is.tellg()) - pos_start;
}
测试结果
如下为对“[TBox] Crash Test Dummies.torrent”(从 www.bittorrent.com上下载的一个测试文件,以test搜索)的解析结果:
announce
http://tracker.torrentbox.com:2710/announce
announce-list
{{udp://tracker.torrentbox.com:2710/announce}{http://tracker.torrentbox.com:2710/announce}{http://redir.bthub.com/redir.php?hash=50c9e8d5fc98727b4bbc93cf5d64a68db647f04f}}
azureus_properties
dht_backup_enable 1
comment
comment.utf-8
created by
Azureus/2.3.0.4
creation date 1126823734
encoding
UTF-8
info ed2k
F7B6B02FFB7EA6ED8F6E39113492F89
files
{ed2k 8F142AEA7564F040DEC6742EDC6DA4FA
length
2512770
path
{Crash Test Dummies - Androgynous.mp3}
path.utf-8
{Crash Test Dummies - Androgynous.mp3}
sha1
5CAFF7DD717645277F39F41FAA5923D6BCDA1FF
ed2k
E27D8CED529F79E32148FE0A24E4AAC
length
3879079
path
{Crash Test Dummies - At My Funeral.mp3}
path.utf-8
{Crash Test Dummies - At My Funeral.mp3}
sha1
D0B9FFDD923B5E5D63EF0D07E59D6E5CB2FE74A
ed2k
4FB3BA5D10494CD9718E52C362FE9EB
…其他文件的列表(略)
name
Crash Test Dummies
name.utf-8
Crash Test Dummies
piece length
65536
pieces
8B7F608981BCE9403AB9C8AC4DCE9D6959898264207E455AD320A762D462A9261954BF6A7329797BCA2C98156B234E49C08446A6638A…(Hash值,略)
private
0
sha1
89DE5AD6F359524847BB9EC237812B44569637
modified-by
{TorrentBox.com}
小结
解析.torrent文件是实现Bittorrent的第一步,这里基于Composite模式给出了优化的解决方案;希望能对程序员提供有益的帮助;另外,程序中大量使用了文件流的处理,对流的使用有指导意义。