http://eprints.eemcs.utwente.nl/22286/01/imc140-drago.pdf
本文观点基于以上paper
相信不是所有同学都了解Dropbox,先做一个简单知识普及,Dropbox是一个提供同步本地文件的网络存储在线应用。支持在多台电脑多种操作中自动同步。并可当作大容量的网络硬盘使用。
在展开之前先回答一个问题,我们为什么要关系Dropbox?随着云计算框架越来越多的进入开发者和用户的事业,对文件、数据同步传输的要求也越来越多,越来越高。我们有必要对行业内比较流行的数据同步协议进行分析和借鉴。
由于Dropbox不是公开协议,论文中采用了一个SSL拦截的方式对其进行了分析。下面对几个比较重要的知识点逐一记录。
距离对Dropbox性能有显著的影响
We highlight that Dropbox performance is mainly driven by the distance between clients and storage data-centers.
另外短数据传输加上一个perchunk确认机制,非常影响吞吐
In addition, short data transfer sizes coupled with a perchunk acknowledgment mechanism impair transfer throughput, which is as little as 530kbits/s on average.
怎么分析STL/SSL传输
a Linux PC running the Dropbox client was instructed to use a Squid proxy server under our control. On the latter, the module SSL-bump4 was used to terminate SSL connections and save decrypted traffic flows. The memory area where the Dropbox application stores trusted certificate authorities
was modified at run-time to replace the original Dropbox Inc. certificate by the self-signed one signing the proxy server.
每一个上传trunk都有一确认消息
Each chunk store operation is acknowledged by one OK message.
Dropbox有三种控制协议
(i) Notification,(ii) meta-data administration, and (iii) system-log servers.
Notification Protocol
TCP长连到notifyX.dropbox.com,notification connection没有加密。在这个长连的TCP上执行HTTP Comet,即Long-Polling操作。
Meta-data Information Protocol
一个典型的同步过程从发送meta消息到meta数据服务器开始,后跟一批通过Amazon服务器进行的store或retrieve操作。随着数据块被成功交换,客户端发送消息到meta数据服务器来完成的交易。
同步协议容易造成小包的传输
(i) the synchronization protocol sending and receiving file deltas as soon as they are detected; (ii) the
primary use of Dropbox for synchronization of small files constantly changed, instead of periodic (large) backups.
通过分析发现TCP慢启动和确认对性能影响最大
Moreover, flows achieve lower throughput as the number of chunks increases. TCP start-up times and application-layer sequential acknowledgments are two major factors limiting the throughput, affecting flows with a small amount of data and flows with a large number of chunks, respectively. In both cases, the high RTT between clients and data-centers amplifies the effects.
Flows carrying a small amount of data are limited by TCP slow start-up times.
Flows with more than 1 chunk have the sequential acknowledgment scheme (Fig. 1) as a bottleneck, because the mechanism forces clients to wait one RTT (plus the server
reaction time) between two storage operations.
Flows with more than 50 chunks, for instance, always last for more than 30s, regardless of their sizes. Considering the RTT in Campus 2, up to one third of that (5-10s)
is wasted while application-layer acknowledgments are transiting the network.
最终给出了作者们的建议,如何来优化Dropbox的传输
即:
1. 设置最小数据块限制,减少大量小块数据同步
2. 使用延迟确认,用pipeline方式减少顺序确认带来的网络空闲 Using delayed ack, pipelining chunks to remove the effects of sequential acknowledgments;
3. 存储靠近用户,减少传输延迟
Our measurements clearly indicate that the applicationlayer protocol in combination with large RTT penalizes the system performance. We identify three possible solutions to remove the identified bottlenecks:
1. Bundling smaller chunks, increasing the amount of data sent per storage operation. Dropbox announced in April 2012, implements a bundling mechanism, which is analyzed in the following;
2. Using a delayed acknowledgment scheme in storage operations, pipelining chunks to remove the effects of sequential acknowledgments;
3. Bringing storage servers closer to customers, thus improving the overall throughput.