1. Overview
sector/sphere was created by Dr. Yunhong Gu in 2006 and it is now maintained by a group of open source developers, available from : http://sector.sourceforge.net/
sector : Distrubuted file system
sphere: parallel data processing framework
There is a test, in some cases,sector/sphere is about twice as fast as Hadoop
2. Sector
Sector system architecture:
Feature:
1. Compared to Hadoop, sector does not split user files into blocks, instead, every sector slice is stored as one single file in the native file system
2. Sector runs an independent security server, this design allows different security service providers to be deployed. In addition, multiple sector masters can user the same security service
3. Topology aware and application aware
4. uses UDP for message passing and UDT for transfer
Replication:
1. provide software level falut tolerance(no hardware RAID is required)
2. all files are replicated to a specific number by defalut
3. by default, replication is created on furthest node
UDT:
A high performance data transfer protocol designed for transferring large volumetric datasets over high speed wide area networks. Such settings are typically disadvantageous for the more common TCP protocol.
UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does.
Limitations:
1. File size if limited by available space individual storage nodes
2. Users my need to split their datasets into proper sizes
3. Sector is designed to provide high throughput on large datases, rather than extreme low latency on small files
3. Sphere
Sphere is a parallel data processing engine integrated in Sector and it can be used to process data stored in Sector in parallel,
Sphere users a stream processing computing paradigm. A stream is an abstraction in sphere and it represents either a dataset or a part of a dataset(A sector dataset consists of one of more physical files)
This figure illustrates how sphere processes the segments in a stream.
SPE: Sphere Proccessing Engine
4. References
Sector and Sphere: The Design and Implementation of a High Performance Data Cloud
http://sector.sourceforge.net/
http://en.wikipedia.org/wiki/Sector/Sphere
http://dongxicheng.org/mapreduce/streaming-mapreduce-sphere/
http://en.wikipedia.org/wiki/UDP-based_Data_Transfer_Protocol
http://udt.sourceforge.net/