1.MougoDB(Consistency +Partition Tolerance):
MougoDB is a distributed document-database that provides anoverall best performance, high availability, and easy scalability.
64-bitmachine.
Enough RAM(most important resource for MongoDB)
MougoDBhas a general-purpose design,making it appropriate for a large number of use cases. Examples include contentmanagement systems, mobile applications, gaming, e-commerce, analytics,archiving, and logging.
1. Performance (Generally Best in NoSQLs):. Because of the memory-mapdesign, developed by C++, its own BSON protocol and update-in-place whendeleting and updating, MougoDB have a very nice performance compared with otherNoSQL. It is nearly designed for performance. Sometimes read will suspend whena write is in progress. It is really a limit.
2. Availability (Good): Replicaprovides high availability for MougoDB. MougoDB can replicate over a“noisy” connection, but if thenetwork connections among the nodes in the replica set are very slow, it mightnot be possible for the members of the node to keep up with the replication.
3. Database-lever lock (Important Limitation): Thatis, only one writer may modify a given database at a time.Write operation willsuspend the later read or write operation.
4. Space-preallocation (Important Limitation): MougoDB’s datafile looks always largesince its pre-allocation strategy. When adding or updating a field in adocument, the entire document must be re-written. If you pre-allocate space foreach document, you can avoid the associated fragmentation, but even with pre-allocation updatingyour document gets slower as it grows.
5. Durability (Always Bothered by Data Losing Rumor): Afterenabling journaling, you can make writes durable. It is only possible on a64-bit machine, but its durability seems always weaker than CouchDB.In fact,MougoDB is always bothered by losing data rumor.
6. Scalability (Good): Sharding lets people spread workload over more machines along with highavailability with replica sets. Data is partitioned to shards by ranges,Sharding with a good database multiples the benefits of having good queries,good drives, and good working sets.
7. Consistency (Very good): A MougoDBauto-sharding + replication cluster has a master server at a given point intime for each shard.It sacrifice availability for consistency. For itsmaster-slave architecture, client can only write data to master, while read canperformed on any slave and master.Since MougoDB have only one master, so,write-conflict doesn’t exist.
8. Large memory consumption (Important Limitation):since MougoDB keeps a memory-map in memory and retrieve all data directly frommemory, many people worries about the huge memory usage for MougoDB. But itsofficial document denies: It’s certainly possible to run MougoDB on amachine with a small amount of free RAM. MougoDB automatically uses all freememory on the machine as its cache. System resource monitors show that MougoDBuses a lot of memory, but its usage is dynamic. If another process suddenlyneeds half the server’s RAM, MougoDB will yield cached memory to the otherprocess.
2.CouchDB(Availability +Partition Tolerance):
Apache CouchDB is ascalable, fault-tolerant, and schema-free document-oriented database written in Erlang. It's used in large and small organizations for a variety ofapplications where a traditional SQL database isn't the best solution for theproblem at hand
More RAMwill make better performance, but it is really memory-saving compared withCouchDB.
1. Performance (Very Good): CouchDB implements a form of Multi-Version Concurrency Control (MVCC) inorder to avoid the need to lock the database file during writes. But for MougoDB,delete or update operation will directly over-writer the previous dat. For thisreason, MougoDB can get a better read/write performance than CouchDB.
2. Consistency (Important Limitation): CouchDB cannot providea strict consistency, just an eventualconsistency. CouchDBguarantees only eventual consistency tobe able to provide both better availability and partition tolerance. Eventualconsistency means, what you read is not definitely what your latest write.
3. Durability (Always Better than MongoDB): CouchDB uses crash-only design to provide a generallybetter reliability and durability than MougoDB. “crash-only” means that anytimeyou meet a server crash and restart it ,data on disk is always consistent. Itis a big benefit for recovering from a crash.
4. Scalability (Good): CouchDB uses replication to get horizon-scalability. MougoDBuses sharding to get horizon-scalability and uses replica to get availability.Well, as a distributed database, the difference is really tiny.
5. Availability (Big Highlight): In short, CouchDB is all about availability. CouchDB fellinto the AP camp (Availability and Partition Tolerance), As a comparison, MougoDBfalls into the CP camp (Consistency and Partition Tolerance).
3.Redis(Not a ready-to-use distributed system):
Redis isan open source, BSD licensed, advanced key-value store. It is often referred toas a data structure server since keys can contain strings,hashes, lists, setsand sorted sets.
Not a Ready-to-Use System (Not suitable for quick deployment):Redis is not a ready-to-use distributed datastore like MougoDB and CouchDB. In general,it just provides quite-basic components to build a distributed system, leavingmany things to implement for client, including sharding and master-slave replication.Now, there is an on-going project called Redis Cluster , whose purpose is to provide a minimalistic ready-to-usedistributed system, still lacks of lot of things, and is not usable yet forproduction purpose.
All Data in Memory (Important Limitation): A Redis will load alldata from dist to memory. So , for a single Redis node, the data on it shouldnot be larger that its memory size. If our data size if 100G ,and memory ofeach server is 10G ,then we will have to use at least 10+ servers .
4.Cassandra (Availability + PartitionTolerance):
Cassandra is a highly scalable, eventually consistent,versioned,distributed, structured key-value store
The moststable version of java 1.6.
4GB RAM isthe minimum per node according to official document.
For raw hardware, 8-core boxes are the currentprice/performance sweet spot. If you're running on virtualized machines,consider using a provider such as Rackspace Cloud Servers that allows CPUbursting.
At least 2 disks, one to keep CommitLogDirectory on, the other to usein DataFileDirectories. CommitLogDirectory disk should be fastenough and DataFileDirectories dist should large and fast enough.
1. Performance (Read Slower than write, but read is also good): The biggest feature of Cassandra is that its read speed isslower than write. So, it is not suitable for a read-intensive system. The sameas CouchDB,Cassandra’sstorage engine only appends updated data, it never has to re-write or re-read existing data. Thus,updates to a Cassandra row or partition stay fast as your dataset grows. It hasa really good support for write operation, it is always writable even in afailure scenario.
2. Availability (Great Highlight): Compared withHBase which is a CP (Consistency + Partition Tolerance) system, Cassandravalues AP(Availability and Partitioning tolerance). Cassandra always highlightsit high availability.Data is automatically replicated to multiple nodes forfault-tolerance. Replication across multiple data centers is supported. Failednodes can be replaced with no downtime.
3. Consistency (Same as CouchDB: eventual-consistent): Same with CouchDB, Cassandra can only provide eventuallyconsistent. That means, Cassandra cannot make sure reader will see what thewriter has written.
4. Durability (Append-OnlyUpdate Makes a Good Durability and Latency):Cassandra provides durability byappending writes to a commit log first; this commit log will be flushed to diskasynchronously. This is a good strategy balancing between Durability andlatency.
5. Scalability (Big Highlight): Cassandrahighlights its incremental scalability. Cassandra meets the requirements of anideal horizontally scalable system by allowing for seamless addition of nodes.As you need more capacity, you add nodes to the cluster and the cluster willutilize the new resources automatically.
6. Fully Distributed (Easymanagement): Every Cassandra machine handlesa proportionate share of every activity in the system. There are no specialcases like the HDFS namenode or MougoDB mongos that require special treatment or special hardware to avoidbecoming a bottleneck.
5.HBase (Consistency +Partition Tolerance):
Apache HBase is an open-source, distributed, versioned,column-oriented store modeled after Google's Bigtable.It is just designed for very bigdata.
Rightfor very large-scale data size. High rates of row-lever update, ie, a messagingsystem.
At least 5datanodes for HDFS
HDFS
64-bitsystem
Java 6
1. Consistency (Verystrong, its highlight): HBase isnot an "eventually consistent" DataStore. On the other hand, it isnearly fully-consistent. It means, after you write something, modifications areimmediately available to all clients. This makes it very suitable for taskssuch as high-speed counter aggregation. This high-consistency feature comesfrom its master-slave architecture, compared with Cassandra’s P2P architecture.
2. HDFS-relied (replication, end-to-end checksumand automatic rebalancing): HBase isbuild on top of HDFS.
3. Complexity(Yes, It is Somewhat Complex):HBase is more complex than other systems;you need Hadoop, Zookeeper, RegionServer, primary/secondary HBase Master and soon.
4. Scalability (Good): It has linear and modular scalability.Nearly no difference with Cassandra. Very easy to scale out.
5. Other Features: auto load balancing andfailover, compression support, multiple shards per server, etc.