NoSQL非关系型数据库学习(一)
在2008年的时候,我还是只知道DB2, Oracle, MS SQLServer, Sybase, MySQL, PostgreSQL, Firebird等主流商业或者开源数据库。当汲取知识于网络之际,突然发现很多新的名词鱼跃而出,什么 SQLite, Memcached, FastDB, MongoDB, Solr, Redis, HBase, Cassandra, Teradata, Hive, CouchDB, HBase等等。也不免的困惑了很多。
试着逐步理清头绪,首先数据库分为关系型数据库Relational DBMS和非关系型数据库NoSQL 数据库。
>> 关系型数据库Relational DBMS包含:
DB2, Oracle, MS SQLServer, Sybase, MySQL, PostgreSQL, Firebird。
>> 非关系型数据库NoSQL包含:
SQLite, Memcached, FastDB, MongoDB, Solr, Redis, HBase, Cassandra,
Teradata, Hive, CouchDB, HBase, Neo4j, Riak
简单应用场景对比:
关系型数据库的优势: 结构化数据、范式模型、ACID事务。
NoSQL优势:性能、可扩展性、灵活的模式和分析能力。在下列应用中更有优势
a.存储的数据实质上是半结构化或者松散的;
b.要求一定等级的性能和扩展性;
c.存取该数据的应用和最终的一致性相吻合
NoSQL非关系型数据库典型支持功能:
a.模式灵活
b.无共享结构
c.分片做为数据存储模型的一部分
d.异步复制
e.使用BASE取代ACID事务。
继续分类:
其次,数据库也可以分为基于磁盘的数据库和基于内存的数据库。
硬盘型数据库包含
>>关系型数据库Relational DBMS: 全部
>>NoSQL:MongoDB
内存型数据库包含
>>NoSQL:SQLite, Memcached, FastDB, Redis
继续分类:
NoSQL根据实现又分为:
You are here: InfoQ Homepage Research NoSQL Database Adoption Trends
NoSQL Database Adoption Trends
by Srini Penchikala on Jul 23, 2013
UPDATE Aug 08 2013: The following new NoSQL database options were added today, after user request and feedback: GridGain, GigaSpaces, Tibco, and MarkLogic.
UPDATE Jul 25 2013: The following options were added today, after user request: Oracle Coherence, Terracotta BigMemory, Couchbase, and Oracle NoSQL Database.
NoSQL databases have been getting lot of attention over the last few years for their performance, scalability, schema flexibility and analytics capabilities. While relational databases are still good choice for certain use cases - like structured data and applications that require ACID transactions - NoSQL databases are better suited for use cases where:
· The data stored is semi-structured or unstructured in nature
· The applications that access this data require a certain level of performance and scalability
· The applications that access this data are ok with eventual consistency
Non-relational databases typically support the following capabilities:
· Schema flexibility
· Shared nothing architecture
· Sharding as part of the data storage model
· Asynchronous replication
· BASE instead of ACID Transactions
InfoQ would like to learn what NoSQL databases you are currently using or planning on using in your applications.
Document Databases
· MongoDB: MongoDB is an open-source document oriented database.
· CouchDB: Apache CouchDB is a database that uses JSON for documents, JavaScript for MapReduce queries, and HTTP for an API.
· Couchbase: NoSQL document database based on JSON model.
· RavenDB: RavenDB is a document-oriented database based on .NET language.
· MarkLogic: MarkLogic NoSQL database is used to store XML-based, document-centric information. It supports schema flexibility.
· Other Document Database
Graph Databases
· Neo4j: Neo4j is a property graph database; supports ACID transactions.
· InfiniteGraph: Graph database used to persist and traverse relationships between objects, supports distribute data stores.
· AllegroGraph: AllegroGraph is a graph database that uses memory utilization in combination with disk-based storage for scalability, supports SPARQL, RDFS++, and Prolog reasoning.
· Other Graph Database
Key Value Data Stores
· Riak: Riak is an open source, distributed key value database, supports data replication and fault-tolerance.
· Redis: Redis is an open source key-value store. Supports master-slave replication, transactions, Pub/Sub, Lua scripting, Keys with a limited time-to-live.
· Dynamo: Dynamo is a key-value distributed data store. It is directly implemented as Amazon DynamoDB; used in Amazon S3 product.
· Oracle NoSQL Database: Key-value NoSQL database from Oracle. It supports ACID transactions and JSON.
· Voldemort: Distributed key-value storage system with the data replication and partitioning.
· Aerospike: Aerospike database is a key-value store; supports hybrid memory architecture and data integrity with strong or tunable consistency.
· Other Key Value Data Store
Columnar Databases
· Cassandra: Cassandra is column database that supports data replication across multiple data centers. Its data model offers column indexes, log-structured updates, support for denormalization, materialized views, and built-in caching.
· HBase: Apache HBase is an open-source, distributed, versioned, column-oriented store modeled after Google's Bigtable. It provides Bigtable-like capabilities on top of Hadoop and HDFS.
· Amazon SimpleDB: Amazon SimpleDB is a non-relational data store that offloads the work of database administration. Developers store and query data items using web services requests.
· Apache Accumulo: Apache Accumulo sorted, distributed key/value data store created based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift technologies.
· Hypertable: Hypertable is an open source, scalable database, also modeled after Bigtable; supports sharding.
· Azure Tables: Windows Azure Table Storage Service offers NoSQL capabilities for applications that require storage of large amounts of unstructured data. Tables can auto-scale to store up to several terabytes of data. They are accessible via REST and managed APIs.
· Other Columnar Database
In-Memory Data Grids
· Hazelcast: Hazelcast CE is an open source data distribution platform. It allows the developers to share and partition the data across the database cluster.
· Oracle Coherence: Oracle's in-memory data grid solution that provides fast access to frequently used data. Coherence supports event capabilities and dynamic partitioning of data.
· Terracotta BigMemory: Distributed in-memory management solution from Terracotta. The product includes an Ehcache interface, Terracotta Management Console and BigMemory-Hadoop Connector (early access).
· GemFire: VMware vFabric GemFire is a distributed data management platform and provides elastic in-memory data management, replication, partitioning, data-aware routing, and continuous querying.
· Infinispan: Infinispan is a Java based open source key/value NoSQL datastore and distributed data grid platform. It supports transactions and peer-to-peer as well as client/server architecture.
· GridGain: Distributed, object-based, in-memory, SQL+NoSQL key-value database. Supports ACID transactions.
· GigaSpaces: GigaSpaces in-memory data grid (the Space) serves as the system of record for the applications and supports a variety of caching scenarios.
· Tibco: ActiveSpaces product from Tibco provides an infrastructure to create virtual data caches from the aggregate memory of participating nodes in the cluster and to scale as nodes join and leave.
· Other In-Memory Data Grid
(译版:一网打尽2013最常用的NoSQL数据库http://blog.chedushi.com/archives/7306)
>>文档数据库
a. MongoDB:开源、面向文档,也是当下最人气的NoSQL数据库。
b. CounchDB:Apache CounchDB是一个使用JSON的文档数据库,使用Javascript做MapReduce查询,以及一个使用HTTP的API。
c. Couchbase:NoSQL文档数据库基于JSON模型。
d. RavenDB:RavenDB是一个基于.net语言的面向文档数据库。
e. MarkLogic:MarkLogic NoSQL数据库用来存储基于XML和以文档为中心的信息,支持灵活的模式。
>>图数据库
a. Neo4j: Neo4j是一个图数据库;支持ACID事务(原子性、独立性、持久性和一致性)
b. InfiniteGraph:一个图数据库用来维持和遍历对象间的关系,支持分布式数据存储。
c. AllegroGraph:AllegroGraph是结合使用了内存和磁盘,提供了高可扩展性,支持SPARQ、RDFS++和Prolog推理。
>>键值数据存储
a. Riak:Riak是一个开源,分布式键值数据库,支持数据复制和容错。
b. Redis:Redis是一个开源的键值存储。支持主从式复制、事务,Pub/Sub、Lua脚本,还支持给Key添加时限。
c. Dynamo:Dynamo是一个键值分布式数据存储。它直接由亚马逊Dynamo数据库实现;在亚马逊S3产品中使用。
d. Oracle NoSQL Database:来自Oracle的键值NoSQL数据库。它支持事务ACID(原子性、一致性、持久性和独立性)和JSON。
e. Oracle NoSQL Database:具备数据备份和分布式键值存储系统。
f. Voldemort:具备数据备份和分布式键值存储系统。
g. Aerospike:Aerospike数据库是一个键值存储,支持混合内存架构,通过强一致性和可调一致性保证数据的完整性。
>>列存储数据库
a. Cassandra:Cassandra是列存储数据库,支持跨数据中心的数据复制。它的数据模型提供列索引,log-structured修改,支持反规范化,实体化视图和嵌入超高速缓存。
b. HBase:Apache Hbase源于Google的Bigtable,是一个开源、分布式、面向列存储的模型。在Hadoop和HDFS之上提供了像Bigtable一样的功能。
c. Amazon SimpleDB:Amazon SimpleDB是一个非关系型数据存储,它卸下数据库管理的工作。开发者使用Web服务请求存储和查询数据项。
d. Apache Accumulo:Apache Accumulo的有序的、分布式键值数据存储,基于Google的BigTable设计,建立在Apache Hadoop、Zookeeper和Thrift技术之上。
e. Hypertable:Hypertable是一个开源、可扩展的数据库,模仿Bigtable,支持分片
f. Azure Tables:Windows Azure Table Storage Service为要求大量非结构化数据存储的应用提供NoSQL性能。表能够自动扩展到TB级别,能通过REST和Managed API访问。
>>内存数据网格
a. Hazelcast:Hazelcast CE是一个开源数据分布平台,它允许开发者在数据库集群之上共享和分割数据。
b. Oracle Coherence:Oracle的内存数据网格解决方案提供了常用数据的快速访问能力,一致性支持事务处理能力和数据的动态划分。
c. Terracotta BigMemory:来自Terracotta的分布式内存管理解决方案。这项产品包括一个Ehcache界面、Terracotta管理控制台和BigMemory-Hadoop连接器。
d. GemFire:Vmware vFabric GemFire是一个分布式数据管理平台,也是一个分布式的数据网格平台,支持内存数据管理、复制、划分、数据识别路由和连续查询。
e. Infinispan:Infinispan是一个基于Java的开源键值NoSQL数据存储,和分布式数据节点平台,支持事务,peer-to-peer 及client/server 架构。
f. GridGain:分布式、面向对象、基于内存、SQL+NoSQL键值数据库。支持ACID事务。
g. GigaSpaces:GigaSpaces内存数据网格能够充当应用的记录系统,并支持各种各样的高速缓存场景。