Couchdb:
引用
Apache CouchDB is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API. Among other features, it provides robust, incremental replication with bi-directional conflict detection and resolution, and is queryable and indexable using a table-oriented view engine with JavaScript acting as the default view definition language.
Couchdb是以Rest方式来完成操作的分布式文档数据库,要多Couchdb完成操作的话,必须通过执行Http请求。传统数据库是通过DBAdapter完成消息传递的,那么可以认为:Http请求的效率和传统的DBAdapter请求的效率,在一定程度上影响着Couchdb和传统数据库的性能。
在Couchdb中,拿插入数据来说,每次插一条数据,客户端必须发起一次put请求,Couchdb再接受这个请求并作处理。下面我就要做这样的一次测试,来比较Couchdb和Mysql两种数据库的插入数据的速度。插mysql和couchdb的操作都从客户端发起,mysql和couchdb同时装在一台server上。通过benchmark记录操作耗时。
测试环境:一台server,一台client,
server:DELL1950划分的一台虚拟机,cpu:2,mem:2GB
mysql:Server version: 5.0.67-0ubuntu6 (Ubuntu)
couchdb:Apache CouchDB 0.8.0-incubating (LogLevel=info)
文件系统:
Filesystem Type Size Used Avail Use% Mounted on
/dev/sda1 ext3 85G 1.5G 80G 2% /
tmpfs tmpfs 1013M 0 1013M 0% /lib/init/rw
varrun tmpfs 1013M 356K 1012M 1% /var/run
varlock tmpfs 1013M 0 1013M 0% /var/lock
udev tmpfs 1013M 2.6M 1010M 1% /dev
tmpfs tmpfs 1013M 0 1013M 0% /dev/shm
client:普通台式机
测试场景:
1、mysql和couchdb各插入1000条数据对比耗时。
2、mysql和couchdb各插入10000条数据对比耗时。
测试代码如下:
require 'rubygems'
require 'couchdb'
require 'active_record'
require 'active_record/vendor/mysql'
require 'pp'
ActiveRecord::Base.establish_connection(
:adapter=>"mysql",
:encoding=>"utf8",
:database=>"couchdbvsmysql",
:username=>"root",
:password=>"******",
:host=>"1.2.3.4")
#A database must be named with all lowercase characters (a-z), digits (0-9), or any of the _$()+-/ characters
#and must end with a slash in the URL. The name has to start with characters.
server = Couch::Server.new("10.2.226.133", "5984")
server.put("/couchdbvsmysql/", "")
seq1=1
seq2=1
print "-----------------------------------------------------------------\n"
print " Insert Records \n"
print "-----------------------------------------------------------------\n"
sql=proc {<<SQL
insert into udb_user(user_id,user_name,creator)
values("#{(seq1+=1).round}","user#{(rand*10**10).round}","CuiZheng#{seq1}")
SQL
}
json=proc{<<JSON
{"user_id":"#{seq2+=1}","user_name":"user#{(rand*10**10).round}","creator":"CuiZheng#{seq2}"}
JSON
}
InsertIntoMysql=proc do |x|
proc do
x.times{ActiveRecord::Base.connection.execute(sql.call.gsub("\n"," "))}
end
end
InsertIntoCouchdb=proc do |x|
proc do
x.times{server.put("/couchdbvsmysql/#{seq2}",json.call.gsub("\n"," "))}
end
end
Benchmark.bm(25) do |x|
x.report("InsertIntoMysql",&(InsertIntoMysql.call(10**4)))
x.report("InsertIntoCouchdb",&(InsertIntoCouchdb.call(10**4)))
end
场景一(插入1000条数据测试结果):
-----------------------------------------------------------------
Insert Records
-----------------------------------------------------------------
user system total real
InsertIntoMysql 0.171000 0.062000 0.233000 ( 7.297000)
InsertIntoCouchdb 1.922000 1.219000 3.141000 ( 27.969000)
这里可以看到,Couchdb的确比Mysql的插入速度慢,而且慢很多,四倍的差距。
场景二(插入10000条数据测试结果):
-----------------------------------------------------------------
Insert Records
-----------------------------------------------------------------
user system total real
InsertIntoMysql 2.187000 0.547000 2.734000 ( 12.953000)
InsertIntoCouchdb D:/ruby/lib/ruby/1.8/net/http.rb:560:in `initialize': 通常每个套接字地址(协议/网络地址/端口)只允许使用一次。 - connect(2) (Errno::EADDRINUSE)
from D:/ruby/lib/ruby/1.8/net/http.rb:560:in `open'
from D:/ruby/lib/ruby/1.8/net/http.rb:560:in `connect'
from D:/ruby/lib/ruby/1.8/timeout.rb:53:in `timeout'
from D:/ruby/lib/ruby/1.8/timeout.rb:93:in `timeout'
from D:/ruby/lib/ruby/1.8/net/http.rb:560:in `connect'
from D:/ruby/lib/ruby/1.8/net/http.rb:553:in `do_start'
from D:/ruby/lib/ruby/1.8/net/http.rb:542:in `start'
from D:/ruby/lib/ruby/1.8/net/http.rb:440:in `start'
... 7 levels...
from F:/MySummary/MyRails/RubyApplication1/lib/CouchdbVsMysqlPerf.rb:54
from D:/ruby/lib/ruby/1.8/benchmark.rb:177:in `benchmark'
from D:/ruby/lib/ruby/1.8/benchmark.rb:207:in `bm'
from F:/MySummary/MyRails/RubyApplication1/lib/CouchdbVsMysqlPerf.rb:52
跑了很多次,都会报“通常每个套接字地址(协议/网络地址/端口)只允许使用一次。”这个异常。这个原因估计是某个请求花了很长的时间没有释放客户端的套接字导致,可能是couchdb那边一时处理不过来了,我猜的。
解释一下benchmark:
lgn21st 写道
benchmark的report显示的信息是:
This report shows the user CPU time, system CPU time, the sum of the user and system CPU times, and the elapsed real time. The unit of time is seconds.
即用户cpu时间,系统时间,用户系统时间合,实际耗时,单位是秒。
跑完之后可以通过浏览器来看一下couchdb的状态:
列出couchdb所有数据库:
http://10.2.226.133:5984/_all_dbs
浏览器显示:
["couchdbvsmysql"]
列出数据库couchdbvsmysql的状态:
http://10.2.226.133:5984/couchdbvsmysql
浏览器显示:
{"db_name":"couchdbvsmysql","doc_count":1000,"doc_del_count":0,"update_seq":1000,"compact_running":false,"disk_size":2376834}
这个是跑插1000条数据后的结果。
通过上面的结果,我随便说说,Couchdb的插入速度的确比Mysql慢。作为一个分布式数据库,Couchdb首先应该解决的问题是:调整Erlang Http服务器的速度,不要出现插入10000条数据时出现的处理不过来的情况(我猜可能是服务器那边响应过长导致),然后找出适合Couchdb的一个文件系统(我想一个文档式数据库的磁盘读写速度应该和文件系统的类型关系很大)。本次测试结果是单台Couchdb,不能和已经部署成分布式的数据库作对比,等Couchdb的高版本支持分布式之后再来测试。