由于要对比swift上传小文件以及fdfs上传小文件的性能,故做性能测试。
1.1 测试环境:
FastDFS集群的搭建方法:【FastDFS分布式文件系统之一】:搭建、部署、配置sdb 8:16 0 6.4T 0 disk /mnt/xfsd
文件生成分为两种:1.随机生成1~100KB之间大小的文件;2.全部大小都为133KB大小的文件。
文件生成程序:
#!/usr/bin/python
from random import randint
import os
data_dir = os.sys.argv[1]
n = int(os.sys.argv[2])
if not os.path.exists(data_dir):
os.makedirs(data_dir)
for x in range(0, n):
with open("%s/file_%d" % (data_dir, x), 'wb') as fout:
fout.write(os.urandom(1024 * randint(80, 180)))
python中os.urandom(n)的作用:随机产生n个字节的字符串。
通过fastdfs-python-sdk:https://github.com/hay86/fdfs_client-py 编写上传测试文件,文件上传分为串行和并行两种方式:
串行上传:对若干个文件依次调用上传接口,直到完成所有文件上传为止。
并行上传:启动多个进程同时上传文件,每个进程上传多个文件。
串行测试脚本:
#!/usr/local/bin/python2.7
import os
import time
import sys
from multiprocessing import Process
try:
from fdfs_client.client import *
from fdfs_client.exception import *
except ImportError:
import_path = os.path.abspath('../')
sys.path.append(import_path)
from fdfs_client.client import *
from fdfs_client.exceptions import *
#size_total = 0
if __name__ == '__main__':
starttime = time.time()
filenumbers = 100000 #number of processes
client = Fdfs_client('/opt/fdfs_client-py/fdfs_client/client.conf')
try:
for i in range(filenumbers):
filename = '/data/files/small/smallfile' + str(i)
client.upload_by_filename(filename)
except Exception,e:
print "error" + str(e)
endtime = time.time()
#print "%d byte has been stored into the fdfs." % size_total
print "%f seconds for sequence processing computation." % ( endtime - starttime )
#print size_total
#print "speed is %f KB/s" % size_total/1024/(endtime-starttime)
并行测试脚本:
#!/usr/local/bin/python2.7
import os
import time
import sys
import multiprocessing
from multiprocessing import Process
try:
from fdfs_client.client import *
from fdfs_client.exception import *
except ImportError:
import_path = os.path.abspath('../')
sys.path.append(import_path)
from fdfs_client.client import *
from fdfs_client.exceptions import *
client = Fdfs_client('/opt/fastdfs/fdfs_client-py/fdfs_client/client.conf')
def uploadfile(begin,end,t_time,t_count,t_size,lock):
try:
for idx in range(begin,end):
filename = '/data/files/small-10w/smallfile'+str(idx)
for y in range(5):
starttime = time.time()
ret = client.upload_by_filename(filename)
endtime = time.time()
if(ret['Status'] != 'Upload successed.'):
os.system('echo upload fail >> log')
else:
os.system('echo upload success >> log')
# print ret['Status']
with lock:
t_count.value += 1
t_time.value += endtime - starttime
t_size.value += os.path.getsize(filename)
except Exception,e:
print "error" + str(e)
if __name__ == '__main__':
process = []
nprocess = int(os.sys.argv[1])
file_per_process = 100000/nprocess
lock = multiprocessing.Lock()
total_time = multiprocessing.Value('f',0.0)
total_count = multiprocessing.Value('i',0)
total_size = multiprocessing.Value('f',0.0)
for i in range(nprocess):
process.append( Process(target=uploadfile,args=(i * file_per_process , (i+1) * file_per_process, total_time,total_count,total_size,lock)))
for p in process:
p.start()
for p in process:
p.join()
print "%f seconds for multiprocessing computation." % total_time.value
print "%d total count." % total_count.value
print "%f total size." % total_size.value
os.system("wc -l log")
上传文件总个数(KB) |
上传文件总大小(KB)
|
平均速度(MB/s)
|
平均每个文件上传所用时间(ms)
|
上传失败次数
|
1000 |
130530
|
21.28 | 5.97 |
0
|
1000
|
130530
|
22.60
|
5.62
|
0
|
10000
|
1294566
|
22.94
|
5.53
|
0
|
10000
|
1294566
|
23.11
|
5.49
|
0
|
100000
|
13018299
|
21.05
|
6.03 |
0
|
100000
|
13018299
|
22.06
|
5.75
|
0
|
并发数
|
上传文件总个数 | 平均每个文件上传所用时间(ms) | 上传失败次数 |
100 | 500000 | 14.62 | 0 |
200
|
500000
|
17.18 |
0
|
250 |
500000
|
22.19 |
0
|
400
|
500000
|
30.62
|
0
|
500
|
500000
|
28.55
|
0
|
800
|
500000
|
27.17
|
0
|
1000
|
500000
|
42.64
|
0
|
Swift上传性能:
上传500000个对象到Swift中
并发数 |
上传文件总个数 |
平均每个文件上传所用时间(ms) |
上传失败百分比 |
100 |
500000 |
78.91 |
0 |
200 |
500000 |
144.27 |
0 |
250 |
500000 |
157.63 |
5.69% |
400 |
195610 |
171.22 |
60.88% |
500 |
193629 |
136.09 |
61.27% |
import time
from multiprocessing import Process, Value
def func(val):
for i in range(50):
time.sleep(0.01)
val.value += 1
if __name__ == '__main__':
v = Value('i', 0)
procs = [Process(target=func, args=(v,)) for i in range(10)]
for p in procs: p.start()
for p in procs: p.join()
print v.value
多进程实现很简单,使用Process,然后传入目标函数以及参数,start()方法启动进程join()方法等待所有进程结束之后主进程再结束,其中v是通过multiprocessing.Value定义的变量,是进程之间共享的变量。那么我们期望最终得到的v.value会是500,但是结果却是比500少的数字,原因就是没有加锁,在进程竞争资源的情况下没有lock住共享变量。那么如何加锁?
import time
from multiprocessing import Process, Value, Lock
def func(val, lock):
for i in range(50):
time.sleep(0.01)
with lock:
val.value += 1
if __name__ == '__main__':
v = Value('i', 0)
lock = Lock()
procs = [Process(target=func, args=(v, lock)) for i in range(10)]
for p in procs: p.start()
for p in procs: p.join()
print v.value
方法二:
import time
from multiprocessing import Process, Value, Lock
def func(val, lock):
for i in range(50):
time.sleep(0.01)
lock.acquire()
val.value += 1
lock.release()
if __name__ == '__main__':
v = Value('i', 0)
lock = Lock()
procs = [Process(target=func, args=(v, lock)) for i in range(10)]
for p in procs: p.start()
for p in procs: p.join()
print v.value