SSD 的对齐和非对齐分区性能的比较
测试的SSD型号为OCZ RevoDrive3X2 960GB.
测试软件为PostgreSQL带的pg_test_fsync.
操作系统版本 :
[root@db-xx ~]# cat /etc/redhat-release
CentOS release 5.8 (Final)
[root@db-xx ~]# uname -a
Linux db-xx.sky-mobi.com 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux
SSD信息 :
[root@db-xx ~]# lspci -k -v|less
04:00.0 SCSI storage controller: OCZ Technology Group, Inc. RevoDrive 3 X2 PCI-Express SSD 240 GB (Marvell Controller) (rev 02)
Subsystem: OCZ Technology Group, Inc. RevoDrive 3 X2 PCI-Express SSD 240 GB (Marvell Controller)
Flags: bus master, fast devsel, latency 0, IRQ 122
Memory at df1a0000 (64-bit, non-prefetchable) [size=128K]
Memory at df1c0000 (64-bit, non-prefetchable) [size=256K]
Expansion ROM at df100000 [disabled] [size=64K]
Capabilities: [40] Power Management version 3
Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [70] Express Endpoint, MSI 00
Capabilities: [100] Advanced Error Reporting
Capabilities: [140] Virtual Channel
Kernel driver in use: ocz10xx
Kernel modules: ocz10xx
驱动信息 :
[root@db-xx ~]# modinfo ocz10xx
filename: /lib/modules/2.6.18-308.el5/extra/ocz10xx.ko
version: 3.7.6.3912
license: Proprietary
description: OCZ Linux driver
author: OCZ Technology Group, Inc.
alias: pci:v00001B85d00001084sv*sd*bc*sc*i*
alias: pci:v00001B85d00001083sv*sd*bc*sc*i*
alias: pci:v00001B85d00001044sv*sd*bc*sc*i*
alias: pci:v00001B85d00001043sv*sd*bc*sc*i*
alias: pci:v00001B85d00001042sv*sd*bc*sc*i*
alias: pci:v00001B85d00001041sv*sd*bc*sc*i*
alias: pci:v00001B85d00001022sv*sd*bc*sc*i*
alias: pci:v00001B85d00001021sv*sd*bc*sc*i*
alias: pci:v00001B85d00001080sv*sd*bc*sc*i*
alias: pci:v00001B85d00001184sv*sd*bc*sc*i*
alias: pci:v00001B85d00001144sv*sd*bc*sc*i*
depends: scsi_mod
vermagic: 2.6.18-308.el5 SMP mod_unload gcc-4.1
parm: ocz_msi_enable: Enable MSI Support for OCZ VCA controllers (default=0) (int)
parm: ocz_debug_enable: Enable Debug (default=0) (int)
非对齐分区 :
[root@db-xx ~]# fdisk /dev/sde
The number of cylinders for this disk is set to 116741.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-116741, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-116741, default 116741):
Using default value 116741
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.
查看起始和结束扇区位置.
[root@db-xx ~]# fdisk -u -l /dev/sde
Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sde1 63 1875444164 937722051 83 Linux
格式化 :
[root@db-xx ~]# partprobe
[root@db-xx ~]# mkfs.ext4 -b 4096 /dev/sde1
对齐前的fsync测试结果 :
[postgres@db-xx pgdata]$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 16kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 9699.678 ops/sec 103 usecs/op
fsync 7953.484 ops/sec 126 usecs/op
fsync_writethrough n/a
open_sync 11947.938 ops/sec 84 usecs/op
Compare file sync methods using two 16kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 6623.739 ops/sec 151 usecs/op
fsync 5428.723 ops/sec 184 usecs/op
fsync_writethrough n/a
open_sync 5984.738 ops/sec 167 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 11944.803 ops/sec 84 usecs/op
2 * 8kB open_sync writes 7001.228 ops/sec 143 usecs/op
4 * 4kB open_sync writes 3991.088 ops/sec 251 usecs/op
8 * 2kB open_sync writes 2241.544 ops/sec 446 usecs/op
16 * 1kB open_sync writes 1183.734 ops/sec 845 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 7125.895 ops/sec 140 usecs/op
write, close, fsync 7030.647 ops/sec 142 usecs/op
Non-Sync'ed 16kB writes:
write 112601.709 ops/sec 9 usecs/op
对齐分区 :
[root@db-xx ~]# fdisk -u /dev/sde
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-1875448316, default 63): 2048
Last sector or +size or +sizeM or +sizeK (2048-1875448316, default 1875448316): 1867775999
对齐方式, 因为扇区编号从0开始. 开始扇区号取1MB的整数, 结束扇区号取1MB的整数-1.
这样得到的容量为1MB的倍数, 同时也是32k, 64k, 128k, 512k等的倍数.
postgres=# select 2048*512/1024/1024.0;
?column?
------------------------
1.00000000000000000000
(1 row)
postgres=# select (1867775999::int8+1)*512/1024/1024.0;
?column?
---------------------
912000.000000000000
(1 row)
查看起始和结束扇区 :
[root@db-xx ~]# fdisk -u -l /dev/sde
Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sde1 2048 1867775999 933886976 83 Linux
格式化 :
[root@db-xx ~]# mkfs.ext4 -b 4096 /dev/sde1
对齐后的fsync测试结果 :
[postgres@db-xx pgdata]$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.
Compare file sync methods using one 16kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 10478.848 ops/sec 95 usecs/op
fsync 9008.415 ops/sec 111 usecs/op
fsync_writethrough n/a
open_sync 13100.546 ops/sec 76 usecs/op
Compare file sync methods using two 16kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync n/a
fdatasync 6995.627 ops/sec 143 usecs/op
fsync 5997.840 ops/sec 167 usecs/op
fsync_writethrough n/a
open_sync 6672.413 ops/sec 150 usecs/op
Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
1 * 16kB open_sync write 13157.005 ops/sec 76 usecs/op
2 * 8kB open_sync writes 8713.243 ops/sec 115 usecs/op
4 * 4kB open_sync writes 4989.391 ops/sec 200 usecs/op
8 * 2kB open_sync writes 2383.124 ops/sec 420 usecs/op
16 * 1kB open_sync writes 1240.352 ops/sec 806 usecs/op
Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
write, fsync, close 8978.282 ops/sec 111 usecs/op
write, close, fsync 8947.028 ops/sec 112 usecs/op
Non-Sync'ed 16kB writes:
write 107073.134 ops/sec 9 usecs/op
[小结]
1. fdatasync对齐后的IO性能有8%左右的提升. 其他同步函数也有相应的提升.
2. 另外需要注意的影响是, OCZ SSD 在使用容量超过整块磁盘的一半后性能会急剧下降.
所以建议将磁盘分成2个区, 每个区不要超过整块磁盘的一半容量. 例如 :
[root@db-xx ~]# fdisk -u -l /dev/sde
Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sde1 2048 933887999 466942976 83 Linux
/dev/sde2 933888000 1867775999 466944000 83 Linux
[参考]
1. http://wiki.centos.org/HowTos/Disk_Optimization
2. http://apcmag.com/how-to-maximise-ssd-performance-with-linux.htm
3. http://lifehacker.com/5837769/make-sure-your-partitions-are-correctly-aligned-for-optimal-solid-state-drive-performance
4. http://pof.eslack.org/2013/01/12/ssd-alignment-on-linux-with-ext4-and-lvm/
5. http://forums.linuxmint.com/viewtopic.php?f=90&t=113399
6. http://blog.nuclex-games.com/2009/12/aligning-an-ssd-on-linux/
7. http://www.ocztechnologyforum.com/forum/showthread.php?48309-Partition-alignment-importance-under-Windows-XP-(32-bit-and-64-bit)-why-it-helps-with-stuttering-and-increases-drive-working-life
8. http://en.wikipedia.org/wiki/Advanced_format