ssd alignment & unalignment partition performan...

SSD 的对齐和非对齐分区性能的比较
测试的SSD型号为OCZ RevoDrive3X2 960GB. 
测试软件为PostgreSQL带的pg_test_fsync.

操作系统版本 : 
[root@db-xx ~]# cat /etc/redhat-release 
CentOS release 5.8 (Final)
[root@db-xx ~]# uname -a
Linux db-xx.sky-mobi.com 2.6.18-308.el5 #1 SMP Tue Feb 21 20:06:06 EST 2012 x86_64 x86_64 x86_64 GNU/Linux

SSD信息 : 
[root@db-xx ~]# lspci -k -v|less
04:00.0 SCSI storage controller: OCZ Technology Group, Inc. RevoDrive 3 X2 PCI-Express SSD 240 GB (Marvell Controller) (rev 02)
        Subsystem: OCZ Technology Group, Inc. RevoDrive 3 X2 PCI-Express SSD 240 GB (Marvell Controller)
        Flags: bus master, fast devsel, latency 0, IRQ 122
        Memory at df1a0000 (64-bit, non-prefetchable) [size=128K]
        Memory at df1c0000 (64-bit, non-prefetchable) [size=256K]
        Expansion ROM at df100000 [disabled] [size=64K]
        Capabilities: [40] Power Management version 3
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [70] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Virtual Channel
        Kernel driver in use: ocz10xx
        Kernel modules: ocz10xx

驱动信息 : 
[root@db-xx ~]# modinfo ocz10xx
filename:       /lib/modules/2.6.18-308.el5/extra/ocz10xx.ko
version:        3.7.6.3912
license:        Proprietary
description:    OCZ Linux driver
author:         OCZ Technology Group, Inc.
alias:          pci:v00001B85d00001084sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001083sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001044sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001043sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001042sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001041sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001022sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001021sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001080sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001184sv*sd*bc*sc*i*
alias:          pci:v00001B85d00001144sv*sd*bc*sc*i*
depends:        scsi_mod
vermagic:       2.6.18-308.el5 SMP mod_unload gcc-4.1
parm:           ocz_msi_enable: Enable MSI Support for OCZ VCA controllers (default=0) (int)
parm:           ocz_debug_enable: Enable Debug (default=0) (int)

非对齐分区 : 
[root@db-xx ~]# fdisk /dev/sde
The number of cylinders for this disk is set to 116741.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-116741, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-116741, default 116741): 
Using default value 116741

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

查看起始和结束扇区位置.
[root@db-xx ~]# fdisk -u -l /dev/sde

Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sde1              63  1875444164   937722051   83  Linux

格式化 : 
[root@db-xx ~]# partprobe 
[root@db-xx ~]# mkfs.ext4 -b 4096 /dev/sde1

对齐前的fsync测试结果 : 
[postgres@db-xx pgdata]$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 16kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                        9699.678 ops/sec     103 usecs/op
        fsync                            7953.484 ops/sec     126 usecs/op
        fsync_writethrough                            n/a
        open_sync                       11947.938 ops/sec      84 usecs/op

Compare file sync methods using two 16kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                        6623.739 ops/sec     151 usecs/op
        fsync                            5428.723 ops/sec     184 usecs/op
        fsync_writethrough                            n/a
        open_sync                        5984.738 ops/sec     167 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write       11944.803 ops/sec      84 usecs/op
         2 *  8kB open_sync writes       7001.228 ops/sec     143 usecs/op
         4 *  4kB open_sync writes       3991.088 ops/sec     251 usecs/op
         8 *  2kB open_sync writes       2241.544 ops/sec     446 usecs/op
        16 *  1kB open_sync writes       1183.734 ops/sec     845 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close              7125.895 ops/sec     140 usecs/op
        write, close, fsync              7030.647 ops/sec     142 usecs/op

Non-Sync'ed 16kB writes:
        write                           112601.709 ops/sec       9 usecs/op

对齐分区 : 
[root@db-xx ~]# fdisk -u /dev/sde
Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First sector (63-1875448316, default 63): 2048
Last sector or +size or +sizeM or +sizeK (2048-1875448316, default 1875448316): 1867775999
对齐方式, 因为扇区编号从0开始. 开始扇区号取1MB的整数, 结束扇区号取1MB的整数-1. 
这样得到的容量为1MB的倍数, 同时也是32k, 64k, 128k, 512k等的倍数.

postgres=# select 2048*512/1024/1024.0;
        ?column?        
------------------------
 1.00000000000000000000
(1 row)
postgres=# select (1867775999::int8+1)*512/1024/1024.0;
      ?column?       
---------------------
 912000.000000000000
(1 row)

查看起始和结束扇区 : 
[root@db-xx ~]# fdisk -u -l /dev/sde
Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048  1867775999   933886976   83  Linux

格式化 : 
[root@db-xx ~]# mkfs.ext4 -b 4096 /dev/sde1

对齐后的fsync测试结果 : 
[postgres@db-xx pgdata]$ pg_test_fsync
5 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 16kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                       10478.848 ops/sec      95 usecs/op
        fsync                            9008.415 ops/sec     111 usecs/op
        fsync_writethrough                            n/a
        open_sync                       13100.546 ops/sec      76 usecs/op

Compare file sync methods using two 16kB writes:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
        open_datasync                                 n/a
        fdatasync                        6995.627 ops/sec     143 usecs/op
        fsync                            5997.840 ops/sec     167 usecs/op
        fsync_writethrough                            n/a
        open_sync                        6672.413 ops/sec     150 usecs/op

Compare open_sync with different write sizes:
(This is designed to compare the cost of writing 16kB
in different write open_sync sizes.)
         1 * 16kB open_sync write       13157.005 ops/sec      76 usecs/op
         2 *  8kB open_sync writes       8713.243 ops/sec     115 usecs/op
         4 *  4kB open_sync writes       4989.391 ops/sec     200 usecs/op
         8 *  2kB open_sync writes       2383.124 ops/sec     420 usecs/op
        16 *  1kB open_sync writes       1240.352 ops/sec     806 usecs/op

Test if fsync on non-write file descriptor is honored:
(If the times are similar, fsync() can sync data written
on a different descriptor.)
        write, fsync, close              8978.282 ops/sec     111 usecs/op
        write, close, fsync              8947.028 ops/sec     112 usecs/op

Non-Sync'ed 16kB writes:
        write                           107073.134 ops/sec       9 usecs/op

[小结]
1. fdatasync对齐后的IO性能有8%左右的提升. 其他同步函数也有相应的提升.
2. 另外需要注意的影响是, OCZ SSD 在使用容量超过整块磁盘的一半后性能会急剧下降.
所以建议将磁盘分成2个区, 每个区不要超过整块磁盘的一半容量. 例如 : 
[root@db-xx ~]# fdisk -u -l /dev/sde
Disk /dev/sde: 960.2 GB, 960229538304 bytes
255 heads, 63 sectors/track, 116741 cylinders, total 1875448317 sectors
Units = sectors of 1 * 512 = 512 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sde1            2048   933887999   466942976   83  Linux
/dev/sde2       933888000  1867775999   466944000   83  Linux

[参考]
1. http://wiki.centos.org/HowTos/Disk_Optimization
2. http://apcmag.com/how-to-maximise-ssd-performance-with-linux.htm
3. http://lifehacker.com/5837769/make-sure-your-partitions-are-correctly-aligned-for-optimal-solid-state-drive-performance
4. http://pof.eslack.org/2013/01/12/ssd-alignment-on-linux-with-ext4-and-lvm/
5. http://forums.linuxmint.com/viewtopic.php?f=90&t=113399
6. http://blog.nuclex-games.com/2009/12/aligning-an-ssd-on-linux/
7. http://www.ocztechnologyforum.com/forum/showthread.php?48309-Partition-alignment-importance-under-Windows-XP-(32-bit-and-64-bit)-why-it-helps-with-stuttering-and-increases-drive-working-life
8. http://en.wikipedia.org/wiki/Advanced_format

你可能感兴趣的:(SSD,ocz,align)