dm-cache 与 bcache
文章来源:http://blog.csdn.net/cybertan/article/details/9475767
Linux块设备加速缓存之bcache
root@utumno:~# fio ~/rw4k
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/49885K /s] [0 /12.2K iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1770
write: io=8192.3MB, bw=47666KB/s, iops=11916 , runt=175991msec
cpu : usr=4.33%, sys=14.28%, ctx=2071968, majf=0, minf=19
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=0/2097215/0, short=0/0/0
Run status group 0 (all jobs):
WRITE: io=8192.3MB, aggrb=47666KB/s, minb=48810KB/s, maxb=48810KB/s, mint=175991msec, maxt=175991msec
Disk stats (read/write):
sdb: ios=69/2097888, merge=0/3569, ticks=0/11243992, in_queue=11245600, util=99.99%
root@utumno:~# fio ~/rw4k
randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
fio 1.59
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/75776K /s] [0 /18.5K iops] [eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=1914
write: io=8192.3MB, bw=83069KB/s, iops=20767 , runt=100987msec
cpu : usr=3.17%, sys=13.27%, ctx=456026, majf=0, minf=19
IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=100.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
issued r/w/d: total=0/2097215/0, short=0/0/0
Run status group 0 (all jobs):
WRITE: io=8192.3MB, aggrb=83068KB/s, minb=85062KB/s, maxb=85062KB/s, mint=100987msec, maxt=100987msec
Disk stats (read/write):
bcache0: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
IO depth of 32: bcache 20.3k iops, raw ssd 19.8k iops
IO depth of 16: bcache 16.7k iops, raw ssd 23.5k iops
IO depth of 8: bcache 8.7k iops, raw ssd 14.9k iops
IO depth of 4: bcache 8.9k iops, raw ssd 19.7k iops
IO depth of 64: bcache 29.5k iops, raw ssd 25.4k iops
IO depth of 16: bcache 28.2k iops, raw ssd 27.6k iops
# cat /usr/lib/udev/rules.d/61-bcache.rules
....
# Backing devices: scan, symlink, register
IMPORT{program}="/sbin/blkid -o udev $tempnode"
# blkid and probe-bcache can disagree, in which case don't register
ENV{ID_FS_TYPE}=="?*", ENV{ID_FS_TYPE}!="bcache", GOTO="bcache_backing_end"
...
# lsblk -o NAME,MAJ:MIN,RM,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID,PARTUUID
NAME MAJ:MIN RM SIZE TYPE FSTYPE MOUNTPOINT UUID PARTUUID
sda 8:0 0 111.8G disk
├─sda1 8:1 0 3G part vfat /esp 7E67-C0BB d39828e8-4880-4c85-9ec0-4255777aa35b
└─sda2 8:2 0 108.8G part ext2 93d22899-cd86-4815-b6d0-d72006201e75 baf812f4-9b80-42c4-b7ac-5ed0ed19be65
sdb 8:16 0 931.5G disk
└─sdb1 8:17 0 931.5G part ntfs FAD2B75FD2B71EB7 90c80e9d-f31a-41b4-9d4d-9b02029402b2
sdc 8:32 0 2.7T disk bcache 4bd63488-e1d7-4858-8c70-a35a5ba2c452
└─bcache1 254:1 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
sdd 8:48 0 2.7T disk bcache ce6de517-7538-45d6-b8c4-8546f13f76c1
└─bcache0 254:0 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
sde 8:64 1 14.9G disk
└─sde1 8:65 1 14.9G part ext4 / d07321b2-b67d-4daf-8022-f3307b605430 5d0a4d76-115f-4081-91ed-fb09aa2318d
# make-bcache -B /dev/sdc /dev/sdd -C /dev/sda2
# dd if=/dev/zero count=1 bs=1024 seek=1 of=/dev/sda2
# lsblk -o NAME,MAJ:MIN,RM,SIZE,TYPE,FSTYPE,MOUNTPOINT,UUID,PARTUUID
NAME MAJ:MIN RM SIZE TYPE FSTYPE MOUNTPOINT UUID PARTUUID
sda 8:0 0 111.8G disk
├─sda1 8:1 0 3G part vfat /esp 7E67-C0BB d39828e8-4880-4c85-9ec0-4255777aa35b
└─sda2 8:2 0 108.8G part bcache 93d22899-cd86-4815-b6d0-d72006201e75 baf812f4-9b80-42c4-b7ac-5ed0ed19be65
├─bcache0 254:0 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
└─bcache1 254:1 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
sdb 8:16 0 931.5G disk
└─sdb1 8:17 0 931.5G part ntfs FAD2B75FD2B71EB7 90c80e9d-f31a-41b4-9d4d-9b02029402b2
sdc 8:32 0 2.7T disk bcache 4bd63488-e1d7-4858-8c70-a35a5ba2c452
└─bcache1 254:1 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
sdd 8:48 0 2.7T disk bcache ce6de517-7538-45d6-b8c4-8546f13f76c1
└─bcache0 254:0 0 2.7T disk btrfs 2ff19aaf-852e-4c58-9eee-3daecbc6a5a1
sde 8:64 1 14.9G disk
└─sde1 8:65 1 14.9G part ext4 / d07321b2-b67d-4daf-8022-f3307b605430 5d0a4d76-115f-4081-91ed-fb09aa2318dd
文章来源:http://blog.csdn.net/liumangxiong/article/details/17839797
Linux内核之bcache简介
文章来源:http://blog.csdn.net/liumangxiong/article/details/18090043
如果你有一台带有慢速硬盘和快速SSD的电脑,你想使用SSD作为快速持久缓存用来提升访问硬盘的速度。然而直到最近,你有三个选择:bcache和dm-cache都upstream,或者Flashcache/EnhanceIO。Flashcache不是upstream。dm-cache要求你首先坐下来,使用计算器计算块的偏移。bcache是三个选择中最全面的。
但是最近LVM已经增减了缓存的支持(构建在dm-cache之上),因此在理论上,你能让已存在的逻辑卷转换到已缓存的设备。
安装
为了在实践中了解是怎样工作的,我在以前的无盘虚拟群集中添加了3块硬盘。
在镜像配置中有两个2TB的WD硬盘。通过蓝色(冷)线连接。在左侧是三星EVO 250GB SSD,作为缓存的红色(热)盘。
另一个新闻:哦,现在品牌制造商的SSD是真的便宜!
lsblk输出如下,sda和sdb是WD硬盘,sdc是三星SSD:
1
2
3
4
5
6
7
8
9
|
# lsblkNAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.8T 0 disk
└─sda1 8:1 0 1.8T 0 part
└─md127 9:127 0 1.8T 0 raid1
sdb 8:16 0 1.8T 0 disk
└─sdb1 8:17 0 1.8T 0 part
└─md127 9:127 0 1.8T 0 raid1
sdc 8:32 0 232.9G 0 disk
└─sdc1 8:33 0 232.9G 0 part
|
性能
在创建 缓存之前,先看看硬盘的速率如何. 以下数据包含了 ext4 和 LVM overhead (ie. 体现在文件系统里,并非块存储). 我还用了 O_DIRECT.
HDD writes: 114 MBytes/sec
HDD reads: 138 MBytes/sec
SSD writes: 157 MBytes/sec
SSD reads: 197 MBytes/sec
这些数字并没有体现出SSDs的好处 — 就是随机访问硬盘时,性能没有什么区别.
Terminology
lvmcache(7) [没有什么能拷贝的地方] 文档指出了一些用到的术语:
1
2
3
4
5
|
origin LV OriginLV large slow LV
cache data LV CacheDataLV small fast LV
for
cache pool data
cache metadata LV CacheMetaLV small fast LV
for
cache pool metadata
cache pool LV CachePoolLV CacheDataLV + CacheMetaLV
cache LV CacheLV OriginLV + CachePoolLV
|
创建 LVs
文档中很夸张的提到,移除错误的 LV 将会完全颠覆之前的 OriginLV, 为了测试这一特性我 在有几个镜像文件的慢速HDDs上创建了一个 OriginLV:
1
2
3
|
# lvcreate -L 100G -n testoriginlv vg_guests
Logical volume
"testoriginlv"
created
# mkfs -t ext4 /dev/vg_guests/testoriginlv
|
需要注意的是resizing cached LVs功能目前并未提供 (可能以后出来 — 现在只能先移除缓存,从新分配大小再重新创建缓存).
创建缓存层
从文件的角度来说,这并不明确,但是一切都必须在一个简单的卷组中. 也就是说,你必须创建一个既包括慢速和快速的磁盘卷组 - 它并不是简单工作。
因此,我的第一个步骤是扩大我现有的VG ,包括快盘:
1
2
|
# vgextend vg_guests /dev/sdc1
Volume group
"vg_guests"
successfully extended
|
我创建了快速的SSD两个LVs 。一个是CacheDataLV ,这就是缓存发生。另一个是用于存储被高速缓存在CacheDataLV的数据块的索引的CacheMetaLV 。该文件说, CacheMetaLV应该是千分之一的CacheDataLV的大小,但最少为8MB 。由于我的总可用空间快是232GB ,而我希望有一个1000:1的分裂,我大方的选择一个1GB的CacheMetaLV , 229G的CacheDataLV ,而且会留下一些遗留下来的空间(我最终的分割结果是229:1 ) 。
1
2
3
4
5
6
7
8
9
10
11
12
13
|
# lvcreate -L 1G -n lv_cache_meta vg_guests /dev/sdc1
Logical volume
"lv_cache_meta"
created
# lvcreate -L 229G -n lv_cache vg_guests /dev/sdc1
Logical volume
"lv_cache"
created
# lvs
LV VG Attr LSize
lv_cache vg_guests -wi-a----- 229.00g
lv_cache_meta vg_guests -wi-a----- 1.00g
testoriginlv vg_guests -wi-a----- 100.00g
# pvs
PV VG Fmt Attr PSize PFree
/dev/md127
vg_guests lvm2 a-- 1.82t 932.89g
/dev/sdc1
vg_guests lvm2 a-- 232.88g 2.88g
|
(你会发现,我的缓存大于我的测试OriginLV ,但是这很好,因为一旦我已经制定了所有的陷阱,我真正的OriginLV将超过1 TB).
为什么要在PV上留下2.88GB的空间?我也不太清楚,只是第一次使用时,我并没有预留空间,结果执行 lvconvert命令后 [如下] 提示需要1GB的扩展空间。
把 CacheDataLV和CacheMetaLV 放到“缓存池”里 :
1
2
3
|
# lvconvert --type cache-pool --poolmetadata vg_guests/lv_cache_meta vg_guests/lv_cache
Logical volume
"lvol0"
created
Converted vg_guests
/lv_cache
to cache pool.
|
接着把缓存池连到OriginLV来创建最终的缓存工程:
1
2
|
# lvconvert --type cache --cachepool vg_guests/lv_cache vg_guests/testoriginlv
vg_guests
/testoriginlv
is now cached.
|
结果
看起来还不错? 在使用缓存 LV后,得到的数据如下:
LV-cache writes: 114 MBytes/sec
LV-cache reads: 138 MBytes/sec
和原来硬盘的结果一致.
还好没出什么岔子. MikeSnitzer说明了我的dd测试在dm-cache中无效的原因 Mike Snitzer gave me an explanation of why my test usingddisn’t a useful test of dm-cache.
接下来我会创建一些请求,然后看看他们的性能如何 (就是刚才我提到的问题).
文章来源:http://www.oschina.net/translate/using-lvms-new-cache-feature