在一个pcie的flash产品中,我们使用了raid0的结构组织了很多不同的chips。当然效果是很不错的。
http://www.fujitsu.com/global/about/resources/news/press-releases/2015/1119-01.html
从中,我发现在mtd layer的基础上做一层 raid 看起来是比较有前途的。然后就做了一个mtd_raid 模块。功能上完全可以用。
Hi guys,
This is a email about MTD RAID.
Code:
kernel: https://github.com/yangdongsheng/linux/tree/mtd_raid_v2-for-4.7
mtd-utils: https://github.com/yangdongsheng/mtd-utils/tree/mtd_raid_v2
Inspired by mtd_concat:
I found there are some drivers are using mtd_concat to build a mtd device from multiple chips.
$ grep -r mtd_concat_create drivers/mtd/
drivers/mtd/mtdconcat.c:struct mtd_info *mtd_concat_create(struct mtd_info *subdev[], /* subdevices to concatenate */
drivers/mtd/mtdconcat.c:EXPORT_SYMBOL(mtd_concat_create);
drivers/mtd/Module.symvers:0xa18d149f mtd_concat_create drivers/mtd//mtd EXPORT_SYMBOL
drivers/mtd/maps/sc520cdp.c: merged_mtd = mtd_concat_create(mymtd, 2, "SC520CDP Flash Banks #0 and #1");
drivers/mtd/maps/sa1100-flash.c: info->mtd = mtd_concat_create(cdev, info->num_subdev,
drivers/mtd/maps/physmap_of.c: info->cmtd = mtd_concat_create(mtd_list, info->list_size,
drivers/mtd/maps/physmap.c: info->cmtd = mtd_concat_create(info->mtd, devices_found, dev_name(&dev->dev));
And at the same time, there are more and more PCIe devices attached lots of flash chips. Then we need
a module to handle this case.
Design:
--------- -------------
| ubi | | mtdchar |
--------- -------------
| |
v v
----------------------------
| MTD RAID | <----- a new layer here (drivers/mtd/mtd_raid/*)
----------------------------
|
v
----------------
| mtd driver |
----------------
The MTD RAID layer is not necessary for every case. It's a optional choice as you wish.
How to use this module:
(1). For multi-chips driver:
/dev/mtd0
|--------------------------------|
| ---------------------------- |
| | MTD RAID | |
| ---------------------------- |
| | | |
| v v |
| ----------- ----------- |
| | mtd_info| | mtd_info| |
| ----------- ----------- |
|--------------------------------|
When a driver have multiple chips, we can use mtd raid framework to build
them into a raid array and expose a mtd device. There are some drivers
are using mtd_concat, we can do the same thing with mtd_single. At the same
time, mtd raid provides more choices to them, such as raid0 or raid1.
That means, we can replace mtd_concat_create with mtd_raid_create.
(2). For user:
/dev/mtd2
|--------------------------------|
| ---------------------------- |
| | MTD RAID | |
| ---------------------------- |
| | | |
| v v |
| ----------- ----------- |
| |/dev/mtd0| |/dev/mtd1| |
| ----------- ----------- |
|--------------------------------|
we have a user tool for mtd raid, and user can use it to build raid array.
TEST:
# modprobe nandsim dev_count=4 access_delay=1 do_delays=1
# modprobe mtd_raid
(1) Single
# mtd_raid create --level=single /dev/mtd0 /dev/mtd1 /dev/mtd2 /dev/mtd3
# mtdinfo /dev/mtd4
mtd4
Name: mtdsingle-1
Type: nand
Eraseblock size: 16384 bytes, 16.0 KiB
Amount of eraseblocks: 32768 (536870912 bytes, 512.0 MiB)
Minimum input/output unit size: 512 bytes
Sub-page size: 256 bytes
OOB size: 16 bytes
Character device major/minor: 90:8
Bad blocks are allowed: true
Device is writable: true
It's similar with mtd_concat.
(2) RAID0
# mtd_raid create --level=0 /dev/mtd0 /dev/mtd1 /dev/mtd2 /dev/mtd3
# mtdinfo /dev/mtd4
mtd4
Name: mtd0-1
Type: nand
Eraseblock size: 65536 bytes, 64.0 KiB
Amount of eraseblocks: 8192 (536870912 bytes, 512.0 MiB)
Minimum input/output unit size: 512 bytes
Sub-page size: 256 bytes
OOB size: 16 bytes
Character device major/minor: 90:8
Bad blocks are allowed: true
Device is writable: true
There are 4 mtd devices simulated by nandsim, then we create a raid0 with
them.
---------------------------------------------------------------------------
| device| size | throughput (dd if=/dev/mtdX of=/dev/null bs=1M count=10)|
|--------------------------------------------------------------------------
| mtd0 | 128M | 14.0 MB/s |
|--------------------------------------------------------------------------
| mtd1 | 128M | 14.0 MB/s |
|--------------------------------------------------------------------------
| mtd2 | 128M | 14.0 MB/s |
|--------------------------------------------------------------------------
| mtd3 | 128M | 14.0 MB/s |
|--------------------------------------------------------------------------
| mtd4 | 512M | 51.1 MB/s |
---------------------------------------------------------------------------
# mtd_raid destroy /dev/mtd4
(3) RAID1
# mtdinfo /dev/mtd4
mtd4
Name: mtd1-1
Type: nand
Eraseblock size: 16384 bytes, 16.0 KiB
Amount of eraseblocks: 8192 (134217728 bytes, 128.0 MiB)
Minimum input/output unit size: 512 bytes
Sub-page size: 256 bytes
OOB size: 16 bytes
Character device major/minor: 90:8
Bad blocks are allowed: true
Device is writable: true
# modprobe ubi
# ubiattach -p /dev/mtd4
UBI device number 0, total 8192 LEBs (130023424 bytes, 124.0 MiB), available 8014 LEBs (127198208 bytes, 121.3 MiB), LEB size 15872 bytes (15.5 KiB)
# ubimkvol /dev/ubi0 -m -N "test"
Set volume size to 127198208
Volume ID 0, size 8014 LEBs (127198208 bytes, 121.3 MiB), LEB size 15872 bytes (15.5 KiB), dynamic, name "test", alignment 1
# mkfs.ubifs /dev/ubi0_0
# mount /dev/ubi0_0 /mnt/test
# echo "mtd raid testing" > /mnt/test/test
# cat /mnt/test/test
mtd raid testing
# umount /mnt/test/
# ubidetach -p /dev/mtd4
# ubiattach -p /dev/mtd0
UBI device number 0, total 8192 LEBs (130023424 bytes, 124.0 MiB), available 0 LEBs (0 bytes), LEB size 15872 bytes (15.5 KiB)
# mount /dev/ubi0_0 /mnt/test
# cat /mnt/test/test
mtd raid testing <---------------------use one mtd device of the raid1 array. we can get the same data.
Reference:
https://en.wikipedia.org/wiki/Standard_RAID_levels
Any suggestion is welcome.
Thanx guys.
所以,我准备研究一下ubi 的结构,然后把我的raid 重构到ubi上面去。