ubi patch for MLC nand power loss (1)

最近要出一个关于mlc nand 的powe loss的patch,我们知道,对于mlc nand,ubifs是没法用的,因为如果有powerloss出现,则必会破环原有的数据,如晨ubi的网站:


UBIFS authors never tested UBI/UBIFS on MLC flash devices. Let's considersome specific aspects of MLC NAND flashes:

  • [OK] MLC NAND flashes are more "faulty" than SLC, so they usestronger ECC codes; these ECC codes often occupy whole OOB area (as dothe ECC codes on some newer SLC flashes, which are more error-prone thanprevious generations of flash); this is not a problem for UBI/UBIFS,because neither UBIFS nor UBI use OOB area;
  • [OK] when the data are written to an eraseblock, they have tobe written sequentially, from the beginning of the eraseblock to the endof it; this is also not a problem because it is exactly what UBI and UBIFSdo (see alsothis section);
  • [OK] MLC flashes have rather short eraseblock life-cycles ofjust a few thousand erase cycles; this is not a problem because UBI uses adeterministic wear-leveling algorithm. However, the default 4096 eraseswear-levelling threshold may need to be lessened for MLC.
  • [NEED WORK] MLC flashes exhibit bit-flips as a result of"program disturb" and "read disturb" errors (seehere).These errors are sometimes referred to as "reversible" errors in NANDdatasheets, meaning that they disappear once the block in which theyare located is erased; as opposed to "irreversible" errors which aredue to cell wear and cause permanent bit failures.Note that SLC flashes have these same errors, but they are much morecommon on MLC:
    • NAND flashes have a so called "read-disturb" property, whichmeans that a NAND page read operation may introduce a persistentbit change, not necessarily located in the page being read;the ECC code would fix it, but more read operationsmay introduce more bit changes and correctable ECC errors may turninto uncorrectable ECC errors; however, when these errors occuron the same page that is being read, this should not be a problembecause UBI is doing scrubbing; in other words, once UBI noticesthat there is a correctable bit-flip in an eraseblock, it movesthe contents of this physical eraseblock to a different physicaleraseblock, and re-maps the corresponding logical eraseblock tothe new physical eraseblock; so UBI refreshes the data and getsrid of bit-flips, thus improving data integrity.
    • "Read-disturb" errors can also occur on a page otherthat the one being read, but which is within the sameeraseblock. This is not a problem if page read operations arespread around somewhat evenly within the eraseblock, since thebit-flip will soon be detected and corrected through the"scrubbing" process described above. However, if a particularpage within a block is rarely read, scrubbing will not have achance to fix errors, and they may accumulate over time untilthey become unfixable. This is very similar the next problem.
    • NAND flashes also have a "program-disturb" property,which means that if you program a NAND page, you may introducea bit-flip in a different NAND page. The bit change can befixed by ECC, but with time the changes may accumulateand become unfixable. Current UBI bit-flip handling onlypartially helps here, because it is passive, which means thatUBI notices bit-flips only when performing users' read requests.So if you never read the NAND page which accumulates bit-flips,UBI will never notice this.

    The read and program disturb issues should be possible to handle byimplementing a kind of "flash crawler" which would read all of the NANDpages in the background from time to time (at UBI level) making UBInotice and fix bit-flips. This is not implemented though, and this canprobably be done from user-space.

  • [NEED WORK] There is another aspect of MLC flashes which mayneed closer attention: the "paired pages" problem (e.g., seethisPower Point presentation). Namely, MLC NAND pages are coupled in asense that if you cut power while writing to a page, you corrupt notonly this page, but also one of the previous pages which is paired withthe current one. For example, pages 0 and 3, 1 and 4, 2 and 5, 3 and 6in and so on (in the same eraseblock) may be paired (page distance is4, but there may be other distances). So if you write data to, say,page 3 and cut the power, you may end up with corrupted data in page 0.UBIFS is not ready to handle this problem at the moment and this needssome work.

    UBIFS can handle this problem by avoiding using the rest of freespace in LEBs after a sync or commit operation. E.g., if start writingto a new journal LEB, and then have a sync or commit, we should "waste"some amount of free space in this LEB to make sure that the previouspaired page does not contain synced data. This way we guarantee thata power cut will not corrupt the synced or committed data. And the"wasted" free space can be re-used after that LEB has beengarbage-collected. Similar to all the other LEBs we write to (LPT, log,orphan, etc). This would require some work and would make UBIFS slower,so this should probably be optional. The way to attack this issue is toimprove UBIFS power cut emulation and implement "paired-pages"emulation, then use theintegck test for testing. Afterall the issues are fixed, real power-cut tests could be carriedout.

  • [NEED WORK] The "unstable bits issue", which is notMLC-specific, describedhere.

关于这一个问题,有一个办法就是在每次写upper page是去对lower page做一次backup,在每次上电时,做一个recover工作。


但对于新版本的内核,为什么之前写的关于这个的一patch,在每次获到rwlock时,出现下面的问题,而且只会第一出现,


=============================================
[ INFO: possible recursive locking detected ]
3.14.0-xilinx-00012-gfba9419-dirty #111 Not tainted
---------------------------------------------
ubirmvol/504 is trying to acquire lock:
 (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20

but task is already holding lock:
 (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20

other info that might help us debug this:
 Possible unsafe locking scenario:

       CPU0
       ----
  lock(&le->mutex);
  lock(&le->mutex);

 *** DEADLOCK ***

 May be due to missing lock nesting notation

2 locks held by ubirmvol/504:
 #0:  (&ubi->device_mutex){+.+...}, at: [<c02ebd60>] ubi_cdev_ioctl+0x2d4/0x90c
 #1:  (&le->mutex){+.+...}, at: [<c02edbe4>] leb_write_lock+0x18/0x20

stack backtrace:
CPU: 1 PID: 504 Comm: ubirmvol Not tainted 3.14.0-xilinx-00012-gfba9419-dirty #111
[<c001e2a4>] (unwind_backtrace) from [<c0019d38>] (show_stack+0x10/0x14)
[<c0019d38>] (show_stack) from [<c041b294>] (dump_stack+0x84/0xd4)
[<c041b294>] (dump_stack) from [<c0061d6c>] (__lock_acquire+0x1cc0/0x1d58)
[<c0061d6c>] (__lock_acquire) from [<c00622b8>] (lock_acquire+0x60/0x74)
[<c00622b8>] (lock_acquire) from [<c041fde8>] (down_write+0x40/0x54)
[<c041fde8>] (down_write) from [<c02edbe4>] (leb_write_lock+0x18/0x20)
[<c02edbe4>] (leb_write_lock) from [<c02f7198>] (ubi_backup_data_to_backup_volume+0xf4/0x47c)
[<c02f7198>] (ubi_backup_data_to_backup_volume) from [<c02f1180>] (ubi_io_write+0x340/0x6c4)
[<c02f1180>] (ubi_io_write) from [<c02ee668>] (ubi_eba_write_leb+0x540/0x6b0)
[<c02ee668>] (ubi_eba_write_leb) from [<c02e7560>] (ubi_change_vtbl_record+0xc8/0x12c)
[<c02e7560>] (ubi_change_vtbl_record) from [<c02e8de8>] (ubi_remove_volume+0x100/0x1f0)
[<c02e8de8>] (ubi_remove_volume) from [<c02ebd6c>] (ubi_cdev_ioctl+0x2e0/0x90c)
[<c02ebd6c>] (ubi_cdev_ioctl) from [<c00de480>] (vfs_ioctl+0x18/0x34)
[<c00de480>] (vfs_ioctl) from [<c00df0b8>] (do_vfs_ioctl+0x5b8/0x600)
[<c00df0b8>] (do_vfs_ioctl) from [<c00df138>] (SyS_ioctl+0x38/0x54)
[<c00df138>] (SyS_ioctl) from [<c00163c0>] (ret_fast_syscall+0x0/0x48)

这个问题.....................................................

你可能感兴趣的:(ubi patch for MLC nand power loss (1))