转:http://wiki.laptop.org/go/NAND_Flash_Bad_Block_Table
This document describes the on-FLASH data structures that OLPC uses to maintain NAND FLASH bad-block information. It is a specific subcase of the general NAND FLASH bad-block scheme in the Linux "mtd" (memory technology device) subsystem as of Linux version 2.6.22.
This document focuses on the specific choices that are relevant for OLPC, omitting inapplicable variations of the general scheme. Those choices include:
NAND FLASH chips typically have some bad blocks as shipped from the factory, and additional blocks may go bad over time due to "wear" from repeated erasure. A factory-fresh NAND chip is shipped in the "erased" state with every good page and its OOB containing all 0xFF bytes. Factory-detected bad blocks have 0 in the first OOB byte of the first or second page in that block.
The factory bad block information must be copied elsewhere before the first time that the device is written, because writes to other blocks could thus make them appear to be bad. (In particular, the CaFe chip uses the first OOB byte for hardware-generated CRC, so it's not feasible to reserve that byte to mean "this block is bad".) "Bad block table" means the storage for the bad block information after it has been copied from the factory locations (and later amended to include blocks that have gone bad from wear).
The various software components that access the NAND FLASH, including the boot firmware and OS drivers, must understand the bad block table, using it to avoid preexisting bad blocks and updating it with newly-detected bad blocks.
The bad block table is a critical resource whose integrity is essential for the system to function properly, so updates must be done carefully to prevent loss of its information if an update is interrupted (perhaps due to power loss at a bad time). In general, it's not possible to fully recover the bad block information by re-scanning the device. The bad block storage format is redundant, and there is a prescribed procedure for updating it safely.
The totality of onboard NAND FLASH storage on an OLPC system can be arranged in several ways. There might a single silicon chip in a single package, two chips in one package, or multiple packages each with one or two chips inside. The driver software "hides" this internal organization, so the totality of onboard NAND FLASH appears as a single contiguous array.
The implication for this bad block table specification is that a single bad block table (with its redundant mirror copy) encompasses all of the chips and packages, instead of the bad block information being "per chip".
The bad block table consists of two subtables, a primary copy and a mirror copy. Each subtable is in a separate block, beginning in the first page of its block. The bad block table blocks are located somewhere within the last four blocks of NAND FLASH. Only two blocks would be necessary in the absence of bad blocks within the last four; the range of four provides a little slack in case one or two of those blocks is bad.
When software scans to find the bad block tables, it starts at the end of NAND FLASH and scans backwards until either both tables (primary and mirror) are found, or until 4 blocks have been scanned.
A NAND FLASH device with 3 or more factory bad blocks in the last 4 cannot be used. This is a rare occurence that reduces factory yield insignificantly. Furthermore, blocks in that region should wear out quite infrequently, with an insignificant effect on field failure rate.
A bad block subtable consists of an identifying header and a bitmap describing the dispostion of all the device's blocks.
The header is present in the OOB of each occupied page in the block that contains the bad block subtable. The bitmap is stored in the page data area, spanning as many pages as necessary to map the entire NAND FLASH.
The bitmap has two bits for each block, so each 2 KB page can map 2 K * 4 = 8K blocks = 1 GB. A 4 GB FLASH thus requires 4 pages. The format works for NAND FLASH sizes up to 128 GB - assuming that the block size remains 128 KB. (That assumption is not necessarily a good one, as larger FLASH devices might have larger block sizes.)
The header includes a signature that identifies the block as either a primary or a mirror bad block subtable and an incrementing version number that is used to detect and recover from interrupted updates.
The bad block subtable header is in the OOB of each page containing bad block information.
The signature is at OOB offset 0x0e-0x11 (bytes 0-0xd are used by the controller chip's hardware ECC).
Primary signature: 0xe: 'B' 0xf: 'b' 0x10: 't' 0x11: '0'
Mirror signature: 0xe: '1' 0xf: 't' 0x10: 'b' 0x11: 'B'
The version number is one byte at OOB offset 0x12. The version number increments on each bad block table update, wrapping from 0xff back to 0. (Updates are typically very infrequent.) Version number comparison must be done with "circular arithmetic modulo 256" as described in #Update Procedure.
The bad block bitmap begins at the start of the first page in the block and spans as many pages as necessary.
The bitmap has two bits for each block on the NAND FLASH device, encoded as:
Bitmap entries are stored in little-endian order - the two least-significant bits of a byte are for block numbers equal to (0 mod 4), the next two bits are for blocks (1 mod 4), etc. In C:
uint_8 bitmap[], byte;
byte = bitmap[block_number / 4]; // 4 entries per byte
bitshift = (block_number % 4) * 2; // 4 entries per byte, 2 bits per entry
bits = (byte >> bitshift) & 3; // Shift and extract 2 bits
The total number of bytes in the bitmap is total_blocks/4 .
For the bad block table to be consistent, all of the following must be true:
For a primary or secondary subtable to be good:
This safe update procedure prevents loss of the bad block information if a bad block table update is interrupted, by ensuring that at least one copy of the information is always present.
Assumed starting point: there is a primary bad block table and a mirror table that are consistent, i.e. with the same data and the same version number N.
If the all steps complete without error, a scan at a later time will find a self-consistent bad block table.
If a later scan finds an inconsistent table (see #Consistency_Checks), the #Recovery_Procedure should be used to restore the table.
If the update process is interrupted (e.g. by loss of power at a bad time), a subsequent scan will detect an inconsistent table. There are several cases:
A write or erase operation could fail during a bad block table update - not an interrupted write caused by an external event like loss of power, but rather a write failure that is detected by the driver while the system is still running. This could happen if the block used for the bad block table wears out.
To recover from this, that block must be removed from use and the bad block data placed in a different block. To find a suitable block, we use the following:
This algorithm isn't foolproof, but it works almost all the time, and the probability of those blocks wearing out is already low because of infrequent bad block table updates, so the algorithm is good enough.
Version numbers, which are 8 bits wide, must be compared with circular arithmetic modulo 256, to give the correct answer when the version number increments from 255 to 0. This can be done in C with:
signed char va, vb;
if (va - vb > 0) {
// va is greater than vb
} else {
// va is less than or equal to vb
}