最近一直在忙一个项目关于nand设备的driver。所以自从第一篇文章之后一直没有后续。
BTW,上一篇文章中的patchset已经进入lxc的主线。
由于dirver相关的细节涉及到一些公司机密的问题,不能详细说明。今天只能把kernel中
nand相关的一些东西记录一下。注:技术细节直接使用英文书写。
(1)mtd driver
http://www.linux-mtd.infradead.org/
MTD means Memory Technology Device.
Struct mtd_info stand for a mtd device. It means each mtd driver need to fill a structure
of mtd_info for kernel. Then we can call the functions in mtd_info to interact with device.
example:
int mtd_read(struct mtd_info *mtd, loff_t from, size_t len, size_t *retlen,
u_char *buf)
{
int ret_code;
*retlen = 0;
if (from < 0 || from > mtd->size || len > mtd->size - from)
return -EINVAL;
if (!len)
return 0;
/*
* In the absence of an error, drivers return a non-negative integer
* representing the maximum number of bitflips that were corrected on
* any one ecc region (if applicable; zero otherwise).
*/
ret_code = mtd->_read(mtd, from, len, retlen, buf);
if (unlikely(ret_code < 0))
return ret_code;
if (mtd->ecc_strength == 0)
return 0; /* device lacks ecc */
return ret_code >= mtd->bitflip_threshold ? -EUCLEAN : 0;
}
EXPORT_SYMBOL_GPL(mtd_read);
This is a interface we can call in other subsystem in kernel. @mtd is a pointer to mtd_info. @from means the address to
read start, @len means the length we need to read, @retlen is the length we actually read, @buf is the buffer to store data
we read from device. This function will call the ->_read in mtd_info.
Similarly, there are other functions named mtd_write(), mtd_erase() for mtd subsystem. For a flash device, we have to erase
the range before writing.
PS, from for mtd_write() is page aligned, from and len for mtd_erase() are erasesize aligned. but for mtd_read() there is no
align pressure.
(2)nand
Nand is one type of mtd device. As we said above, we should fill the ->read() for mtd_info.
/**
* nand_scan_tail - [NAND Interface] Scan for the NAND device
* @mtd: MTD device structure
*
* This is the second phase of the normal nand_scan() function. It fills out
* all the uninitialized function pointers with the defaults and scans for a
* bad block table if appropriate.
*/
int nand_scan_tail(struct mtd_info *mtd)
{
..........
mtd->_read = nand_read;
..........
}
A normal nand driver will call nand_scan_tail in pci_probe(). Then ->_read in mtd_info will be assigned
as nand_read(). The detail about nand_read() as below.
/**
* nand_read - [MTD Interface] MTD compatibility function for nand_do_read_ecc
* @mtd: MTD device structure
* @from: offset to read from
* @len: number of bytes to read
* @retlen: pointer to variable to store the number of read bytes
* @buf: the databuffer to put data
*
* Get hold of the chip and call nand_do_read.
*/
static int nand_read(struct mtd_info *mtd, loff_t from, size_t len,
size_t *retlen, uint8_t *buf)
{
struct mtd_oob_ops ops;
int ret;
nand_get_device(mtd, FL_READING);
ops.len = len;
ops.datbuf = buf;
ops.oobbuf = NULL;
ops.mode = MTD_OPS_PLACE_OOB;
ret = nand_do_read_ops(mtd, from, &ops);
*retlen = ops.retlen;
nand_release_device(mtd);
return ret;
}
As you saw above, the most strange thing I guess should be the oobbuf = NULL, right?
Haha, oob means "out of band". There is a little space in device for each page we can
use to store some extra data, and we call it as oob. In nand_read(), we do not care
about oob, then assign it as NULL.
After prepare the ops for read, call nand_do_read_ops() to do the real reading.
(3)nandsim
This is a kernel module which can simulate a nand device.
/*
* Module initialization function
*/
static int __init ns_init_module(void)
{
struct nand_chip *chip;
struct nandsim *nand;
int retval = -ENOMEM, i;
if (bus_width != 8 && bus_width != 16) {
NS_ERR("wrong bus width (%d), use only 8 or 16\n", bus_width);
return -EINVAL;
}
/* Allocate and initialize mtd_info, nand_chip and nandsim structures */
nsmtd = kzalloc(sizeof(struct mtd_info) + sizeof(struct nand_chip)
+ sizeof(struct nandsim), GFP_KERNEL);
if (!nsmtd) {
NS_ERR("unable to allocate core structures.\n");
return -ENOMEM;
}
............................
retval = nand_scan_ident(nsmtd, 1, NULL);
if (retval) {
NS_ERR("cannot scan NAND Simulator device\n");
if (retval > 0)
retval = -ENXIO;
goto error;
}
.............................
retval = nand_scan_tail(nsmtd);
if (retval) {
NS_ERR("can't register NAND Simulator\n");
if (retval > 0)
retval = -ENXIO;
goto error;
}
.............................
/* Register NAND partitions */
retval = mtd_device_register(nsmtd, &nand->partitions[0],
nand->nbparts);
if (retval != 0)
goto err_exit;
return 0;
..............................
}
module_init(ns_init_module);
Okey, for a better explanation, I removed some annoying detail. :)
a). Alloc a mtd_info
b). call nand_scan_ident() This function will read the identity from device and check it.
c). call nand_scan_tail(), This function will do the real scanning, it will assign ->read()
as nand_read().
d). call mtd_device_register() to register the mtd device.
This module is very convenient to testing mtd. To use it we need to compile it
with CONFIG_MTD_NAND_NANDSIM=m.
[root@yds-pc linux]# lsmod|grep nand
nandsim 32965 0
nand 68251 1 nandsim
nand_ecc 13098 1 nand
nand_ids 12625 1 nand
mtd 53331 3 nand,nandsim
[root@yds-pc linux]# ls /dev/mtd0*
/dev/mtd0 /dev/mtd0ro
[root@yds-pc linux]# mtdinfo /dev/mtd0
mtd0
Name: NAND simulator partition 0
Type: nand
Eraseblock size: 16384 bytes, 16.0 KiB
Amount of eraseblocks: 8192 (134217728 bytes, 128.0 MiB)
Minimum input/output unit size: 512 bytes
Sub-page size: 256 bytes
OOB size: 16 bytes
Character device major/minor: 90:0
Bad blocks are allowed: true
Device is writable: true
(4)mtdtest
There are some tests for mtd in kernel repo.
[root@yds-pc linux]# ls drivers/mtd/tests/*.c
drivers/mtd/tests/mtd_nandecctest.c
drivers/mtd/tests/mtd_test.c
drivers/mtd/tests/nandbiterrs.c
drivers/mtd/tests/oobtest.c
drivers/mtd/tests/pagetest.c
drivers/mtd/tests/readtest.c
drivers/mtd/tests/speedtest.c
drivers/mtd/tests/stresstest.c
drivers/mtd/tests/subpagetest.c
drivers/mtd/tests/torturetest.c
We pick the readtest as the example.
[root@yds-pc linux]# insmod drivers/mtd/tests/mtd_readtest.ko dev=0
[root@yds-pc linux]# dmesg
[ 7581.563765] =================================================
[ 7581.563766] mtd_readtest: MTD device: 0
[ 7581.563769] mtd_readtest: MTD device size 134217728, eraseblock size 16384, page size 512, count of eraseblocks 8192, pages per eraseblock 32, OOB size 16
[ 7581.563772] mtd_test: scanning for bad eraseblocks
[ 7581.563924] mtd_test: scanned 8192 eraseblocks, 0 are bad
[ 7581.563925] mtd_readtest: testing page read
[ 7581.962781] mtd_readtest: finished
[ 7581.962788] =================================================
static int __init mtd_readtest_init(void)
{
..................................................
pr_info("MTD device: %d\n", dev);
mtd = get_mtd_device(NULL, dev);
..................................................
/* Read all eraseblocks 1 page at a time */
pr_info("testing page read\n");
for (i = 0; i < ebcnt; ++i) {
int ret;
if (bbt[i])
continue;
ret = read_eraseblock_by_page(i);
if (ret) {
dump_eraseblock(i);
if (!err)
err = ret;
}
cond_resched();
}
..................................................
}
module_init(mtd_readtest_init);
As usual, I removed some annoying detail in it. Actually the test module is doing
the testing in module_init(); it means if we insmod the mtd_readtest.ko successfully,
this test passed. Haha, sounds strange, right?
In this test, we read the all erase blocks in the for loop by
read_eraseblock_by_page(i);
static int read_eraseblock_by_page(int ebnum)
{
int i, ret, err = 0;
loff_t addr = ebnum * mtd->erasesize;
void *buf = iobuf;
void *oobbuf = iobuf1;
for (i = 0; i < pgcnt; i++) {
memset(buf, 0 , pgsize);
ret = mtdtest_read(mtd, addr, pgsize, buf);
if (ret) {
if (!err)
err = ret;
}
if (mtd->oobsize) {
struct mtd_oob_ops ops;
ops.mode = MTD_OPS_PLACE_OOB;
ops.len = 0;
ops.retlen = 0;
ops.ooblen = mtd->oobsize;
ops.oobretlen = 0;
ops.ooboffs = 0;
ops.datbuf = NULL;
ops.oobbuf = oobbuf;
ret = mtd_read_oob(mtd, addr, &ops);
if ((ret && !mtd_is_bitflip(ret)) ||
ops.oobretlen != mtd->oobsize) {
pr_err("error: read oob failed at "
"%#llx\n", (long long)addr);
if (!err)
err = ret;
if (!err)
err = -EINVAL;
}
oobbuf += mtd->oobsize;
}
addr += pgsize;
buf += pgsize;
}
return err;
}
At first, we it call mtdtest_read() to read one page from device. this function is
implemented in mtd_test.c, simply calling mtd_read().
Then, we call mtd_read_oob() to read normal data and oob once a time.
到此,已经将最简单的一个mtd/nand的代码结构介绍了一遍,当然在实际开发中比这个复杂很多很多。
包括多管道设备,预读,raid0等等。