终于移植好了2.6.39的内核,烧写ubi文件系统后,第一次打印到/init初始化了,虽然oops了,但还是感觉离最后的成功越来越近了。
可是,偶然断电重启发现,debug口打印却和第一次不一样了,在attach mtd的时候,内核已经开始打印了一些错误,如下
3 cmdlinepart partitions found on MTD device atmel_nand Creating 3 MTD partitions on "atmel_nand": 0x000000000000-0x000000500000 : "bstrap/uboot/kernel" 0x000000500000-0x000002500000 : "rootfs" 0x000002500000-0x000008000000 : "usrdata" UBI: attaching mtd1 to ubi0 UBI: physical eraseblock size: 131072 bytes (128 KiB) UBI: logical eraseblock size: 126976 bytes UBI: smallest flash I/O unit: 2048 UBI: VID header offset: 2048 (aligned 2048) UBI: data offset: 4096 atmel_nand atmel_nand: Too many error. atmel_nand atmel_nand: Too many error. atmel_nand atmel_nand: Too many error. atmel_nand atmel_nand: Too many error. UBI error: ubi_io_read: error -1 while reading 64 bytes from PEB 128:0, read 0 byte slab error in kmem_cache_destroy(): cache `ubi_scan_leb_slab': Can't free all objects Backtrace: [<c003b898>] (dump_backtrace+0x0/0x104) from [<c0352cf0>] (dump_stack+0x18/0x1c) r6:c7875000 r5:c7951ce0 r4:c78a30c0 [<c0352cd8>] (dump_stack+0x0/0x1c) from [<c00a6150>] (kmem_cache_destroy+0x90/0xf8) [<c00a60c0>] (kmem_cache_destroy+0x0/0xf8) from [<c02264dc>] (ubi_scan+0x6b8/0x9ac) r4:00000080 [<c0225e24>] (ubi_scan+0x0/0x9ac) from [<c021e4b4>] (ubi_attach_mtd_dev+0x41c/0xa08)尝试了多次均是一样的情况。第一次烧写时,可以跑到最后的地方
VFS: Mounted root (ubifs filesystem) on device 0:12. Freeing init memory: 136K Failed to execute /init. Attempting defaults... Kernel panic - not syncing: Attempted to kill init! Backtrace: [<c003b898>] (dump_backtrace+0x0/0x104) from [<c0352cf0>] (dump_stack+0x18/0x1c) r6:c781a00c r5:00000004 r4:c0489878 [<c0352cd8>] (dump_stack+0x0/0x1c) from [<c0352e70>] (panic+0x5c/0x188) [<c0352e14>] (panic+0x0/0x188) from [<c004b8b4>] (do_exit+0x98/0x644) r3:c046c394 r2:c7819cfc r1:c781be5c r0:c0403e44 r7:c781a00c [<c004b81c>] (do_exit+0x0/0x644) from [<c004bef8>] (do_group_exit+0x5c/0xc4) r7:c781cd60 [<c004be9c>] (do_group_exit+0x0/0xc4) from [<c0055eac>] (get_signal_to_deliver+0x30c/0x348) r4:00000004 [<c0055ba0>] (get_signal_to_deliver+0x0/0x348) from [<c003ac00>] (do_signal+0xc8/0x5a0) [<c003ab38>] (do_signal+0x0/0x5a0) from [<c003b3f8>] (do_notify_resume+0x20/0x74) [<c003b3d8>] (do_notify_resume+0x0/0x74) from [<c0038b94>] (work_pending+0x24/0x28) r4:00000000但是重启后,就在ubi最开始探测的时候就已经出问题了。网上各种放狗搜,各种说法都有。
于是,各种排除。怀疑的方向主要是这样的,为什么第一次是好好的,重启后不行了呢?是因为第一次运行修改了flash的数据??破坏了文件系统结构?如果是这样,谁修改的?是内核??内核做什么了???在什么地方修改的???其实,这些疑问排查起来非常困难。
于是,在想,是不是ECC校验出的问题。但是转念一想,uboot 就是配置的4BIT的pmecc,kernel也是一样。亲手修改的代码啊!!!!
想了一上午,没头绪。还是看看代码吧,果不然,有个变量写错了。
} if (nand_chip->ecc.mode == NAND_ECC_HW) { /* ECC is calculated for the whole page (1 step) */ nand_chip->ecc.size = mtd->writesize; //modified by zhaozx /* set ECC page size and oob layout */ switch (mtd->writesize) { case 2048: nand_chip->ecc.bytes = 28; nand_chip->ecc.steps = 1; nand_chip->ecc.layout = &pmecc_oobinfo_2048; host->mm = GF_DIMENSION_13; host->nn = (1 << host->mm) - 1; /* 4-bits correction */ host->tt = 4; host->sector_size = 512; host->sector_number = mtd->writesize / host->sector_size; host->ecc_bytes_per_sector = 7;///BUG here.这个地方官方默认是2bit校验是4。现在用4bit ecc应该是7,悲剧!!!! host->alpha_to = pmecc_get_alpha_to(host); host->index_of = pmecc_get_index_of(host); break;
真相只有一个,就是在代码里!!!!
从打印入手:
关键的打印信息[UBI error: ubi_io_read: error -1 while reading 64 bytes from PEB 128:0, read 0 byte]
这句话直面的意思就是,从flash读数据,结果一个字节没读出来????为什么,直观的感觉就是ecc校验出问题了。
因为用到的pmecc,校验码是硬件写入到flash的。如果此时对pmecc的寄存器的配置,与写数据时的配置不一样,那么必然会出问题。
所以,推测,问题应该出在ecc的校验!!也就是上面说的,发现代码的书写有个bug!!!