OpenWRT 内核恐慌

本人所写的博客都为开发之中遇到问题记录的随笔,主要是给自己积累些问题。免日后无印象,如有不当之处敬请指正(欢迎进扣群 24849632 探讨问题);

因为之前的开发板内存仅64M,在运行SWOOLE时一旦运行便被内核KILL掉进程了,因此在网上又淘了一块128M的路由板子,到手后将uboot, 内核镜像统统重新整了一片,并且把LUCI/LUA也选上了;满心开心的一开机

## Booting image at bc050000 ...
   Image Name:   MIPS OpenWrt Linux-4.14.171
   Image Type:   MIPS Linux Kernel Image (lzma compressed)
   Data Size:    14125924 Bytes = 13.5 MB
   Load Address: 80000000
   Entry Point:  80000000
   Verifying Checksum ... OK
   Uncompressing Kernel Image ... 

期待中。。。

No initrd
## Transferring control to Linux (at address 80000000) ...
## Giving linux memsize in MB, 128

Starting kernel ...

[    0.000000] Linux version 4.14.171 (bruce@gitsvr) (gcc version 8.3.0 (OpenWrt GCC 8.3.0 r12380-71f3179fc8)) #0 Fri Feb 28 23:54:41 2020
[    0.000000] Board has DDR2
[    0.000000] Analog PMU set to hw control
[    0.000000] Digital PMU set to hw control
[    0.000000] SoC Type: MediaTek MT7628AN ver:1 eco:2
[    0.000000] bootconsole [early0] enabled
[    0.000000] CPU0 revision is: 00019655 (MIPS 24KEc)
[    0.000000] MIPS: machine is Mediatek MT7628AN evaluation board
[    0.000000] Determined physical RAM map:
[    0.000000]  memory: 08000000 @ 00000000 (usable)
[    0.000000] Initrd not found or empty - disabling initrd
[    0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
[    0.000000] Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff]
[    0.000000] random: get_random_bytes called from start_kernel+0x98/0x4a0 with crng_init=0
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32480
[    0.000000] Kernel command line: console=ttyS0,115200 rootfstype=squashfs,jffs2
[    0.000000] PID hash table entries: 512 (order: -1, 2048 bytes)
[    0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
[    0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
[    0.000000] Writing ErrCtl register=000724c4
[    0.000000] Readback ErrCtl register=000724c4
[    0.000000] Memory: 111252K/131072K available (3946K kernel code, 181K rwdata, 900K rodata, 13204K init, 205K bss, 19820K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 256
[    0.000000] intc: using register map from devicetree
[    0.000000] CPU Clock: 575MHz
[    0.000000] timer_probe: no matching timers found
[    0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 6647862422 ns
[    0.000012] sched_clock: 32 bits at 287MHz, resolution 3ns, wraps every 7469508094ns
[    0.007540] Calibrating delay loop... 380.92 BogoMIPS (lpj=1904640)
[    0.073455] pid_max: default: 32768 minimum: 301
[    0.078195] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.084546] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes)
[    0.097639] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.107152] futex hash table entries: 256 (order: -1, 3072 bytes)
[    0.113140] pinctrl core: initialized pinctrl subsystem
[    0.119276] NET: Registered protocol family 16
[    0.151031] mt7621_gpio 10000600.gpio: registering 32 gpios
[    0.156629] mt7621_gpio 10000600.gpio: registering 32 gpios
[    0.162239] mt7621_gpio 10000600.gpio: registering 32 gpios
[    0.172745] clocksource: Switched to clocksource MIPS
[    0.178905] NET: Registered protocol family 2
[    0.184049] TCP established hash table entries: 1024 (order: 0, 4096 bytes)
[    0.190741] TCP bind hash table entries: 1024 (order: 0, 4096 bytes)
[    0.196928] TCP: Hash tables configured (established 1024 bind 1024)
[    0.203185] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    0.208785] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    0.215179] NET: Registered protocol family 1
[    2.572755] random: fast init done
[   22.350231] Kernel panic - not syncing: LZMA data is corrupt
[   22.355650] Rebooting in 1 seconds..

这是什么情况??

Kernel panic - not syncing: LZMA data is corrupt

内核没恐慌,我慌得一匹

回头又把LUCI/LUA/PHP下没关系的包统统删除了,重新编译,固件大小12M以内,一启动竟然正常了,那么问题不是UBOOT,就是打包内核的原因了;得让我好好找一找。

内核解压路径为:

查找对应解压方法

lib\decompress.c
decompress_fn __init decompress_method(const unsigned char *inbuf, long len,
				const char **name)
{
	const struct compress_format *cf;

	if (len < 2) {
		if (name)
			*name = NULL;
		return NULL;	/* Need at least this much... */
	}

	pr_debug("Compressed data magic: %#.2x %#.2x\n", inbuf[0], inbuf[1]);

	for (cf = compressed_formats; cf->name; cf++) {
		if (!memcmp(inbuf, cf->magic, 2))
			break;

	}
	if (name)
		*name = cf->name; // lzma-openwrt
	return cf->decompressor;
}

init\initramfs.c

rootfs_initcall(populate_rootfs)(L665) ->populate_rootfs(L613) ->unpack_to_rootfs(L487) ->unlzma

lib\decompress_unlzma.c : L317
static inline int INIT write_byte(struct writer *wr, uint8_t byte)
{
	wr->buffer[wr->buffer_pos++] = wr->previous_byte = byte;
	if (wr->flush && wr->buffer_pos == wr->header->dict_size) {
		wr->buffer_pos = 0;
		wr->global_pos += wr->header->dict_size;
		if (wr->flush((char *)wr->buffer, wr->header->dict_size)
				!= wr->header->dict_size)
			return -1;
……

lib\decompress_unlzma.c : L331
static inline int INIT copy_byte(struct writer *wr, uint32_t offs)
{
	return write_byte(wr, peek_old_byte(wr, offs));
}

static inline int INIT copy_bytes(struct writer *wr,
					 uint32_t rep0, int len)
{
	int slen = len; 
	do {
		if (copy_byte(wr, rep0))
			return -1;
……
}

lib\decompress_unlzma.c : L398
static inline int INIT process_bit1(struct writer *wr, struct rc *rc,
					    struct cstate *cst, uint16_t *p,
					    int pos_state, uint16_t *prob) {
……
	return copy_bytes(wr, cst->rep0, len);
}


lib\decompress_unlzma.c : L544
STATIC inline int INIT unlzma(unsigned char *buf, long in_len,
			      long (*fill)(void*, unsigned long),
			      long (*flush)(void*, unsigned long),
			      unsigned char *output,
			      long *posp,
			      void(*error)(char *x)
	)
{
……

init\initramfs.c : L451
static char * __init unpack_to_rootfs(char *buf, unsigned long len)
{
	... ...
		decompress = decompress_method(buf, len, &compress_name); // unlzma
		pr_debug("Detected %s compressed data\n", compress_name);
		if (decompress) {
			int res = decompress(buf, len, NULL, flush_buffer, NULL,
				   &my_inptr, error);
			if (res)
				error("decompressor failed");
……

init\initramfs.c : L610 
static int __init populate_rootfs(void)
{
	/* Load the built in initramfs */
	char *err = unpack_to_rootfs(__initramfs_start, __initramfs_size);
	if (err)
		panic("%s", err); /* Failed to decompress INTERNAL initramfs */
	/* If available load the bootloader supplied initrd */
	if (initrd_start && !IS_ENABLED(CONFIG_INITRAMFS_FORCE)) {
#ifdef CONFIG_BLK_DEV_RAM
		int fd;
		printk(KERN_INFO "Trying to unpack rootfs image as initramfs...\n");
		err = unpack_to_rootfs((char *)initrd_start,
			initrd_end - initrd_start);
……

由以上调用流程可知

wr->flush ==== flush_buffer

init\initramfs.c : 424
static long __init flush_buffer(void *bufv, unsigned long len)
{
	char *buf = (char *) bufv;
	long written;
	long origLen = len;
	if (message)
		return -1;
	while ((written = write_buffer(buf, len)) < len && !message) {
		char c = buf[written];
		if (c == '0') {
			buf += written;
			len -= written;
			state = Start;
		} else if (c == 0) {
			buf += written;
			len -= written;
			state = Reset;
		} else
			error("junk in compressed archive");
	}
	return origLen;
}

解压缩出错是flush_buffer函数内出错,接下来将会使用printk进一步跟踪出错的具体原因是什么?

在以下代码中进一步定位发现出现问题的位置在build_dir\target-mipsel_24kc_musl\linux-ramips_mt76x8\linux-4.14.171\lib\decompress_unlzma.c, L532行处

		if (++(cst->rep0) == 0) { 
			printk("cst->rep0===0\n"); 
			return 0;
		}
		if (cst->rep0 > wr->header->dict_size
				|| cst->rep0 > get_pos(wr)) {
			printk("cst->rep0(%d) > wr->header->dict_size(%08x): %d\n", cst->rep0, wr->header->dict_size, cst->rep0 > wr->header->dict_size); 
			printk("cst->rep0(%d) > get_pos({wr->global_pos(%d)+wr->buffer_pos(%d)})(%d): %d\n", cst->rep0, wr->global_pos, wr->buffer_pos, get_pos(wr), cst->rep0 > get_pos(wr)); 
			return -1;
		}
	}

OpenWRT 内核恐慌_第1张图片

可以看出解压缩时出现了文件错误

于是在 https://breed.hackpascal.net/ 下载了一个breed,写入开发板后,系统正常引导,那么确定问题在u-boot无疑了,为了进一步验证uboot的问题,在uboot中打印出了待解压lzma的大小 uboot-mt76x8/lib_generic\LzmaDecode.c:lzmaBuffToBuffDecompress(L666)

  memcpy(properties,src,sizeof(properties));
  src += sizeof(properties);
  outSize = 0;
  for (ii = 0; ii < 4; ii++)
  {
    unsigned char b;
    memcpy(&b,src, sizeof(b));
	src += sizeof(b);
    outSize += (unsigned int)(b) << (ii * 8);
  }
  printf("\n outSize: %d\n", outSize);

及uboot-mt76x8/common\cmd_bootm.c:do_bootm (L473)

		unsigned int destLen = 0;
                i = lzmaBuffToBuffDecompress ((char*)ntohl(hdr->ih_load),
                                &destLen, (char *)data, len);
                if (i != LZMA_RESULT_OK) {
                        printf ("LZMA ERROR %d - must RESET board to recover\n", i);
                        SHOW_BOOT_PROGRESS (-6);
                        udelay(100000);
                        do_reset (cmdtp, flag, argc, argv);
                }
                printf("\nUncompression address at %08X\n", (char*)ntohl(hdr->ih_load));
                printf("Uncompression length is %d\n", destLen);

 打印出己解压的大小

OpenWRT 内核恐慌_第2张图片

发现两次的大小是一致的,那么实际文件与写入flash大小是否一致呢,通过分析uboot的代码,可知编译生成的linux镜像包的前64字节为描述符,剩余部份为lzma压缩包文件,而lzma文件可以通过7-zip 或 win-rar文件可以打开的,所以通过ultraEdit编辑bin文件后并另存为.lzma文件,然后用7-zip打开,发现它们的大小也是一致的,那么是不是linux引导参数的问题呢? 

OpenWRT 内核恐慌_第3张图片

OpenWRT 内核恐慌_第4张图片

 

 

参考资料:

Linux Kernel Panic报错解决思路U-boot如何向kernel传递参数 + kernel如何读取参数

uboot环境变量(设置bootargs向linux内核传递正确的参数)

ramdisk配置、解压、创建rootfs、启动简单分析

initramfs的加载过程(从uboot到kernel)

使用initramfs启动Linux成功

SPI4种模式

 

你可能感兴趣的:(智能路由问题总结)