21. 重启log分析

案例一	: kernel重启 - mt6580.dtsi
	现象	: 
	平台	: androidN,MTK6580
	排查过程: 1. 打串口log,发现如下:
				[    1.607970] <2>.(2)[1:swapper/0]musb-hdrc musb-hdrc.0.auto: Cannot find usb pinctrl iddig_irq_init
				[    1.609094] <2>.(2)[1:swapper/0]Unable to handle kernel paging request at virtual address fffffff9
				[    1.610245] <2>.(2)[1:swapper/0]pgd = c0004000
				[    1.610794] [fffffff9] *pgd=9fffd821, *pte=00000000, *ppte=00000000
				[    1.611581] <2>-(2)[1:swapper/0]Internal error: Oops: 17 [#1] PREEMPT SMP ARM
				[    2.612481] <2>-(2)[1:swapper/0]Non-crashing CPUs did not react to IPI
				[    2.613303] <2>-(2)[1:swapper/0]CPU: 2 PID: 1 Comm: swapper/0 Tainted: G        W      3.18.35 #2
				[    2.614409] <2>-(2)[1:swapper/0]task: df060000 ti: df04a000 task.ti: df04a000
				[    2.615304] <2>-(2)[1:swapper/0]PC is at pinctrl_select_state+0x84/0x154	【// 重启log可以看【PC】停在哪儿】
				[    2.616140] <2>-(2)[1:swapper/0]LR is at otg_int_init+0x64/0x154	
				...
				[    3.000390] <2>-(2)[1:swapper/0][] (pinctrl_select_state) from [] (otg_int_init+0x64/0x154)
																							【// 重启log可以根据这条log看停在哪个【函数】】
				[    3.001803]  r9:60000113 r8:60000113 r7:c1151d70 r6:c1151e30 r5:c109a408 r4:c1151db0
				[    3.003408] <2>-(2)[1:swapper/0][] (otg_int_init) from [] (mt_usb_otg_init+0x120/0x230)
				[    3.004777]  r6:c1151e30 r5:c1151db0 r4:df182140
				[    3.005816] <2>-(2)[1:swapper/0][] (mt_usb_otg_init) from [] (mt_usb_init+0x1d8/0x6d0)
				[    3.007176]  r6:e1700000 r5:c1151d70 r4:df182140 r3:c12ecaa0
				[    3.008417] <2>-(2)[1:swapper/0][] (mt_usb_init) from [] (musb_probe+0x2d0/0xb24)
				[    3.009720]  r10:00000088 r9:e1700000 r8:de918700 r7:de915000 r6:c1151df8 r5:df182140
				[    3.011218]  r4:df182000
				[    3.011861] <2>-(2)[1:swapper/0][] (musb_probe) from [] (platform_drv_probe+0x38/0x90)
				[    3.013218]  r10:00000000 r9:df212a00 r8:c10501c0 r7:c10501c0 r6:fffffdfb r5:de915010
				[    3.014714]  r4:ffffffed
				[    3.015361] <2>-(2)[1:swapper/0][] (platform_drv_probe) from [] (driver_probe_device+0x1d8/0x43c)
				[    3.016841]  r7:c113cdbc r6:c1094378 r5:de915010 r4:c113cdb0
				[    3.018096] <2>-(2)[1:swapper/0][] (driver_probe_device) from [] (__driver_attach+0x94/0x98)
				[    3.019521]  r10:00000000 r9:df212a00 r8:c0f00600 r7:00000000 r6:de915044 r5:c10501c0
				[    3.021013]  r4:de915010
				[    3.021646] <2>-(2)[1:swapper/0][] (__driver_attach) from [] (bus_for_each_dev+0x68/0x9c)
				[    3.023037]  r6:c03c2f08 r5:c10501c0 r4:00000000 r3:00000000
				[    3.024278] <2>-(2)[1:swapper/0][] (bus_for_each_dev) from [] (driver_attach+0x24/0x28)
				[    3.025649]  r6:c1040f40 r5:df211d00 r4:c10501c0
				[    3.026690] <2>-(2)[1:swapper/0][] (driver_attach) from [] (bus_add_driver+0x15c/0x218)
				[    3.028185] <2>-(2)[1:swapper/0][] (bus_add_driver) from [] (driver_register+0x80/0x100)
				[    3.029567]  r7:df04a030 r6:c0f2efb8 r5:c0f617d8 r4:c10501c0
				[    3.030840] <2>-(2)[1:swapper/0][] (driver_register) from [] (__platform_driver_register+0x5c/0x64)
				[    3.032341]  r5:c0f617d8 r4:00000000
				[    3.033203] <2>-(2)[1:swapper/0][] (__platform_driver_register) from [] (musb_init+0x34/0x48)
				[    3.034790] <2>-(2)[1:swapper/0][] (musb_init) from [] (do_one_initcall+0x140/0x200)
				[    3.036128]  r4:c0f617d8 r3:00000000
				[    3.037016] <2>-(2)[1:swapper/0][] (do_one_initcall) from [] (kernel_init_freeable+0x144/0x1e8)
				[    3.038474]  r10:c0f6200c r9:00000141 r8:c0f00600 r7:c10dc7c0 r6:c10dc7c0 r5:c0f62000
				[    3.039968]  r4:00000006
				[    3.040635] <2>-(2)[1:swapper/0][] (kernel_init_freeable) from [] (kernel_init+0x10/0x100)
				[    3.042036]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c0a80e4c
				[    3.043523]  r4:00000000
				[    3.044170] <2>-(2)[1:swapper/0][] (kernel_init) from [] (ret_from_fork+0x14/0x34)
				[    3.045483]  r4:00000000 r3:df04a000
				...
				[    7.201650] Rebooting in 1 seconds..
	
			  2. 根据log分析:由usb20_host.c中的函数otg_int_init()调用pinctrl_select_state()时出错导致重启
			
				 往前找到:Cannot find usb pinctrl iddig_irq_init;
				 到此问题明了:重启是由于mt6580.dtsi里缺少了一个pin属性:“iddig_irq_init”
			  
			  3. 查看log:在mt6580.dtsi中删掉了这一属性
				 --> 还原这一属性,ok。
			
	处理方案:  明天特意删掉尝试一下 --> 尝试删掉之后编译报错!【存疑】
	总结	:


案例二	: 开机重启 - kernel无法启动 - 怀疑是kernel所在分区有坏块 - 【最后换了块屏就好了】
	现象	: 开机还在log界面就重启 ,只有一台机器重启
	平台	: androidN,MTK6737
	排查过程: 1. 抓取串口log:
				[7360] cmdline: console=tty0 console=ttyMT0,921600n1 root=/dev/ram vmalloc=496M androidboot.hardware=mt6735 slub_max_order=0 slub_debug=O androidboot.verifiedbootstate=green bootopt=64S3,32N2,64N2 printk.disable_uart=1 bootprof.pl_t=1809 bootprof.lk_t=3748 boot_reason=0 androidboot.serialno=0123456789ABCDEF androidboot.bootreason=power_key gpt=1
				[7360] lk boot time = 3748 ms
				[7360] lk boot mode = 0
				[7360] lk boot reason = power_key
				[7360] lk finished --> jump to linux kernel 64Bit	【// 由lk进入kernel】

				[7360] 
				[LK]jump to K64 0x40080000
				[7360] smc jump
				[ATF](0)[0.0]save kernel info
				[ATF](0)[0.0]Kernel_EL2
				[ATF](0)[0.0]Kernel is 64Bit
				[ATF](0)[0.0]pc=0x40080000, r0=0x4e000000, r1=0x0
				INFO:    BL3-1: Preparing for EL3 exit to normal world, Kernel
				INFO:    BL3-1: Next image address = 0x40080000
				INFO:    BL3-1: Next image spsr = 0x3c9
				[ATF](0)[0.0]el3_exit
					【// 以下为重启后】
				[ATF](0)[30.162857]aee_wdt_dump: on cpu0
				[ATF](0)[30.163290](0) pc: lr: sp: pstate: 600000c5
				[ATF](0)[30.164452](0) x29: ffffffc000dffea0 x28: 0000004040000000 x27: ffffffc000080270
				[ATF](0)[30.165430](0) x26: ffffffc000e94a00 x25: 0000000000000003 x24: 0000000704c2eee5
				[ATF](0)[30.166406](0) x23: 0000000000000000 x22: ffffffc000eced08 x21: ffffffc03f738d48
				[ATF](0)[30.167382](0) x20: ffffffc000eced08 x19: 0000000000000000 x18: 0000000000000070
				[ATF](0)[30.168359](0) x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
				[ATF](0)[30.169335](0) x14: 0000000000000007 x13: 0000000000000000 x12: 0000000000000078
				[ATF](0)[30.170311](0) x11: 000000000000000f x10: 000000000000000c x09: 00000000ffffffff
				[ATF](0)[30.171287](0) x08: 0000000000000007 x07: 0000000000035f48 x06: 00000000000006de
				[ATF](0)[30.172263](0) x05: 00405f7e0099cf00 x04: 0000000000000019 x03: 0000000099d89d8a
				[ATF](0)[30.173239](0) x02: 0000000000085b0c x01: ffffffc000fb4000 x00: 0000000000000000
				[pmic_init] Preloader Start,MT6328 CHIP Code = 0x2820
			  
			  3. 由log分析 : 在lk进入kernel时重启,可能的原因为:flash有坏块,kernel刚好在该坏块
				 -> 重新格式化下载几次 -> 一样
				 
			  4. 换块flash -> 一样 -> 再换一块 -> 结果未知(询问项目程工)
			  
			  5. 最后换了块屏就好了

案例三	: 开机过一会儿重启(约5分钟) - 死锁
	现象	: 
	平台	: androidO,MTK6737
	排查过程: 1. 导出mtklog, 查看last_kmsg.log
				// 查看PC指针停在哪里
				[  121.420261] -(2)[816:system_server]PC is at spin_bug+0x1d8/0x220	
				[  121.420614] -(2)[816:system_server]LR is at spin_bug+0x1cc/0x220
				[  121.420915] -(2)[816:system_server]pc : [] lr : [] pstate: 800001c5
				[  121.421198] -(2)[816:system_server]sp : ffffffc05c0138c0
				// 调用踪迹
				[  121.640932] -(2)[816:system_server][] spin_bug+0x1d8/0x220
				[  121.641310] -(2)[816:system_server][] do_raw_spin_lock+0x58/0x338
				[  121.641660] -(2)[816:system_server][] _raw_spin_lock_irqsave+0x5c/0x84
				[  121.642001] -(2)[816:system_server][] down_interruptible+0x18/0x60
				[  121.642368] -(2)[816:system_server][] mc3xxx_mutex_lock+0x18/0x2c		// 猜测是锁导致,先把gsensor去掉看是否重启 -> 去掉后正常
				[  121.642744] -(2)[816:system_server][] mc3xxx_suspend+0x8c/0xec
				[  121.643096] -(2)[816:system_server][] i2c_legacy_suspend+0x38/0x48
				[  121.643457] -(2)[816:system_server][] i2c_device_pm_suspend+0x34/0x38
	
			  2. 由于是8.0临时版本,未继续深究

案例四	: 开机重启 - 温度检测NTC电阻异常
	现象	: 
	平台	: androidL,MTK6580
	排查过程: 1. 抓取串口log:
				[   16.177681] <1>-(1)[345:thermal_manager]PC is at tspa_sysrst_set_cur_state+0x6c/0xd0
				[   16.178644] <1>-(1)[345:thermal_manager]LR is at mtk_cooling_wrapper_set_cur_state+0x190/0x4d0
				// 调用踪迹 - 根据“thermal_cdev_update” 推测是温度检测导致
				[   16.316470] <1>-(1)[345:thermal_manager][] (tspa_sysrst_set_cur_state) from [] (mtk_cooling_wrapper_set_cur_state+0x190/0x4d0)
				[   16.318097]  r4:dcd28b80 r3:c05a645c
				[   16.318554] <1>-(1)[345:thermal_manager][] (mtk_cooling_wrapper_set_cur_state) from [] (thermal_cdev_update+0xa0/0x18c)
				[   16.320105]  r10:dcdc4a18 r9:dcdc4b28 r8:c0dbcff8 r7:dcdc4b40 r6:dcdc4a00 r5:00000001
				[   16.321081]  r4:dcdc4ae4
				[   16.321404] <1>-(1)[345:thermal_manager][] (thermal_cdev_update) from [] (backward_compatible_throttle+0x94/0xbc)
				[   16.322892]  r10:00000048 r9:dc571a00 r8:00000001 r7:00000000 r6:00000000 r5:dc571b54
				[   16.323868]  r4:dc9e6900
				[   16.324191] <1>-(1)[345:thermal_manager][] (backward_compatible_throttle) from [] (handle_thermal_trip+0x68/0x1e4)
				[   16.325691]  r9:c0d56530 r8:dc571a18 r7:00000000 r6:00000000 r5:00000000 r4:dc571a00
				[   16.326665] <1>-(1)[345:thermal_manager][] (handle_thermal_trip) from [] (thermal_zone_device_update+0x9c/0x158)
			  
			  2. kernel-3.18/drivers/misc/mediatek/thermal/common/thermal_zones/mtk_ts_pa.c
					static int tspa_sysrst_set_cur_state(struct thermal_cooling_device *cdev, unsigned long state)
					{
						cl_dev_sysrst_state = state;
						if (cl_dev_sysrst_state == 1) {
							...
							*(unsigned int *)0x0 = 0xdead;		// 怀疑是这里导致重启,注释掉本行 -> 不重启
						}
			  
			  3. 开机后抓取mtklog中的kernel_log, 显示温度异常:125摄氏度(事实上问题没这么高)
				Line 18718: <7>[  306.833625]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				Line 18831: <7>[  307.833641]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				Line 18969: <7>[  308.833733]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				Line 19063: <7>[  309.833632]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				Line 19155: <7>[  310.834030]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				Line 19218: <7>[  311.833597]  (0)[57:kworker/0:1][name:mtk_ts_bts&][Power/BTS_Thermal] T_AP=125000
				 
			  4. 怀疑是温度检测NTC电阻异常,经检查,NTC电阻短路(焊锡过多)
	处理方案: 
	总结	: 主板上有多个NTC电阻检测温度(pa、电池等),boot启动会检测温度,过高则reboot


案例五	: 开机重启 - 温度检测
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 抓取串口log:
				[    4.864498].(4)[68:bat_thread_kthr][Power/BatMeter] [force_get_tbat] 0,108,0,0,0,60  
				[    4.865459].(4)[68:bat_thread_kthr][Power/BatMeter] [oam_run_inf] 4045, 4045, 4010, 2592, 2592, 135, 135, 2, 2, 1782, 60, 16  
				[    4.866864].(4)[68:bat_thread_kthr][Power/BatMeter] [oam_result_inf] 16, 16, 16, 16, 16, 0  
				[    4.867902].(4)[68:bat_thread_kthr][Power/Battery] AvgVbat=(4010),bat_vol=(4010),AvgI=(0),I=(0),VChr=(359),AvgT=(60),T=(60),pre_SOC=(84),SOC=(84),ZCV=(4044)  
				[    4.869657].(4)[68:bat_thread_kthr][Power/Battery] [Battery] Tbat(60)>=60, system need power down.  
				[    4.871358].(4)[68:bat_thread_kthr][Power/Battery] charging_set_power_off=0  
				[    4.872229].(4)[68:bat_thread_kthr]mt_power_off  
				
			  2. 根据log,关灯温度过高保护,还是无法开机
				kernel-3.18/drivers/power/mediatek/battery_common.c
				-   if(BMT_status.temperature >= 60)
				+   if(0)
			  
			  3. 再次抓取串口log:
				[    7.837171].(7)[191:thermal_manager]Power/battery_Thermal: reset, reset, reset!!!@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@*****************************************@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  
				[    7.839444]-(7)[191:thermal_manager]------------[ cut here ]------------  
				[    7.840299]-(7)[191:thermal_manager]kernel BUG at kernel/drivers/thermal/mtk_ts_battery.c:394!  
				[    7.842208]-(7)[191:thermal_manager]Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM  
				 
			  4. 临时关闭检测,可以开机
				kernel-3.18/drivers/misc/mediatek/thermal/common/thermal_zones/mtk_ts_battery.c
					int  mtktsbattery_register_cooler(void)  
					{  
						/* cooling devices */  
					-	cl_dev_sysrst = mtk_thermal_cooling_device_register("mtktsbattery-sysrst", NULL,  
					-	&mtktsbattery_cooling_sysrst_ops);  
						return 0;  
					} 
	处理方案: 
	总结	: 


案例六	: 开机休眠后重启 - mxc400x_resume()死锁引起 - 未深究
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 查看mtklog中的last_kmsg,可知由mxc400x_resume()函数中的死锁导致
				[  194.487798] -(0)[719:system_server]PC is at __list_add+0x40/0xe0
				[  194.488102] -(0)[719:system_server]LR is at mutex_lock_nested+0x1ac/0x678
				[  194.639589] Backtrace: 
				[  194.640103] -(0)[719:system_server][] (__list_add) from [] (mutex_lock_nested+0x1ac/0x678)
				[  194.640325]  r7:00000000 r6:c17b7668 r5:60070013 r4:c17b7664
				[  194.641259] -(0)[719:system_server][] (mutex_lock_nested) from [] (mxc400x_resume+0x28/0x64)
				[  194.641488]  r10:c1126378 r9:c0dda8d7 r8:00000010 r7:c0773000
				[  194.642407] -(0)[719:system_server][] (mxc400x_resume) from [] (i2c_legacy_resume+0x38/0x44)
				[  194.642638]  r5:c11b4164 r4:00000001
				[  194.643273] -(0)[719:system_server][] (i2c_legacy_resume) from [] (i2c_device_pm_resume+0x38/0x3c)
				[  194.643664] -(0)[719:system_server][] (i2c_device_pm_resume) from [] (dpm_run_callback+0x120/0x234)
				[  194.644051] -(0)[719:system_server][] (dpm_run_callback) from [] (device_resume+0xb4/0x198)
				
			  2. 注释掉mxc400x_resume()中的锁-->不重启
				static int mxc400x_resume(struct i2c_client *client)
				{
					struct mxc400x_i2c_data *obj = i2c_get_clientdata(client);
					int err = 0;
					if(obj == NULL)
					{
						GSE_ERR("null mxc400x!!\n");
						return -EINVAL;
					}
				-	mutex_lock(&mxc400x_mutex);
					err = mxc400x_init_client(client, 0);
					if(err)
					{
						GSE_ERR("initialize client fail!!\n");
				-		mutex_unlock(&mxc400x_mutex);
						return -EINVAL;
					}
					atomic_set(&obj->suspend, 0);
				-	mutex_unlock(&mxc400x_mutex);
					return err;
				}
			  
	处理方案: 
	总结	: 


案例七	: dtsi里缺少node,在preloader阶段重启 - 添加node
	现象	: 屏不亮
	平台	: androidO,MTK6737
	排查过程: 1. 打印串口log:
				[175] Copy DTB from 0x41f07086 to 0x4e000000(size: 0xb442)
				[176] [LK] fdt setup addr:0x4e000000 status:1!!!
				[176] [partition_get_index]find odmdtbo index 20
				[178] Multiple ODM DTBO.
				[178] ODM mdtbo_index: 0, dtbo_offset: 1024, dtbo_size: 48768
				[179] [partition_get_index]find odmdtbo index 20
				ata start bit at rising edge
				[28] [SD0]e80
				ERROR: ufdt_overlay_do_fixups():Couldn't find 'strobe' symbol in main dtb	// 显示缺少node"strobe"
				ERROR: ufdt_overlay_apply():failed to perform fixups in overlay
				[189] ufdt_apply_overlay() failed!
				[189]  app/mt_boot/mt_boot.c:line 407 0
			  
				==> 在mt6735m.dts中添加node
				
			  2.还是重启,再打log:
				[176] [LK] fdt setup addr:0x4e000000 status:1!!!
				[176] [partition_get_index]find odmdtbo index 20
				[178] Multiple ODM DTBO.
				[179] ODM mdtbo_index: 0, dtbo_offset: 1024, dtbo_size: 48768
				[179] [partition_get_index]find odmdtbo index 20
				[182] blob_len: 0x80000, overlay_len: 0xbe80
				ERROR: ufdt_overlay_do_fixups():Couldn't find 'leds' symbol in main dtb		// 显示缺少node"leds"
				ERROR: ufdt_overlay_apply():failed to perform fixups in overlay
				[189] ufdt_apply_overlay() failed!
				[190]  app/mt_boot/mt_boot.c:line 407 0
			  
				==> 在mt6735m.dts中添加node
				
			  3. 不再重启
				 
			  4. 分析:由于在dts中引用了节点,追加属性,以达到dts控制gpio口,preloader阶段解析dtb的时候,就会出错:
				  &strobe {
						pinctrl-names = "default", "main_strobe_oh", "main_strobe_ol", "main_strobe_flash_oh", "main_strobe_flash_ol", "sub_strobe_oh", "sub_strobe_ol", "psel_pinctrl_oh", "psel_pinctrl_ol", "charger_enable_pinctrl_oh", "charger_enable_pinctrl_ol";
						pinctrl-0 = <&strobe_intpin_default>;
						pinctrl-1 = <&main_strobe_oh>;
						pinctrl-2 = <&main_strobe_ol>;
						pinctrl-3 = <&main_strobe_flash_oh>;
						
			  5. 注:电池电量不足(电压过低),也会导致无法开机:
				[911] mtk detect 3130 
				[985] [AUXADC] ch=0 raw=18999 data=3130 
				[985] [mt65xx_bat_init] check VBAT=3130 mV with 3450 mV
				[986] [BATTERY] battery voltage(3130mV) <= CLV ! Can not Boot Linux Kernel !! 

案例八	: 
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 
			  2. 
			  
			  3. 
				 
			  4. 
	处理方案: 
	总结	: 




案例九	: 
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 
			  2. 
			  
			  3. 
				 
			  4. 
	处理方案: 
	总结	: 




案例七	: 
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 
			  2. 
			  
			  3. 
				 
			  4. 
	处理方案: 
	总结	: 




案例七	: 
	现象	: 
	平台	: androidN,MTK6737
	排查过程: 1. 
			  2. 
			  
			  3. 
				 
			  4. 
	处理方案: 
	总结	: 





你可能感兴趣的:(mtk驱动)