Two more complicated dead-lock issues of android app

因为涉及到一些QC代码, 所以不粘贴代码了, 仅记录思路.

[第一个内存dump分析]
From Android system log
   WARN [  11903.510399] (598:616) BroadcastQueue  Timeout of broadcast BroadcastRecord{41d635a0 android.intent.action.SCREEN_ON} - receiver=android.app.LoadedApk$ReceiverDispatcher$InnerReceiver@41695c58, started 10007ms ago
PROBABLE CAUSE OF PROBLEM:
Timeout

Receiver: android.app.LoadedApk$ReceiverDispatcher$InnerReceiver
.... Generating Dalvik backtraces. This might take some time ....
Receiver: might be pid 615

***** Dalvik stack for pid 615 *****
#0  android.media.AudioSystem.setParameters (Native Method)
#1  android.media.AudioService$AudioServiceBroadcastReceiver.onReceive (AudioService.java:3779)
#2  android.app.LoadedApk$ReceiverDispatcher$Args.run (LoadedApk.java:765)
#3  android.os.Handler.handleCallback (Handler.java:615)
#4  android.os.Handler.dispatchMessage (Handler.java:94)
#5  android.os.Looper.loop (Looper.java:147)
#6  com.android.server.ServerThread.run (SystemServer.java:219)
-- Break frame --

可以看出是SystemServer.Handler.handleCallback派送消息到处理节点上不能及时处理完. 这样SoftwareWatchdog发出到该消息队列的MONITOR消息就不能得到处理, SWWD就killSystem了.

Examining the user stack of the thread, find In system_server.ServerThread main messagequeue thread, AudioService$AudioServiceBroadcastReceiver.onReceive() calls AudioSystem.setParameters(...) to handle SCREEN_ON/SCREEN_OFF.

            } else if (action.equals(Intent.ACTION_SCREEN_ON)) {
                AudioSystem.setParameters("screen_state=on");
            } else if (action.equals(Intent.ACTION_SCREEN_OFF)) {
                AudioSystem.setParameters("screen_state=off");
            }

这么个消息派送到这儿来合理吗? 需要确认一下. 总消息泵!!

Examining the user stack, find ServerThread is in binder communication--talkWithDriver--ioctl.
Examining the kernel stack, find the thread is in binder_thread_read to wait for transaction reply.
Examining the binder transaction stack, the target binder_thread is 173, that is mediaserver as a binder worker thread.

Threads info of mediaserver.
PID: 173    TASK: ee6cc700  CPU: 0   COMMAND: "mediaserver"
  PID: 524    TASK: ed94ea00  CPU: 0   COMMAND: "AudioCommand"
  PID: 525    TASK: ed94dc00  CPU: 0   COMMAND: "ApmCommand"
  PID: 526    TASK: ed94fb80  CPU: 0   COMMAND: "mediaserver"
  PID: 527    TASK: ed94e300  CPU: 0   COMMAND: "FastMixer"
  PID: 638    TASK: edad5880  CPU: 0   COMMAND: "AudioOut_2"
  PID: 640    TASK: edad4e00  CPU: 0   COMMAND: "Binder_1"
  PID: 1026   TASK: d39f8a80  CPU: 0   COMMAND: "Binder_2"
  PID: 2980   TASK: e78b7800  CPU: 0   COMMAND: "Binder_3"
  PID: 3098   TASK: e8b64e00  CPU: 0   COMMAND: "Binder_4"
  PID: 7491   TASK: d259e680  CPU: 0   COMMAND: "Binder_5"

mediaserver is handling "screen_state=off" in AudioFlinger::setParameters(...) and is waiting on the lockMutex::Autolock _l(mLock);
call stack of thread 173.
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/futex_arm.S:59
#1  0x4021e1d0 in _normal_lock (shared=0, mutex=0x410707b0) at bionic/libc/bionic/pthread.c:1067
#2  pthread_mutex_lock_impl (mutex=0x410707b0) at bionic/libc/bionic/pthread.c:1189
#3  0x401e8816 in android::Mutex::lock (this=<optimized out>) at frameworks/native/include/utils/Mutex.h:112
#4  0x401f78cc in Autolock (mutex=..., this=<synthetic pointer>) at frameworks/native/include/utils/Mutex.h:65
#5  android::AudioFlinger::setParameters (this=0x410707a0, ioHandle=0, keyValuePairs=...) at frameworks/av/services/audioflinger/AudioFlinger.cpp:1118
#6  0x40361858 in android::BnAudioFlinger::onTransact (this=0x410707a0, code=<optimized out>, data=..., reply=0xbed92b94, flags=16) at frameworks/av/media/libmedia/IAudioFlinger.cpp:899
#7  0x402f4392 in android::BBinder::transact (this=0x410707a4, code=19, data=..., reply=0xbed92b94, flags=16) at frameworks/native/libs/binder/Binder.cpp:108
#8  0x402f6f16 in android::IPCThreadState::executeCommand (this=0x41070468, cmd=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:1034
#9  0x402f7340 in android::IPCThreadState::joinThreadPool (this=0x41070468, isMain=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:473
#10 0x400ebc3e in main (argc=<optimized out>, argv=<optimized out>) at frameworks/av/media/mediaserver/main_mediaserver.cpp:67

Naitive的代码中哪个线程占有pthread_mutex是没有好办法找的, 只能浏览代码. 上面线程在在哪个类的方法中等待锁, 就在进程组内所有线程的用户态调用栈中检查哪个线程也执行在该类的方法中. 如果线程太多, 可以先把在等待同一个互斥量的线程使用futex哈希表排除掉. 如上面线程173在AudioFlinger方法中等, 那么就找执行在AudioFlinger方法中的线程, 可以找到是1026占有互斥量.

The AudioFlinger::mLock(at 0x410707b0) is held by thread 1026 "Binder_2", which is in AudioFlinger::createTrack(...).
Thread 3098 and thread 2980 are also waiting on the same lock.
call stack of thread 1026.
#0  __futex_syscall3 () at bionic/libc/arch-arm/bionic/futex_arm.S:59
#1  0x4021e1d0 in _normal_lock (shared=0, mutex=0x40e4c02c) at bionic/libc/bionic/pthread.c:1067
#2  pthread_mutex_lock_impl (mutex=0x40e4c02c) at bionic/libc/bionic/pthread.c:1189
#3  0x401e8816 in android::Mutex::lock (this=<optimized out>) at frameworks/native/include/utils/Mutex.h:112
#4  0x401f160e in Autolock (mutex=..., this=<synthetic pointer>) at frameworks/native/include/utils/Mutex.h:65
#5  android::AudioFlinger::PlaybackThread::createTrack_l (this=0x40e4c008, client=..., streamType=AUDIO_STREAM_SYSTEM, sampleRate=48000, format=AUDIO_FORMAT_PCM_16_BIT, channelMask=1, frameCount=4096,
    sharedBuffer=..., sessionId=206, flags=2, tid=13284, status=0x424a2c00) at frameworks/av/services/audioflinger/AudioFlinger.cpp:2122
#6  0x401f5a54 in android::AudioFlinger::createTrack (this=0x410707a0, pid=<optimized out>, streamType=AUDIO_STREAM_SYSTEM, sampleRate=48000, format=AUDIO_FORMAT_PCM_16_BIT, channelMask=1, frameCount=4096,
    flags=2, sharedBuffer=..., output=2, tid=13284, sessionId=0x424a2ca4, status=0x424a2ca8) at frameworks/av/services/audioflinger/AudioFlinger.cpp:524
#7  0x403614da in android::BnAudioFlinger::onTransact (this=0x410707a0, code=<optimized out>, data=..., reply=0x424a2e04, flags=16) at frameworks/av/media/libmedia/IAudioFlinger.cpp:764
#8  0x402f4392 in android::BBinder::transact (this=0x410707a4, code=1, data=..., reply=0x424a2e04, flags=16) at frameworks/native/libs/binder/Binder.cpp:108
#9  0x402f6f16 in android::IPCThreadState::executeCommand (this=0x41208618, cmd=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:1034
#10 0x402f7340 in android::IPCThreadState::joinThreadPool (this=0x41208618, isMain=<optimized out>) at frameworks/native/libs/binder/IPCThreadState.cpp:473
#11 0x402faf58 in android::PoolThread::threadLoop (this=0x4107f088) at frameworks/native/libs/binder/ProcessState.cpp:67
#12 0x40314e38 in android::Thread::_threadLoop (user=0x4107f088) at frameworks/native/libs/utils/Threads.cpp:793
#13 0x4031499e in thread_data_t::trampoline (t=<optimized out>) at frameworks/native/libs/utils/Threads.cpp:132
#14 0x4021ee74 in __thread_entry (func=0x40314905 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x41207a40, tls=0x424a2f00) at bionic/libc/bionic/pthread.c:217
#15 0x4021e5cc in pthread_create (thread_out=0x4107ee88, attr=0xbed92aa8, start_routine=0x40314905 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x41207a40) at bionic/libc/bionic/pthread.c:356
#16 0x00000000 in ?? ()

Thread 1026 is waiting for ThreadBase::mLock(at 0x40e4c02c)held by thread 638 which is in thread body AudioFlinger::PlaybackThread::threadLoop(...)
call stack of thread 638, "AudioOut_2"
#0  close () at bionic/libc/arch-arm/syscalls/close.S:10
#1  0x40068d30 in pcm_close (pcm=0x41208860) at hardware/qcom/audio/libalsa-intf/alsa_pcm.c:756
#2  0x40e25a68 in android_audio_legacy::ALSADevice::standby (this=0x41072008, handle=0x4107c0d8) at hardware/qcom/audio/alsa_sound/ALSADevice.cpp:1107
#3  0x40e1c8b4 in android_audio_legacy::AudioStreamOutALSA::standby (this=0x4107c610) at hardware/qcom/audio/alsa_sound/AudioStreamOutALSA.cpp:373
#4  0x40e1f884 in android_audio_legacy::out_standby (stream=<optimized out>) at hardware/qcom/audio/alsa_sound/audio_hw_hal.cpp:110
#5  0x401e7100 in android::AudioFlinger::PlaybackThread::threadLoop_standby (this=<optimized out>) at frameworks/av/services/audioflinger/AudioFlinger.cpp:3182
#6  0x401f1dac in android::AudioFlinger::PlaybackThread::threadLoop (this=0x40e4c008) at frameworks/av/services/audioflinger/AudioFlinger.cpp:2918
#7  0x40314e38 in android::Thread::_threadLoop (user=0x40e4c008) at frameworks/native/libs/utils/Threads.cpp:793
#8  0x4031499e in thread_data_t::trampoline (t=<optimized out>) at frameworks/native/libs/utils/Threads.cpp:132
#9  0x4021ee74 in __thread_entry (func=0x40314905 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x40e547b0, tls=0x41ca2f00) at bionic/libc/bionic/pthread.c:217
#10 0x4021e5cc in pthread_create (thread_out=0x40e547d8, attr=0xbed929b8, start_routine=0x40314905 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x40e547b0) at bionic/libc/bionic/pthread.c:356
#11 0x00000000 in ?? ()
kernel stack of thread 638,
PID: 638    TASK: edad5880  CPU: 0   COMMAND: "AudioOut_2"
 #0 [<c0774fc8>] (__schedule) from [<c077546c>]
 #1 [<c077546c>] (schedule_preempt_disabled) from [<c0774254>]
 #2 [<c0774254>] (__mutex_lock_slowpath) from [<c07743f4>]
 #3 [<c07743f4>] (mutex_lock) from [<c05b10b0>]
 #4 [<c05b10b0>] (sitar_hph_pa_event) from [<c05a38cc>]        <==******
 #5 [<c05a38cc>] (dapm_seq_check_event) from [<c05a4638>]
 #6 [<c05a4638>] (dapm_seq_run_coalesced) from [<c05a4f4c>]
 #7 [<c05a4f4c>] (dapm_seq_run.isra.7) from [<c05a57d8>]
 #8 [<c05a57d8>] (dapm_power_widgets) from [<c05a5a80>]
 #9 [<c05a5a80>] (soc_dapm_stream_event) from [<c05a759c>]
#10 [<c05a759c>] (snd_soc_dapm_stream_event) from [<c05a8e84>]
#11 [<c05a8e84>] (soc_pcm_close) from [<c05a96b8>]
#12 [<c05a96b8>] (soc_dpcm_be_dai_shutdown) from [<c05aaf0c>]
#13 [<c05aaf0c>] (soc_dpcm_fe_dai_close) from [<c0588330>]
#14 [<c0588330>] (snd_pcm_release_substream) from [<c05883a8>]
#15 [<c05883a8>] (snd_pcm_release) from [<c01293dc>]
#16 [<c01293dc>] (fput) from [<c0125fa4>]
#17 [<c0125fa4>] (filp_close) from [<c0126078>]
#18 [<c0126078>] (sys_close) from [<c000dec0>]

Thread 638 is in sitar_hph_pa_event(..) and waiting on the kernel mutex at 0xeeb7054c and is scheduled out.
1635static int sitar_hph_pa_event(struct snd_soc_dapm_widget *w,
1636 struct snd_kcontrol *kcontrol, int event)
1637{
1638 struct snd_soc_codec *codec = w->codec;
1639 struct sitar_priv *sitar = snd_soc_codec_get_drvdata(codec);
1640 u8 mbhc_micb_ctl_val;
1641 pr_debug("%s: event = %d\n", __func__, event);
1642
1643 switch (event) {
  SITAR_ACQUIRE_LOCK(sitar->codec_resource_lock); 
 }
    }

内核态的mutex就好说点了, mutex有个DEBUG_选项时, 可以使用struct mutex::owner来找出占有互斥量的进程.

The mutex.owner is task_struct 0xeebbc000, which is kernel thread 114.
   PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
    114      2   0  eebbc000  UN   0.0       0      0  [irq/325-sitar-h]
kthread 114 is in irq_thread interrupt handling. From the kernel call stack, The execution is in snd_soc_jack_report(...)and trys to acquire codec mutex at 0xeeaf4e14.
62void snd_soc_jack_report(struct snd_soc_jack *jack, int status, int mask)
63{
64 struct snd_soc_codec *codec;

74
75 codec = jack->codec;
76 dapm =  &codec->dapm;
77
78 mutex_lock(&codec->mutex); 
79
80 oldstatus = jack->status;

109 snd_jack_report(jack->jack, jack->status);
110
111out:
112 mutex_unlock(&codec->mutex);
113}

The codec mutex at 0xeeaf4e14 is owned by task_struct at 0xedad5880, which is pid 638! Check the above kernel stack trace of thread 638 and find the codec mutex is acquired in  snd_soc_dapm_stream_event(...)
3091int snd_soc_dapm_stream_event(struct snd_soc_pcm_runtime *rtd,
3092 const char *stream, int event)
3093{
3094 struct snd_soc_codec *codec = rtd->codec;
3095
3096 if (stream == NULL)
3097  return 0;
3098
3099 mutex_lock(&codec->mutex);  
3100 soc_dapm_stream_event(&codec->dapm, stream, event);
3101 mutex_unlock(&codec->mutex);
3102
3103 return 0;
3104}

So, the root cause is mutexes dead-lock in kernel space between kernel context of pid 638 and irq/325-sitar-h.


[第二个内存dump分析]

同第一个类似, 虽然互斥占有链不同. 就不用记录代码了. 仅记流水账以加强一下理疗效果.

From Android system log
   WARN [   2212.508383] (594:614) BroadcastQueue  Timeout of broadcast BroadcastRecord{42df1ee8 com.android.internal.policy.impl.PhoneWindowManager.DELAYED_KEYGUARD} - receiver=android.app.LoadedApk$ReceiverDispatcher$InnerReceiver@42e43818, started 60004ms ago
PROBABLE CAUSE OF PROBLEM:
Timeout

Receiver: android.app.LoadedApk$ReceiverDispatcher$InnerReceiver
.... Generating Dalvik backtraces. This might take some time ....
Receiver: might be pid 613

***** Dalvik stack for pid 613 *****
#0  android.media.AudioSystem.setParameters (Native Method)
#1  android.media.AudioService$AudioServiceBroadcastReceiver.onReceive (AudioService.java:3779)
#2  android.app.LoadedApk$ReceiverDispatcher$Args.run (LoadedApk.java:765)
#3  android.os.Handler.handleCallback (Handler.java:615)
#4  android.os.Handler.dispatchMessage (Handler.java:94)
#5  android.os.Looper.loop (Looper.java:147)
#6  com.android.server.ServerThread.run (SystemServer.java:219)
-- Break frame --

This thread is probably doing a binder call as kernel backtrace includes binder_ioctl
Thread at the other end of the binder: 735

Same as first issue, Thread 735 is in the mediaserver Process.

PID: 171    TASK: ee67c700  CPU: 0   COMMAND: "mediaserver"
  PID: 499    TASK: ed98d180  CPU: 0   COMMAND: "AudioCommand"
  PID: 500    TASK: ed98e680  CPU: 1   COMMAND: "ApmCommand"
  PID: 513    TASK: edc96300  CPU: 0   COMMAND: "mediaserver"
  PID: 521    TASK: ed98ca80  CPU: 0   COMMAND: "FastMixer"
  PID: 621    TASK: eaa07800  CPU: 0   COMMAND: "AudioOut_2"
  PID: 622    TASK: eaa07b80  CPU: 0   COMMAND: "Binder_1"
  PID: 735    TASK: e10bd500  CPU: 0   COMMAND: "Binder_2"
  PID: 3183   TASK: e111bb80  CPU: 0   COMMAND: "Binder_3"
  PID: 5351   TASK: eebbdf80  CPU: 0   COMMAND: "Binder_4"

Binder_2 is in AudioFlinger::setParameters and is waiting on mutex (pthread_mutex_t *) = 0x411857d0; The mutex is AutoMutex lock(mHardwareLock) and is held by thread 500.
In Userspace, Thread 500 is in system call ioctl(...) in adev_set_fm_volume(..) in AudioFlinger:::setFmVolume(float value) holding the lock AutoMutex lock(mHardwareLock). And In Kernel Space, Thread 500 is waiting on kernel mutex 0xeeb70214 held by thread 521.
Checking kernel call stack of Thread 521, find that it is waiting on kernel mutex 0xeeb7054c held by thread 4538 which is a workqueue work kthread.
   PID    PPID  CPU   TASK    ST  %MEM     VSZ    RSS  COMM
   4538      2   0  e4a2aa00  UN   0.0       0      0  [kworker/0:0]

Checking the kworker's stack trace, find it is waiting on kernel mutex 0xeeaf4e14 held by task_struct 0xed98ca80, pid 521!!

So, The roor cause is kernel mutexes dead-lock.

Break or avoid the dead-lock to solve the issues. 

你可能感兴趣的:(Two more complicated dead-lock issues of android app)