1、问题描述
长时间monkey测试发生android重启。
2、问题分析过程
【初步分析】
首先确定system_server重启原因,从android log看,系统没有发生native/java crash、anr、surfaceflinger等重要服务发生重启等异常,从kernel log看,发现system_server有被kill。原因是out of memory。Log如下:
418[01-01 21:46:38.283] <3>[49366.984650] c4 5184 Out of memory: Kill process 4288 (system_server) score 0 or sacrifice child
419[01-01 21:46:38.283] <3>[49366.985612] c4 5184 Killed process 4288 (system_server) total-vm:4572168kB, anon-rss:0kB, file-rss:0kB
上面log可以确认两件事情,一是system_server是被kernel的oom kill的,说明系统内存不足很严重,有较为严重的内存泄露,kernel无法回收到更多内存用于正常流程执行,开始杀用户态占用内存较多进程。二是这里杀system_server并不能说明一定是system_server有内存泄露,只能怀疑是存在泄露。因为kernel oom kill机制中计算用户态被杀进程是通过oom_badness函数中对进程进行打分来确定要杀谁,而这里优先杀用户态进程,所以如果内核进程发生内存泄露,导致内存大量消耗也会优先查杀用户态进程。通过kernel log和meminfo,slab info等信息,我们初步判断不是kernel有内存泄露,这里system_server有内存泄露,是java heap有泄露还是native heap泄露,就需要继续分析。通过脚本进行测试,抓取showmap信息分析。log如下
virtual shared shared private private
size RSS PSS clean dirty clean dirty swap swapPSS # object
36864 21016 19097 0 1980 0 19036 0 0 10 [anon:libc_malloc]
126976 17996 17451 80 504 236 17176 13892 12632 19 [anon:libc_malloc]
151552 30800 30239 88 504 384 29824 18696 17433 20 [anon:libc_malloc]
159744 18908 18384 264 296 652 17696 17608 16325 20 [anon:libc_malloc]
159744 20560 20045 500 56 192 29812 25148 23853 20 [anon:libc_malloc]
159744 26996 26507 492 44 684 25776 29344 28053 20 [anon:libc_malloc]
159744 10940 10458 532 16 328 10064 47444 46171 20 [anon:libc_malloc]
167936 18946 28507 52 4 12 3680 61112 59448 20 [anon:libc_malloc]
从上面信息可以看出在8个小时的过程中system_server的内存占用持续上升,说明是native heap泄露,接下来需要查找泄露点。
方法是使用系统自带的malloc_debug来查找,当然分析native 层内存泄露还有其他工具可以使用。
【开malloc_debug】
$adb root
$adb remount
$adb shell
#cd system/
#busybox vi build.prop //在文件最后面添加下面两个prop
libc.debug.malloc.program=system
libc.debug.malloc.options= backtrace=64
:wq //保存
#reboot //重启手机
这里重启手机后开机会非常慢,是因为开机会对/system/bin目录下的程序都要加载libc_malloc_debug.so库对malloc进行监控,注册相应的singal处理函数,初始化相关debug参数和变量。
测试跑起来比正常测试时慢很多,而且会有anr和watchdog导致测试停止,所以这里需要将anr和watchdog关闭,或者屏蔽掉signal3的处理流程,因为kill -3 会输出栈信息,栈打印会非常消耗内存,导致测试结果误差较大。
修改测试脚本通过下面命令每隔1小时抓取一份system_server的dumpheap
am dumpheap –n system_server的pid >/data/anr/dumpheapdir/dumpheap.log
测试8小时后我们看抓到的log,对比分析如下:
刚开机时调用allocateHeapBitmap相关申请内存是12M,log信息输出如下
12688 0.35% 30 system@[email protected] ??? ???
11656 0.32% 2 libandroid_runtime.so Bitmap_createFromParcel(_JNIEnv*, _jobject*, _jobject*) frameworks/base/core/jni/android/graphics/Bitmap.cpp:1111
11520 0.32% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:80
136 0.00% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:84
136 0.00% 1 libc++.so operator new(unsigned long) external/libcxxabi/src/cxa_new_delete.cpp:46
3个小时之后,allocateHeapBitmap相关内存申请增长到46%
17792566 100.00% 27401 app
8294600 46.62% 3 system@[email protected] ??? ???
8294536 46.62% 2 libandroid_runtime.so Bitmap_copy(_JNIEnv*, _jobject*, long, int, unsigned char) frameworks/base/core/jni/android/graphics/Bitmap.cpp:806
8294536 46.62% 2 libandroid_runtime.so bitmapCopyTo(SkBitmap*, SkColorType, SkBitmap const&, SkBitmap::Allocator*) frameworks/base/core/jni/android/graphics/Bitmap.cpp:732
8294536 46.62% 2 libskia.so SkBitmap::tryAllocPixels(SkBitmap::Allocator*) external/skia/src/core/SkBitmap.cpp:239
8294536 46.62% 2 libandroid_runtime.so HeapAllocator::allocPixelRef(SkBitmap*) frameworks/base/core/jni/android/graphics/Graphics.cpp:625
8294400 46.62% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:80
136 0.00% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:84
64 0.00% 1 libandroid_runtime.so android::bitmap::createBitmap(_JNIEnv*, android::Bitmap*, int, _jbyteArray*, _jobject*, int) frameworks/base/core/jni/android/graphics/Bitmap.cpp:207
64 0.00% 1 libc++.so operator new(unsigned long) external/libcxxabi/src/cxa_new_delete.cpp:46
6个小时后,内存增长到49%,内存累计申请8294536大
16745717 100.00% 27428 app
8294600 49.53% 3 system@[email protected] ??? ???
8294536 49.53% 2 libandroid_runtime.so Bitmap_copy(_JNIEnv*, _jobject*, long, int, unsigned char) frameworks/base/core/jni/android/graphics/Bitmap.cpp:806
8294536 49.53% 2 libandroid_runtime.so bitmapCopyTo(SkBitmap*, SkColorType, SkBitmap const&, SkBitmap::Allocator*) frameworks/base/core/jni/android/graphics/Bitmap.cpp:732
8294536 49.53% 2 libskia.so SkBitmap::tryAllocPixels(SkBitmap::Allocator*) external/skia/src/core/SkBitmap.cpp:239
8294536 49.53% 2 libandroid_runtime.so HeapAllocator::allocPixelRef(SkBitmap*) frameworks/base/core/jni/android/graphics/Graphics.cpp:625
8294400 49.53% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:80
136 0.00% 1 libhwui.so android::allocateHeapBitmap(unsigned long, SkImageInfo const&, unsigned long) frameworks/base/libs/hwui/hwui/Bitmap.cpp:84
136 0.00% 1 libc++.so operator new(unsigned long) external/libcxxabi/src/cxa_new_delete.cpp:46
其余几台机器测试结果看也都是java层通过jni调用了Bitmap_copy,导致内存增长较快。确定这里可能存在内存泄漏,需要进一步确定java代码是如何调用。这里在framework接口代码处添加栈打印,确定哪里调用到jni的。添加代码如下:
frameworks/base/graphics/java/android/graphics/Bitmap.java
636 public Bitmap copy(Config config, boolean isMutable)
中加调用栈,
public Bitmap copy(Config config, boolean isMutable) {
checkRecycled("Can't copy a recycled bitmap");
++ Log.d("DEBUG","copy stack" + Log.getStackTraceString(new Throwable()));
输出调用栈如下:
//进入录像界面
01-01 08:16:02.050 4268 4268 D DEBUG : copy stackjava.lang.Throwable
01-01 08:16:02.050 4268 4268 D DEBUG : at android.graphics.Bitmap.copy(Bitmap.java:638)
01-01 08:16:02.050 4268 4268 D DEBUG : at com.android.camera.app.CameraAppUI$2.getScreenShot(CameraAppUI.java:613)
01-01 08:16:02.050 4268 4268 D DEBUG : at com.android.camera.app.CameraAppUI.freezeScreen(CameraAppUI.java:986)
01-01 08:16:02.050 4268 4268 D DEBUG : at com.android.camera.app.CameraAppUI.freezeScreenUntilPreviewReady(CameraAppUI.java:933)
01-01 08:16:02.050 4268 4268 D DEBUG : at com.android.camera.CameraActivity.freezeScreenCommon(CameraActivity.java:3478)
01-01 08:16:02.050 4268 4268 D DEBUG : at com.android.camera.CameraActivity.switchMode(CameraActivity.java:4264)
01-01 08:16:02.050 4268 4268 D DEBUG : at java.lang.reflect.Method.invoke(Native Method)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.view.View$DeclaredOnClickListener.onClick(View.java:5381)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.view.View.performClick(View.java:6306)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.view.View$PerformClick.run(View.java:24813)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.os.Handler.handleCallback(Handler.java:790)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.os.Handler.dispatchMessage(Handler.java:99)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.os.Looper.loop(Looper.java:164)
01-01 08:16:02.050 4268 4268 D DEBUG : at android.app.ActivityThread.main(ActivityThread.java:6719)
//还有和图片编辑相关的两个调用栈
1)
01-01 08:09:39.689 3354 3354 D DEBUG : copy stackjava.lang.Throwable
01-01 08:09:39.689 3354 3354 D DEBUG : at android.graphics.Bitmap.copy(Bitmap.java:638)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.imageshow.MasterImage.getTemporaryThumbnailBitmap(MasterImage.java:888)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.category.Action.setImageFrame(Action.java:118)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.FilterShowActivity.loadActions(FilterShowActivity.java:592)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.FilterShowActivity.-wrap3(Unknown Source:0)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.FilterShowActivity$LoadBitmapTask.onPostExecute(FilterShowActivity.java:1130)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.gallery3d.filtershow.FilterShowActivity$LoadBitmapTask.onPostExecute(FilterShowActivity.java:1083)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.os.AsyncTask.finish(AsyncTask.java:695)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.os.AsyncTask.-wrap1(Unknown Source:0)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.os.AsyncTask$InternalHandler.handleMessage(AsyncTask.java:712)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.os.Handler.dispatchMessage(Handler.java:106)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.os.Looper.loop(Looper.java:164)
01-01 08:09:39.689 3354 3354 D DEBUG : at android.app.ActivityThread.main(ActivityThread.java:6719)
01-01 08:09:39.689 3354 3354 D DEBUG : at java.lang.reflect.Method.invoke(Native Method)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:455)
01-01 08:09:39.689 3354 3354 D DEBUG : at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:807)
2)
01-01 08:12:11.919 3354 4519 D DEBUG : copy stackjava.lang.Throwable
01-01 08:12:11.919 3354 4519 D DEBUG : at android.graphics.Bitmap.copy(Bitmap.java:638)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.filters.ImageFilterDehaze.apply(ImageFilterDehaze.java:67)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.FilterEnvironment.applyRepresentation(FilterEnvironment.java:136)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.ImagePreset.applyFilters(ImagePreset.java:526)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.ImagePreset.apply(ImagePreset.java:446)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.CachingPipeline.render(CachingPipeline.java:379)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.RenderingRequestTask.doInBackground(RenderingRequestTask.java:72)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.ProcessingTask.processRequest(ProcessingTask.java:61)
01-01 08:12:11.919 3354 4519 D DEBUG : at com.android.gallery3d.filtershow.pipeline.ProcessingTaskController.handleMessage(ProcessingTaskController.java:79)
01-01 08:12:11.919 3354 4519 D DEBUG : at android.os.Handler.dispatchMessage(Handler.java:102)
01-01 08:12:11.919 3354 4519 D DEBUG : at android.os.Looper.loop(Looper.java:164)
01-01 08:12:11.919 3354 4519 D DEBUG : at android.os.HandlerThread.run(HandlerThread.java:65)
通过上述调用栈分析代码,图库中调用bitmap_copy的对象有调用recycle进行释放,而camera摄像时没有释放对应的bitmap_copy时new的对象。
至此system_server中的内存泄露问题查明是camera模块调用jni中的bitmap_copy时没有释放相关资源导致system_server存在内存泄露,与模块沟通进行修改,修改后通过monkey camera单包测试+memory_leak.sh脚本监控,未发现还有bitmap_copy相关的内存泄漏。