Android8.1原生systemUI导致framwork全局符号表溢出问题

一、问题描述:

10台机器进行某项自动化测试,一轮5天,发现一台机器没有完成测试就停止了。

二、分析过程:

1. 拿到log,可以快速地定位到system_server发生了crash导致android层重启,且直接原因是全局引用表溢出,虚拟机dump信息如下:

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256] JNI ERROR (app bug): global reference table overflow (max=51200)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256] global reference table dump:

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]  Last 10 entries (of 51200):

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51199: 0x12e3ba60 com.android.server.content.ContentService$ObserverNode$ObserverEntry

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51198: 0x12d93760 com.android.server.am.ServiceRecord

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51197: 0x12d8fa20 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51196: 0x12e391b8 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51195:0x12c4db58 com.android.server.am.ServiceRecord

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51194: 0x12dc3e78 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51193: 0x12e3b560 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51192: 0x12e38718 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51191: 0x12dc3fc0 java.lang.ref.WeakReference (referent is a android.os.BinderProxy)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    51190: 0x12fc8b60 com.android.server.am.ServiceRecord

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]  Summary:

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    24615 of java.lang.ref.WeakReference (24615 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]    23622 of com.android.server.content.ContentService$ObserverNode$ObserverEntry (23622 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      758 of android.os.Binder (758 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      485 of com.android.server.notification.NotificationManagerService$StatusBarNotificationHolder (485 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      468 of com.android.server.am.ServiceRecord (468 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      319 of java.lang.Class (239 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      181 of java.nio.DirectByteBuffer (170 unique instances)

08-25 18:41:14.285  955  4704 F zygote64: indirect_reference_table.cc:256]      119 of android.os.RemoteCallbackList$Callback (119 unique instances)

由上面dump的关键信息可以得到,虚拟机全局引用表对象引用个数限制为51200个;最后在创建一个ObserverEntry对象;而引用数最多的两个对象是ObserverEntry和WeakReference(怀疑是binder实体的弱引用,ObserverEntry代表一个内容监听器在framwork的实例,是用来回调到app端的,所以ObserverEntry和WeakReference 数量接近)。这里只能从ObserverEntry对象入手继续分析,结合ContentService的实现,怀疑有应用反复注册了内容监听器而没有在适当的时候释放。需要现场来进行定位确认,但是复现问题的机器现场已经丢失,只能拿做完测试但未复现问题、且没重启过的机器来做现场调试。

2.拿到一台测试后没有重启的机器。既然ObserverEntry的对象这么多,那么就dumpsys content看看ContentService的信息,想进一步了解其内部信息可以看看ContentService的registerContentObserver、unregisterContentObserver和dump函数的逻辑。这里不详细说,直接看结果。这里有大量的监听器,且注册者都是pid=1266的进程。通过ps命令查看或者log,发现pid 1266是SystemUI。

……

settings/global/always_on_display_constants:pid=1266 uid=10044 user=-1 target=17dc217

settings/global/always_on_display_constants:pid=1266 uid=10044 user=-1 target=878d604

……

pid 1266: 10532 observers

……

3. 现在我们基本确定是SystemUI反复注册了always_on_display_constants的监听器而没有释放,通过监听器内容搜索SystemUI的代码,找到其中注册always_on_display_constants监听器的逻辑,发现在启动DozeService的时候总共会注册三个always_on_display_constants的监听器,但是没有看到有注销的地方。这应该就是ObserverEntry对象泄露的原因。

4.那DozeService什么时候启动,又是什么时候退出呢?再看看log,有大量的启动停止DozeService的log记录,启动记录7609条,每启动退出一次泄露3个,在能看到的log记录总共就泄露22827个。非常接近前面虚拟机dump出来的引用表信息ObserverEntry的数量。

08-21 05:41:56.013 955 4704 I PowerManagerService: Going to sleep due to power button (uid 1000)...

......

08-21 05:41:56.713  955  974 I DreamController: Starting dream: name=ComponentInfo{com.android.systemui/com.android.systemui.doze.DozeService}, isTest=false, canDoze=true, userId=0

......

08-21 05:42:01.530  955  1520 I PowerManagerService: Waking up from dozing (uid=1000 reason=android.policy:POWER)...

......

08-21 05:42:01.829  955  974 I DreamController: Stopping dream: name=ComponentInfo{com.android.systemui/com.android.systemui.doze.DozeService}, isTest=false, canDoze=true, userId=0

自动化测试通常都比较耗时,问题不好验证,最好是找到复现问题的步骤。这个问题也比较明显,DozeService会在灭屏的时候启动,亮屏的时候退出,这是framework的逻辑,这里不详细分析。我们通过亮灭屏的动作和dumpsys content看看我们前面的猜测是不是对的。灭屏后亮屏可以看到observer的对象增加3,而增加的3个都是always_on_display_constants

4. 和测试的同事确认这个自动化测试用例中是否有亮灭屏的动作,得到了肯定的答复,是客户要求的测试用例!这样我们就有了复现问题的方法,后续修改验证也就方便了许多。

三、修改方案

diff --git a/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java b/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java

index debda21..9b69031 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/AlwaysOnDisplayPolicy.java

@@ -102,6 +102,14 @@

        mSettingsObserver.observe();

    }

+    public void clear() {

+        if (mSettingsObserver != null) {

+            //mSettingsObserver.clearObserve();

+            ContentResolver resolver = mContext.getContentResolver();

+            resolver.unregisterContentObserver(mSettingsObserver);

+        }

+    }

+

    private int[] parseIntArray(final String key, final int[] defaultArray) {

        final String value = mParser.getString(key, null);

        if (value != null) {

@@ -130,7 +138,12 @@

                    false, this, UserHandle.USER_ALL);

            update(null);

        }

-

+/*

+        void clearObserve() {

+            ContentResolver resolver = mContext.getContentResolver();

+            resolver.registerContentObserver(this);

+        }

+*/

        @Override

        public void onChange(boolean selfChange, Uri uri) {

            update(uri);

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java b/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java

index 8ec6afc..7d3dc3f 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeMachine.java

@@ -129,6 +129,13 @@

        mParts = parts;

    }

+    /** clear some reference stored in framework-system_server */

+    public void clear() {

+        for (Part p : mParts) {

+            p.clear();

+        }

+    }

+

    /**

      * Requests transitioning to {@code requestedState}.

      *

@@ -348,6 +355,8 @@

          */

        void transitionTo(State oldState, State newState);

+        default void clear() {}

+

        /** Dump current state. For debugging only. */

        default void dump(PrintWriter pw) {}

    }

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java b/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java

index 58f1448..f7f49225 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozePauser.java

@@ -50,6 +50,13 @@

        }

    }

+    @Override

+    public void clear() {

+        if (mPolicy != null) {

+            mPolicy.clear();

+        }

+    }

+

    private void onTimeout() {

        mMachine.requestState(DozeMachine.State.DOZE_AOD_PAUSED);

    }

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java b/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java

index 4bb4e79..f31d3c6 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeScreenBrightness.java

@@ -38,6 +38,7 @@

    private final Sensor mLightSensor;

    private final int[] mSensorToBrightness;

    private final int[] mSensorToScrimOpacity;

+    private final AlwaysOnDisplayPolicy mPolicy;

    private boolean mRegistered;

    private int mDefaultDozeBrightness;

@@ -58,16 +59,31 @@

        mDefaultDozeBrightness = defaultDozeBrightness;

        mSensorToBrightness = sensorToBrightness;

        mSensorToScrimOpacity = sensorToScrimOpacity;

+

+        mPolicy = null;

    }

    @VisibleForTesting

    public DozeScreenBrightness(Context context, DozeMachine.Service service,

            SensorManager sensorManager, Sensor lightSensor, DozeHost host,

            Handler handler, AlwaysOnDisplayPolicy policy) {

+/*

        this(context, service, sensorManager, lightSensor, host, handler,

                context.getResources().getInteger(

                        com.android.internal.R.integer.config_screenBrightnessDoze),

                policy.screenBrightnessArray, policy.dimmingScrimArray);

+*/

+        mContext = context;

+        mDozeService = service;

+        mSensorManager = sensorManager;

+        mLightSensor = lightSensor;

+        mDozeHost = host;

+        mHandler = handler;

+

+        mDefaultDozeBrightness = context.getResources().getInteger(com.android.internal.R.integer.config_screenBrightnessDoze);

+        mSensorToBrightness = policy.screenBrightnessArray;

+        mSensorToScrimOpacity = policy.dimmingScrimArray;

+        mPolicy = policy;

    }

    @Override

@@ -94,6 +110,13 @@

    }

    @Override

+    public void clear() {

+        if (mPolicy != null) {

+            mPolicy.clear();

+        }

+    }

+

+    @Override

    public void onSensorChanged(SensorEvent event) {

        Trace.beginSection("DozeScreenBrightness.onSensorChanged" + event.values[0]);

        try {

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java b/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java

index 91cde37..000e47a 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeSensors.java

@@ -133,6 +133,12 @@

        return null;

    }

+    public void clear() {

+        if (mProxSensor != null) {

+            mProxSensor.clear();

+        }

+    }

+

    public void setListening(boolean listen) {

        for (TriggerSensor s : mSensors) {

            s.setListening(listen);

@@ -234,6 +240,12 @@

            updateRegistered();

        }

+        public void clear() {

+            if (mPolicy != null) {

+                mPolicy.clear();

+            }

+        }

+

        private void updateRegistered() {

            setRegistered(mRequested && !mCooldownTimer.isScheduled());

        }

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeService.java b/packages/SystemUI/src/com/android/systemui/doze/DozeService.java

index 98b1106..b147f97 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozeService.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeService.java

@@ -57,6 +57,16 @@

        mDozeMachine = new DozeFactory().assembleMachine(this);

    }

+    /** {@inheritDoc} */

+    @Override

+    public void onDestroy() {

+        Log.d(TAG, "onDestroy() being called when DreamController stop this service");

+        if (mDozeMachine != null) {

+            mDozeMachine.clear();

+        }

+        super.onDestroy();

+    }

+

    @Override

    public void onPluginConnected(DozeServicePlugin plugin, Context pluginContext) {

        mDozePlugin = plugin;

diff --git a/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java b/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java

index f7a258a..ea6ae4d 100644

--- a/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java

+++ b/packages/SystemUI/src/com/android/systemui/doze/DozeTriggers.java

@@ -212,6 +212,13 @@

        }

    }

+    @Override

+    public void clear() {

+        if (mDozeSensors != null) {

+            mDozeSensors.clear();

+        }

+    }

+

    private void checkTriggersAtInit() {

        if (mUiModeManager.getCurrentModeType() == Configuration.UI_MODE_TYPE_CAR

                || mDozeHost.isPowerSaveActive()

四、验证方法

由于问题比较清晰,我们直接测试20000次”休眠唤醒”的动作来对比验证。准备两台机器,一台刷老版本;一台刷修改后的版本。验证结果为老版本8000次左右就发生了crash,而修改后的版本20000次后系统依然正常。

你可能感兴趣的:(Android8.1原生systemUI导致framwork全局符号表溢出问题)