LeakCanary 内存泄露源码分析

写在前面

LeakCanary 是大名鼎鼎的 Square 公司出品的开源库,用于分析 APP 的内存泄露并以非常直观的方式展示出来。本文中的 LeakCanary 版本为 v1.6.3 。现在最新的版本为 v2.0-alpha-2,已经全部用 Kotlin 重写了(网络库 okhttp 也同样用 Kotlin 重写了),这也从侧面反映了 Kotlin 是大势所趋。

开始

在 github 的官方指南链接中,我们可以看到,LeakCanary 检测程序内存泄露的入口是:

LeakCanary.install(this);

我们看该行代码做了什么操作:

  /**
   * Creates a {@link RefWatcher} that works out of the box, and starts watching activity
   * references (on ICS+).
   */
  public static @NonNull RefWatcher install(@NonNull Application application) {
    return refWatcher(application).listenerServiceClass(DisplayLeakService.class)
        .excludedRefs(AndroidExcludedRefs.createAppDefaults().build())
        .buildAndInstall();
  }

上面的代码,采样 Bulid 模式构造了一个 RefWatcher 对象,看下buildAndInstall()做了什么:

  public @NonNull RefWatcher buildAndInstall() {
    if (LeakCanaryInternals.installedRefWatcher != null) {
      throw new UnsupportedOperationException("buildAndInstall() should only be called once.");
    }
    RefWatcher refWatcher = build();
    if (refWatcher != DISABLED) {
      if (enableDisplayLeakActivity) {
        LeakCanaryInternals.setEnabledAsync(context, DisplayLeakActivity.class, true);
      }
      if (watchActivities) {
        ActivityRefWatcher.install(context, refWatcher);
      }
      if (watchFragments) {
        FragmentRefWatcher.Helper.install(context, refWatcher);
      }
    }
    LeakCanaryInternals.installedRefWatcher = refWatcher;
    return refWatcher;
  }

其中,refWatcher 不为 DISABLED,而 enableDisplayLeakActivity 赋值是通过:

    enableDisplayLeakActivity = DisplayLeakService.class.isAssignableFrom(listenerServiceClass);

得到的,如果 A 是 B 的父类或父接口,A.isAssignableFrom(B) 返回 true,因此 enableDisplayLeakActivity 为 true。watchActivitieswatchFragments 默认也为 true,接着看ActivityRefWatcher.install() 做了什么:

 public static void install(@NonNull Context context, @NonNull RefWatcher refWatcher) {
    Application application = (Application) context.getApplicationContext();
    ActivityRefWatcher activityRefWatcher = new ActivityRefWatcher(application, refWatcher);

    application.registerActivityLifecycleCallbacks(activityRefWatcher.lifecycleCallbacks);
  }

  private final Application.ActivityLifecycleCallbacks lifecycleCallbacks =
      new ActivityLifecycleCallbacksAdapter() {
        @Override public void onActivityDestroyed(Activity activity) {
          refWatcher.watch(activity);
        }
      };

可以看到,这里会注册 activity 的生命周期回调,在onActivityDestroyed() 即 activity 销毁的时候,执行 refWatcher.watch()。找到 RefWatcherwatch()

  public void watch(Object watchedReference) {
    watch(watchedReference, "");
  }

  public void watch(Object watchedReference, String referenceName) {
    if (this == DISABLED) {
      return;
    }
    checkNotNull(watchedReference, "watchedReference");
    checkNotNull(referenceName, "referenceName");
    final long watchStartNanoTime = System.nanoTime();
    String key = UUID.randomUUID().toString();
    retainedKeys.add(key);
    final KeyedWeakReference reference =
        new KeyedWeakReference(watchedReference, key, referenceName, queue);

    ensureGoneAsync(watchStartNanoTime, reference);
  }

其中,先获取了一个随机的 key,然后将 key 加入到 retainedKeys 这个Set中,再创建一个与引用队列 queue 绑定的弱引用。这里需要说一下弱引用 WeakReference 和引用队列 ReferenceQueue,由于 Java 的 内存管理机制,当 GC 时,弱引用对象所引用的对象会被加入到引用队列;相反,如果引用队列中没有该弱引用对象,那就说明弱引用对象的引用没有被释放,也即有可能发生了内存泄露。可以说,WeakReference + ReferenceQueue 撑起了 LeakCanary 的大旗。

watch()方法的最后,会走 ensureGoneAsync()

 private void ensureGoneAsync(final long watchStartNanoTime, final KeyedWeakReference reference) {
    watchExecutor.execute(new Retryable() {
      @Override public Retryable.Result run() {
        return ensureGone(reference, watchStartNanoTime);
      }
    });
  }

  @SuppressWarnings("ReferenceEquality") // Explicitly checking for named null.
  Retryable.Result ensureGone(final KeyedWeakReference reference, final long watchStartNanoTime) {
    long gcStartNanoTime = System.nanoTime();
    long watchDurationMs = NANOSECONDS.toMillis(gcStartNanoTime - watchStartNanoTime);

    removeWeaklyReachableReferences();

    //这里Debugger指的是当前是否在调试模式(IDE断点调试时会返回true),调试模式下可能导致错误的泄露,需要排除
    if (debuggerControl.isDebuggerAttached()) {
      // The debugger can create false leaks.
      return RETRY;
    }
    if (gone(reference)) {
      return DONE;
    }
    //触发GC
    gcTrigger.runGc();
    removeWeaklyReachableReferences();
    if (!gone(reference)) {
      long startDumpHeap = System.nanoTime();
      long gcDurationMs = NANOSECONDS.toMillis(startDumpHeap - gcStartNanoTime);

      //得到堆转储hprof文件
      File heapDumpFile = heapDumper.dumpHeap();
      if (heapDumpFile == RETRY_LATER) {
        // Could not dump the heap.
        return RETRY;
      }
      long heapDumpDurationMs = NANOSECONDS.toMillis(System.nanoTime() - startDumpHeap);

      HeapDump heapDump = heapDumpBuilder.heapDumpFile(heapDumpFile).referenceKey(reference.key)
          .referenceName(reference.name)
          .watchDurationMs(watchDurationMs)
          .gcDurationMs(gcDurationMs)
          .heapDumpDurationMs(heapDumpDurationMs)
          .build();

      heapdumpListener.analyze(heapDump);
    }
    return DONE;
  }

ensureGoneAsync() 里面会 调用watchExecutor.execute()watchExecutor 的真正实现是 AndroidWatchExecutor ,看下里面的 execute() 方法:

@Override public void execute(@NonNull Retryable retryable) {
    if (Looper.getMainLooper().getThread() == Thread.currentThread()) {
      waitForIdle(retryable, 0);
    } else {
      postWaitForIdle(retryable, 0);
    }
  }

  private void postWaitForIdle(final Retryable retryable, final int failedAttempts) {
    mainHandler.post(new Runnable() {
      @Override public void run() {
        waitForIdle(retryable, failedAttempts);
      }
    });
  }

  private void waitForIdle(final Retryable retryable, final int failedAttempts) {
    // This needs to be called from the main thread.
    Looper.myQueue().addIdleHandler(new MessageQueue.IdleHandler() {
      @Override public boolean queueIdle() {
        postToBackgroundWithDelay(retryable, failedAttempts);
        return false;
      }
    });
  }

  private void postToBackgroundWithDelay(final Retryable retryable, final int failedAttempts) {
    long exponentialBackoffFactor = (long) Math.min(Math.pow(2, failedAttempts), maxBackoffFactor);
    long delayMillis = initialDelayMillis * exponentialBackoffFactor;
    backgroundHandler.postDelayed(new Runnable() {
      @Override public void run() {
        Retryable.Result result = retryable.run();
        if (result == RETRY) {
          postWaitForIdle(retryable, failedAttempts + 1);
        }
      }
    }, delayMillis);
  }

execute()会先判断当前线程,如果当前线程是在主线程,就执行waitForIdle(retryable, 0);,否则执行postWaitForIdle(retryable, 0);。而postWaitForIdle()方法中,mainHandler 为主线程的Handler,因此该方法又将其执行的线程切换为主线程,故waitForIdle()始终执行在主线程。

在主线程执行waitForIdle(),注册了addIdleHandler()回调,IdleHandler会在主线程的空闲期得到执行,当执行的时候会调用postToBackgroundWithDelay()。这个方法为 LeakCanary 检测的重试机制,参数 failedAttempts 表示重试的次数,从0开始递增,因此指数补偿因子exponentialBackoffFactor 从 2^0 开始指数级增加,而延时时间 delayMillis 同样从 5 * 2^0 开始指数级增加。postToBackgroundWithDelay()然后会切换到子线程中执行,Retryable.Result result = retryable.run();这里执行的是 ensureGoneAsync()run() 返回的是 ensureGone() 的执行结果。当没有内存泄露时,会返回 RETRY,就会走postWaitForIdle(retryable, failedAttempts + 1);可以看到,这里的重试次数增加了1,并且postToBackgroundWithDelay()下次执行的时间也会指数级增加。

ensureGone()

由于这个方法很重要,因此单独拿出来作为一小节。下面来分析前面的ensureGone(),再贴一下其代码:

Retryable.Result ensureGone(final KeyedWeakReference reference, final long watchStartNanoTime) {
    long gcStartNanoTime = System.nanoTime();
    long watchDurationMs = NANOSECONDS.toMillis(gcStartNanoTime - watchStartNanoTime);

    removeWeaklyReachableReferences();

    if (debuggerControl.isDebuggerAttached()) {
      // The debugger can create false leaks.
      return RETRY;
    }
    if (gone(reference)) {
      return DONE;
    }
    gcTrigger.runGc();
    removeWeaklyReachableReferences();
    if (!gone(reference)) {
      long startDumpHeap = System.nanoTime();
      long gcDurationMs = NANOSECONDS.toMillis(startDumpHeap - gcStartNanoTime);

      File heapDumpFile = heapDumper.dumpHeap();
      if (heapDumpFile == RETRY_LATER) {
        // Could not dump the heap.
        return RETRY;
      }
      long heapDumpDurationMs = NANOSECONDS.toMillis(System.nanoTime() - startDumpHeap);

      HeapDump heapDump = heapDumpBuilder.heapDumpFile(heapDumpFile).referenceKey(reference.key)
          .referenceName(reference.name)
          .watchDurationMs(watchDurationMs)
          .gcDurationMs(gcDurationMs)
          .heapDumpDurationMs(heapDumpDurationMs)
          .build();

      heapdumpListener.analyze(heapDump);
    }
    return DONE;
  }

removeWeaklyReachableReferences(),顾名思义,即去除弱可达引用:

private void removeWeaklyReachableReferences() {
    // WeakReferences are enqueued as soon as the object to which they point to becomes weakly
    // reachable. This is before finalization or garbage collection has actually happened.
    KeyedWeakReference ref;
    while ((ref = (KeyedWeakReference) queue.poll()) != null) {
      retainedKeys.remove(ref.key);
    }
}

queue 是引用队列 ReferenceQueue ,retainedKeys 是保存与弱引用关联着的 key 的 Set 集合,在watch()的时候会将 key 加入到该集合。GC 的时候,对象会被回收,并且对象的弱引用会被加入到该引用队列queue 。源码中,如果queue.poll() 不为空,说明弱引用对象已经被加入到了引用队列中,也就说明ReferenceQueue所引用的对象已经被 GC 正常回收了,因此该对象不需要再被监视了,即从集合中 remove 掉。

再往下看ensureGone(),再判断这个对象有没有被回收就很容易了:

private boolean gone(KeyedWeakReference reference) {
    return !retainedKeys.contains(reference.key);
}

key 不在集合中,就表明对象已经被回收了。

经过两次removeWeaklyReachableReferences()操作,如果还没有被回收,那么就有可能发生了内存泄露,继续往下执行。heapDumper.dumpHeap()是生成 hprof 的方法,heapDumper 是 AndroidHeapDumper 的一个对象:

  @Override @Nullable
  public File dumpHeap() {
    // 生成一个用于存储 hprof 的空文件
    File heapDumpFile = leakDirectoryProvider.newHeapDumpFile();

    //如果文件创建失败,会走重试逻辑
    if (heapDumpFile == RETRY_LATER) {
      return RETRY_LATER;
    }

   // FutureResult 内部有一个 CountDownLatch,用于倒计时
    FutureResult waitingForToast = new FutureResult<>();
    // 切换到主线程显示 toast(也就是我们见到的那个带有金丝雀icon的toast)
    showToast(waitingForToast);

     // 等待5秒,确保 toast 已完成显示
    if (!waitingForToast.wait(5, SECONDS)) {
      CanaryLog.d("Did not dump heap, too much time waiting for Toast.");
      return RETRY_LATER;
    }

   // 创建通知,显示在通知栏
    Notification.Builder builder = new Notification.Builder(context)
        .setContentTitle(context.getString(R.string.leak_canary_notification_dumping));
    Notification notification = LeakCanaryInternals.buildNotification(context, builder);
    NotificationManager notificationManager =
        (NotificationManager) context.getSystemService(Context.NOTIFICATION_SERVICE);
    int notificationId = (int) SystemClock.uptimeMillis();
    notificationManager.notify(notificationId, notification);

    Toast toast = waitingForToast.get();
    try {
     // 采用系统自带的Debug工具来 doump 出 hprof文件,并保存到指定的文件目录下
      Debug.dumpHprofData(heapDumpFile.getAbsolutePath());
      //dump完毕后,取消toast显示
      cancelToast(toast);
      notificationManager.cancel(notificationId);
      return heapDumpFile;
    } catch (Exception e) {
      CanaryLog.d(e, "Could not dump heap");
      // Abort heap dump
      return RETRY_LATER;
    }
}

看看showToast()是如何确保 toast 已完成显示:

  private void showToast(final FutureResult waitingForToast) {
    mainHandler.post(new Runnable() {
      @Override public void run() {
        if (resumedActivity == null) {
          waitingForToast.set(null);
          return;
        }
        final Toast toast = new Toast(resumedActivity);
        toast.setGravity(Gravity.CENTER_VERTICAL, 0, 0);
        toast.setDuration(Toast.LENGTH_LONG);
        LayoutInflater inflater = LayoutInflater.from(resumedActivity);
        toast.setView(inflater.inflate(R.layout.leak_canary_heap_dump_toast, null));
        toast.show();
        // Waiting for Idle to make sure Toast gets rendered.
        Looper.myQueue().addIdleHandler(new MessageQueue.IdleHandler() {
          @Override public boolean queueIdle() {
            waitingForToast.set(toast);
            return false;
          }
        });
      }
    });
  }

由于前面的ensureGone()运行在子线程,dumpHeap()因此也在子线程。showToast()先切换到主线程,在主线程消息队列中,先显示toast,然后在主线程空闲的时候执行waitingForToast.set(toast)将 CountDownLatch 计数器减 1。CountDownLatch 计数器count的初值为1,执行set()后即为 0。

CountDownLatch 的 await(long time, TimeUnit unit)方法意思是在指定的时间内一直处于等待状态,即阻塞当前线程。当超过等待时间或者计数器减至0时,就结束等待,不再阻塞当前线程,此时waitingForToast.wait(5, SECONDS)返回 true。所以,在if (!waitingForToast.wait(5, SECONDS))判断中,最多会等待 5s,如果超时会走重试机制;否则如果在这 5s 时间内 CountDownLatch 执行了减 1 操作使 计数器count 至 0,则会正常走后续流程,同时可以推出它前面 toast 肯定也已经显示完成了(因为后执行的都执行完毕了,那么它之前的肯定也执行完毕了)。

再接着看ensureGone()后面的 heapdumpListener.analyze(heapDump)

heapdumpListener 是 ServiceHeapDumpListener 的一个对象,最终执行了HeapAnalyzerService.runAnalysis方法。

 public static void runAnalysis(Context context, HeapDump heapDump,
      Class listenerServiceClass) {
    setEnabledBlocking(context, HeapAnalyzerService.class, true);
    setEnabledBlocking(context, listenerServiceClass, true);
    Intent intent = new Intent(context, HeapAnalyzerService.class);
    intent.putExtra(LISTENER_CLASS_EXTRA, listenerServiceClass.getName());
    intent.putExtra(HEAPDUMP_EXTRA, heapDump);
    //这里会开启一个前台服务。
    ContextCompat.startForegroundService(context, intent);
  }

HeapAnalyzerService 继承自 ForegroundService,而ForegroundService 又继承自 IntentService。IntentService 最终实际的耗时操作都在onHandleIntent()回调中进行:

  @Override
  protected void onHandleIntentInForeground(@Nullable Intent intent) {
    if (intent == null) {
      CanaryLog.d("HeapAnalyzerService received a null intent, ignoring.");
      return;
    }
    String listenerClassName = intent.getStringExtra(LISTENER_CLASS_EXTRA);
    // hprof 文件
    HeapDump heapDump = (HeapDump) intent.getSerializableExtra(HEAPDUMP_EXTRA);

    HeapAnalyzer heapAnalyzer =
        new HeapAnalyzer(heapDump.excludedRefs, this, heapDump.reachabilityInspectorClasses);

    // checkForLeak 会调用 haha 库,分析 hprof 文件
    AnalysisResult result = heapAnalyzer.checkForLeak(heapDump.heapDumpFile, heapDump.referenceKey,
        heapDump.computeRetainedHeapSize);
    AbstractAnalysisResultService.sendResultToListener(this, listenerClassName, heapDump, result);
  }

这其中,最关键的在于分析 hprof 文件checkForLeak()

checkForLeak()

 public @NonNull AnalysisResult checkForLeak(@NonNull File heapDumpFile,
      @NonNull String referenceKey,
      boolean computeRetainedSize) {
    long analysisStartNanoTime = System.nanoTime();

    if (!heapDumpFile.exists()) {
      Exception exception = new IllegalArgumentException("File does not exist: " + heapDumpFile);
      return failure(exception, since(analysisStartNanoTime));
    }

    try {
      // 更新分析进度的回调
      listener.onProgressUpdate(READING_HEAP_DUMP_FILE);
      HprofBuffer buffer = new MemoryMappedFileBuffer(heapDumpFile);
      HprofParser parser = new HprofParser(buffer);
      listener.onProgressUpdate(PARSING_HEAP_DUMP);
       // 将 hprof 文件解析为 Snapshot(由 Square 另一个开源框架 haha 库完成)
      Snapshot snapshot = parser.parse();
      listener.onProgressUpdate(DEDUPLICATING_GC_ROOTS);
       // 移除相同 GC root 项
      deduplicateGcRoots(snapshot);
      listener.onProgressUpdate(FINDING_LEAKING_REF);
       // 查找内存泄漏项
      Instance leakingRef = findLeakingReference(referenceKey, snapshot);

      // False alarm, weak reference was cleared in between key check and heap dump.
      // leakingRef为空,就说明没有泄漏
      if (leakingRef == null) {
        String className = leakingRef.getClassObj().getClassName();
        return noLeak(className, since(analysisStartNanoTime));
      }
       // 找到泄漏处的引用关系链
      return findLeakTrace(analysisStartNanoTime, snapshot, leakingRef, computeRetainedSize);
    } catch (Throwable e) {
      return failure(e, since(analysisStartNanoTime));
    }
  }

findLeakingReference()是查找泄漏的引用处,看下代码:

private Instance findLeakingReference(String key, Snapshot snapshot) {
  // 从 hprof 文件保存的对象中找到所有 KeyedWeakReference 的实例
  ClassObj refClass = snapshot.findClass(KeyedWeakReference.class.getName());
  if (refClass == null) {
    throw new IllegalStateException(
        "Could not find the " + KeyedWeakReference.class.getName() + " class in the heap dump.");
  }
  List keysFound = new ArrayList<>();
  // 对 KeyedWeakReference 实例列表进行遍历
  for (Instance instance : refClass.getInstancesList()) {
    // 获取每个实例里的所有字段
    List values = classInstanceValues(instance);
    // 找到 key 字段对应的值
    Object keyFieldValue = fieldValue(values, "key");
    if (keyFieldValue == null) {
      keysFound.add(null);
      continue;
    }
    // 将 keyFieldValue 转为 String 对象
    String keyCandidate = asString(keyFieldValue);
    // 如果这个对象的 key 和 查找的 key 相同,就返回这个弱对象持有的原对象
    if (keyCandidate.equals(key)) {
      return fieldValue(values, "referent");
    }
    keysFound.add(keyCandidate);
  }
  throw new IllegalStateException(
      "Could not find weak reference with key " + key + " in " + keysFound);
}

至此,LeakCanary 的分析就告一段落了。

总结

整个 LeakCanary 内存泄露分析的流程图如下:
LeakCanary 内存泄露源码分析_第1张图片
学到的一些知识点:

  • 内存泄露如何判断(WeakReference)

  • Builder模式的继承(泛型实现)

  • Debug.dumpHprofData()获取内存数据

  • 动态设置组件开关:setComponentEnabledSetting()

  • 当前进程判断:isInServiceProcess()

  • 手动GC的方法:GcTrigger.run()

  • 注册UI线程的空闲回调:Looper.myQueue().addIdleHandler()

你可能感兴趣的:(Android)