Android 性能优化实战 - 直播间场景「涉及到 Kotlin Coroutine, Websocket , SharedFlow, StateFlow 」

说一下场景

线上总有反馈说从直播间掉线，然后测试开始压测，发现对于低端设备在我们业务中推流场景下只能到60，即会发生异常;

先说一下优化前的问题

优化前业务流程图.png

业务逻辑层，在一个协程里顺序处理每一个 ws 接收者，eg:IM,Inner Notification,直播间..., 处理完后，才会开始开始下一条 ws 消息的处理;
对于 IM 的ws 消息，也会走进直播间 ws 消息的处理，只是直播间 ws 解析 command 后，发现自己无需处理，才会return;
直播间有个特殊命令「简称 special cmd」确保用户始终和直播间保持链接，若 n 秒内未接收到 special cmd，则会执行退房处理. 问题是在第1条的描述中，ws 会有堆积，若之前的消息处理逻辑过多，则会导致这个 special cmd 始终无法处理;
在直播间中，用户会刷评论/送礼等 level 较低的消息，若设备低端，无法处理大量低 level 消息时，可做丢弃或用其他 coroutine 处理，不要影响直播间中处理主流程 ws 的coroutine;
ws 中 jsonstring 重复解析问题，Gson 解析耗时问题众所周知，重复解析且解析方式问题，导致 CPU 占用增大 and 发热 and 耗时增多，进而降低设备 CPU 处理能力，恶性循环;

优化思路

不同场景只处理属于自己的 ws;
解决 ws 积压导致后面消息无法及时处理的问题;
对于直播间等重 ws 的业务场景，low_level 且对 UI 线程影响较大的 msg 单独 coroutine 处理，eg:评论,送礼「有动画」;
Gson 解析中高频 key 解析优化 and 优化 Gson 解析使用方式;

优化实战

优化后业务流程图.png

二次分发

目前所有的 ws 都在 totalFlow 中，需要做二次分发. 根据目前公司提供的后端数据，只能根据 ws 中携带的 command 做 filter 过滤；
对于少量且低频的 msg,eg:IM, InnerNotification 等封装为一个 otherFlow;
对于大量且重要的 msg,eg:直播间流程中的各 msg 封装为一个 liveFlow;
对于大量且可根据设备处理能力丢弃 or 延迟处理的msg,eg:评论,送礼等封装为一个 highFreqFlow;

otherFlow, liveFlow, highFreqFlow 均为 MutableSharedFlow;
其中 extraBufferCapacity 分别为 5, 800, 300「5：基本不会有积压; 800:重要流程消息，为防止被丢弃设为较高阀值，服务端目前最高并发貌似也不会超过 800; 300:评论,送礼都是可丢弃的，且评论区最高存储消息 count 为 150」;
onBufferOverflow 均为 BufferOverflow.DROP_OLDEST;
处理之后也解决了 special cmd 无法及时处理，导致退房的bug.

Gson 解析

对于刚需字段 command，初次解析后，在 ws 的 data 里增加一个 command 即可, so easy.

data class Broadcast(val message: JSONObject, val command: Int)

对于 Gson 解析，改之前写法如下

val a = jsonString.optJSONObject("A")?.toString() ?: ""
val b = jsonString.optJSONObject("B")?.toString() ?: ""
val aa = try {
    GsonManager.gson().fromJson(a, object : TypeToken() {}.type)
} catch (e: Exception) {
    User()
}
val bb = GsonManager.gson().fromJson(b, object : TypeToken() {}.type)
val c = jsonString.optLong("c")
val d = jsonString.optInt("d")
var e = jsonString.optString("e", "")
val f = jsonString.optBoolean("f", false)
val g = jsonString.optJSONObject("g")?.toString() ?: ""
val extra = try {
    GsonManager.gson().fromJson(g, object : TypeToken() {}.type)
} catch (e: Throwable) {
    null
}

改完后

val data = try {
    GsonManager.gson().fromJson(jsonString, object : TypeToken() {}.type)
} catch (e: Throwable) {
    return
}

data class AB(
    @SerializedName("a")
    var a: User,
    @SerializedName("b")
    var b: User,
    @SerializedName("c")
    var c: Long,
    @SerializedName("d")
    var d: Int,
    @SerializedName("e")
    var e: String,
    @SerializedName("f")
    var f: Boolean? = null,
    @SerializedName("g")
    var g: CCC? = null,
)

至于 Gson 解析耗时原理的话，自行查阅吧，我也不太熟...

其他优化

在对上面的 totalFlow 做初步解析的时候，用到了很多 filter 方法，每一次 filter 都是新建的一个 flow，尽量一次 filter 完成功能，如下

public inline fun  Flow.filter(crossinline predicate: suspend (T) -> Boolean): Flow = transform { value ->
    if (predicate(value)) return@transform emit(value)
}

//优化前如下：
flow.filter {
//**
}.map {
//***
}.filter {
//**
}.catch {
//**
}

//优化后如下：
flow.mapNotNull{
//**
}.catch {
//**
}

另外对于疯狂刷评论等操作，肯定会导致评论区的UI 疯狂刷新，可以新建一个队列缓存 comment msg，每秒取 3-4 次，每次取的 msg count 根据设备 level 来定，减少 UI 绘制压力；

成果

单丛直播压测的角度来讲，对于低端设备，结论如下：

成果.png

优化前推流-评论-60 掉线;
注:以上数字为每秒发送聊天消息*条；

卡顿检测工具

说一下发现 gson 耗时的检测工具:
这版优化前，还有一版优化，当时发现的问题是大量 gson 解析发生在 UI thread, 看下图:

cpu profiler.png

原因是: ws 接收的 coroutine dispatcher 是 Main.
不过也能发现 gson 耗时问题.

拓展
目前进一步的优化所用工具为 tencent matrix.

Android 性能优化实战 - 直播间场景 「涉及到 Kotlin Coroutine, Websocket , SharedFlow, StateFlow 」

说一下场景

先说一下优化前的问题

优化思路

优化实战

你可能感兴趣的:(Android 性能优化实战 - 直播间场景 「涉及到 Kotlin Coroutine, Websocket , SharedFlow, StateFlow 」)

Android 性能优化实战 - 直播间场景「涉及到 Kotlin Coroutine, Websocket , SharedFlow, StateFlow 」

你可能感兴趣的:(Android 性能优化实战 - 直播间场景「涉及到 Kotlin Coroutine, Websocket , SharedFlow, StateFlow 」)