DataStore之Proto

  • 概述

    Proto DataStore 将数据作为自定义数据类型的实例进行存储。此实现要求您使用协议缓冲区来定义架构,但可以确保类型安全。

    Preferences DataStore不同的是,使用Proto DataStore会比较繁琐,需要使用proto语法预定义数据,优点是可以确保类型安全。

    这里说的确保类型安全其实就是会按照.proto文件中定义的数据类型来读取写入对象,因为是事先定义,并且整个读取写入操作都被封装好,少了人为的读取写入,就达到了所谓的确保类型安全的目的。其实Preferences DataStore在使用过程中也是“确保”了类型安全的,不过这里的类型安全确保值得是人为的确保,通常对于一个属性或对象我们会定义一组特定类型的读取写入方法,使用的时候调取相关方法就好,很少情况下会出现类型不匹配错误。

  • 配置

    在使用Proto DataStore时,依赖于proto buffer的插件来根据proto文件自动生成相关实体类,只不过这些类只存在于编译运行期,并不属于用户代码,自动生成也是避免人为出错。

    注意:下面的配置版本是适配的,使用的是Studio4.2.1,使用更高或更低版本需要选择适配版本,我这里根据google开发者中国文档和网上的博文一起整理的,没找到官方的完整配置。

    1. 在项目build.gradle中

      buildscript {
          ext.kotlin_version = "1.4.21"
          //这里的repositories是配置用于当前buildscript中的dependencies中依赖项的查找仓库
          repositories {
              jcenter()
              google()
              mavenCentral()
          }
          dependencies {
              classpath "com.android.tools.build:gradle:4.0.2"
              classpath "org.jetbrains.kotlin:kotlin-gradle-plugin:$kotlin_version"
              classpath 'com.google.protobuf:protobuf-gradle-plugin:0.8.12'
      
              // NOTE: Do not place your application dependencies here; they belong
              // in the individual module build.gradle files
          }
      }
      
    2. Module的build.gradle中

      apply plugin: 'com.google.protobuf'
      
      protobuf {
          protoc {
              artifact = "com.google.protobuf:protoc:3.10.0"
          }
      
          // Generates the java Protobuf-lite code for the Protobufs in this project. See
          // https://github.com/google/protobuf-gradle-plugin#customizing-protobuf-compilation
          // for more information.
          generateProtoTasks {
              all().each { task ->
                  task.builtins {
                      java {
                          option 'lite'
                      }
                  }
              }
          }
      }
      dependencies{
         implementation "androidx.datastore:datastore:1.0.0"
          implementation  "com.google.protobuf:protobuf-javalite:3.10.0"
      }
      
    3. sync下载依赖项和插件,然后定义.proto文件

      syntax = "proto3";
      
      option java_package = "com.mph.review.bean.plain";
      option java_multiple_files = true;
      
      message Demo2 {
        int32 aa = 1;
        string bb = 2;
      }
      

      注意proto文件要放在src/main/proto文件夹下。

    4. 最后ReBuild项目,会在build>generated>source>proto>debug或者release>“java_package指定的包路径”中找到自动生成的类,比如上面的proto文件会生成:

      image-20210927154858417
  • 代码使用

    1. 定义自己的androidx.datastore.core.Serializer

      class MyDemo2Serializer(override val defaultValue : Demo2) : Serializer {
          override suspend fun readFrom(input : InputStream) : Demo2 {
              try {
                  return Demo2.parseFrom(input)
              } catch(e : InvalidProtocolBufferException) {
                  throw e
              }
          }
      
          override suspend fun writeTo(t : Demo2, output : OutputStream) {
              t.writeTo(output)
          }
      
      }
      

      这里其实就是调用生成的Demo2类的相关方法进行实例化读取和写入。

    2. 创建proto的操作对象

      val Context.demo2Proto by dataStore("demo2.proto",MyDemo2Serializer(Demo2.getDefaultInstance()))
      

      我们使用kotlin的委托方法dataStore来生成,需要传入proto文件名,还有上面定义的Serializer。

    3. 读取写入

      suspend fun incrementCounterByProto() {
          demo2Proto.updateData { currentSettings ->
              currentSettings.toBuilder()
                      .setAa(currentSettings.aa + 1)
                      .setBb(currentSettings.bb + "New")
                      .build()
          }
      }
      
      private fun testProtoDataStore() {
          val aaFlow : Flow = demo2Proto.data.map { settings -> // The exampleCounter property is generated from the proto schema.
                settings.aa
          }
      }
      
  • 源码分析

    首先看一下构造:

    public fun  dataStore(
        fileName: String,
        serializer: Serializer,
        corruptionHandler: ReplaceFileCorruptionHandler? = null,
        produceMigrations: (Context) -> List> = { listOf() },
        scope: CoroutineScope = CoroutineScope(Dispatchers.IO + SupervisorJob())
    ): ReadOnlyProperty> {
        return DataStoreSingletonDelegate(
            fileName, serializer, corruptionHandler, produceMigrations, scope
        )
    }
    

    返回的是DataStoreSingletonDelegate,它继承自ReadOnlyProperty:

    public fun interface ReadOnlyProperty {
        /**
         * Returns the value of the property for the given object.
         * @param thisRef the object for which the value is requested.
         * @param property the metadata for the property.
         * @return the property value.
         */
        public operator fun getValue(thisRef: T, property: KProperty<*>): V
    }
    

    可以看到,这里需要operator重写getValue方法,即:

    @GuardedBy("lock")
    @Volatile
    private var INSTANCE: DataStore? = null
    
    /**
     * Gets the instance of the DataStore.
     *
     * @param thisRef must be an instance of [Context]
     * @param property not used
     */
    override fun getValue(thisRef: Context, property: KProperty<*>): DataStore {
        return INSTANCE ?: synchronized(lock) {
            if (INSTANCE == null) {
                val applicationContext = thisRef.applicationContext
                INSTANCE = DataStoreFactory.create(
                    serializer = serializer,
                    produceFile = { applicationContext.dataStoreFile(fileName) },
                    corruptionHandler = corruptionHandler,
                    migrations = produceMigrations(applicationContext),
                    scope = scope
                )
            }
            INSTANCE!!
        }
    }
    

    当使用DataStoreSingletonDelegate的时候其实就是使用INSTANCE,也就是DataStore,DataStore接口有一个属性和一个方法,分别用于读取和写入操作,具体的读取和写入逻辑交给不同的子类实现:

    public interface DataStore {
        /**
         * Provides efficient, cached (when possible) access to the latest durably persisted state.
         * The flow will always either emit a value or throw an exception encountered when attempting
         * to read from disk. If an exception is encountered, collecting again will attempt to read the
         * data again.
         *
         * Do not layer a cache on top of this API: it will be be impossible to guarantee consistency.
         * Instead, use data.first() to access a single snapshot.
         *
         * @return a flow representing the current state of the data
         * @throws IOException when an exception is encountered when reading data
         */
        public val data: Flow
    
        /**
         * Updates the data transactionally in an atomic read-modify-write operation. All operations
         * are serialized, and the transform itself is a coroutine so it can perform heavy work
         * such as RPCs.
         *
         * The coroutine completes when the data has been persisted durably to disk (after which
         * [data] will reflect the update). If the transform or write to disk fails, the
         * transaction is aborted and an exception is thrown.
         *
         * @return the snapshot returned by the transform
         * @throws IOException when an exception is encountered when writing data to disk
         * @throws Exception when thrown by the transform function
         */
        public suspend fun updateData(transform: suspend (t: T) -> T): T
    }
    

    applicationContext.dataStoreFile(fileName)会创建一个File实例用于保存数据:

    public fun Context.dataStoreFile(fileName: String): File =
        File(applicationContext.filesDir, "datastore/$fileName")
    

    DataStoreFactory.create会返回一个SingleProcessDataStore实例:

    public fun  create(
        serializer: Serializer,
        corruptionHandler: ReplaceFileCorruptionHandler? = null,
        migrations: List> = listOf(),
        scope: CoroutineScope = CoroutineScope(Dispatchers.IO + SupervisorJob()),
        produceFile: () -> File
    ): DataStore =
        SingleProcessDataStore(
            produceFile = produceFile,
            serializer = serializer,
            corruptionHandler = corruptionHandler ?: NoOpCorruptionHandler(),
            initTasksList = listOf(DataMigrationInitializer.getInitializer(migrations)),
            scope = scope
        )
    

    接下来我们先看读取操作,上面的读取操作会返回一个Flow对象,看一下SingleProcessDataStore的data属性重写:

    override val data: Flow = flow {
        val currentDownStreamFlowState = downstreamFlow.value
    
        if (currentDownStreamFlowState !is Data) {
            // We need to send a read request because we don't have data yet.
            actor.offer(Message.Read(currentDownStreamFlowState))
        }
    
        emitAll(
            downstreamFlow.dropWhile {
                if (currentDownStreamFlowState is Data ||
                    currentDownStreamFlowState is Final
                ) {
                    // We don't need to drop any Data or Final values.
                    false
                } else {
                    // we need to drop the last seen state since it was either an exception or
                    // wasn't yet initialized. Since we sent a message to actor, we *will* see a
                    // new value.
                    it === currentDownStreamFlowState
                }
            }.map {
                when (it) {
                    is ReadException -> throw it.readException
                    is Final -> throw it.finalException
                    is Data -> it.value
                    is UnInitialized -> error(
                        "This is a bug in DataStore. Please file a bug at: " +
                            "https://issuetracker.google.com/issues/new?" +
                            "component=907884&template=1466542"
                    )
                }
            }
        )
    }
    

    downstreamFlow.value是最新操作的数据,读取和写入都会重新更新downstreamFlow.value的值,这里判断如果不是Data(即可读取数据),则调用actor.offer方法来读取数据,看一下actor是什么:

    private val actor = SimpleActor>(
        scope = scope,
        onComplete = {
            it?.let {
                downstreamFlow.value = Final(it)
            }
            // We expect it to always be non-null but we will leave the alternative as a no-op
            // just in case.
    
            synchronized(activeFilesLock) {
                activeFiles.remove(file.absolutePath)
            }
        },
        onUndeliveredElement = { msg, ex ->
            if (msg is Message.Update) {
                // TODO(rohitsat): should we instead use scope.ensureActive() to get the original
                //  cancellation cause? Should we instead have something like
                //  UndeliveredElementException?
                msg.ack.completeExceptionally(
                    ex ?: CancellationException(
                        "DataStore scope was cancelled before updateData could complete"
                    )
                )
            }
        }
    ) { msg ->
        when (msg) {
            is Message.Read -> {
                handleRead(msg)
            }
            is Message.Update -> {
                handleUpdate(msg)
            }
        }
    }
    

    SimpleActor的构造方法如下:

    internal class SimpleActor(
        /**
         * The scope in which to consume messages.
         */
        private val scope: CoroutineScope,
        /**
         * Function that will be called when scope is cancelled. Should *not* throw exceptions.
         */
        onComplete: (Throwable?) -> Unit,
        /**
         * Function that will be called for each element when the scope is cancelled. Should *not*
         * throw exceptions.
         */
        onUndeliveredElement: (T, Throwable?) -> Unit,
        /**
         * Function that will be called once for each message.
         *
         * Must *not* throw an exception (other than CancellationException if scope is cancelled).
         */
        private val consumeMessage: suspend (T) -> Unit
    ) {
      ... ...
    

    可见,这里最后的函数体(msg->...部分)赋值给了SimpleActor的consumeMessage属性。现在来看一下它的offer方法:

    fun offer(msg: T) {
        // should never return false bc the channel capacity is unlimited
        check(
            messageQueue.trySend(msg)
                .onClosed { throw it ?: ClosedSendChannelException("Channel was closed normally") }
                .isSuccess
        )
    
        // If the number of remaining messages was 0, there is no active consumer, since it quits
        // consuming once remaining messages hits 0. We must kick off a new consumer.
        if (remainingMessages.getAndIncrement() == 0) {
            scope.launch {
                // We shouldn't have started a new consumer unless there are remaining messages...
                check(remainingMessages.get() > 0)
    
                do {
                    // We don't want to try to consume a new message unless we are still active.
                    // If ensureActive throws, the scope is no longer active, so it doesn't
                    // matter that we have remaining messages.
                    scope.ensureActive()
    
                    consumeMessage(messageQueue.receive())
                } while (remainingMessages.decrementAndGet() != 0)
            }
        }
    }
    

    我们看到,在这里调用了consumeMessage函数,传入的参数是从messageQueue中获取的,messageQueue在前面的check时通过trySend把actor.offer传入的Message.Read(currentDownStreamFlowState)添加的,还要注意的一点是这一切都是在scope.launch协程中发生的,scope是CoroutineScope(Dispatchers.IO + SupervisorJob()),这就保证了数据读取是一个不影响主线程的异步操作。

    那么现在回到actor的构造处,因为它是Message.Read,就知道它会走handleRead方法:

    private suspend fun handleRead(read: Message.Read) {
        when (val currentState = downstreamFlow.value) {
            is Data -> {
                // We already have data so just return...
            }
            is ReadException -> {
                if (currentState === read.lastState) {
                    readAndInitOrPropagateFailure()
                }
    
                // Someone else beat us but also failed. The collector has already
                // been signalled so we don't need to do anything.
            }
            UnInitialized -> {
                readAndInitOrPropagateFailure()
            }
            is Final -> error("Can't read in final state.") // won't happen
        }
    }
    

    这里的if判断其实就是确保之前的一系列操作结束后并没有产生异常,因为这两个东西正常来说是同一个东西。那么只要不是Data或者不是Final(Final就是无数据),就会执行到readAndInitOrPropagateFailure方法:

    private suspend fun readAndInitOrPropagateFailure() {
        try {
            readAndInit()
        } catch (throwable: Throwable) {
            downstreamFlow.value = ReadException(throwable)
        }
    }
    

    这里又调用readAndInit方法:

    private suspend fun readAndInit() {
        // This should only be called if we don't already have cached data.
        check(downstreamFlow.value == UnInitialized || downstreamFlow.value is ReadException)
    
        val updateLock = Mutex()
        var initData = readDataOrHandleCorruption()
    
        var initializationComplete: Boolean = false
    
        // TODO(b/151635324): Consider using Context Element to throw an error on re-entrance.
        val api = object : InitializerApi {
            override suspend fun updateData(transform: suspend (t: T) -> T): T {
                return updateLock.withLock() {
                    if (initializationComplete) {
                        throw IllegalStateException(
                            "InitializerApi.updateData should not be " +
                                "called after initialization is complete."
                        )
                    }
    
                    val newData = transform(initData)
                    if (newData != initData) {
                        writeData(newData)
                        initData = newData
                    }
    
                    initData
                }
            }
        }
    
        initTasks?.forEach { it(api) }
        initTasks = null // Init tasks have run successfully, we don't need them anymore.
        updateLock.withLock {
            initializationComplete = true
        }
    
        downstreamFlow.value = Data(initData, initData.hashCode())
    }
    

    readDataOrHandleCorruption用来读取文件中数据并转化成相关的实体类实例:

    private suspend fun readDataOrHandleCorruption(): T {
        try {
            return readData()
        } catch (ex: CorruptionException) {
    
            val newData: T = corruptionHandler.handleCorruption(ex)
    
            try {
                writeData(newData)
            } catch (writeEx: IOException) {
                // If we fail to write the handled data, add the new exception as a suppressed
                // exception.
                ex.addSuppressed(writeEx)
                throw ex
            }
    
            // If we reach this point, we've successfully replaced the data on disk with newData.
            return newData
        }
    }
    
    private suspend fun readData(): T {
        try {
            FileInputStream(file).use { stream ->
                return serializer.readFrom(stream)
            }
        } catch (ex: FileNotFoundException) {
            if (file.exists()) {
                throw ex
            }
            return serializer.defaultValue
        }
    }
    

    可见,这里根据文件构造文件输入流,再把它传给我们之前传入的Serializer,也就是MyDemo2Serializer,调用它的readFrom方法,还记得我们在那里调用了自动生成的Demo2的parseFrom方法:

    public static com.mph.review.bean.plain.Demo2 parseFrom(java.io.InputStream input)
        throws java.io.IOException {
      return com.google.protobuf.GeneratedMessageLite.parseFrom(
          DEFAULT_INSTANCE, input);
    }
    

    DEFAULT_INSTANCE就是Demo2对象,是在静态块中实例化的:

    private static final com.mph.review.bean.plain.Demo2 DEFAULT_INSTANCE;
    static {
      Demo2 defaultInstance = new Demo2();
      // New instances are implicitly immutable so no need to make
      // immutable.
      DEFAULT_INSTANCE = defaultInstance;
      com.google.protobuf.GeneratedMessageLite.registerDefaultInstance(
        Demo2.class, defaultInstance);
    }
    

    来看一下GeneratedMessageLite的parseFrom方法:

    protected static > T parseFrom(
        T defaultInstance, InputStream input) throws InvalidProtocolBufferException {
      return checkMessageInitialized(
          parsePartialFrom(
              defaultInstance,
              CodedInputStream.newInstance(input),
              ExtensionRegistryLite.getEmptyRegistry()));
    }
    

    parsePartialFrom方法如下:

    static > T parsePartialFrom(
        T instance, CodedInputStream input, ExtensionRegistryLite extensionRegistry)
        throws InvalidProtocolBufferException {
      @SuppressWarnings("unchecked") // Guaranteed by protoc
      T result = (T) instance.dynamicMethod(MethodToInvoke.NEW_MUTABLE_INSTANCE);
      try {
        // TODO(yilunchong): Try to make input with type CodedInpuStream.ArrayDecoder use
        // fast path.
        Schema schema = Protobuf.getInstance().schemaFor(result);
        schema.mergeFrom(result, CodedInputStreamReader.forCodedInput(input), extensionRegistry);
        schema.makeImmutable(result);
      } catch (IOException e) {
        if (e.getCause() instanceof InvalidProtocolBufferException) {
          throw (InvalidProtocolBufferException) e.getCause();
        }
        throw new InvalidProtocolBufferException(e.getMessage()).setUnfinishedMessage(result);
      } catch (RuntimeException e) {
        if (e.getCause() instanceof InvalidProtocolBufferException) {
          throw (InvalidProtocolBufferException) e.getCause();
        }
        throw e;
      }
      return result;
    }
    

    因为instance是Demo2,所以看一下Demo2的dynamicMethod方法:

    protected final java.lang.Object dynamicMethod(
        com.google.protobuf.GeneratedMessageLite.MethodToInvoke method,
        java.lang.Object arg0, java.lang.Object arg1) {
      switch (method) {
        case NEW_MUTABLE_INSTANCE: {
          return new com.mph.review.bean.plain.Demo2();
        }
        case NEW_BUILDER: {
          return new Builder();
        }
        ... ...
      }
    }
    

    这里还是会返回一个新的Demo2对象,DEFAULT_INSTANCE已经是对象了,这里看起来好像有些多余,其实DEFAULT_INSTANCE是静态的、共用的,有可能其他地方也在用,我们不能直接修改这个实例避免影响其他地方。拿到result之后,parsePartialFrom的后续操作就是从文件输入流中读取保存值然后对result进行赋值。

    到这里,readDataOrHandleCorruption的流程就走完了,我们读取到了保存的数据,回到readAndInit方法,接下来就是一些非必要的处理操作,我们来看一下。

    首先会构造一个InitializerApi对象叫api,initTasks中的每个函数都要处理一次它,注意,这里的initTasks只会使用一次,只用于第一次读取的时候,这是为什么呢?因为这些操作是为了兼容旧数据,相当于对旧数据提供一个可以更新的入口,为什么这么说,我们往下看。

    initTasks就是initTasksList,initTasksList来自于DataStoreFactory构造SingleProcessDataStore时传入的:

    initTasksList = listOf(DataMigrationInitializer.getInitializer(migrations)),
    

    migrations是DataStoreFactory的create方法传入的,这可以在我们调用dataStore时通过produceMigrations参数传入,是一个集合,里面存的是DataMigration,是一个接口:

    public interface DataMigration {
        public suspend fun shouldMigrate(currentData: T): Boolean
    
        public suspend fun migrate(currentData: T): T
    
        public suspend fun cleanUp()
    }
    

    现在看一下DataMigrationInitializer.getInitializer方法返回的:

    fun  getInitializer(migrations: List>):
        suspend (api: InitializerApi) -> Unit = { api ->
            runMigrations(migrations, api)
        }
    

    是一个suspend函数,所以前面的initTasks的forEach中的it就是这个函数体,可见这里调用了runMigrations方法:

    private suspend fun  runMigrations(
        migrations: List>,
        api: InitializerApi
    ) {
        val cleanUps = mutableListOf Unit>()
    
        api.updateData { startingData ->
            migrations.fold(startingData) { data, migration ->
                if (migration.shouldMigrate(data)) {
                    cleanUps.add { migration.cleanUp() }
                    migration.migrate(data)
                } else {
                    data
                }
            }
        }
    
        var cleanUpFailure: Throwable? = null
    
        cleanUps.forEach { cleanUp ->
            try {
                cleanUp()
            } catch (exception: Throwable) {
                if (cleanUpFailure == null) {
                    cleanUpFailure = exception
                } else {
                    cleanUpFailure!!.addSuppressed(exception)
                }
            }
        }
    
        // If we encountered a failure on cleanup, throw it.
        cleanUpFailure?.let { throw it }
    }
    

    这里的api就是前面构造的InitializerApi,调用它的updateData方法,参数就是这里的startingData部分,赋值给transform,然后传入前面读取的initData并调用它,那现在我们再来读上面的代码就好理解了,startingData就是initData,也就是读取到的Demo2,migrations.fold方法会循环migrations中的所有的DataMigration,依次执行后面的代码块,每一个DataMigration都会判断shouldMigrate,为true则进行migrate迁移操作并返回处理后的数据,为false则返回原数据,fold操作使得后面的DataMigration会在前者处理后的基础上继续操作。迁移之后统一调用所有DataMigration的cleanUp方法,如果有需要可以重写这个方法执行一些逻辑。

    回到readAndInit方法,我们已经完成旧数据的按需迁移工作,最后给downstreamFlow.value赋值Data,里面含有我们读取的Demo2和它的hashCode。

    此时我们应该回到SingleProcessDataStore的data赋值的地方,也就是flow代码块处,继续往下走该执行emitAll方法了,这部分操作主要是去除重复数据和排错。现在我们只是拿到了一个包含数据的Flow对象,关于取数据我们后面再说。

    下面来看一下写入。

    写入是调用DataStore的updateData方法,看一下SingleProcessDataStore的实现:

    override suspend fun updateData(transform: suspend (t: T) -> T): T {
        /**
         * The states here are the same as the states for reads. Additionally we send an ack that
         * the actor *must* respond to (even if it is cancelled).
         */
        val ack = CompletableDeferred()
        val currentDownStreamFlowState = downstreamFlow.value
    
        val updateMsg =
            Message.Update(transform, ack, currentDownStreamFlowState, coroutineContext)
    
        actor.offer(updateMsg)
    
        return ack.await()
    }
    

    可以看到也是调用了actor的offer方法,只不过参数换成了Message.Update(transform, ack, currentDownStreamFlowState, coroutineContext),transform是前面设置数据的用户代码:

    { currentSettings ->
        currentSettings.toBuilder()
                .setAa(currentSettings.aa + 1)
                .setBb(currentSettings.bb + "New")
                .build()
    }
    

    根据前面的经验,这次会走handleUpdate()->transformAndWrite()->writeData():

    internal suspend fun writeData(newData: T) {
        file.createParentDirectories()
    
        val scratchFile = File(file.absolutePath + SCRATCH_SUFFIX)
        try {
            FileOutputStream(scratchFile).use { stream ->
                serializer.writeTo(newData, UncloseableOutputStream(stream))
                stream.fd.sync()
                // TODO(b/151635324): fsync the directory, otherwise a badly timed crash could
                //  result in reverting to a previous state.
            }
    
            if (!scratchFile.renameTo(file)) {
                throw IOException(
                    "Unable to rename $scratchFile." +
                        "This likely means that there are multiple instances of DataStore " +
                        "for this file. Ensure that you are only creating a single instance of " +
                        "datastore for this file."
                )
            }
        } catch (ex: IOException) {
            if (scratchFile.exists()) {
                scratchFile.delete() // Swallow failure to delete
            }
            throw ex
        }
    }
    

    和读取同理,也会调用serializer的writeTo方法:

    override suspend fun writeTo(t : Demo2, output : OutputStream) {
        t.writeTo(output)
    }
    

你可能感兴趣的:(DataStore之Proto)