关于Yii2跨数据库批量操作数据使用嵌套事务碰到的问题

某次开发业务需求中，项目数据支撑由多个数据库（Postgresql）组成，业务要求从DB-1获取Table A-User，将数据计算后批量插入DB-1的Table B-UserInfo以及DB-2的Table C Customer中，之后删除Table A-User的数据。

DB-1连接配置如下，db.php

return [
    'class' => 'yii\db\Connection',
    'dsn' => 'pgsql:host=127.0.0.1;dbname=mydb1',
    'username' => 'postgres',
    'password' => '123456',
    'charset' => 'utf8',

    // Schema cache options (for production environment)
    'enableSchemaCache' => true,
];

DB-2连接配置如下，params.php

return [
    'backend_db' => [
        'dsn' => 'pgsql:host=127.0.0.1;dbname=mydb2',
        'username' => 'postgres',
        'password' => '123456',
        'charset' => 'utf8',
        'enableSchemaCache' => true,
    ]
];

大概实现流程如下：


$datas = User::find()->where(['type' => $type])->with('user_type')->all();
if ($datas){

    ...

    $transaction = User::getDb()->beginTransaction();

    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);

    try {

        $transaction2 = Customer::getDb()->beginTransaction();

        $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);

        try {

          foreach ($datas as $key => $user) {

                ...

                $data = $data2 = $user->getAttributes();

                ...

                $model = new UserInfo();

                $model2 = new Customer();

                $model->load($data, "");

                $model2->load($data2, "");

                if($model->save() && $model2->save() && $user->delete()){

                    $transaction2->commit();

                    $transaction->commit();

                }

            }

        } catch(\Throwable $e) {

            $transaction2->rollBack();

            throw $e;

        }

    } catch(\Throwable $e) {

        $transaction->rollBack();

        throw $e;

    }

}

按照上面的流程走，数据量少时没问题，当数据量多的时候，执行时间非常长。

然后就开始找原因了：

刚开始怀疑是事务导致，去掉事务后，执行效率并没有提升多少，Pass。

在每个重要执行地方加入Yii::warning打印，查看日志时间，发现每一次model保存时花费将近1秒，那么问题就在实体保存上了。

在Yii框架里开启debug模块，查看DataBase，发现有很多Query语句，类似：


SELECT d.nspname AS table_schema,

    c.relname AS table_name,

    a.attname AS column_name,

    COALESCE(td.typname, tb.typname, t.typname) AS data_type,

    COALESCE(td.typtype, tb.typtype, t.typtype) AS type_type,

    a.attlen AS character_maximum_length,

    pg_catalog.col_description(c.oid, a.attnum) AS column_comment,

    a.atttypmod AS modifier,

    a.attnotnull = false AS is_nullable,

    CAST(pg_get_expr(ad.adbin, ad.adrelid) AS varchar) AS column_default,

    coalesce(pg_get_expr(ad.adbin, ad.adrelid) ~ 'nextval',false) AS is_autoinc,

    CASE WHEN COALESCE(td.typtype, tb.typtype, t.typtype) = 'e'::char        THEN array_to_string((SELECT array_agg(enumlabel) FROM pg_enum WHERE enumtypid = COALESCE(td.oid, tb.oid, a.atttypid))::varchar[], ',')

        ELSE NULL    END AS enum_values,

    CASE atttypid

        WHEN 21 /*int2*/ THEN 16        WHEN 23 /*int4*/ THEN 32        WHEN 20 /*int8*/ THEN 64        WHEN 1700 /*numeric*/ THEN              CASE WHEN atttypmod = -1              THEN null              ELSE ((atttypmod - 4) >> 16) & 65535              END        WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/        WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/        ELSE null      END  AS numeric_precision,

      CASE        WHEN atttypid IN (21, 23, 20) THEN 0        WHEN atttypid IN (1700) THEN        CASE            WHEN atttypmod = -1 THEN null            ELSE (atttypmod - 4) & 65535        END          ELSE null      END AS numeric_scale,

    CAST(

            information_schema._pg_char_max_length(information_schema._pg_truetypid(a, t), information_schema._pg_truetypmod(a, t))

            AS numeric    ) AS size,

    a.attnum = any (ct.conkey) as is_pkey,

    COALESCE(NULLIF(a.attndims, 0), NULLIF(t.typndims, 0), (t.typcategory='A')::int) AS dimensionFROM    pg_class c    LEFT JOIN pg_attribute a ON a.attrelid = c.oid    LEFT JOIN pg_attrdef ad ON a.attrelid = ad.adrelid AND a.attnum = ad.adnum

    LEFT JOIN pg_type t ON a.atttypid = t.oid    LEFT JOIN pg_type tb ON (a.attndims > 0 OR t.typcategory='A') AND t.typelem > 0 AND t.typelem = tb.oid OR t.typbasetype > 0 AND t.typbasetype = tb.oid    LEFT JOIN pg_type td ON t.typndims > 0 AND t.typbasetype > 0 AND tb.typelem = td.oid    LEFT JOIN pg_namespace d ON d.oid = c.relnamespace

    LEFT JOIN pg_constraint ct ON ct.conrelid = c.oid AND ct.contype = 'p'WHERE    a.attnum > 0 AND t.typname != ''    AND c.relname = 'current_user'    AND d.nspname = 'public'ORDER BY    a.attnum;

以及


select ct.conname as constraint_name,

    a.attname as column_name,

    fc.relname as foreign_table_name,

    fns.nspname as foreign_table_schema,

    fa.attname as foreign_column_namefrom    (SELECT ct.conname, ct.conrelid, ct.confrelid, ct.conkey, ct.contype, ct.confkey, generate_subscripts(ct.conkey, 1) AS s

      FROM pg_constraint ct

    ) AS ct

    inner join pg_class c on c.oid=ct.conrelid

    inner join pg_namespace ns on c.relnamespace=ns.oid    inner join pg_attribute a on a.attrelid=ct.conrelid and a.attnum = ct.conkey[ct.s]

    left join pg_class fc on fc.oid=ct.confrelid

    left join pg_namespace fns on fc.relnamespace=fns.oid    left join pg_attribute fa on fa.attrelid=ct.confrelid and fa.attnum = ct.confkey[ct.s]where    ct.contype='f'    and c.relname='current_user'    and ns.nspname='public'order by    fns.nspname, fc.relname, a.attnum

初步判断以为是缓存原因，就将enableSchemaCache设为false了，查看Debug依然存在以上Query。

将Query的部分语句复制，对整个项目进行搜索大法，检查这些语句是在yii2/db/pgsql/Schema.php中。怀疑是Model在插入数据、验证时需要查询表结构、约束之类的，首先把保存的验证去掉model->save(false)，再次执行，效率提升了一丢丢，查看Debug，Query依然有几百条...

索性不用Model了，换成纯SqlCommand模式：
Customer::getDb()->createCommand()->Insert('customer', $data)->execute()
再次执行，效率依然只是提升了一丢丢，查看Debug，Query依然有几百条...

为什么不用批量插入呢？因为Table-B插入时，需要用到Table-A插入后的id作为外键，刚开始没有想到好的办法来解决，之后想到了才用批量插入：

Customer::getDb()->createCommand()->batchInsert('customer', $columns, $data)->execute()

到这里，50条数据执行在2秒左右就完成了。

那1000条数据呢？
果断测试了一下，结果很桑心，需要10多秒...不符合客户要求。

现在是用批量插入，Query比起之前已经很少了，仔细查看Debug，看着Query语句忽然想到，由于是用了两个DB连接，Yii2可能在执行数据时，只记录了第一条连接的Table结构缓存，当切换数据库进行插入操作后，又需要重新获取当前操作的表结构，之后把结构调整了一下，类似如下：

$datas = (new \yii\db\Query())->select('*')->from('user')->where('...')->all();

if ($datas){
    ...
    $transaction = User::getDb()->beginTransaction();
    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
    try {
        foreach ($datas as $key => $user) {
              ...
              $data处理;
              ...
        }
        if(User::getDb()->->createCommand()->batchInsert()->execute() && User::getDb()->createCommand()->delete()){
              $transaction2 = Customer::getDb()->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    Customer::getDb()->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }
        }
    } catch(\Throwable $e) {
        $transaction->rollBack();
        throw $e;
    }
}

改成先处理完DB1的业务，之后再处理DB2的。
现在没有看到多余的Query语句了，只有在插入时查询一次，两次插入总共两次Query，Debug统计1200条记录的DB操作是200多毫秒。

到这里以为完事了，在一次程序测试时，发现某个地方出现错误$transaction2没有回滚。以为是Yii2不支持嵌套事务（其实并不完全是嵌套，只是跨数据库同时使用事务而已），度娘找了下答案，没有想要的结果。
继续检查Debug，发现$transaction2事务在开始之后又执行了一次DB open的操作。由此想到，事务2的连接在开始之后被建立一次新的连接替换了。继续检查代码。最终发现：

              $transaction2 = Customer::getDb()->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    //这里又getDb()一次
                    Customer::getDb()->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }

Customer的getDb()：

    public static function getDb()
    {
        return new \yii\db\Connection(Yii::$app->params['backend_db']);
    }

这样写肯定是两条连接了，每次调用getDb()方法，都会new一条新的数据库连接，导致事务连接被替换而无效。为什么前面的User::getDb()不会导致事务1的失败呢？原因是事务1的数据库连接是db.php中的，是Yii默认连接，Yii在每次执行DB操作时都是使用一条DB连接操作，所以没有出现以上问题。

最终修改版：

$datas = (new \yii\db\Query())->select('*')->from('user')->where('...')->all();

if ($datas){
    ...
    $transaction = User::getDb()->beginTransaction();
    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
    try {
        foreach ($datas as $key => $user) {
              ...
              $data处理;
              ...
        }
        if(User::getDb()->->createCommand()->batchInsert()->execute() && User::getDb()->createCommand()->delete()){
              //这里必须用同一连接，否则事务无效
              $backend_conn =  new \yii\db\Connection(Yii::$app->params['backend_db']);
              $transaction2 = $backend_conn->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    $backend_conn->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }
        }
    } catch(\Throwable $e) {
        $transaction->rollBack();
        throw $e;
    }
}

写的比较啰嗦，记录的都是一步一个坑的过程，希望能给其他碰到此类问题的人一个提示。

总结：

日常开发碰到问题，一定要先考虑清楚代码逻辑的实现流程是否有问题。
调试时断句输出变量数据，能很好的辅助检查错误。
Yii2的Debug模块真的很强大，日志能看出很多问题。
批量数据插入操作用batchInsert，建议不要用Model方式操作。
多数据库事务执行，最好不要同时操作两个数据库，先处理完数据库1的再处理数据库2的业务，这样不会导致Yii随时需要切换DB连接和查询表结构。
非默认数据库连接（db.php）检查是否共用同一数据库连接。

关于Yii2跨数据库批量操作数据使用嵌套事务碰到的问题

你可能感兴趣的:(关于Yii2跨数据库批量操作数据使用嵌套事务碰到的问题)