关于Yii2跨数据库批量操作数据使用嵌套事务碰到的问题

某次开发业务需求中,项目数据支撑由多个数据库(Postgresql)组成,业务要求从DB-1获取Table A-User,将数据计算后批量插入DB-1的Table B-UserInfo以及DB-2的Table C Customer中,之后删除Table A-User的数据。

DB-1连接配置如下,db.php

return [
    'class' => 'yii\db\Connection',
    'dsn' => 'pgsql:host=127.0.0.1;dbname=mydb1',
    'username' => 'postgres',
    'password' => '123456',
    'charset' => 'utf8',

    // Schema cache options (for production environment)
    'enableSchemaCache' => true,
];

DB-2连接配置如下,params.php

return [
    'backend_db' => [
        'dsn' => 'pgsql:host=127.0.0.1;dbname=mydb2',
        'username' => 'postgres',
        'password' => '123456',
        'charset' => 'utf8',
        'enableSchemaCache' => true,
    ]
];

大概实现流程如下:


$datas = User::find()->where(['type' => $type])->with('user_type')->all();
if ($datas){

    ...

    $transaction = User::getDb()->beginTransaction();

    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);

    try {

        $transaction2 = Customer::getDb()->beginTransaction();

        $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);

        try {

          foreach ($datas as $key => $user) {

                ...

                $data = $data2 = $user->getAttributes();

                ...

                $model = new UserInfo();

                $model2 = new Customer();

                $model->load($data, "");

                $model2->load($data2, "");

                if($model->save() && $model2->save() && $user->delete()){

                    $transaction2->commit();

                    $transaction->commit();

                }

            }

        } catch(\Throwable $e) {

            $transaction2->rollBack();

            throw $e;

        }

    } catch(\Throwable $e) {

        $transaction->rollBack();

        throw $e;

    }

}

按照上面的流程走,数据量少时没问题,当数据量多的时候,执行时间非常长。

然后就开始找原因了:

刚开始怀疑是事务导致,去掉事务后,执行效率并没有提升多少,Pass。

在每个重要执行地方加入Yii::warning打印, 查看日志时间,发现每一次model保存时花费将近1秒,那么问题就在实体保存上了。

在Yii框架里开启debug模块,查看DataBase,发现有很多Query语句,类似:


SELECT d.nspname AS table_schema,

    c.relname AS table_name,

    a.attname AS column_name,

    COALESCE(td.typname, tb.typname, t.typname) AS data_type,

    COALESCE(td.typtype, tb.typtype, t.typtype) AS type_type,

    a.attlen AS character_maximum_length,

    pg_catalog.col_description(c.oid, a.attnum) AS column_comment,

    a.atttypmod AS modifier,

    a.attnotnull = false AS is_nullable,

    CAST(pg_get_expr(ad.adbin, ad.adrelid) AS varchar) AS column_default,

    coalesce(pg_get_expr(ad.adbin, ad.adrelid) ~ 'nextval',false) AS is_autoinc,

    CASE WHEN COALESCE(td.typtype, tb.typtype, t.typtype) = 'e'::char        THEN array_to_string((SELECT array_agg(enumlabel) FROM pg_enum WHERE enumtypid = COALESCE(td.oid, tb.oid, a.atttypid))::varchar[], ',')

        ELSE NULL    END AS enum_values,

    CASE atttypid

        WHEN 21 /*int2*/ THEN 16        WHEN 23 /*int4*/ THEN 32        WHEN 20 /*int8*/ THEN 64        WHEN 1700 /*numeric*/ THEN              CASE WHEN atttypmod = -1              THEN null              ELSE ((atttypmod - 4) >> 16) & 65535              END        WHEN 700 /*float4*/ THEN 24 /*FLT_MANT_DIG*/        WHEN 701 /*float8*/ THEN 53 /*DBL_MANT_DIG*/        ELSE null      END  AS numeric_precision,

      CASE        WHEN atttypid IN (21, 23, 20) THEN 0        WHEN atttypid IN (1700) THEN        CASE            WHEN atttypmod = -1 THEN null            ELSE (atttypmod - 4) & 65535        END          ELSE null      END AS numeric_scale,

    CAST(

            information_schema._pg_char_max_length(information_schema._pg_truetypid(a, t), information_schema._pg_truetypmod(a, t))

            AS numeric    ) AS size,

    a.attnum = any (ct.conkey) as is_pkey,

    COALESCE(NULLIF(a.attndims, 0), NULLIF(t.typndims, 0), (t.typcategory='A')::int) AS dimensionFROM    pg_class c    LEFT JOIN pg_attribute a ON a.attrelid = c.oid    LEFT JOIN pg_attrdef ad ON a.attrelid = ad.adrelid AND a.attnum = ad.adnum

    LEFT JOIN pg_type t ON a.atttypid = t.oid    LEFT JOIN pg_type tb ON (a.attndims > 0 OR t.typcategory='A') AND t.typelem > 0 AND t.typelem = tb.oid OR t.typbasetype > 0 AND t.typbasetype = tb.oid    LEFT JOIN pg_type td ON t.typndims > 0 AND t.typbasetype > 0 AND tb.typelem = td.oid    LEFT JOIN pg_namespace d ON d.oid = c.relnamespace

    LEFT JOIN pg_constraint ct ON ct.conrelid = c.oid AND ct.contype = 'p'WHERE    a.attnum > 0 AND t.typname != ''    AND c.relname = 'current_user'    AND d.nspname = 'public'ORDER BY    a.attnum;

以及


select ct.conname as constraint_name,

    a.attname as column_name,

    fc.relname as foreign_table_name,

    fns.nspname as foreign_table_schema,

    fa.attname as foreign_column_namefrom    (SELECT ct.conname, ct.conrelid, ct.confrelid, ct.conkey, ct.contype, ct.confkey, generate_subscripts(ct.conkey, 1) AS s

      FROM pg_constraint ct

    ) AS ct

    inner join pg_class c on c.oid=ct.conrelid

    inner join pg_namespace ns on c.relnamespace=ns.oid    inner join pg_attribute a on a.attrelid=ct.conrelid and a.attnum = ct.conkey[ct.s]

    left join pg_class fc on fc.oid=ct.confrelid

    left join pg_namespace fns on fc.relnamespace=fns.oid    left join pg_attribute fa on fa.attrelid=ct.confrelid and fa.attnum = ct.confkey[ct.s]where    ct.contype='f'    and c.relname='current_user'    and ns.nspname='public'order by    fns.nspname, fc.relname, a.attnum

初步判断以为是缓存原因,就将enableSchemaCache设为false了,查看Debug依然存在以上Query。

将Query的部分语句复制,对整个项目进行搜索大法,检查这些语句是在yii2/db/pgsql/Schema.php中。怀疑是Model在插入数据、验证时需要查询表结构、约束之类的,首先把保存的验证去掉model->save(false),再次执行,效率提升了一丢丢,查看Debug,Query依然有几百条...

索性不用Model了,换成纯SqlCommand模式:
Customer::getDb()->createCommand()->Insert('customer', $data)->execute()
再次执行,效率依然只是提升了一丢丢,查看Debug,Query依然有几百条...

为什么不用批量插入呢?因为Table-B插入时,需要用到Table-A插入后的id作为外键,刚开始没有想到好的办法来解决,之后想到了才用批量插入:

Customer::getDb()->createCommand()->batchInsert('customer', $columns, $data)->execute()

到这里,50条数据执行在2秒左右就完成了。

那1000条数据呢?
果断测试了一下,结果很桑心,需要10多秒...不符合客户要求。

现在是用批量插入,Query比起之前已经很少了,仔细查看Debug,看着Query语句忽然想到,由于是用了两个DB连接,Yii2可能在执行数据时,只记录了第一条连接的Table结构缓存,当切换数据库进行插入操作后,又需要重新获取当前操作的表结构,之后把结构调整了一下,类似如下:

$datas = (new \yii\db\Query())->select('*')->from('user')->where('...')->all();

if ($datas){
    ...
    $transaction = User::getDb()->beginTransaction();
    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
    try {
        foreach ($datas as $key => $user) {
              ...
              $data处理;
              ...
        }
        if(User::getDb()->->createCommand()->batchInsert()->execute() && User::getDb()->createCommand()->delete()){
              $transaction2 = Customer::getDb()->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    Customer::getDb()->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }
        }
    } catch(\Throwable $e) {
        $transaction->rollBack();
        throw $e;
    }
}

改成先处理完DB1的业务,之后再处理DB2的。
现在没有看到多余的Query语句了,只有在插入时查询一次,两次插入总共两次Query,Debug统计1200条记录的DB操作是200多毫秒。

到这里以为完事了,在一次程序测试时,发现某个地方出现错误$transaction2没有回滚。以为是Yii2不支持嵌套事务(其实并不完全是嵌套,只是跨数据库同时使用事务而已),度娘找了下答案,没有想要的结果。
继续检查Debug,发现$transaction2事务在开始之后又执行了一次DB open的操作。由此想到,事务2的连接在开始之后被建立一次新的连接替换了。继续检查代码。最终发现:

              $transaction2 = Customer::getDb()->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    //这里又getDb()一次
                    Customer::getDb()->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }

Customer的getDb():

    public static function getDb()
    {
        return new \yii\db\Connection(Yii::$app->params['backend_db']);
    }

这样写肯定是两条连接了,每次调用getDb()方法,都会new一条新的数据库连接,导致事务连接被替换而无效。为什么前面的User::getDb()不会导致事务1的失败呢?原因是事务1的数据库连接是db.php中的,是Yii默认连接,Yii在每次执行DB操作时都是使用一条DB连接操作,所以没有出现以上问题。

最终修改版:

$datas = (new \yii\db\Query())->select('*')->from('user')->where('...')->all();

if ($datas){
    ...
    $transaction = User::getDb()->beginTransaction();
    $transaction->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
    try {
        foreach ($datas as $key => $user) {
              ...
              $data处理;
              ...
        }
        if(User::getDb()->->createCommand()->batchInsert()->execute() && User::getDb()->createCommand()->delete()){
              //这里必须用同一连接,否则事务无效
              $backend_conn =  new \yii\db\Connection(Yii::$app->params['backend_db']);
              $transaction2 = $backend_conn->beginTransaction();
              $transaction2->setIsolationLevel(\yii\db\Transaction::SERIALIZABLE);
              try {
                    $backend_conn->->createCommand()->batchInsert()->execute()
                    $transaction2->commit();
                    $transaction->commit();
              } catch(\Throwable $e) {
                    $transaction2->rollBack();
                    throw $e;
              }
        }
    } catch(\Throwable $e) {
        $transaction->rollBack();
        throw $e;
    }
}

写的比较啰嗦,记录的都是一步一个坑的过程,希望能给其他碰到此类问题的人一个提示。

总结:

  1. 日常开发碰到问题,一定要先考虑清楚代码逻辑的实现流程是否有问题。
  2. 调试时断句输出变量数据,能很好的辅助检查错误。
  3. Yii2的Debug模块真的很强大,日志能看出很多问题。
  4. 批量数据插入操作用batchInsert,建议不要用Model方式操作。
  5. 多数据库事务执行,最好不要同时操作两个数据库,先处理完数据库1的再处理数据库2的业务,这样不会导致Yii随时需要切换DB连接和查询表结构。
  6. 非默认数据库连接(db.php)检查是否共用同一数据库连接。

你可能感兴趣的:(关于Yii2跨数据库批量操作数据使用嵌套事务碰到的问题)