诡异的sqlite3之malformed错误(一)

诡异的sqlite3之malformed错误(一)

现象

  • 现场设备生成并插入大规模的数据,设备异常将数据库拉出来检查时,报告malformed错误
  • sqlite3 版本 3.8.6
  • 数据库文件大小 210 MB

定位问题

  • 数据大小

    select count(*) from DataSheet;
    /*结果*/
    950131
    
  • 主键

    primary key(ctype,id,DataTime)
    
  • 复现 (SQLite Expert)

    select * from DataSheet where id='000537719140' order by DataTime;  /*malformed*/
    select * from DataSheet where ctype=5002 and id='000537719140' order by DataTime; /*OK*/
    

    如上所述,这两条SQL语句差异只有where子句中是否含有oad条件,为什么会如此呢,先执行如下SQL比较一二:

    select * from DataSheet where id='000537719140';  /*OK*/
    select * from DataSheet order by DataTime;     /*malformed, 全表扫描*/
    
    • where子句中有ctype,id,DataTime时命中索引(主键),此时,order by DataTime对符合条件的索引排序,因此,问题不复现
    • where子句不完全匹配有ctype,id,DataTime时,未命中索引(主键),order by DataTime对符合条件的结果排序,实际上执行全表扫描,问题复现
  • 测试代码: 定位行

    char * malformed = "select rowid, ctype, id, StartTime, DataTime from DataSheet;"; /*malformed*/
    char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;"; /*OK*/
    char * sql = okayQuery;
    sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL);
    ncols = sqlite3_column_count(stmt);
    int rowno = 0;
    do {
        rc = sqlite3_step(stmt);
        switch (rc) {
        case SQLITE_ROW:	break;
        case SQLITE_DONE:	rowno = -100;	break;
        default:			fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc);  break;
        }
        if (rowno >= 0) { rowno++; }
    }
    while (rowno >= 0);
    sqlite3_finalize(stmt);
    
    • sql = okayQuery,共查询950131(rowno:950130)条记录,然后SQLITE_DONE结束
    • sql = malformed,取第950300条记录时报告错误11 (SQLITE_CORRUPT)
      • 查询正常时,只能遍历950131(rowno:950130)条记录
      • 查询异常时,居然遍历到950300(rowno:950301)条记录时出错!
  • 测试代码: 定位时间

    char * malformed = "select rowid, ctype, id, DataTime, StartTime from DataSheet;";
    char * okayQuery = "select rowid, ctype, id, DataTime from DataSheet;";
    char * sql = malformed;
    sqlite3_prepare_v2(db, sql, strlen(sql), &stmt, NULL);
    ncols = sqlite3_column_count(stmt);
    int rowno = 0;
    do {
      rc = sqlite3_step(stmt);
      switch (rc) {
      case SQLITE_ROW:
        if (rowno >= 950129 && rowno <= 950301/* 950300 */) {
          rowid	 = sqlite3_column_int(stmt, 0);
          ctype  = (char*)sqlite3_column_text(stmt, 1);
          id = (char*)sqlite3_column_text(stmt, 2);
          colltime = (char *)sqlite3_column_text(stmt, 3);
          fprintf(stderr, "rowno: %d, rowid: %i, ctype: %s, id:%s, time:%s\n", rowno, rowid, (ctype?ctype:snull),(id?id:snull), (colltime?colltime:snull));
        }
        break;
      case SQLITE_DONE:	rowno = -100;	break;
      default:			fprintf(stderr, "rowno = %d, sqlite3 error %d\n", rowno, rc);	break;
      }
      if (rowno >= 0) { rowno++; }
    }
    while (rowno >= 0);
    sqlite3_finalize(stmt);
    

    输出:

    rowno: 950129, rowid: 4478623, ctype: 5002, id:001548532328, time:20191220105300
    rowno: 950130, rowid: 4478624, ctype: 5002, id:001548532328, time:20191220105400
    ...
    rowno: 950297, rowid: 4478791, ctype: 5002, id:001548532342, time:20191220105800
    rowno: 950298, rowid: 4478792, ctype: 5002, id:001548532342, time:20191220105900
    rowno: 950299, rowid: 4478793, ctype: 5002, id:001548532342, time:20191220110100
    rowno = 950300, sqlite3 error 11
    rowno: 950301, rowid: 3526517, ctype: 5002, id:005410401490, time:20191214202200
    

日志

  • 20191220.log
    #12-20 11:21:30.098: 写入[38] 20191220-111700  ...
    #12-20 11:21:30.433: 写入[39] 20191220-111800  ...
    #12-20 11:21:30.434: 写入[40] 20191220-111900  ...
    

检查下一天的数据库

  • 2019/12/21的数据库
  • 数据库文件大小: 210 MB
    • 数据库大小相同
  • Check菜单
    • OK

分析

  • 2019/12/20的数据库有malformed问题,而2019/12/21的数据库不存在该问题,可能:
    • 可能原因一:数据库已自动恢复
      • 数据周期未抵达:非通过清理自动恢复
      • 设备中无修复功能
    • 可能原因二:数据库拷贝问题
      • 最后的日志时标: 12-20 11:24:21.057# ...
      • 如果先ftp拉取数据库,然后,保存shell日志,则拉取数据库时,有可能数据库正在执行一个事务,从而导致该问题

结论

  • ftp拉取数据库时,数据库正在执行写数据事务,从而导致malformed问题

你可能感兴趣的:(Linux,C&C++,数据库)