gitlab 14.4.2升级至15.11.2报错修复

gitlab 14.4.2升级至15.11.2报错修复

错误提示PG::CheckViolation: ERROR: check constraint "check_70f294ef54" is violated by some row

升级前gitlab ce版本(GitLab Omnibus package)为 14.4.2

升级版本路径

通过官方工具 Upgrade Path tool ,提示升级路径为 14.9.5 => 14.10.5 => 15.0.5 => 15.4.6 => 15.5.9 => 15.11.2

提前下载好各个版本rpm包,然后开始升级,每次 rpm -Uvh 后,都执行了一次 gitlab-ctl restart

rpm -Uvh gitlab-ce-14.9.5-ce.0.el7.x86_64.rpm
rpm -Uvh gitlab-ce-14.10.5-ce.0.el7.x86_64.rpm
rpm -Uvh gitlab-ce-15.0.5-ce.0.el7.x86_64.rpm
rpm -Uvh gitlab-ce-15.4.6-ce.0.el7.x86_64.rpm
rpm -Uvh gitlab-ce-15.5.9-ce.0.el7.x86_64.rpm
rpm -Uvh gitlab-ce-15.11.2-ce.0.el7.x86_64.rpm

GitLab在升级前会自动进行数据库备份,如果想跳过备份,可以创建一个空文件标记(升级结束后需手动删除)。
touch /etc/gitlab/skip-auto-backup

全部rpm包都更新后,重启gitlab后,看到版本已经升级为 15.11.2, 但出现一些问题。

可能因为数据库升级(表结构、字段变更)还在进行中,未结束,就进行了下一个rpm包版本升级。

升级后出现问题

  1. 重载配置 gitlab-ctl reconfigure 执行出错。
  2. gitlab web管理界面有部分设置,保存设置出现 500 错误提示,即无法保存设置。

gitlab-ctl reconfigure 执行错误信息(部分):

STDOUT: rake aborted!
StandardError: An error has occurred, all later migrations canceled:

PG::CheckViolation: ERROR:  check constraint "check_70f294ef54" is violated by some row
……
Caused by:
ActiveRecord::StatementInvalid: PG::CheckViolation: ERROR:  check constraint "check_70f294ef54" is violated by some row
……

Caused by:
PG::CheckViolation: ERROR:  check constraint "check_70f294ef54" is violated by some row
……
Tasks: TOP => db:migrate
(See full trace by running task with --trace)
Running db:migrate rake task
main: == [advisory_lock_connection] object_id: 224680, pg_backend_pid: 37008
main: == 20230223014251 ValidateNotNullConstraintOnOauthAccessTokensExpiresIn: migrating 
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0004s
main: -- execute("ALTER TABLE oauth_access_tokens VALIDATE CONSTRAINT check_70f294ef54;")
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
STDERR: 
---- End output of "bash"  ----
Ran "bash"  returned 1

上面的意思是说在执行数据库 db:migrate 迁移过程,即执行SQL:ALTER TABLE oauth_access_tokens VALIDATE CONSTRAINT check_70f294ef54 出现错误。

查看数据库迁移状态:

命令: gitlab-rake db:migrate:status

   up     20230222102421  Remove fk to ci build ci running build on build
   up     20230222153048  Add registry size estimated to namespace root storage statistics
   up     20230222161226  Add custom jira regex to jira tracker data
   ……
  down    20230227123950  Remove fk to ci builds ci sources pipelines on source job
  down    20230227151608  Validate fk on ci build trace metadata partition id and build
  down    20230227151609  Remove fk to ci builds ci build trace metadata on build

显示有 down,则表明升级后部分数据库结构还没升级成功,需要全部为UP才表示迁移完成。

查看数据迁移队列

方法1:打开 gitlab web 页面:管理中心 -> Monitoring -> 后台迁移(background migrations)
方法2:通过命令查看,如:

# sudo gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining'
0

显示还有27个队列
# sudo gitlab-rails runner -e production 'puts Gitlab::Database::BackgroundMigration::BatchedMigration.queued.count'
27

# sudo gitlab-rails runner -e production 'puts Gitlab::Database::BackgroundMigration::BatchedMigration.with_status(:failed).count'
0
获取 GitLab 安装的状态

执行命令: gitlab-rake gitlab:check SANITIZE=true

反馈提示数据库结构迁移到gitlab新版本为 no 状态,即未完成。

# gitlab-rake gitlab:check

Checking LDAP ... Finished

Checking GitLab App ...

Database config exists? ... yes
All migrations up? ... no
  Try fixing it:
  sudo -u git -H bundle exec rake db:migrate RAILS_ENV=production
  Please fix the error above and rerun the checks.
Database contains orphaned GroupMembers? ... no

这里命令是针对(installed GitLab from source) 方式的安装版
sudo -u git -H bundle exec rake db:migrate RAILS_ENV=production
等效于 GitLab Omnibus package 安装版命令
gitlab-rake db:migrate

手动执行数据库迁移:

命令: gitlab-rake db:migrate

报错,显示信息同 gitlab-ctl reconfigure 命令,都是指向 PG::CheckViolation: ERROR: check constraint "check_70f294ef54" is violated by some row

完整提示:

# gitlab-rake db:migrate
main: == [advisory_lock_connection] object_id: 223220, pg_backend_pid: 56272
main: == 20230223014251 ValidateNotNullConstraintOnOauthAccessTokensExpiresIn: migrating 
main: -- execute("SET statement_timeout TO 0")
main:    -> 0.0005s
main: -- execute("ALTER TABLE oauth_access_tokens VALIDATE CONSTRAINT check_70f294ef54;")
main: -- execute("RESET statement_timeout")
main:    -> 0.0003s
rake aborted!
StandardError: An error has occurred, all later migrations canceled:

PG::CheckViolation: ERROR:  check constraint "check_70f294ef54" is violated by some row
……
/opt/gitlab/embedded/bin/bundle:23:in `load'
/opt/gitlab/embedded/bin/bundle:23:in `
' Tasks: TOP => db:migrate (See full trace by running task with --trace)
解决方法:

参考这里 https://gitlab.com/gitlab-org/gitlab/-/issues/406885

Chris Nightingale @cnightingale · 5 days ago Developer

@sudee4255 were you able to fix this?

We just ran into this problem for another customer and came across your issue while researching. We had good results when we ran an update query on the database and updated all that were NULL to an integer value so that this constraint can be successfully applied by the migration.

e.g: UPDATE oauth_access_tokens SET expires_in = '7200' WHERE expires_in IS NULL;

After this, you should be able to re-run the migrations with:
sudo gitlab-rake db:migrate

Care to give it a try?

登入数据库控制台,执行SQL

# gitlab-rails dbconsole --database main

gitlabhq_production=> UPDATE oauth_access_tokens SET expires_in = '7200' WHERE expires_in IS NULL;

然后执行数据库迁移
gitlab-rake db:migrate

执行完成后已经没有提示报错了,然后重载配置
gitlab-ctl reconfigure

最后成功修复问题。

总结

升级每一个版本时,最好检查数据库是否有表结构变更动作,后台升级任务未执行完成等。
如果有,就等待它们完成后再进行下一个版本升级。

相关命令

gitlab-rake db:migrate:status
gitlab-rake gitlab:check SANITIZE=true
gitlab-rails runner -e production 'puts Gitlab::BackgroundMigration.remaining

你可能感兴趣的:(gitlab)