案例环境:
数据库版本: Microsoft SQL Server 2005 (Microsoft SQL Server 2005 - 9.00.5000.00 (X64) )
案例介绍:
对一个数据库实例做清理工作时,发现有一个很久之前禁用的数据库维护作业,于是遂删除该作业,但是删除该作业时,遇到如下错误:
脚本删除操作:
USE [msdb]
GO
EXEC msdb.dbo.sp_delete_job @job_id=N'876ab683-6d81-47c4-bba2-0dfa58156110',
@delete_unused_schedule=1
GO
消息 547,级别 16,状态 0,过程 sp_delete_job,第 178 行
The DELETE statement conflicted with the REFERENCE constraint "FK_subplan_job_id". The conflict occurred in database "msdb",
table "dbo.sysmaintplan_subplans", column 'job_id'.The statement has been terminated.
图形界面操作:
案例分析:
从错误信息我们可以看出是删除某个系统表中记录时,由于外键约束关系,导致删除失败。最后导致存储过程msdb.dbo.sp_delete_job执行失败。我想彻底弄清楚删除失败的具体原因,于是可以从提示信息的系统表dbo.sysmaintplan_subplans开始,如下所示,
可以看到系统表dbo.sysmaintplan_subplans中的job_id字段引用了msdb.dbo.sysjobs中的job_id字段,那么可以肯定是在删除msdb.dbo.sysjobs表中对应记录时,没有先删除dbo.sysmaintplan_subplans中的记录。这样推测也跟报错信息吻合。
那么接下来我们研究一下msdb数据库的存储过程[dbo].[sp_delete_job] 如下所示:
USE [msdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
ALTER PROCEDURE [dbo].[sp_delete_job]
@job_id UNIQUEIDENTIFIER = NULL, -- If provided should NOT also provide job_name
@job_name sysname = NULL, -- If provided should NOT also provide job_id
@originating_server sysname = NULL, -- Reserved (used by SQLAgent)
@delete_history BIT = 1, -- Reserved (used by SQLAgent)
@delete_unused_schedule BIT = 1 -- For backward compatibility schedules are deleted by default if they are not
-- being used by another job. With the introduction of reusable schedules in V9
-- callers should set this to 0 so the schedule will be preserved for reuse.
AS
BEGIN
DECLARE @current_msx_server sysname
DECLARE @bMSX_job BIT
DECLARE @retval INT
DECLARE @local_machine_name sysname
DECLARE @category_id INT
DECLARE @job_owner_sid VARBINARY(85)
SET NOCOUNT ON
-- Remove any leading/trailing spaces from parameters
SELECT @originating_server = UPPER(LTRIM(RTRIM(@originating_server)))
-- Turn [nullable] empty string parameters into NULLs
IF (@originating_server = N'') SELECT @originating_server = NULL
-- Change server name to always reflect real servername or servername\instancename
IF (@originating_server IS NOT NULL AND @originating_server = '(LOCAL)')
SELECT @originating_server = UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName')))
IF ((@job_id IS NOT NULL) OR (@job_name IS NOT NULL))
BEGIN
EXECUTE @retval = sp_verify_job_identifiers '@job_name',
'@job_id',
@job_name OUTPUT,
@job_id OUTPUT,
@owner_sid = @job_owner_sid OUTPUT
IF (@retval <> 0)
RETURN(1) -- Failure
END
-- We need either a job name or a server name, not both
IF ((@job_name IS NULL) AND (@originating_server IS NULL)) OR
((@job_name IS NOT NULL) AND (@originating_server IS NOT NULL))
BEGIN
RAISERROR(14279, -1, -1)
RETURN(1) -- Failure
END
-- Get category to see if it is a misc. replication agent. @category_id will be
-- NULL if there is no @job_id.
select @category_id = category_id from msdb.dbo.sysjobs where job_id = @job_id
-- If job name was given, determine if the job is from an MSX
IF (@job_id IS NOT NULL)
BEGIN
SELECT @bMSX_job = CASE UPPER(originating_server)
WHEN UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName'))) THEN 0
ELSE 1
END
FROM msdb.dbo.sysjobs_view
WHERE (job_id = @job_id)
END
-- If server name was given, warn user if different from current MSX
IF (@originating_server IS NOT NULL)
BEGIN
EXECUTE @retval = master.dbo.xp_getnetname @local_machine_name OUTPUT
IF (@retval <> 0)
RETURN(1) -- Failure
IF ((@originating_server = UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName')))) OR (@originating_server = UPPER(@local_machine_name)))
SELECT @originating_server = UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName')))
EXECUTE master.dbo.xp_instance_regread N'HKEY_LOCAL_MACHINE',
N'SOFTWARE\Microsoft\MSSQLServer\SQLServerAgent',
N'MSXServerName',
@current_msx_server OUTPUT,
N'no_output'
SELECT @current_msx_server = UPPER(@current_msx_server)
-- If server name was given but it's not the current MSX, print a warning
SELECT @current_msx_server = LTRIM(RTRIM(@current_msx_server))
IF ((@current_msx_server IS NOT NULL) AND (@current_msx_server <> N'') AND (@originating_server <> @current_msx_server))
RAISERROR(14224, 0, 1, @current_msx_server)
END
-- Check authority (only SQLServerAgent can delete a non-local job)
IF (((@originating_server IS NOT NULL) AND (@originating_server <> UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName'))))) OR (@bMSX_job = 1)) AND
(PROGRAM_NAME() NOT LIKE N'SQLAgent%')
BEGIN
RAISERROR(14274, -1, -1)
RETURN(1) -- Failure
END
-- Check permissions beyond what's checked by the sysjobs_view
-- SQLAgentReader and SQLAgentOperator roles that can see all jobs
-- cannot delete jobs they do not own
IF (@job_id IS NOT NULL)
BEGIN
IF (@job_owner_sid <> SUSER_SID() -- does not own the job
AND (ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0) <> 1)) -- is not sysadmin
BEGIN
RAISERROR(14525, -1, -1);
RETURN(1) -- Failure
END
END
-- Do the delete (for a specific job)
IF (@job_id IS NOT NULL)
BEGIN
-- Note: This temp table is referenced by msdb.dbo.sp_delete_job_references
CREATE TABLE #temp_jobs_to_delete (job_id UNIQUEIDENTIFIER NOT NULL, job_is_cached INT NOT NULL)
DECLARE @temp_schedules_to_delete TABLE (schedule_id INT NOT NULL)
INSERT INTO #temp_jobs_to_delete
SELECT job_id, (SELECT COUNT(*)
FROM msdb.dbo.sysjobservers
WHERE (job_id = @job_id)
AND (server_id = 0))
FROM msdb.dbo.sysjobs_view
WHERE (job_id = @job_id)
-- Check if we have any work to do
IF (NOT EXISTS (SELECT *
FROM #temp_jobs_to_delete))
BEGIN
DROP TABLE #temp_jobs_to_delete
RETURN(0) -- Success
END
-- Post the delete to any target servers (need to do this BEFORE
-- deleting the job itself, but AFTER clearing all all pending
-- download instructions). Note that if the job is NOT a
-- multi-server job then sp_post_msx_operation will catch this and
-- will do nothing. Since it will do nothing that is why we need
-- to NOT delete any pending delete requests, because that delete
-- request might have been for the last target server and thus
-- this job isn't a multi-server job anymore so posting the global
-- delete would do nothing.
DELETE FROM msdb.dbo.sysdownloadlist
WHERE (object_id = @job_id)
and (operation_code != 3) -- Delete
EXECUTE msdb.dbo.sp_post_msx_operation 'DELETE', 'JOB', @job_id
-- Must do this before deleting the job itself since sp_sqlagent_notify does a lookup on sysjobs_view
-- Note: Don't notify agent in this call. It is done after the transaction is committed
-- just in case this job is in the process of deleting itself
EXECUTE msdb.dbo.sp_delete_job_references @notify_sqlagent = 0
-- Delete all traces of the job
BEGIN TRANSACTION
--Get the schedules to delete before deleting records from sysjobschedules
IF(@delete_unused_schedule = 1)
BEGIN
--Get the list of schedules to delete
INSERT INTO @temp_schedules_to_delete
SELECT DISTINCT schedule_id
FROM msdb.dbo.sysschedules
WHERE (schedule_id IN
(SELECT schedule_id
FROM msdb.dbo.sysjobschedules
WHERE (job_id = @job_id)))
END
DELETE FROM msdb.dbo.sysjobschedules
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
DELETE FROM msdb.dbo.sysjobservers
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
DELETE FROM msdb.dbo.sysjobsteps
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
DELETE FROM msdb.dbo.sysjobs
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
--Delete the schedule(s) if requested to and it isn't being used by other jobs
IF(@delete_unused_schedule = 1)
BEGIN
--Now OK to delete the schedule
DELETE FROM msdb.dbo.sysschedules
WHERE schedule_id IN
(SELECT schedule_id
FROM @temp_schedules_to_delete as sdel
WHERE NOT EXISTS(SELECT *
FROM msdb.dbo.sysjobschedules AS js
WHERE (js.schedule_id = sdel.schedule_id)))
END
-- Delete the job history if requested
IF (@delete_history = 1)
BEGIN
DELETE FROM msdb.dbo.sysjobhistory
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
END
-- All done
COMMIT TRANSACTION
-- Now notify agent to delete the job.
IF(EXISTS(SELECT * FROM #temp_jobs_to_delete WHERE job_is_cached > 0))
BEGIN
DECLARE @nt_user_name NVARCHAR(100)
SELECT @nt_user_name = ISNULL(NT_CLIENT(), ISNULL(SUSER_SNAME(), FORMATMESSAGE(14205)))
--Call the xp directly. sp_sqlagent_notify checks sysjobs_view and the record has already been deleted
EXEC master.dbo.xp_sqlagent_notify N'J', @job_id, 0, 0, N'D', @nt_user_name, 1, @@trancount, NULL, NULL
END
END
ELSE
-- Do the delete (for all jobs originating from the specific server)
IF (@originating_server IS NOT NULL)
BEGIN
EXECUTE msdb.dbo.sp_delete_all_msx_jobs @msx_server = @originating_server
-- NOTE: In this case there is no need to propagate the delete via sp_post_msx_operation
-- since this type of delete is only ever performed on a TSX.
END
IF (OBJECT_ID(N'tempdb.dbo.#temp_jobs_to_delete', 'U') IS NOT NULL)
DROP TABLE #temp_jobs_to_delete
RETURN(0) -- 0 means success
END
从上面SQL脚本中可以看到在删除msdb.dbo.sysjobsteps之前,该存储过程执行了msdb.dbo.sp_delete_job_references
USE [msdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
ALTER PROCEDURE [dbo].[sp_delete_job_references]
@notify_sqlagent BIT = 1
AS
BEGIN
DECLARE @deleted_job_id UNIQUEIDENTIFIER
DECLARE @task_id_as_char VARCHAR(10)
DECLARE @job_is_cached INT
DECLARE @alert_name sysname
DECLARE @maintplan_plan_id UNIQUEIDENTIFIER
DECLARE @maintplan_subplan_id UNIQUEIDENTIFIER
-- Keep SQLServerAgent's cache in-sync and cleanup any 'webtask' cross-references to the deleted job(s)
-- NOTE: The caller must have created a table called #temp_jobs_to_delete of the format
-- (job_id UNIQUEIDENTIFIER NOT NULL, job_is_cached INT NOT NULL).
DECLARE sqlagent_notify CURSOR LOCAL
FOR
SELECT job_id, job_is_cached
FROM #temp_jobs_to_delete
OPEN sqlagent_notify
FETCH NEXT FROM sqlagent_notify INTO @deleted_job_id, @job_is_cached
WHILE (@@fetch_status = 0)
BEGIN
-- NOTE: We only notify SQLServerAgent if we know the job has been cached
IF(@job_is_cached = 1 AND @notify_sqlagent = 1)
EXECUTE msdb.dbo.sp_sqlagent_notify @op_type = N'J',
@job_id = @deleted_job_id,
@action_type = N'D'
IF (EXISTS (SELECT *
FROM master.dbo.sysobjects
WHERE (name = N'sp_cleanupwebtask')
AND (type = 'P')))
BEGIN
SELECT @task_id_as_char = CONVERT(VARCHAR(10), task_id)
FROM msdb.dbo.systaskids
WHERE (job_id = @deleted_job_id)
IF (@task_id_as_char IS NOT NULL)
EXECUTE ('master.dbo.sp_cleanupwebtask @taskid = ' + @task_id_as_char)
END
-- Maintenance plan cleanup for SQL 2005.
-- If this job came from another server and it runs a subplan of a
-- maintenance plan, then delete the subplan record. If that was
-- the last subplan still referencing that plan, delete the plan.
-- This removes a distributed maintenance plan from a target server
-- once all of jobs from the master server that used that maintenance
-- plan are deleted.
SELECT @maintplan_plan_id = plans.plan_id, @maintplan_subplan_id = plans.subplan_id
FROM sysmaintplan_subplans plans, sysjobs_view sjv
WHERE plans.job_id = @deleted_job_id
AND plans.job_id = sjv.job_id
AND sjv.master_server = 1 -- This means the job came from the master
IF (@maintplan_subplan_id is not NULL)
BEGIN
EXECUTE sp_maintplan_delete_subplan @subplan_id = @maintplan_subplan_id, @delete_jobs = 0
IF (NOT EXISTS (SELECT *
FROM sysmaintplan_subplans
where plan_id = @maintplan_plan_id))
BEGIN
DECLARE @plan_name sysname
SELECT @plan_name = name
FROM sysmaintplan_plans
WHERE id = @maintplan_plan_id
EXECUTE sp_dts_deletepackage @name = @plan_name, @folderid = '08aa12d5-8f98-4dab-a4fc-980b150a5dc8' -- this is the guid for 'Maintenance Plans'
END
END
FETCH NEXT FROM sqlagent_notify INTO @deleted_job_id, @job_is_cached
END
DEALLOCATE sqlagent_notify
-- Remove systaskid references (must do this AFTER sp_cleanupwebtask stuff)
DELETE FROM msdb.dbo.systaskids
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
-- Remove sysdbmaintplan_jobs references (legacy maintenance plans prior to SQL 2005)
DELETE FROM msdb.dbo.sysdbmaintplan_jobs
WHERE job_id IN (SELECT job_id FROM #temp_jobs_to_delete)
-- Finally, clean up any dangling references in sysalerts to the deleted job(s)
DECLARE sysalerts_cleanup CURSOR LOCAL
FOR
SELECT name
FROM msdb.dbo.sysalerts
WHERE (job_id IN (SELECT job_id FROM #temp_jobs_to_delete))
OPEN sysalerts_cleanup
FETCH NEXT FROM sysalerts_cleanup INTO @alert_name
WHILE (@@fetch_status = 0)
BEGIN
EXECUTE msdb.dbo.sp_update_alert @name = @alert_name,
@job_id = 0x00
FETCH NEXT FROM sysalerts_cleanup INTO @alert_name
END
DEALLOCATE sysalerts_cleanup
END
而msdb.dbo.sp_delete_job_references这个存储过程又接着调用了存储过程sp_maintplan_delete_subplan,
USE [msdb]
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER OFF
GO
ALTER PROCEDURE [dbo].[sp_maintplan_delete_subplan]
@subplan_id UNIQUEIDENTIFIER,
@delete_jobs BIT = 1
AS
BEGIN
DECLARE @retval INT
DECLARE @job UNIQUEIDENTIFIER
DECLARE @jobMsx UNIQUEIDENTIFIER
SET NOCOUNT ON
SET @retval = 0
-- Raise an error if the @subplan_id doesn't exist
IF( NOT EXISTS(SELECT * FROM sysmaintplan_subplans WHERE subplan_id = @subplan_id))
BEGIN
DECLARE @subplan_id_as_char VARCHAR(36)
SELECT @subplan_id_as_char = CONVERT(VARCHAR(36), @subplan_id)
RAISERROR(14262, -1, -1, '@subplan_id', @subplan_id_as_char)
RETURN(1)
END
BEGIN TRAN
--Is there an Agent Job/Schedule associated with this subplan?
SELECT @job = job_id, @jobMsx = msx_job_id
FROM msdb.dbo.sysmaintplan_subplans
WHERE subplan_id = @subplan_id
EXEC @retval = msdb.dbo.sp_maintplan_delete_log @subplan_id = @subplan_id
IF (@retval <> 0)
BEGIN
ROLLBACK TRAN
RETURN @retval
END
-- Delete the subplans table entry first since it has a foreign
-- key constraint on its job_id existing in sysjobs.
DELETE msdb.dbo.sysmaintplan_subplans
WHERE (subplan_id = @subplan_id)
IF (@delete_jobs = 1)
BEGIN
--delete the local job associated with this subplan
IF (@job IS NOT NULL)
BEGIN
EXEC @retval = msdb.dbo.sp_delete_job @job_id = @job, @delete_unused_schedule = 1
IF (@retval <> 0)
BEGIN
ROLLBACK TRAN
RETURN @retval
END
END
--delete the multi-server job associated with this subplan.
IF (@jobMsx IS NOT NULL)
BEGIN
EXEC @retval = msdb.dbo.sp_delete_job @job_id = @jobMsx, @delete_unused_schedule = 1
IF (@retval <> 0)
BEGIN
ROLLBACK TRAN
RETURN @retval
END
END
END
COMMIT TRAN
RETURN (0)
END
也就是说最终在此存储过程sp_maintplan_delete_subplan中删除msdb.dbo.sysmaintplan_subplans 表中的记录。 过程梳理清楚了,那么逆向推导看看具体原因
如下所示,删除msdb.dbo.sysmaintplan_subplans中对应记录语句如下
此时要看参数@subplan_id的取值,它从msdb.dbo.sp_delete_job_references中传入,如下所示
ALTER PROCEDURE [dbo].[sp_maintplan_delete_subplan]
@subplan_id UNIQUEIDENTIFIER,
@delete_jobs BIT = 1
AS
…………………………………………………………………
在[dbo].[sp_delete_job_references]中,它的值来自于 @maintplan_subplan_id变量,最终来自于sysmaintplan_subplans系统表
SELECT @maintplan_plan_id = plans.plan_id,
@maintplan_subplan_id = plans.subplan_id
FROM sysmaintplan_subplans plans, sysjobs_view sjv
WHERE plans.job_id = @deleted_job_id
AND plans.job_id = sjv.job_id
AND sjv.master_server = 1 -- This means the job came from the master
我通过DAC登录数据库(sysmaintplan_subplans是内部对象,此对象在DAC下才可以访问),查询如下所示,你会发现无记录,也就是说@maintplan_subplan_id为NULL值,导致后面执行删除msdb.dbo.sysmaintplan_subplans表中记录时,没有真正的删除记录。
最后发现导致查询无记录的原因在于查询条件sjv.master_server = 1
sysjob_view视图代码如下所示:
CREATE VIEW sysjobs_view
AS
SELECT jobs.job_id,
svr.originating_server,
jobs.name,
jobs.enabled,
jobs.description,
jobs.start_step_id,
jobs.category_id,
jobs.owner_sid,
jobs.notify_level_eventlog,
jobs.notify_level_email,
jobs.notify_level_netsend,
jobs.notify_level_page,
jobs.notify_email_operator_id,
jobs.notify_netsend_operator_id,
jobs.notify_page_operator_id,
jobs.delete_level,
jobs.date_created,
jobs.date_modified,
jobs.version_number,
jobs.originating_server_id,
svr.master_server
FROM msdb.dbo.sysjobs as jobs
JOIN msdb.dbo.sysoriginatingservers_view as svr
ON jobs.originating_server_id = svr.originating_server_id
--LEFT JOIN msdb.dbo.sysjobservers js ON jobs.job_id = js.job_id
WHERE (owner_sid = SUSER_SID())
OR (ISNULL(IS_SRVROLEMEMBER(N'sysadmin'), 0) = 1)
OR (ISNULL(IS_MEMBER(N'SQLAgentReaderRole'), 0) = 1)
OR ( (ISNULL(IS_MEMBER(N'TargetServersRole'), 0) = 1) AND
(EXISTS(SELECT * FROM msdb.dbo.sysjobservers js
WHERE js.server_id <> 0 AND js.job_id = jobs.job_id))) -- filter out local jobs
继续往下扒,视图dbo.sysoriginatingservers_view代码如下所示,
CREATE VIEW dbo.sysoriginatingservers_view(originating_server_id, originating_server, master_server)
AS
SELECT
0 AS originating_server_id,
UPPER(CONVERT(sysname, SERVERPROPERTY('ServerName'))) AS originating_server,
0 AS master_server
UNION
SELECT
originating_server_id,
originating_server,
master_server
FROM
dbo.sysoriginatingservers
原来master_server的值是默认的。因为表dbo.sysoriginatingservers无记录。至此,可以看出,这应该是SQL Server 2005的一个BUG来的。
解决方案:
手工删除系统表msdb.dbo.sysmaintplan_subplans中的记录,然后删除该作业。问题搞定。
USE [msdb]
GO
DELETE FROM msdb.dbo.sysmaintplan_subplans WHERE SUBPLAN_ID='B9A639EB-955D-4AE6-B69E-860145C133E7';
USE [msdb]
GO
EXEC msdb.dbo.sp_delete_job @job_id=N'ce8cb4ad-c91f-45bc-9e21-b50947063fba', @delete_unused_schedule=1
GO