Take control of Oracle's queue with a step by step approach to getting rid of those pesky DBMS_JOBs.
Let's face it, Oracle's job scheduling facility is a wonderful tool for scheduling Oracle related jobs without having to maintain a cron job on Unix or an AT job in windows. It is also very robust and reliable. It is that very reliability and robustness that gives many of us our problems.
If you have any form of jobs running on your system, you will at one time or another come across the issue of a run-away job that just doesn't seem to want to end. Or maybe you will try and shutdown the database only to find out that it is waiting to complete a job. I would like to offer some help in the management of those job queues when they just don't seem to want to end or go away.
A while back I needed to find information on how to clear the job queue for jobs running with no apparent end in sight. Some had hung, while others just were taking a bad access path to data. I needed to bring down these jobs, do a bit of tuning and then restart the jobs. Well, to my amazement, there wasn't very much information out on the web that gave good insight into this process. Basically the method suggested was to first break the job and then issue an ALTER SYTEM KILL SESSION command. This method does not always work and unfortunately--never on my system, for the jobs I had. I then called Oracle support and basically got the same answer as I found out on the web. They did give me one added piece of information. They said, if the ALTER SYSTEM KILL SESSION didn't work, I was supposed to bounce my database in order to bring down the job queue processes. First of all, this wasn't an option and when I did get the opportunity to bounce the database box, many of the jobs seemed to come right back as strong as ever.
Before writing this article I did another quick search on the topic of killing dbms_jobs and to my amazement there still wasn't much good information out there. This is why I want to share my method, so that you won't be stuck up against the wall with this problem and nowhere to turn, as I was.
Lets first go through a few different methods of viewing the information about job queues.
Viewing scheduled dbms_jobs
When looking at what jobs have been scheduled, there is really only one view that you need to go to. The dba_jobs view contains all of the information you need, to see what has been scheduled, when they were last run, and if they are currently running. Use the following simple script to take a look. Bear with me on the sub-select, I will build on this query as we go on in the presentation.
scheduled_dbms_jobs.sql
set linesize 250 col log_user for a10 col job for 9999999 head 'Job' col broken for a1 head 'B' col failures for 99 head "fail" col last_date for a18 head 'Last|Date' col this_date for a18 head 'This|Date' col next_date for a18 head 'Next|Date' col interval for 9999.000 head 'Run|Interval' col what for a60 select j.log_user, j.job, j.broken, j.failures, j.last_date||':'||j.last_sec last_date, j.this_date||':'||j.this_sec this_date, j.next_date||':'||j.next_sec next_date, j.next_date - j.last_date interval, j.what from (select dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES, dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC, dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT from dba_jobs dj) j;
What Jobs are Actually Running
A simple join to the dba_jobs_running view will give us a good handle on the scheduled jobs that are actually running at this time. This is done by a simple join through the job number. The new column of interest returned here is the sid which is the identifier of the process that is currently executing the job.
running_jobs.sql
set linesize 250 col sid for 9999 head 'Session|ID' col log_user for a10 col job for 9999999 head 'Job' col broken for a1 head 'B' col failures for 99 head "fail" col last_date for a18 head 'Last|Date' col this_date for a18 head 'This|Date' col next_date for a18 head 'Next|Date' col interval for 9999.000 head 'Run|Interval' col what for a60 select j.sid, j.log_user, j.job, j.broken, j.failures, j.last_date||':'||j.last_sec last_date, j.this_date||':'||j.this_sec this_date, j.next_date||':'||j.next_sec next_date, j.next_date - j.last_date interval, j.what from (select djr.SID, dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES, dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC, dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT from dba_jobs dj, dba_jobs_running djr where dj.job = djr.job ) j;
What Sessions are Running the Jobs
Now that we have determined which jobs are currently running, we need to find which Oracle session and operating system process is accessing them. This is done through first joining v$process to v$session by way of paddr and addr which is the address of the processs that owns the sessions, and then joining the results back to the jobs running through the sid value. The new columns returned in our query are spid which is the operating system process identifier and serial# which is the session serial number.
session_jobs.sql
set linesize 250 col sid for 9999 head 'Session|ID' col spid head 'O/S|Process|ID' col serial# for 9999999 head 'Session|Serial#' col log_user for a10 col job for 9999999 head 'Job' col broken for a1 head 'B' col failures for 99 head "fail" col last_date for a18 head 'Last|Date' col this_date for a18 head 'This|Date' col next_date for a18 head 'Next|Date' col interval for 9999.000 head 'Run|Interval' col what for a60 select j.sid, s.spid, s.serial#, j.log_user, j.job, j.broken, j.failures, j.last_date||':'||j.last_sec last_date, j.this_date||':'||j.this_sec this_date, j.next_date||':'||j.next_sec next_date, j.next_date - j.last_date interval, j.what from (select djr.SID, dj.LOG_USER, dj.JOB, dj.BROKEN, dj.FAILURES, dj.LAST_DATE, dj.LAST_SEC, dj.THIS_DATE, dj.THIS_SEC, dj.NEXT_DATE, dj.NEXT_SEC, dj.INTERVAL, dj.WHAT from dba_jobs dj, dba_jobs_running djr where dj.job = djr.job ) j, (select p.spid, s.sid, s.serial# from v$process p, v$session s where p.addr = s.paddr ) s where j.sid = s.sid;
Now that we have a good handle on how we can look at the jobs and the key columns involved, let's go through the steps needed to bring down a job. The following is a 5 to 11 step process that should solve all of your problems.
Bringing Down a DBMS_JOB
1. Find the Job You Want to Bring Down
In order to do anything you first need to find the job that is giving you a headache. Go ahead and run the running_jobs.sql. This will give you the prime information, job, sid, serial#, and spid, for the following actions in bringing down the job.
2. Mark the DBMS_JOB as Broken
Use the following command for the job that you have to deal with.
SQL> EXEC DBMS_JOB.BROKEN(job#,TRUE);
All this command does is mark the job so that if we get it to stop, it won't start again. Let's make one thing perfectly clear, after executing this command the job is still running.
As a side note, if you are trying to shut down a database with jobs that run throughout the day, they may hinder your attempts to bring down the database cleanly. This is a wonderful command to make sure no jobs are executing during the shutdown process. Just be aware that you will need to mark the jobs as unbroken when the database comes back up, more on that later.
3. Kill the Oracle Session
Since the job is still running and it isn't going to end soon, you will need to kill the Oracle session that is executing the job. Use the following command for to kill the job.
ALTER SYSTEM KILL SESSION 'sid,serial#';
4. Kill the O/S Process
More often than not the previous step will still leave the job attached to the database and still running. When this happens you will need to go out to the operating system level and get rid of the process that has spawned from the running job. In order to do this you must login to the database box and issue the following command, depending on the type of operating system you have.
For Windows, at the DOS Prompt: orakill sid spid
For UNIX at the command line> kill '9 spid
The orakill is an Oracle command, while kill is a Unix command.
5. Check if the Job is Still Running
Re-run the session_jobs.sql script to see if you have gotten rid of the job. If you have there is no reason to go further. Usually steps 1 through 4 will be sufficient to get rid of a job but when the job is running wild you will have to continue with steps 6 through 11 which describes a process for bouncing the job queue process.
6. Determine the Current Number of Job Queue Processes
SQL> col value for a10
SQL> select name,value from v$parameter where name = 'job_queue_processes';
7. Alter the Job Queue to Zero
SQL> ALTER SYSTEM SET job_queue_processes = 0;
This will bring down the entire job queue processes.
8. Validate that No Processes are Using the Job Queue
Re-run the session_jobs.sql script to see if any jobs are still running. Since we have given a hard stop to the job queue and issued the kill commands, you can now wait until no more jobs are running. After all the jobs have quit running, you can do whatever maintenance or tuning you need to do before proceeding.
9. Mark the DBMS_JOB as Not Broken
You can now reset the broken job to not broken so they can run again. Just issue the command.
SQL>EXEC DBMS_JOB.BROKEN(job#,FALSE):
10. Alter the Job Queue to Original Value
Set the job queue to its' original value so that the jobs can run again.
ALTER SYSTEM SET job_queue_processes = original_value;
11. Validate that DBMS_JOB Is Running
To make sure everything is back to normal, re-run the above scripts to validate that jobs are scheduled, not broken, and are executing with the next and last dates columns changing.
Oracle have given us a great tool for scheduling activities within the database. As with many things inside the database, not everything goes as planned, nor are we given adequate tools to fix some of the problems we encounter. With the eleven steps outlined here, hopefully you will have increased your arsenal to handle those run away jobs that have given the best of us a few tense moments.