写PL/SQL procedure的时候,一定要写的够健壮、够完善;如果该procedure会有job来run它,更需要有完善的异常处理机制(捕获所有异常,包括WHEN OTHERS, 把出错信息写入日志。在异常处理中不要继续RAISE。这样运行该procedure的job就保证不会失败了)。要不然,任何未捕获异常的发生都会引起job的failure;在failure 16次后,则其就broken了。
!!!待解决!!!
procedure中加入完善的异常处理机制无法解决因job所用资源被其他线程锁定引起的job的failure问题。具体见itpub ....
最重要的三个链接:
http://docstore.mik.ua/orelly/oracle/bipack/ch13_01.htm
http://download.oracle.com/docs/cd/B10501_01/server.920/a96521/jobq.htm
http://www.itpub.net/viewthread.php?tid=296656&extra=&highlight=job&page=1
Oracle9i Database Administrator's Guide Release 2 - 10 Managing Job Queues:
http://download.oracle.com/docs/cd/B10501_01/server.920/a96521/jobq.htm
引用
Submitting a Job to the Job Queue
To submit a new job to the job queue, use the SUBMIT procedure in the DBMS_JOB package. You specify the following parameters with the SUBMIT procedure.
For the submitted job to run, you must issue a COMMIT statement immediately after the DBMS_JOB.SUBMIT statement.
Description of Procedure DBMS_JOB.RUN's Parameters:
引用
JOB An output parameter, this is the identifier assigned to the job you created. You must use this job number whenever you want to alter or remove the job.
WHAT This is the PL/SQL code you want to have executed.
The WHAT parameter must end with a semi-colon.
NEXT_DATE
The next date when the job will be run. The default value is SYSDATE.(如果使用的是默认的sysdate,则该job在submit后马上执行)
INTERVAL The date function that calculates the next time to execute the job. The default value is NULL. INTERVAL must evaluate to a future point in time or NULL.
This parameter is a VARCHAR2 and must be enclosed in single quotes.(如果使用的是默认的NULL,则该job第一次运行成功后就会停掉(deleted from the queue))
NO_PARSE
This is a flag. If NO_PARSE is set to FALSE (the default), Oracle parses the procedure associated with the job. If NO_PARSE is set to TRUE, Oracle parses the procedure associated with the job the first time that the job is executed. If, for example, you want to submit a job before you have created the tables associated with the job, set NO_PARSE to TRUE.
How a Job Becomes Broken
There are two ways a job can break:
1
Oracle has failed to successfully execute the job after 16 attempts.
2
You have marked the job as broken, using the procedure DBMS_JOB.BROKEN:
BEGIN
DBMS_JOB.BROKEN(14144, TRUE);
END;
/
Once a job has been marked as broken, Oracle will not attempt to execute the job
until you either mark the job as not broken, or force the job to be executed by calling the procedure DBMS_JOB.RUN.
Running Broken Jobs
If a problem has caused a job to fail 16 times, Oracle marks the job as broken. Once you have fixed this problem, you can run the job by either:
1 Forcing the job to run by calling DBMS_JOB.RUN
2 Marking the job as not broken by calling DBMS_JOB.BROKEN
and waiting for Oracle to execute the job(第三个参数next_date在broken为false时的作用!)
The following example marks job 14144 as not broken and sets its next execution date to the following Monday:
BEGIN
DBMS_JOB.BROKEN(14144, FALSE, NEXT_DAY(SYSDATE, 'MONDAY'));
END;
/
If you force the job to run by calling the procedure DBMS_JOB.RUN,
Oracle runs the job immediately. If the job succeeds, then Oracle labels the job as not broken and resets its count of the number of failed executions for the job to zero.
Once you reset a job's broken flag (by calling either RUN or BROKEN), job execution resumes according to the scheduled execution intervals set for the job.
The procedure RUN contains an implicit commit. Once you execute a job using RUN, you cannot roll back.(
除DBMS_JOB.RUN之外的其他PROCEDURE,在执行完之后都must issue a COMMIT statement immediately after the statement!!!)
Job Execution Errors
When a job fails,
information about the failure is recorded in a trace file and the alert log. Oracle writes message number ORA-12012 and includes the job number of the failed job.
The following can prevent the successful execution of queued jobs:
1
A network or instance failure
2
An exception when executing the job
If a job returns an error while Oracle is attempting to execute it, Oracle tries to execute it again. The first attempt is made after one minute, the second attempt after two minutes, the third after four minutes, and so on, with the interval doubling between each attempt. If the job fails 16 times, Oracle automatically marks the job as broken and no longer tries to execute it. However, between attempts, you have the opportunity to correct the problem that is preventing the job from running. This will not disturb the retry cycle, and Oracle will eventually attempt to run the job again.
BROKEN有三个参数:
引用
DBMS_JOB.BROKEN (
job IN BINARY_INTEGER,
broken IN BOOLEAN,
next_date IN DATE DEFAULT SYSDATE);
经试验得知:第三个参数next_date只有在第二参数为false时才有效;
当第二个参数为true时,job broken掉,并且broken掉后的该job的next_date恒为4000年1月1日(即使指定了第三个参数也没有用);
当第二个参数为false时,第三个参数用来指定该变正常job的下次执行时间(默认值是sysdate,即马上执行)。
由此得出通过run方法和通过置broken为false的方法来启动一个broken掉的job的区别:
run是马上执行该job;置broken的方法可以指定job的执行启动时间。
Oracle® Database PL/SQL Packages and Types Reference:
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/toc.htm
DBMS_SCHEDULER package:
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_sched.htm#CIHHBGGI
DBMS_JOB package:
http://download.oracle.com/docs/cd/B19306_01/appdev.102/b14258/d_job.htm#BABHCBFD
引用
Note about procedure DBMS_JOB.BROKEN:
If you set job as broken while it is running, Oracle resets the job's status to normal after the job completes. Therefore, only execute this procedure for jobs that are not running.
REMOVE Procedure:
This procedure removes an existing job from the job queue.
This currently does not stop a running job.
You must issue a COMMIT statement immediately after execute the remove procedure statement.
about DBA_JOBS and USER_JOBS(都是sys用户的view):
http://download.oracle.com/docs/cd/B28359_01/server.111/b28320/statviews_4093.htm
引用
DBA_JOBS describes all jobs in the database.
USER_JOBS describes the jobs owned by the current user. Its columns are the same as those in DBA_JOBS.
其字段详解(重要字段标记为蓝色^_^):
Oracle DBMS_JOB:
http://psoug.org/reference/dbms_job.html
Moving from DBMS_JOB to DBMS_SCHEDULER:
http://download.oracle.com/docs/cd/B19306_01/server.102/b14231/jobtosched.htm
DBMS_SCHEDULER as a new alternative for DBMS_JOB:
http://technology.amis.nl/blog/1158/dbms_scheduler-as-a-new-alternative-for-dbms_job
Use DBMS_SCHEDULER to replace DBMS_JOB:
http://pandazen.wordpress.com/2008/09/19/use-dbms_scheduler-to-replace-dbms_job/
Using the power of DBMS_JOB.SUBMIT:
http://www.orafaq.com/node/871
DBMS JOB FAQ:
http://www.orafaq.com/wiki/DBMS_JOB
如何彻底停止job:
http://www.itpub.net/viewthread.php?tid=100926&highlight=job
停止删除oracle中正在执行的job:
http://pengfeng.iteye.com/blog/422549
引用
Note that, once a job is started and running, there is no easy way to stop the job.(也就是说,没有给你提供直接停掉的方法;想停掉,参照上链接,麻烦点整呗)
定制可定期结束运行的JOB:
http://www.itpub.net/viewthread.php?tid=151427&highlight=job
关于NEXT_DATE:
本次run成功之后,才会去计算并更新下次执行的NEXT_DATE;
本次执行失败:不会去计算NEXT_DATE,也就不存在“下次执行”这个概念了,转入"16次尝试重新执行"处理;
本次执行成功:NEXT_DATE的计算方式为
NEXT_DATE := greatest(this_date+interval,sysdate)(注:this_date为本次执行的开始时间)。故:以本次成功执行的结束时间点作为参考,本次job run耗时小于时间间隔interval,则下次执行在"interval-<job_run_耗时>"后开始;大于时间间隔interval,则下次执行马上开始。
Oracle作业(JOB)更新next_date的探讨:
http://blog.csdn.net/kamus/archive/2004/12/02/201377.aspx
http://www.itpub.net/viewthread.php?tid=296656&extra=&highlight=job&page=1
引用
Q:
按照斑竹所说JOB是根据开始执行的时间+INTERVAL来决定下一次运行时间的,并且是在上一次运行完才更新NEXT_DAte的。那么假如我的JOB开始执行时间是6点,INTERVAL是30分钟,而我的JOB程序运行需要1个小时,即7点运行完毕,此时更新后的NEXT_DATE为6:30,这是不是说我的JOB从此以后就再也不会运行了呀?
A:
根据trace文件中更新next_date的语句
update sys.job$ set failures=0, this_date=null, flag=:1, last_date=:2, next_date = greatest(:3, sysdate), total=total+(sysdate-nvl(this_date,sysdate)) where job=:4
可以看到有greatest(:3, sysdate),此处的:3绑定的是job的this_date+interval。所以实际上应该是有一个跟当前时间的比较机制,如果现在的时间更晚一些,那么next_date就用现在的时间。
可以理解成在生成next_date的时候采用了类似next_date := decode(sign(sysdate - next_date), 1, sysdate, next_date)的机制
所以,问题中job的下次执行时间应该在7:00。
基于此,可以认为:如果一个job的执行时间间隔INTERVAL过短,每次执行用时都会超过这个INTERVAL,则该job就会不停的run!
如果作业失败了,ORACLE将自动尝试再次运行它,该作业将在失败后的一分钟以后再次运行。如果这次尝试也失败了,那么下一次尝试是在两分钟以后,每次尝试失败了,间隔时间为前一次的2倍,如超过了作业的间隔时间,将会使用执行间隔时间,总次数为16次,16次还失败标记为失效(broken),失效作业不再自动运行。
一本经典书《Oracle Built-in Packages》中关于job的经典论述节选:
http://docstore.mik.ua/orelly/oracle/bipack/ch13_01.htm
When job execution fails, the SNP processes attempts to rerun the job one minute later. If this run fails, another attempt is made in two minutes and another in four minutes. The job queue doubles the retry interval until it exceeds the normal execution interval, which is then used. After 16 consecutive failures, the job is flagged as broken and will not be re-executed by the job queue without user intervention.
The next_date parameter can be set to a time in the past. Jobs are chosen for execution in order of their next execution dates, so setting a job's next_date back can effectively move the job ahead in the queue. This can be useful in systems where the job queue processes are not keeping up with jobs to be executed, and a specific job needs to be executed as soon as possible.
The interval parameter is a character string representing a valid Oracle date expression. This date expression is evaluated each time the job begins execution. When a job completes successfully, this date becomes the next execution date for the job. It is important to remember that interval evaluation and updating the job's next execution date happen at different times. For instance, a job that takes one hour to complete and has interval set to SYSDATE+1/48 (every 30 minutes) will constantly execute, because each time it completes, it will already be 30 minutes late to execute again.
The interval expression must evaluate to either a NULL value or a time in the future. When interval evaluates to a NULL value, the job will not be re-executed after the next execution and will be automatically removed from the job queue. Thus, to execute a job one time only, pass a NULL value for the interval parameter.
job next_date的计算:
http://www.oracleblog.cn/working-case/how-to-calculate-job-next_date/
引用
dba_jobs 几个主要字段的具体意思:
(a)LAST_DATE:Date on which this job last successfully executed——job执行成功的时间,即1.job执行成功;2.是job执行完毕的时间
(这里说LAST_DATE取用的是上次成功执行的结束时间点,下个链接中说取用的是开始时间点,到底是开始时间点还是结束时间点?经测试,LAST_DATE取用的是上次成功执行的开始时间点!)
(b)LAST_SEC:同LAST_DATE
(c)THIS_DATE:Date that this job started executing (usually null if not executing)——没有job在执行的时候,该字段为空。若有job正在运行,这个时间是job的开始执行时间。
(d)THIS_SEC:同THIS_DATE
(e)NEXT_DATE:Date that this job will next be executed——job下次执行时间点。
(f)NEXT_SEC:同NEXT_DATE
(g)INTERVAL:A date function, evaluated at the start of execution, becomes next NEXT_DATE——重点的说明一下这个参数,这个参数涉及到next date的计算,next就是根据job的开始执行时间点,结合interval中的时间函数,算出来的。
比如,某个job开始执行时间8月1日23:55,那么如果interval是trunc(sysdate)+1435/1440,因此,next date就是把job的开始时间23:55,代入到interval中的sysdate,得到的结果就仍然是8月1日的23:55。而next date的更新,是一次job完成时更新的,比如job在8月2日的0:15完成,于是在更新next date的时候,就发现next date的时间晚于当前的时间,于是就报错may not schedule automatic refresh for times in the past。
last_date 和 next_date ,以及interval之间是什么关系:
http://blog.csdn.net/renren000/archive/2009/02/04/3863176.aspx(转载人论坛帖子咋不标明出处那?原始帖子愣是没找到...)
引用
Q:
job里面的 last_date 和 next_date ,以及interval之间是什么关系,last_date到底是记的 job 的开始执行时间,还是执行结束的时间。next_date到底是 job 的开始时间还是结束时间加上 interval 得出的呢?
A:
last_date=开始时间
last_date+interval=next_date
job的next_date是在job开始执行的时候就算好了
不过是在job成功执行之后才写上去
当job的执行时间长过interval时候 也就是说在job的执行过程中next_date就已经过了
那么next_date就变为job执行完时的时间
但要清楚,next_date通常不是真正的开始时间,真正的开始时间往往比next_date晚几秒,
也就是延时。
Q:
如果一个Job的执行时间比较长(比如超过了interval),会不会出现多个同样的 job 同时执行呢?
A:
不会出现这种情况
Oracle job 9.2.0.6之前版本的一个bug:
Oracle诊断案例-Job任务停止执行:
http://www.itpub.net/viewthread.php?tid=291269&extra=&highlight=job&page=1
Job相关CRUD:
查看是否有失败的数据库任务
select JOB, LAST_DATE, THIS_DATE, NEXT_DATE,BROKEN,FAILURES,WHAT from user_jobs;
查看正在运行的任务:
select * from dba_jobs_running;
submit一个(每30秒执行一次的)job并马上执行之(不为该job的next_date赋值,即采用其默认值sysdate,则该job在submit后会马上执行):
DECLARE jobNum number;
begin
sys.dbms_job.submit(job => :jobNum,
what => '<PL/SQL code>;',
interval => 'SYSDATE + 1/2880');
commit;
end;
DECLARE jobNum number;
begin
sys.dbms_job.submit(job => :jobNum,
what => 'insert into t_job_test values(seq_job_test.nextval, sysdate);',
interval => 'SYSDATE + 1/2880');
commit;
end;
说明:what里的<PL/SQL code>可以是个procedure,也可以是个简单的insert语句...等等。
JOB执行失败后,查看jobNum,使用下面方式可以启动该job:
1 通过调用DBMS_JOB.RUN来启动:
exec sys.dbms_job.run(jobNum); --The procedure RUN contains an implicit commit
2 通过调用DBMS_JOB.BROKEN置job的broken为false:
begin
sys.dbms_job.broken(jobNum, false);
commit;
end;
/
ORACLE JOB不能按预期执行的常见原因:
http://skygodblue.spaces.live.com/blog/cns!D18CEE1DB3D21A07!398.entry?wa=wsignin1.0&sa=66751260
http://www.developpez.net/forums/d240405/bases-donnees/oracle/administration/resolu-pb-refresh-auto-vue-materialisee/
引用
TIP: Click help for a detailed explanation of this page.
Bookmark Go to End
Subject: Jobs Not Executing Automatically
Doc ID: Note:313102.1 Type: PROBLEM
Last Revision Date: 14-APR-2006 Status: MODERATED
In this Document
Symptoms
Cause
Solution
References
This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) Rapid Visibility (RaV) process, and therefore has not been subject to an independent technical review.
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.4.0
This problem can occur on any platform.
Symptoms
Jobs are no longer executing automatically.
If forced (exec dbms_job.run(<enter here job number>), these execute fine.
Cause
Trying the most common reasons why jobs don't execute automatically and as scheduled:
1) Instance in RESTRICTED SESSIONS mode?
Check if the instance is in restricted sessions mode:
select instance_name,logins from v$instance;
If logins=RESTRICTED, then:
alter system disable restricted session;
^-- Checked!
2) JOB_QUEUE_PROCESSES=0
Make sure that job_queue_processes is > 0
show parameter job_queue_processes
^-- Checked!
SQL> show parameter job;
the value of job_queue_processes parameter must be greater than 0 (at least 1).
3) _SYSTEM_TRIG_ENABLED=FALSE
Check if _system_enabled_trigger=false
col parameter format a25
col value format a15
select a.ksppinm parameter,b.ksppstvl value from x$ksppi a,x$ksppcv b
Where a.indx=b.indx and ksppinm=’_system_trig_enabled’;
^-- Checked!
4) Is the job BROKEN?
select job,broken from dba_jobs where job=<job_number>;
If broken, then check the alert log and trace files to diagnose the issue.
^-- Checked! The job is not broken.
5) Is the job COMMITted?
Make sure a commit is issued after submitting the job:
DECLARE X NUMBER;
BEGIN
SYS.DBMS_JOB.SUBMIT
(
job => X
,what => 'dbms_utility.analyze_schema
(''SCOTT'',''COMPUTE'',NULL,NULL,NULL);'
,next_date => to_date('08/06/2005 09:35:00','dd/mm/yyyy hh24:mi:ss')
,no_parse => FALSE
);
COMMIT;
END;
/
If the job executes fine if forced (i.e., exec dbms_jobs.run(<job_no>), then likely a commit
is missing.
^-- Checked! The job is committed after submission.
6) UPTIME > 497 days
Check if the server (machine) has been up for more than 497 days:
For SUN , use 'uptime' OS command.
If uptime>497 and the jobs do not execute automatically, then you are hitting bug 3427424
(Jobs may stop running after 497 days uptime) which is fixed in 9206 and A102
^-- Checked! The server in this case has been up 126 days only
7) DBA_JOBS_RUNNING
Check dba_jobs_running to see if the job is still running:
select * from dba_jobs_running;
^-- Checked! The job is not running.
8) LAST_DATE and NEXT_DATE
Check if the last_date and next_date for the job are proper:
select Job,Next_date,Last_date from dba_jobs where job=<job_number>;
^-- NEXT_DATE is porper, however LAST_DATE is null since the job never executes automatically.
9) NEXT_DATE and INTERVAL
Check if the Next_date is changing properly as per the interval set in dba_jobs:
select Job,Interval,Next_date,Last_date from dba_jobs where job=<job_number>;
^-- This is not possible since the job never gets executed automatically.
10) Toggle value for JOB_QUEUE_PROCESSES
Set job_queue_processes=0, wait some time and then and set it back to the original value:
alter system set job_queue_processes=0 ;
--<Wait for some time>
alter system set job_queue_processes=4 ;
Ref: Bug 2649244 (fixed by: 9015,9203,A000)
^-- Done but did not help
11) DBMS_IJOB(Non-documented):
Last ditch effort.
Either restart the database or try the following:
exec dbms_ijob.set_enabled(true);
Ref: Bug 3505718 (Closed, Not a Bug)
@Note 90989.1 (Using DBMS_IJOB) INTERNAL NOTE
Done but did not help
These are the most common causes for this behavior.
Solution
The solution ended up to be the server (machine) uptime.
Even though it was up for only 126 days, after the server was rebooted all jobs were able to execute automatically.
To implement the solution, please execute the following steps:
1. Shutdown all applications, including databases.
2. Shutdown the server (machine)
3. Restart all applications, including databases.
4. Check that jobs are executing automatically.
.
References
Bug 2649244 - Jobs Don'T Run In Scheduled Intervals
Bug 3505718 - Jobs Not Running When Scheduled
Help us improve our service. Please email us your comments for this document. .
常用 INTERVAL 参数值:
引用
1:每分钟执行
Interval => TRUNC(sysdate,'mi') + 1/ (24*60)
或
Interval => sysdate+1/1440
2:每天定时执行
例如:每天的凌晨1点执行
Interval => TRUNC(sysdate) + 1 +1/ (24)
3:每周定时执行
例如:每周一凌晨1点执行
Interval => TRUNC(next_day(sysdate,'星期一'))+1/24
4:每月定时执行
例如:每月1日凌晨1点执行
Interval =>TRUNC(LAST_DAY(SYSDATE))+1+1/24
5:每季度定时执行
例如每季度的第一天凌晨1点执行
Interval => TRUNC(ADD_MONTHS(SYSDATE,3),'Q') + 1/24
6:每半年定时执行
例如:每年7月1日和1月1日凌晨1点
Interval => ADD_MONTHS(trunc(sysdate,'yyyy'),6)+1/24
7:每年定时执行
例如:每年1月1日凌晨1点执行
Interval =>ADD_MONTHS(trunc(sysdate,'yyyy'),12)+1/24