是谁私吞了1个小时时间
【关键词】:开源Quartz、定时任务、夏令时、Calendar
实验室环境,测试人员针对产品做夏令时专题测试,发现通过修改系统时间的方式,当时间即将跳出夏令时前1个小时,系统中的定时任务都集体罢工了。
但是观察日志,模块运行正常,看到打印的时间并没有什么异常。调整系统时区和时间的方式如下:
Linux189:~ # ln -s /usr/share/zoneinfo/Europe/Bucharest/etc/localtime
Linux189:~ # zdump -v /etc/localtime |grep2013
/etc/localtime Sun Mar 31 00:59:59 2013 UTC = Sun Mar 3102:59:59 2013 EET isdst=0
/etc/localtime Sun Mar 31 01:00:00 2013 UTC = Sun Mar 3104:00:00 2013 EEST isdst=1
/etc/localtime Sun Oct 27 00:59:59 2013 UTC = Sun Oct 2703:59:59 2013 EEST isdst=1
/etc/localtime Sun Oct 27 01:00:00 2013 UTC = Sun Oct 2703:00:00 2013 EET isdst=0
Linux189:~ # date -u 102700502013
Sun Oct 27 00:50:00 UTC 2013
Linux189:~ # date
Sun Oct 27 03:50:01 EEST 2013
Linux189:~ #
通过上述修改后,重启所有服务,然后观察10min后切换为非夏令时,系统中的定时任务是否能够正常执行,结果却发现,重启后所有定时任务都不跑了。
在XXX系统中,使用的定时任务是开源Quartz组件,因此有必要查阅一下官方对于夏令时的支持情况,发现在FAQ中确有专门针对夏令时的说明:
http://quartz-scheduler.org/documentation/faq#FAQ-daylightSavings
CronTrigger allows you to schedule jobs to fire at certain momentswith respect to a "Gregorian calendar". Hence, if you create atrigger to fire every day at 10:00 am, before and after daylight savings timeswitches it will continue to do so. However, depending on whether it was theSpring or Autumn daylight savings event, for that particular Sunday, the actualtime interval between the firing of the trigger on Sundary morning at 10:00 amsince its firing on Saturday morning at 10:00 am will not be 24 hours, but willinstead be 23 or 25 hours respectively.
There is one additional point users must understandabout CronTrigger with respect to daylight savings. This is that you shouldtake careful thought about creating schedules that fire between midnight and 3:00am (the critical window of time depends on your trigger's locale, as explainedabove). The reason is that depending on your trigger's schedule, and theparticular daylight event, the trigger may be skipped or may appear to not firefor an hour or two. As examples, say you are in the United States, wheredaylight savings events occur at 2:00 am. If you have a CronTrrigger that firesevery day at 2:15 am, then on the day of the beginning of daylight savings timethe trigger will be skipped, since, 2:15 am never occurs that day.If you have a CronTrigger that fires every 15 minutes ofevery hour of every day, then on the day daylight savings time ends you willhave an hour of time for which no triggerings occur, because when 2:00 amarrives, it will become 1:00 am again, however all of the firings during theone o'clock hour have already occurred, and the trigger's next fire time wasset to 2:00 am - hence for the next hour no triggerings will occur.
VMS中使用的都是CronTrigger,上面这段标红的部分意思是时间出现重叠时,Job定时任务只会在第一次出现02:15时会执行,夏时令跳变后的第二次02:15不会再执行Job。
如上图所示,在夏令时向非夏令时跳变时,会出现一段时间的重叠T1和T2。按照官方的说法,每15分钟执行一次的定时任务,应该是在夏令时的时间段T1就执行了,到夏令时结束后的T2时间段应该就不再执行。触发器记录每次触发的时间,仅在当前时间大于上次触发时间时,才会执行。
但实际环境中的测试的结果却是定时任务在T1不执行,却在T2执行了,这又是为什么呢?这得打开Quartz的源码才能一探究竟了。
1)首先是定时任务触发的地方(org.quartz.core.QuartzSchedulerThread):
public class QuartzSchedulerThread extendsThread {
/**
*
*The main processing loop of the QuartzSchedulerThread
.
*
*/
public void run() {
boolean lastAcquireFailed = false;
while (!halted) {
signaled = false;
Trigger trigger = null;
long now = System.currentTimeMillis();
try {
trigger = qsRsrcs.getJobStore().acquireNextTrigger(
ctxt, now +idleWaitTime);
lastAcquireFailed = false;
} catch (JobPersistenceException jpe) {
if(!lastAcquireFailed)
qs.notifySchedulerListenersError(
"An error occuredwhile scanning for the next trigger to fire.",
jpe);
lastAcquireFailed = true;
}
catch (RuntimeException e) {
if(!lastAcquireFailed)
getLog().error("quartzSchedulerThreadLoop:RuntimeException "
+e.getMessage(),e);
lastAcquireFailed = true;
}
……(省略后面Job的执行过程)
}
}
Quartz的Job触发执行的大致思路是:由一个线程不停地sleep,然后从JobStore中获取未来30秒内(idleWaitTime=30s)将要触发的任务。注意,这里传入的是一个相对时间:System.currentTimeMillis() +idleWaitTime;。然后再看一下JobStore中是怎么获取的:
org.quartz.simpl.RAMJobStore
/**
*
*Get a handle to the next trigger to be fired, and mark it as 'reserved'
*by the calling scheduler.
*
*
*@see #releaseAcquiredTrigger(SchedulingContext, Trigger)
*/
public Trigger acquireNextTrigger(SchedulingContext ctxt, longnoLaterThan) {
TriggerWrapper tw = null;
synchronized (triggerLock) {
while (tw == null) {
try {
tw = (TriggerWrapper)timeTriggers.first();
} catch (java.util.NoSuchElementExceptionnsee) {
return null;
}
if (tw == null) return null;
if(tw.trigger.getNextFireTime() == null) {
timeTriggers.remove(tw);
tw = null;
continue;
}
timeTriggers.remove(tw);
if (applyMisfire(tw)) {
if(tw.trigger.getNextFireTime() != null)
timeTriggers.add(tw);
tw = null;
continue;
}
if(tw.trigger.getNextFireTime().getTime() >noLaterThan) {
timeTriggers.add(tw);
return null;
}
tw.state =TriggerWrapper.STATE_ACQUIRED;
tw.trigger.setFireInstanceId(getFiredTriggerRecordId());
Trigger trig = (Trigger)tw.trigger.clone();
return trig;
}
}
return null;
}
再通过代码调试发现,在每次调用时,都走了上面标红的分支,也就是下次触发时间,总是在未来30s之外,因此每次都取不到要执行的定时任务。
理论上配置为10s执行一次的任务,未来30s内应该至少有3次任务要执行,那这里面取出的时间为什么会在30s之外呢?
定时任务下次执行时间是通过定时任务表达式(CronExpression)来计算的,而定时任务表达式计算过程如下:
org.quartz.CronExpression
protected Date getTimeAfter(Date afterTime) {
Calendar cl = Calendar.getInstance(getTimeZone());
// move ahead one second, since we're computing the time *after* the
// given time
afterTime = new Date(afterTime.getTime() + 1000);
// CronTrigger does not deal with milliseconds
cl.setTime(afterTime);
cl.set(Calendar.MILLISECOND, 0);
boolean gotOne = false;
// loop until we've computed the next time, or we've past the endTime
while (!gotOne) {
……//(省略根据表达式计算下次执行时间的过程,比较长)
gotOne = true;
} // while( !done )
return cl.getTime();
}
上述代码的主要实现思路是:先创建一个Calendar对象,然后将其时间设置为afterTime+1s(afterTime就是前面传入的new Date()),然后再根据表达式计算下次执行时间,然后把年月日时分秒设置到c1中,最后再通过c1.getTime()返回一个Date类型的下次执行时间。
通过代码调试发现,这个表达式计算出来的时间总是在1个小时之后,而疑点就在Calendar身上。下面通过测试代码来验证这个疑点。
在Windows PC上,通过将时区修改到以下时区:
然后执行如下测试代码,得到了令人惊讶的结果:
Date t = new Date();
t.setTime(1382834090248L);
System.out.println("Date1=" + t);
Calendar cl = Calendar.getInstance();
cl.setTime(t);
cl.set(Calendar.MILLISECOND, 0);
cl.setTimeZone(TimeZone.getDefault());
Date t2 = cl.getTime();
System.out.println("Date2=" + t2);
System.out.println("Date3=" + cl.getTimeInMillis());
System.out.println("Diff=" + (t2.getTime() - t.getTime()));
输出如下:
1382834090248
Date1=Sun Oct 27 03:34:50 EEST 2013
Date2=Sun Oct 27 03:34:50 EET 2013
Date3=1382837690000
Diff=3599752
很奇怪,代码中只是将c1的毫秒字段设置为0,但结果Date1和Date2的误差却是将近一个小时!一个小时的时间被私吞了……真可谓差之毫厘,谬以千里。
有意思的是:如果你没有设置c1的任何时间字段(年月日时分秒毫秒),而只是调用了setTime(),那么就不会有上述误差的出现。
这就是为什么Quartz里面获取未来30秒内的任务的时候总是获取不到,任务表达式CronExpression使用java.util.Calendar计算出来的时间总是远在1个小时之后,超过了30秒,自然就取不到将要执行的任务列表了。
问题到这还没有终结,为什么上述简单的测试代码会出现如此大的误差呢?这得从JDK源码入手了。
通过查看java.util.Calendar的实例化方法,通过getInstance()方法获得的实例实际是其子实现类java.util.GregorianCalendar。在该类中,每次设置时间字段(年月日时分秒毫秒等)后,在读取之前都会进行一次时间计算,其中计算方法如下:
/**
*Converts calendar field values to the time value (millisecond
*offset from the Epoch).
*
*@exception IllegalArgumentException if any calendar fields are invalid.
*/
protected void computeTime() {
//In non-lenient mode, perform brief checking of calendar
//fields which have been set externally. Through this
//checking, the field values are stored in originalFields[]
//to see if any of them are normalized later.
……(这里省略对年月日的换算过程)
// millis represents local wall-clock time in milliseconds.
long millis = (fixedDate - EPOCH_OFFSET) * ONE_DAY + timeOfDay;
// Compute the time zone offset and DST offset. There are two potential
// ambiguities here. We'll assumea 2:00 am (wall time) switchover time
// for discussion purposes here.
// 1. The transition into DST. Here, a designated time of 2:00 am - 2:59 am
// can be in standard or in DSTdepending. However, 2:00 am is aninvalid
// representation (therepresentation jumps from 1:59:59 am Std to 3:00:00 am DST).
// We assume standard time.
// 2. The transition out of DST. Here, a designated time of 1:00 am - 1:59 am
// can be in standard orDST. Both are valid representations (therep
// jumps from 1:59:59 DST to1:00:00 Std).
// Again, we assume standardtime.
// We use the TimeZone object, unless the user has explicitly set theZONE_OFFSET
// or DST_OFFSET fields; then we use those fields.
TimeZone zone = getZone();
if(zoneOffsets == null) {
zoneOffsets = new int[2];
}
inttzMask = fieldMask & (ZONE_OFFSET_MASK|DST_OFFSET_MASK);
if(tzMask != (ZONE_OFFSET_MASK|DST_OFFSET_MASK)) {
if (zone instanceof ZoneInfo) {
((ZoneInfo)zone).getOffsetsByWall(millis,zoneOffsets);
} else {
intgmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ?
internalGet(ZONE_OFFSET) :zone.getRawOffset();
zone.getOffsets(millis- gmtOffset, zoneOffsets);
}
}
if(tzMask != 0) {
if (isFieldSet(tzMask, ZONE_OFFSET)) {
zoneOffsets[0]= internalGet(ZONE_OFFSET);
}
if (isFieldSet(tzMask, DST_OFFSET)) {
zoneOffsets[1]= internalGet(DST_OFFSET);
}
}
//Adjust the time zone offset values to get the UTC time.
millis-= zoneOffsets[0] + zoneOffsets[1];
//Set this calendar's time in milliseconds
time= millis;
}
上面代码的计算过程思路大致为:获取Calendar中各个字段,年月日时分秒毫秒,然后各个字段减去基准时间January1, 1970 00:00:00 GMT,换算成毫秒后累加在一起就得到总的毫秒偏移量。
其中在计算小时的时候,由于存在夏令时时间,就会导致对相同的小时字段,会产生误差,比如之前的时间图中,如果小时字段值为2,那这个时间即可能是夏令时T1中的2:00,也可能是非夏令时T2中的2:00,那应该是取哪一个呢?JDK无从知晓。
代码中的标红注释部分也正是解释了这个情况:无论2:00是夏令时段的2:00还是非夏令时段的2:00,都一律视为非夏令时的2:00。这也就导致夏令时的“提前终结”,那1个小时时间就是在java.util. GregorianCalendar中被私吞的。
回过头来再看看之前的测试代码,之前提到的有意思的现象:如果不去设置毫秒字段,输出的结果完全正确。这又是怎么和问题根因对上的呢?
通过代码分析,我们知道,产生误差的根因在于将年月日时分秒毫秒字段换算成毫秒偏移量的时候跳了一个小时。如果我们只用setTime()直接设置了time值,那么再重新获取Time的时候,JDK内部判断各个时间字段(年月日时分秒毫秒)都没有变化的情况下,就会直接返回设置的值。一旦一个以上的字段进行过设置,那么就要来重新根据年月日时分秒毫秒字段换算成毫秒偏移量也就是time值,这时候就会引入误差。
至此,真相终于大白于天下!是JDK在某些情况下私吞了一个小时,也许这不能算是JDK的一个Bug,但不能不说,这是一个隐藏很深的陷阱!
技术上的根因是找到了,但其实还有人的原因,是因为我们对Quartz还不够了解,不了解人家的Best Practice导致的。
查阅了官方说明文档,实际上使用定时任务有两种方式,一种是被我们用烂了的CronTrigger(仅因为长得和Linux的Contab配置一样而受到大家偏爱),另一种是更简单的SimpleTrigger。两种Trigger各自擅长不同的领域,前者花拳绣腿无所不能,但也往往让一知半解的人掉入陷阱;而后者简单朴实,表里如一,专用于以固定频率执行的定时任务。
SimpleTrigger因为有固定频率,不看当前时间是何年何月何时,因而也不会有夏时令的问题。因此,就需要针对我们的应用场景,进行区分对待。能用SimpleTrigger的地方,就一定优先考虑使用SimpleTrigger,比如我们的定时心跳、定时刷新内存等。搞不定的情况下,再考虑使用ConTrigger,而这种场景下,出现两个2:00的时候只在其中一个2:00运行时,这个是可以接受的。比如老化用户Job,只在两个2:00之中运行一次,这个没关系。但对于心跳Job,停一个小时不运行,那可不得了了,那网元的状态就有可能会异常,进而导致严重后果。
使用CronTrigger还有另外一个隐患就是,假如我们把定时任务设置在系统闲时凌晨2:00~3:00之间执行,再假如这恰好是当地夏令时跳变的时间,从非夏令时跳变到夏令时,直接从2:00跳到3:00,那么就在这一天,这个定时任务就没有机会运行!
使用Quartz做定时任务框架时,优先使用SimpleTrigger,谨慎使用CronTrggier!
Daylight Savings Time
Avoid Scheduling Jobs Nearthe Transition Hours of Daylight Savings Time
NOTE: Specifics of the transition hour and the amount of time theclock moves forward or back varies by locale see:https://secure.wikimedia.org/wikipedia/en/wiki/Daylight_saving_time_around_the_world.
SimpleTriggers are notaffected by Daylight Savings Time as they always fire at an exact millisecondin time, and repeat an exact number of milliseconds apart.
Because CronTriggers fire at given hours/minutes/seconds, they aresubject to some oddities when DST transitions occur.
As an example of possible issues, scheduling in the United Stateswithin TimeZones/locations that observe Daylight Savings time, the followingproblems may occur if using CronTrigger and scheduling fire times during thehours of 1:00 AM and 2:00 AM:
· 1:05 AM may occur twice! - duplicate firings on CronTrigger possible
· 2:05 AM may never occur! - missed firings on CronTrigger possible
Again, specifics of time and amount of adjustment varies by locale.
Other trigger types that are based on sliding along a calendar(rather than exact amounts of time), such as CalenderIntervalTrigger, will besimilarly affected - but rather than missing a firing, or firing twice, may endup having it's fire time shifted by an hour.