在我们的某个旧集群系统中,一直有个问题,一个希望单节点执行的任务在多节点都在运行,虽然由于该任务多次执行影响不大,只是CPU占用时间有点多,我们还是希望可以把这个问题解决。简单说下这个任务当前情况:在系统启动的时候进入waiting状态,每次监听到外部消息就执行。
*Task execution thread*
public class Task{
private Queue queue;
private long success;
private long failure;
public void start() throw InterruptedException{
while(true){
Message msg = queue.take();
try{
process(msg);
success++;
lastExecutedTime = System.currentTimeMillis();
} catch (Exception e){
LOG.error("fail to process message",e);
failure++;
}
}
}
}
*Message provider class*
pubic class Provider{
private Queue queue;
private Connection con;
public Provider(){
con.registerCallback(this);
}
private long success;
private long failure;
public void conMsgCallback(Callback data){
Message msg = process(data);
if(!queue.offer(msg)){
success++;
LOG.error("message discard!");
} else {
failure++;
}
}
}
最开始我们希望把这个任务和消息监听单独出去做一个系统,采用Active-Standby模式,简单粗暴有效,可是没有资源扩展。只好内部解决。为了问题的简化,connection listen 打算继续在2个节点同时进行,只单节点化处理任务线程。Oracle有个cluster timer的paper和相关的实现,比较复杂。我们只好自己弄个简单方法来解决这个问题。
在我看来为了实现单节点执行任务以及失败转移,只需要多节点查询共享数据,通过节点信息和时间戳来判断当前由哪个节点执行任务。下面说说解决方案的具体设计思路。
在集群共享数据的存储上有2种想法,一种是利用Cluster Cache&Lock保证同步,另外一种是利用数据库来保证同步。旧系统的数据库远远没有达到性能瓶颈。所以选择使用数据库悲观锁机制来保证数据一致性。给出基于数据库的实现:
table syn_ActiveNode {
long id;
String task;
String ip;
long lastUpdatedTime;
}
insert into syn_ActiveNode values (1,"task","192.168.1.1",0);
在集群同步实现考虑中,放弃了任务失败重启线程设计,改成上报告警人工干预,交于人工干预,任务执行保证交给失败转移实现。在最开始的设计中我们直接在任务线程收到外部消息的时候查询数据库,查看当前节点是否是活跃节点,是否需要执行任务和更新数据库共享数据。这个设计很快就被Pass了,原因是为了性能,每次查询运行节点和相关信息,设置更新节点信息需要时间。任务线程执行耗时太长。接下来就把任务线程中查询和判断执行节点的代码做成定时任务,在任务接收到外部消息时只查询定时任务中存储的当前是否可以运行,让检查和执行任务同时发生的几率减小。
public class Task{
public Task(){
start();
}
// blocking queue used to receive external message
private Queue queue;
private long lastTaskExecutedTime;
public void start() throw InterruptedException{
while(true){
//no guarantee when the msg comes, to check the while loop lives, use timeout here
Message msg = queue.poll(300, TimeUnit.SECONDS);
if(null == msg || !isActiveNode()){
continue;
}
try{
process(msg);
} catch (Exception e){
LOG.error("fail to process message",e);
}
}
lastExecutedTime = System.currentTimeMillis();
}
private Lock lock = new ReentrantLock();
private boolean active = false;
public boolean isActiveNode(){
try{
lock.lock();
if((System.currentTimeMillis() - lastUpdatedTime) > 360,000){
//assume scheduled task is down
active=false;
}
return active;
} finally {
lock.unlock();
}
}
private boolean validStatus(){
return con.isOpen() && ((System.currentTimeMillis() - lastExecutedTime ) > 300,000 * 2);
}
private long lastUpdatedTime;
@Scheduled(cron="0 5 * * * ?")
public void run() {
try{
lock.lock();
ActiveNode info= persistService.loadActiveNodeForUpdate("task");
if(info.getIP().equals(localIP)){
//local node is active node
if(validStatus()){
// task is running well, update status lastRunTime and release read lock
lastUpdatedTime= System.currentTimeMillis();
info.setLastUpdateTime(lastUpdatedTime);
persistSerivce.saveActiveNodeAndReleaseLock(info);
active = true;
} else {
// assume current node is in failure status, report alram
Alarm.report(task.lastTaskExecutedTime, con.isOpen());
active = false;
lastUpdatedTime= System.currentTimeMillis();
}
} else {
if(System.currentTimeMillis() - status.getLastUpdatedTime() > 3 * checkRate){
//assume the active node is down, take the owner here
if(validStatus()){
info.setIP(localIP);
lastUpdatedTime= System.currentTimeMillis();
info.setLastUpdateTime(lastUpdatedTime);
persistSerivce.saveActiveNodeAndReleaseLock(info);
active = true;
} else {
active = false;
lastUpdatedTime= System.currentTimeMillis();
}
} else {
// assume the active node is still live, nothing to do.
active = false;
lastUpdatedTime= System.currentTimeMillis();
}
}
} finally {
lock.unlock();
}
}
}
这种实现需要注意定时状态查询更新功能和业务功能的并发性,加入锁减少并发访问的可能性。同时业务功能查询定时功能提供的状态是要注意状态的更新时间防止定时任务已宕机的情况。定时判断状态也需要检查所有业务运行状态。这个设计的问题在于:自检任务和业务任务职责不清楚,所以把2个职责拆成2个任务:自检任务和业务任务。 自检任务:查看现在是哪个节点在运行任务(activeNode),任务运行是否能够运行正常(lastUpdatedTime & connection status)。为了防止自检任务自己宕机,查询功能被调用的时候要同时检查自检任务自己的最后运行时间,超过自检线程运行周期2倍时间就无视自检线程的当前状态并且上报异常。业务任务则在每次运行的时候查询自检任务的状态,并把业务执行完毕后把自己运行状态上报给自检任务。显然这里有个双向依赖。为了解决这个问题。业务任务依赖自检接口,自检任务提供统一接口函数接收业务任务运行的前置状态和业务任务本身运行状态。一旦其中一个状态出现异常,则发生失败转移。
自检任务接口:
public boolean isCurrentActive();
public TaskStatus registerTask(String taskName, long checkRate);
public boolean reportStatus(String taskName, boolean active);
自检任务:
public class Check{
private class TaskStatus {
public boolean isActive;
public long lastCheckTime;
public long checkRate;
}
private static ConcurrentHashMap map = new ConcurrentHashMap();
private lastRunTime = 0;
private boolean isCurrentActive = false;
public boolean isCurrentActive(){
if(0 != lastRunTime && ((System.currentTimeMillis() - lastUpdatedTime) > 360,000)){
//assume scheduled task is down, it should executed every 5 mins, but no update time for 6 min
isCurrentActive = false;
return false;
}
return isCurrentActive;
}
private boolean isTasksActive(){
/* check related tasks' last report time */
for(TaskStatus status : map.entryset()){
if (System.currentTimeMillis() - status.lastCheckTime > status.checkRate){
return false;
}
if(!status.isActive){
return false;
}
}
return true;
}
public TaskStatus registerTask(String taskName, long checkRate){
TaskStatus status = new TaskStatus(false, 0, checkRate);
map.put(taskName, status);
return status;
}
public void reportStatus(String taskName, boolean active){
TaskStatus status = map.get(taskName);
status.setLastCheckTime(System.currentTimeMillis());
status.setActive(active);
}
/** node active check task */
private String localIP = InetAddress.getLocalHost().getHostAddress();
@Scheduled(cron="0 5 * * * ?")
public void run() {
ActiveNode status = persistService.loadActiveNodeForUpdate("task");
if(null == status.getIP() || status.getIP().equals(localIP)){
//local node is active node
if(isTasksActive()){
// task is running well, update status lastRunTime and release read lock
status.setIP(localIP);
lastRunTime = System.currentTimeMillis();
status.setLastUpdateTime(lastRunTime);
persistSerivce.saveActiveNodeAndReleaseLock(status);
isCurrentActive = true;
reported = false;
} else {
// assume current node is owner but in failure status, report alram and wait for other nodes to take the ownership
if(!reported){Alarm.report(displayTasks());}
isCurrentActive = false;
reported = true;
lastRunTime = System.currentTimeMillis();
}
} else {
if(System.currentTimeMillis() - status.getLastUpdateTime() > 900,000){
//assume the active node is down, system is not updated in 15 mins (max 10 mins gap for task scheduled 5 mins), take the ownership here if task status is okay
if(isTasksActive()){
status.setIP(localIP);
lastRunTime = System.currentTimeMillis();
status.setLastUpdateTime(lastRunTime);
persistSerivce.saveActiveNodeAndReleaseLock(status);
isCurrentActive = true;
reported = false;
} else {
// try to take ownership but system is not in good ship
if(!reported){Alarm.report(displayTasks());}
reported = true;
isCurrentActive = false;
lastRunTime = System.currentTimeMillis();
}
} else {
// assume the active node is still live, nothing to do.
isCurrentActive = false;
lastRunTime = System.currentTimeMillis();
}
}
}
}
业务任务:
check.registerTask("task",360,000);
while(true){
Message msg = queue.poll(300, TimeUnit.SECONDS);
if(null == msg || !check.isCurrentActive()){
continue;
}
try{
process(msg);
success++;
} catch (Exception e){
LOG.error("fail to process message",e);
failure++;
}
}
check.reportStatus("task",true);
}
这里却掉了更新isCurrentActive 和lastRunTime 的lock,原因是因为我们的系统对failover发生过程中重复一两次不敏感,对数据库访问占时长敏感。