Java并行开发笔记6.2

阅读更多

shutdownNow的局限性

  shutdownNow是强行关闭ExecutorService的,它会尝试取消正在执行的任务,并返回所以已提交但尚未开始的任务,但是我们无法通过常规方法来找出哪些任务已经开始但尚未结束。这意味着我们无法在关闭过程中知道正在执行的任务的状态,除非任务本身会执行某种检查。为此,设计了一个TrackingExecutor类跟踪在关闭之后被取消的任务,getCancelledTasks返回被取消的任务清单。要使这项技术能发挥作用,任务在返回时必须维持线程的中断状态。具体代码如下:

  

public class TrackingExecutor extends AbstractExecutorService {
	private final ExecutorService exec = Executors.newCachedThreadPool();
	private final Set tasksCancelledAtShutdown = Collections.synchronizedSet(new HashSet());
	
	public List getCancelledTasks(){
		if(!exec.isTerminated()){
			throw new IllegalStateException("线程尚未中断");
		}
		return new ArrayList(tasksCancelledAtShutdown);
	}

	@Override
	public void execute(final Runnable runnable) {
		exec.execute(new Runnable() {
			@Override
			public void run() {
				try{
					runnable.run();
				}finally{
					if(isShutdown() && Thread.currentThread().isInterrupted()){
						tasksCancelledAtShutdown.add(runnable);
					}
				}
			}
		});
	}
	
	
	/*----------------------将ExecutroService的其他方法委托给exec---------------------------------*/
	@Override
	public void shutdown() {
		// TODO Auto-generated method stub
		exec.shutdown();
	}

	@Override
	public List shutdownNow() {
		// TODO Auto-generated method stub
		return exec.shutdownNow();
	}

	@Override
	public boolean isShutdown() {
		// TODO Auto-generated method stub
		return exec.isShutdown();
	}

	@Override
	public boolean isTerminated() {
		// TODO Auto-generated method stub
		return exec.isTerminated();
	}

	@Override
	public boolean awaitTermination(long timeout, TimeUnit unit)
			throws InterruptedException {
		// TODO Auto-generated method stub
		return exec.awaitTermination(timeout, unit);
	}


}

 使用TrackingExecutorService保存未完成的任务以备后续执行,代码如下:

public abstract class WebCrawler {
	private static final long TIMEOUT = 1;
	private static final TimeUnit UNIT = TimeUnit.MILLISECONDS;
	private volatile TrackingExecutor exec;
	@GuardedBy("this")
	private final Set urlsToCrawl = new HashSet();
	
	
	public synchronized void start(){
		exec = new TrackingExecutor(Executors.newCachedThreadPool());
		for(URL url : urlsToCrawl){
			submitCrawlTask(url);
		}
		urlsToCrawl.clear();
	}
	
	public synchronized void stop() throws InterruptedException{
		try{
			saveUncrawled(exec.shutdownNow());
			if(exec.awaitTermination(TIMEOUT, UNIT)){
					saveUncrawled(exec.getCancelledTasks());
			}
		}finally{
			exec = null;
		}
	}

	private void saveUncrawled(List uncrawled){
		for(Runnable task : uncrawled){
			urlsToCrawl.add(((CrawlTask) task).getPage());
		}
	}

	private void submitCrawlTask(URL url) {
		exec.execute(new CrawlTask(url));
	}
	
	
	
	protected abstract List processPage(URL url);
	
	
	private class CrawlTask implements Runnable{
		private final URL url = null;
		
		
		
		public CrawlTask(URL url2) {
		}
		public void run(){
			for(URL link : processPage(url)){
				if(Thread.currentThread().isInterrupted()){
					return;
				}
				submitCrawlTask(link);
			}
		}
		public URL getPage(){return url;}
	}
	
}

   其中在 TrackingExecutor中存在一个不可避免的竞态条件,从而产生“误报”问题:一些被认为已取消的任务实际上已经执行完成。这个问题的原因在于,任务执行最后一条指令以及线程池将任务记录为“结束”的两个时刻之间,线程池可能被关闭。如果任务是幂等的(Idempotent,即将任务执行两次与执行一次会得到相同的结果),那么这不会存在问题,在网页爬虫程序中就是这种情况。否则,在应用程序中必须考虑这种风险,并对“误报”问题做好准备。

 

你可能感兴趣的:(java,并行计算)