用java做抓取的时候免不了要用到多线程的了,因为要同时抓取多个网站或一条线程抓取一个网站的话实在太慢,而且有时一条线程抓取同一个网站的话也比较浪费CPU资源。要用到多线程的等方面,也就免不了对线程的控制或用到线程池。 我在做我们现在的那一个抓取框架的时候,就曾经用过java.util.concurrent.ExecutorService作为线程池,关于ExecutorService的使用代码大概如下:
java.util.concurrent.Executors类的API提供大量创建连接池的静态方法:1.固定大小的线程池:
package BackStage;
import java.util.concurrent.Executors;
import java.util.concurrent.ExecutorService;
public class JavaThreadPool {
public static void main(String[] args) {
// 创建一个可重用固定线程数的线程池
ExecutorService pool = Executors.newFixedThreadPool(2);
// 创建实现了Runnable接口对象,Thread对象当然也实现了Runnable接口
Thread t1 = new MyThread();
Thread t2 = new MyThread();
Thread t3 = new MyThread();
Thread t4 = new MyThread();
Thread t5 = new MyThread();
// 将线程放入池中进行执行
pool.execute(t1);
pool.execute(t2);
pool.execute(t3);
pool.execute(t4);
pool.execute(t5);
// 关闭线程池
pool.shutdown();
}
}
class MyThread extends Thread {
@Override
public void run() {
System.out.println(Thread.currentThread().getName() + "正在执行。。。");
}
}
后来发现ExecutorService的功能没有想像中的那么好,而且最多只是提供一个线程的容器而然,所以后来我用改用了java.lang.ThreadGroup,ThreadGroup有很多优势,最重要的一点就是它可以对线程进行遍历,知道那些线程已经运行完毕,还有那些线程在运行。关于ThreadGroup的使用代码如下:
class MyThread extends Thread {
boolean stopped;
MyThread(ThreadGroup tg, String name) {
super(tg, name);
stopped = false;
}
public void run() {
System.out.println(Thread.currentThread().getName() + " starting.");
try {
for (int i = 1; i < 1000; i++) {
System.out.print(".");
Thread.sleep(250);
synchronized (this) {
if (stopped)
break;
}
}
} catch (Exception exc) {
System.out.println(Thread.currentThread().getName() + " interrupted.");
}
System.out.println(Thread.currentThread().getName() + " exiting.");
}
synchronized void myStop() {
stopped = true;
}
}
public class Main {
public static void main(String args[]) throws Exception {
ThreadGroup tg = new ThreadGroup("My Group");
MyThread thrd = new MyThread(tg, "MyThread #1");
MyThread thrd2 = new MyThread(tg, "MyThread #2");
MyThread thrd3 = new MyThread(tg, "MyThread #3");
thrd.start();
thrd2.start();
thrd3.start();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in thread group.");
Thread thrds[] = new Thread[tg.activeCount()];
tg.enumerate(thrds);
for (Thread t : thrds)
System.out.println(t.getName());
thrd.myStop();
Thread.sleep(1000);
System.out.println(tg.activeCount() + " threads in tg.");
tg.interrupt();
}
}
由以上的代码可以看出:ThreadGroup比ExecutorService多以下几个优势
1.ThreadGroup可以遍历线程,知道那些线程已经运行完毕,那些还在运行
2.可以通过ThreadGroup.activeCount知道有多少线程从而可以控制插入的线程数