java多线程抓取新闻

任务描述:
<wbr><wbr><wbr><wbr>有个信息发布网站,按照条件会查询出数据,按页显示,每页15条,每条信息为一个url链接。点击链接再打开一个页面,显示这条的详细信息。</wbr></wbr></wbr></wbr>

<wbr><wbr><wbr><wbr>我们需要做的是把每条的详细信息都抓取下来,保存到数据库中。</wbr></wbr></wbr></wbr>

开始我做好了抓取的所有程序,保存到数据库中。
做完运行,发现速度很慢,因为数据量比较大,大约有30多万条详细信息需要抓取。

就想到这个用多线程来实现真是再好不过了,开15条线程,一次就把一页的详细信息都抓取下来,不是很快吗?

后来加上多线程后,一次开16线程,比原来快了很多,呵呵~!

因为这个工程是用java实现的,下面是java实现这个多线程的主要代码,用c#应该也是一样的!有兴趣的可以翻译成c#的,厚厚~
import java.util.ArrayList;

public class Catcher
{
<wbr><wbr><wbr>private static ArrayList threads= new ArrayList();//存储未处理URL<br><wbr><wbr><wbr>public static boolean isFinished=false;//是否已经把所有的链接存到threads了<br><wbr><wbr><wbr><br><wbr><wbr><wbr>public synchronized String <wbr>getUrl()<br><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>if(threads.size()&gt;0)<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>String tmp = String.valueOf(threads.get(0));<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>threads.remove(0);<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>return tmp;<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}else<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>return null;<br><wbr><wbr><wbr>}<br><wbr><wbr><wbr>public void process(){<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>//处理预处理<br><wbr><wbr><wbr><wbr><wbr><wbr>//下面开10个线程等待处理<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();<br><wbr><wbr><wbr><wbr>new Thread(new Processer(this)).start();</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>

<wbr><wbr><wbr><wbr><wbr><wbr><wbr>//进入翻页的处理<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>for(int j=0;j&lt;10;j++)//从第一页翻到最后一页<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>for(int i = 0;i&lt;15;i++)<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>threads.add(sUrl);//把URL存进去<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>isFinished=true; <wbr><wbr><wbr><wbr><wbr><wbr><wbr><br><wbr><wbr><wbr>}<br><wbr><br><wbr><wbr><wbr><br> }</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>

class Processer implements Runnable
{
<wbr><wbr><wbr>Catcher c;<br><wbr><wbr><wbr>public Processer(Catcher c)<br><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>this.c = c;<br><wbr><wbr><wbr>}<br><wbr><wbr><wbr>public void run()<br><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>String tmp = null;<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>while((tmp=c.getUrl())!=null || !c.isFinished)<wbr>//当还有记录时就处理 <wbr><wbr><wbr><wbr><wbr><wbr><wbr><br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>if(tmp!=null)<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>//处理将一条信息保存到数据库<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}else//如果没标志处理则休眠一秒再重新开始处理<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>try<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>Thread.sleep(1000);<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>} catch (InterruptedException e)<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>{<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>e.printStackTrace();<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr>}<br><wbr><wbr><wbr><wbr><wbr><wbr><wbr><br><wbr><wbr><wbr>}<br> }</wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr></wbr>

你可能感兴趣的:(java多线程)