网上有一些公开课视频教程还有课件啥的,手动下太慢了,写个python下载。我想尽可能的做到通用性,以后可以直接用的,代码如下,抛砖引玉,欢迎建议和意见:
[python] view plain copy print ?
- import urllib.request
- import re
- import queue
- import threading
- import os
- class download(threading.Thread):
- def __init__(self,que):
- threading.Thread.__init__(self)
- self.que=que
- def run(self):
- while True:
- if not self.que.empty():
- print('-----%s------'%(self.name))
- os.system('wget '+self.que.get())
- else:
- break
- def startDown(url,rule,num,start,end,decoding=None):
- if not decoding:
- decoding='utf8'
- req=urllib.request.urlopen(url)
- body=req.read().decode(decoding)
- rule=re.compile(rule)
- link=rule.findall(body)
- que=queue.Queue()
- for l in link:
- que.put(l[start:end])
- for i in range(num):
- d=download(que)
- d.start()
- if __name__=='__main__':
- url='https://class.coursera.org/algo-004/lecture/index'
- rule='<a target=\"_new\" href=\".*\"'
- startDown(url,rule,10,23,-1)
import urllib.request
import re
import queue
import threading
import os
class download(threading.Thread):
def __init__(self,que):
threading.Thread.__init__(self)
self.que=que
def run(self):
while True:
if not self.que.empty():
print('-----%s------'%(self.name))
os.system('wget '+self.que.get())
else:
break
def startDown(url,rule,num,start,end,decoding=None):
if not decoding:
decoding='utf8'
req=urllib.request.urlopen(url)
body=req.read().decode(decoding)
rule=re.compile(rule)
link=rule.findall(body)
que=queue.Queue()
for l in link:
que.put(l[start:end])
for i in range(num):
d=download(que)
d.start()
if __name__=='__main__':
url='https://class.coursera.org/algo-004/lecture/index'
rule='<a target=\"_new\" href=\".*\"'
startDown(url,rule,10,23,-1)
简单说一下:download类继承了threading.Thread类,并重写了run函数,目的是只要队列不为空,则不停的从队列中取出资源真实链接地址调用wget下载,如果为空则退出线程。startDown函数是多线程下载的接口,里面的参数分别为:url--资源的网页,rule--正则表达式匹配方式,num--开启的线程数,start--正则中匹配真实链接的起始位置,end--正则中匹配真实链接的结束位置,decoding--资源页面采用的编码方式,默认是utf8。
下面是我运行时的样子:
好了,下次要下载直接import这个文件就妥了~哈哈~希望得到大家批评,我也希望进步快点~
转载请注明:转自 http://blog.csdn.net/littlethunder/article/details/9396059