1.创建线程
Python中提供了threading模块来创建线程,创建方式有两种。
1)直接通过threading.Thread来创建
import threading
def singing():
print(threading.currentThread().name + " - 正在唱歌 ")
def dancing():
print(threading.currentThread().name + " - 正在跳舞 ")
if __name__ == '__main__':
t1 = threading.Thread(target=singing)
t2 = threading.Thread(target=dancing)
t1.start()
t2.start()
输出:
Thread-1 - 正在唱歌
Thread-2 - 正在跳舞
2)过继承threading.Thread来创建
import threading
class Sing(threading.Thread):
def run(self):
print(threading.currentThread().name + " - 正在唱歌 ")
class Dance(threading.Thread):
def run(self):
print(threading.currentThread().name + " - 正在跳舞 ")
if __name__ == '__main__':
t1 = Sing()
t2 = Dance()
t1.start()
t2.start()
输出:
Thread-1 - 正在唱歌
Thread-2 - 正在跳舞
2.线程池的使用
ThreadPoolExecutor提供了线程池的功能
from concurrent.futures import ThreadPoolExecutor
import time
def download(url):
print("url = %s , downloading..."%url)
time.sleep(3)
return 'success'
# 通过max_workers指定线程池同时能执行的最大线程数
executor = ThreadPoolExecutor(max_workers=2)
task1 = executor.submit(download, 'www.xxx.com')
task2 = executor.submit(download, 'www.yyy.com')
# 会被阻塞
print(task1.result())
print(task2.result())
输出:
url = www.xxx.com , downloading...
url = www.yyy.com , downloading...
success
success
executor通过submit能立即将任务提交到线程池中,通过task获取任务结果会被阻塞,直到任务执行完之后才能获取到任务返回结果。ThreadPoolExecutor提供了is_done方法来判断线程是否执行完,但是一直判断也不合适,ThreadPoolExecutor又提供了一个as_completed方法一次获取所有执行完的线程。
from concurrent.futures import ThreadPoolExecutor, as_completed
import time
def download(url):
print("url = %s , downloading..."%url)
time.sleep(3)
return 'success'
executor = ThreadPoolExecutor(max_workers=2)
all_task = [executor.submit(download, 'www.xxx.com-' + str(i)) for i in range(5)]
for future in as_completed(all_task):
data = future.result()
print(data)
输出:
url = www.xxx.com-0 , downloading...
url = www.xxx.com-1 , downloading...
url = www.xxx.com-2 , downloading...
url = www.xxx.com-3 , downloading...
success
success
url = www.xxx.com-4 , downloading...
success
success
success
小技能daemon提示
daemon和守护线程这方面知识有关, 比如在启动线程前设置thread.setDaemon(True),就是设置该线程为守护线程,表示该线程是不重要的,进程退出时不需要等待这个线程执行完成。这样做的意义在于:避免子线程无限死循环,导致退不出程序。
thread.setDaemo()设置为True, 则主线程执行完毕后会将子线程回收掉,设置为false,主进程执行结束时不会回收子线程
setDaemon()说明
setDaemon() :设置此线程是否被主线程守护回收。默认False不回收,需要在 start 方法前调用;设为True相当于像主线程中注册守护,主线程结束时会将其一并回收。
3.自定义一个线程池
线程池内部实际上也是通过一个个的线程来执行提交的任务,基于此原理可以通过queue来实现一个自定义的线程池。
from multiprocessing import JoinableQueue
import threading
import time
queue = JoinableQueue()
class MyTask():
def __init__(self, name):
self.name = name
def do(self):
print('this is %s'%self.name)
def do_job():
while True:
task = queue.get()
time.sleep(1)
task.do()
queue.task_done()
if __name__ == '__main__':
for i in range(3):
t = threading.Thread(target=do_job)
t.daemon = True
t.start()
time.sleep(3)
for i in range(10):
queue.put(MyTask('myTask - '+ str(i)))
queue.join()
输出:
this is myTask - 0
this is myTask - 2
this is myTask - 1
this is myTask - 5
this is myTask - 3
this is myTask - 4
this is myTask - 6
this is myTask - 7
this is myTask - 8
this is myTask - 9
程序中通过定义三个线程一直循环去获取队列里面的任务,并执行。执行完之后通过queue.task_done()表示当前任务执行完。直到把队列里面的任务都执行完,主进程结束,由于三个任务线程设置为了守护线程,也会随着主进程的结束而结束。
本人是做大数据开发的,在微信上开了个个人号,会经常在上面分享一些学习心得,原创文章都会首发到公众号上,感兴趣的盆友可以关注下哦!
备注:微信公众号搜索‘大数据入坑指南