概述:python中的一个线程对应于C语言中的一个线程,GIL使得同一时刻只有一个线程在一个cpu上执行字节码,无法将多个线程映射到多个cpu上执行。
GIL释放:GIL会根据执行的字节码行数(执行多少行释放)以及时间片(执行多少毫秒释放)释放GIL,GIL在遇到IO的操作时主动释放。
概述:对于IO操作来说,多线程和多进程性能差别不大。
thread1.setDaemon(True) # setDaemon子进程成为守护进程,当主进程关闭,守护进程关闭。
thread1.join() # join线程阻塞,等线程运行完毕再结束退出
class GetDetailHtml(threading.Thread):
def __init__(self, name):
super().__init(name=name)
def run(self):
print("start")
time.sleep(2)
print("end")
if __name__ == "__main__":
thraed1 = GetDetailHtml("test1")
thread1.start()
thread1.join()
def get_detail_html(queue): # 爬取文章详情
while True:
url = queue.get(queue) # 从队列中取出URL,进行操作
print("url start")
time.sleep(2)
print("url end")
def get_detail_url(queue): # 爬取文章列表页
while True:
print("html start")
for i in range(20):
queue.put("http://projectdu.com/{id}".format(id=i)) # 将爬取的url加入到空队列中
time.sleep(4)
print("html end")
if __name__ == "__main__":
detail_url_queue = Queue(maxsize = 1000) # maxsize队列最大值,对内存有影响
thread_detail_url = threading.Thread(target=get_detail_url, args=(detail_url_queue,))
thread_detail_url.start()
for i in range(10):
html_thread = threading.Thread(target=get_detail_html, args = (detail_url_queue,))
html_thread.start()
detail_url_queue.task_done() # 发送停止信号
detail_url_queue.join # 线程结束,与task_done成对使用
(1)加锁会使运行时间增加,影响一定的性能
(2) 锁会引起死锁
from tareading import Lock
total = 0
lock = Lock()
def add():
global total
global lock
for i in range(10000):
lock.acquire() # 先获取锁
total += 1 # 与desc中的减total互相竞争,所以加锁防止错乱
lock.release() # 一定要释放锁,不释放会导致其他竞争无法获取锁,导致线程停滞
def desc():
global lock
global total
for i in range(10000):
lock.acquire() # 先获取锁
total -= 1
lock.release() # 一定要释放锁
import threading
thread1 = threading.Thread(target=add)
thread2 = threading.Thread(target=desc)
thread1.start()
thread2.start()
thread1.join()
thread2.join()
在同一线程中,可以多次调用acquire,但是也要调用与acquire次数相同的release。
lock = RLock()
def add():
global total
global lock
for i in range(10000):
lock.acquire() # 先获取锁
lock.acquire()
total += 1 # 与desc中的减total互相竞争,所以加锁防止错乱
lock.release()
lock.release()
(1) xiaoai.start()
tianmao.start()启动顺序很重要,一定要先启动最先处于等待状态的,不能先启动发送通知状态的,因为condition.wait()方法只能有condition.notify()方法能唤醒(若先启动发送通知的类会一直处于等待状态)
(2)在调用with condition 之后才能调用wait或notify方法
(3)condition有两层锁,一把底层锁会在线程调用了wait方法的时候释放,上面的锁会在每次调用wait的时候分配一把并放入到condition的等待队列中,等待notify方法的唤醒。
(4)分析文章请参考:https://www.cnblogs.com/yoyoketang/p/8337118.html
class Xiaoai(threading.Thread):
def __init__(self, lock):
super()__init__(name="小爱")
self.condition = condition
def run(self):
with self.condition:
self.condition.wait() # 等待天猫的一个通知,接到通知开始说活
print("{}: 在 ".format(self.name))
self.condition.notify() # 小爱回答完,发出一个通知给天猫
self.condition.wait()
print("{}: 好啊 ".format(self.name))
self.condition.notify()
class Tianmao(threading.Thread):
def __init__(self, lock):
super()__init__(name="天猫精灵")
self.condition = condition
def run(self):
with self.condition:
print("{}: 小爱同学 ".format(self.name))
self.condition.notify() # 天猫说完话,进行一个通知给小爱
self.condition.wait() 等待小爱的通知进行回答
print("{}: 我们来对诗吧 ".format(self.name))
self.condition.notify()
self.condition.wait()
if __name__ == "__main__":
condition = threading.condition()
xiaoai = Xiaoai(condition)
tianmao = Tianmao(condition)
xiaoai.start()
tianmao.start()
对话以此类推:天猫讲话完毕发通知给小爱,此时的小爱处于等待状态,接收到天猫的通知后小爱开始讲话,讲话完毕发出通知给天猫,此时的天猫处于等待状态,接收到小爱的通知后开始
讲话。。。。
概述:是用于控制进入数量的锁
例: 读写,写入时候只允许一个写,读的时候可以有多个读。
class HtmlSpider(threading.Thread):
def__init__(self, sem)
super().__init__()
self.sem = sem
def run(self):
time.sleep(2)
print("got html text success")
self.sem.release()
class UrlProducer(threading.Thread):
def __init__(self, sem)
super().__init__()
self.sem = sem
def run(self):
for i in range(20):
self.sem.acquire() # 对应一个release方法,acquire控制线程的数量,并且使用时会线程会-1,相应的release释放后会线程+1
html_thread = HtmlSpider("http://baidu.com/{}".format(i), self.sem)
html_thread.start()
if __name__ == "__main__":
sem = threading.Semaphore(3) # 开启三个线程
url_prodecer = UrlProducer()
url_prodecer.start()
(1)重要的包
from concurrent import futures, as_completed,wait
(2)主线程中可以获取某一个线程的状态或者某一任务的状态,以及返回值。当一个线程完成的时候我们的主线程能立即知道,futures可以让多线程和多进程编码接口一致。
(3)详细讲解:https://www.jianshu.com/p/b9b3d66aa0be
import time
def get_html(times):
time,sleep(times)
print("get page {} success".format(times))
return times
executor = ThreadPoolExecutor(max_workers=2)
# 通过submit函数提交执行的函数到线程池中,submnit是立即执行
task1 = executor.submit(get_html, (3)) # 参数一是函数名称,参数二是函数的参数
task2 = executor.submit(get_html, (2))
# 要获取已经成功的task的返回
urls = [3,2,4]
(2)
all_task = [executor.submit(get_html,(url)) for url in urls]
wait(all_task) # wait方法可以控制等待某个事件完成后开始其他操作,return_when
print("all_task完成,开始其他操作")
for future in as_completed(all_task): # as_completed只返回完成的task
data = future.result()
print("get {} page".format(data))
(3) 通过executor 的map获取已经完成的task的值
for data in executor.map(get_html, urls): # map方法,就是要urls中的每个数去执行get_html方法
print("get {} page".format(data))
(1)
print(task1.done()) # done用于判定某个任务是否完成
print(task1.result()) # result方法可以获取task的执行结果
from concurrent.futures import ThreadPoolExecutor,as_completed
from concurrent.futures import ProcessPoolExecutor
def fib():
if n<=2:
return 1
return fib(n-1)+fib(n-2)
# 如下多线程运行
with ThreadPoolExecutor(3) as executor:
all_task = [executor.submit(fib, (num) for num in range(25,35)]
start_time = time.time()
for future in as_completed(all_task):
data = future.result()
print("exe result: {} ".format(data))
print("last time is : {}".format(time.time()-start_time))
# 如下多进程运行
"""
注意:A.多进程运行下若不在if __name__ == "__main__":下运行会报错,仅在windows中
报错,linux不会。
"""
if __name__ == "__main__":
with ProcessPoolExecutor(3) as executor:
all_task = [executor.submit(fib, (num) for num in range(25,35)]
start_time = time.time()
for future in as_completed(all_task):
data = future.result()
print("exe result: {} ".format(data))
print("last time is : {}".format(time.time()-start_time))
3、对于IO操作来说,多线程优于多进程(使用sleep模拟io)
def random_sleep(n):
time.sleep(n)
return n
if __name__ == "__main__":
with ThreadPoolExecutor(3) as executor:
all_task = [executor.submit(random_sleep, (num) for num in [2]*30]
start_time = time.time()
for future in as_completed(all_task):
data = future.result()
print("exe result: {} ".format(data))
print("last time is : {}".format(time.time()-start_time))
import time
import multiprocessing
def get_html(n)
time.sleep(n)
print("process run success")
return n
if __name__ == "__main__":
process = multiprocessing.Process(target=get_html, args=(2,))
process.start()
process.join()
print("process run end")
pool = multiprocessing.Pool(multiprocessing.cpu_count()) # cpu_count获取cup个数,以cpu个数作为开启的进程个数
result = pool.apply_async(get_html)
未完,待续。。。