在网络编程中,有一些有的时候比较难处理的地方,尤其是在系统性能要求比较高的时候,
(1)类似于gethostbyname,getaddrinfo这些操作,不能阻塞
(2)connect方法也不能阻塞
不过一个优秀成熟的服务器系统都需要能够完美的解决这些问题。。。
gevnet应该算是吧,起码这些问题都得到比较好的解决。
在gevent中默认将会使用线程池来解决gethostbyname相关的额一些操作,刚开始学习gevent都会比较的困惑,为啥都用了reactor了,为啥还要线程池,
恩,主要目的就是解决一些epoll本身并不能解决的阻塞操作,例如gethostbyname
我们来看看这个Resolver是怎么定义的吧:
class Resolver(object): expected_errors = Exception def __init__(self, hub=None): if hub is None: hub = get_hub() self.pool = hub.threadpool def __repr__(self): return '<gevent.resolver_thread.Resolver at 0x%x pool=%r>' % (id(self), self.pool) def close(self): pass # from briefly reading socketmodule.c, it seems that all of the functions # below are thread-safe in Python, even if they are not thread-safe in C. def gethostbyname(self, *args): return self.pool.apply_e(self.expected_errors, _socket.gethostbyname, args) def gethostbyname_ex(self, *args): return self.pool.apply_e(self.expected_errors, _socket.gethostbyname_ex, args) def getaddrinfo(self, *args, **kwargs): return self.pool.apply_e(self.expected_errors, _socket.getaddrinfo, args, kwargs) def gethostbyaddr(self, *args, **kwargs): return self.pool.apply_e(self.expected_errors, _socket.gethostbyaddr, args, kwargs) def getnameinfo(self, *args, **kwargs): return self.pool.apply_e(self.expected_errors, _socket.getnameinfo, args, kwargs)
代码都很少,其实就将这些操作本身直接派发到线程池中去运行,这样就只会阻塞发起这些方法本身的协程了,而不会阻塞整个系统。
那么接下来问题就到了gevent是如何实现协程与python线程之间的协作的,主要问题在于:
业务协程如何提交任务到线程池,然后协程挂起,等待任务在线程池中执行完毕再恢复协程的执行?
线程池的代码在threadpool.py文件,本身的定义也跟标准的线程池一样,都是一个任务对咧,然后多个工作线程,我们来看看一般是怎么调度任务到线程池中运行的吧:
# XXX apply() should re-raise error by default # XXX because that's what builtin apply does # XXX check gevent.pool.Pool.apply and multiprocessing.Pool.apply def apply_e(self, expected_errors, function, args=None, kwargs=None): """ 任务的执行将会挂起当前的协程,当任务执行完毕之后当前协程才会被调度 """ if args is None: args = () if kwargs is None: kwargs = {} # 将任务放到任务队列里面,然后等待并返回结果 success, result = self.spawn(wrap_errors, expected_errors, function, args, kwargs).get() if success: return result raise result def apply(self, func, args=None, kwds=None): """Equivalent of the apply() builtin function. It blocks till the result is ready.""" if args is None: args = () if kwds is None: kwds = {} return self.spawn(func, *args, **kwds).get()
主要就在于spawn方法的实现:
def spawn(self, func, *args, **kwargs): """ 将这个执行封装成一个任务,放到任务队列里面去,返回ThreadResult对象 """ while True: semaphore = self._semaphore semaphore.acquire() # 获取当前的锁 if semaphore is self._semaphore: break try: task_queue = self.task_queue # 获取工作队列 result = AsyncResult() # 创建异步结果 thread_result = ThreadResult(result, hub=self.hub) # 创建线程结果来包装异步结果 task_queue.put((func, args, kwargs, thread_result)) # 将当前任务放到任务队列里面 self.adjust() # 调整线程池大小 # rawlink() must be the last call result.rawlink(lambda *args: self._semaphore.release()) # 在异步结果上面挂起回调 # XXX this _semaphore.release() is competing for order with get() # XXX this is not good, just make ThreadResult release the semaphore before doing anything else except: semaphore.release() # 释放信号量 raise return result
class ThreadResult(object): def __init__(self, receiver, hub=None): if hub is None: hub = get_hub() self.receiver = receiver # AsyncResult self.hub = hub self.value = None self.context = None self.exc_info = None self.async = hub.loop.async() # 在loop上面创建async对象 self.async.start(self._on_async) # 启动这个监听对象 def _on_async(self): """ 在另外的线程任务完成之后,会调用async的send,用于通知 也就是执行这里的回调 """ self.async.stop() try: if self.exc_info is not None: try: self.hub.handle_error(self.context, *self.exc_info) finally: self.exc_info = None self.context = None self.async = None self.hub = None if self.receiver is not None: # XXX exception!!!? self.receiver(self) # 相当于是通知AsyncResult对象 finally: self.receiver = None self.value = None def set(self, value): """ 在另外一个线程中,如果执行完了,那么将会调用这个方法来通知async """ self.value = value self.=.send() def handle_error(self, context, exc_info): self.context = context self.exc_info = exc_info self.async.send() # link protocol: def successful(self): return True
这里,就基本上都清除了,关键在于async对象,其实就是libev的ev_async对象,通过这种类型的watcher来实现python的线程与协程之间的协作的。
这样就算理顺了gevent线程池这部分的实现逻辑。