gevent的resolver与线程池

在网络编程中,有一些有的时候比较难处理的地方,尤其是在系统性能要求比较高的时候,

(1)类似于gethostbyname,getaddrinfo这些操作,不能阻塞

(2)connect方法也不能阻塞


不过一个优秀成熟的服务器系统都需要能够完美的解决这些问题。。。

gevnet应该算是吧,起码这些问题都得到比较好的解决。


在gevent中默认将会使用线程池来解决gethostbyname相关的额一些操作,刚开始学习gevent都会比较的困惑,为啥都用了reactor了,为啥还要线程池,

恩,主要目的就是解决一些epoll本身并不能解决的阻塞操作,例如gethostbyname


我们来看看这个Resolver是怎么定义的吧:

class Resolver(object):

    expected_errors = Exception

    def __init__(self, hub=None):
        if hub is None:
            hub = get_hub()
        self.pool = hub.threadpool

    def __repr__(self):
        return '<gevent.resolver_thread.Resolver at 0x%x pool=%r>' % (id(self), self.pool)

    def close(self):
        pass

    # from briefly reading socketmodule.c, it seems that all of the functions
    # below are thread-safe in Python, even if they are not thread-safe in C.

    def gethostbyname(self, *args):
        return self.pool.apply_e(self.expected_errors, _socket.gethostbyname, args)

    def gethostbyname_ex(self, *args):
        return self.pool.apply_e(self.expected_errors, _socket.gethostbyname_ex, args)

    def getaddrinfo(self, *args, **kwargs):
        return self.pool.apply_e(self.expected_errors, _socket.getaddrinfo, args, kwargs)

    def gethostbyaddr(self, *args, **kwargs):
        return self.pool.apply_e(self.expected_errors, _socket.gethostbyaddr, args, kwargs)

    def getnameinfo(self, *args, **kwargs):
        return self.pool.apply_e(self.expected_errors, _socket.getnameinfo, args, kwargs)

代码都很少,其实就将这些操作本身直接派发到线程池中去运行,这样就只会阻塞发起这些方法本身的协程了,而不会阻塞整个系统。


那么接下来问题就到了gevent是如何实现协程与python线程之间的协作的,主要问题在于:

业务协程如何提交任务到线程池,然后协程挂起,等待任务在线程池中执行完毕再恢复协程的执行?


线程池的代码在threadpool.py文件,本身的定义也跟标准的线程池一样,都是一个任务对咧,然后多个工作线程,我们来看看一般是怎么调度任务到线程池中运行的吧:

 # XXX apply() should re-raise error by default
    # XXX because that's what builtin apply does
    # XXX check gevent.pool.Pool.apply and multiprocessing.Pool.apply
    def apply_e(self, expected_errors, function, args=None, kwargs=None):
        """
        任务的执行将会挂起当前的协程,当任务执行完毕之后当前协程才会被调度
        """
        if args is None:
            args = ()
        if kwargs is None:
            kwargs = {}

        # 将任务放到任务队列里面,然后等待并返回结果
        success, result = self.spawn(wrap_errors, expected_errors, function, args, kwargs).get()
        if success:
            return result
        raise result

    def apply(self, func, args=None, kwds=None):
        """Equivalent of the apply() builtin function. It blocks till the result is ready."""
        if args is None:
            args = ()
        if kwds is None:
            kwds = {}
        return self.spawn(func, *args, **kwds).get()

主要就在于spawn方法的实现:

def spawn(self, func, *args, **kwargs):
        """
        将这个执行封装成一个任务,放到任务队列里面去,返回ThreadResult对象
        """
        while True:
            semaphore = self._semaphore
            semaphore.acquire()                                      # 获取当前的锁
            if semaphore is self._semaphore:
                break
        try:
            task_queue = self.task_queue                             # 获取工作队列
            result = AsyncResult()                                   # 创建异步结果
            thread_result = ThreadResult(result, hub=self.hub)       # 创建线程结果来包装异步结果
            task_queue.put((func, args, kwargs, thread_result))      # 将当前任务放到任务队列里面
            self.adjust()                                            # 调整线程池大小
            # rawlink() must be the last call
            result.rawlink(lambda *args: self._semaphore.release())  # 在异步结果上面挂起回调
            # XXX this _semaphore.release() is competing for order with get()
            # XXX this is not good, just make ThreadResult release the semaphore before doing anything else
        except:
            semaphore.release()                                      # 释放信号量
            raise
        return result

这里可以看到,其实构建了ThreadResult来进行当前协程与当前线程之间的协作,然后通过AsyncResult来进行当前协程的调度,那么实现的关键就在于 ThreadResult的实现了,它是如何实现线程间的协作的:
class ThreadResult(object):

    def __init__(self, receiver, hub=None):
        if hub is None:
            hub = get_hub()
        self.receiver = receiver               # AsyncResult
        self.hub = hub
        self.value = None
        self.context = None
        self.exc_info = None
        self.async = hub.loop.async()          # 在loop上面创建async对象
        self.async.start(self._on_async)       # 启动这个监听对象

    def _on_async(self):
        """
        在另外的线程任务完成之后,会调用async的send,用于通知
        也就是执行这里的回调
        """
        self.async.stop()
        try:
            if self.exc_info is not None:
                try:
                    self.hub.handle_error(self.context, *self.exc_info)
                finally:
                    self.exc_info = None
            self.context = None
            self.async = None
            self.hub = None
            if self.receiver is not None:
                # XXX exception!!!?
                self.receiver(self)                 # 相当于是通知AsyncResult对象
        finally:
            self.receiver = None
            self.value = None

    def set(self, value):
        """
        在另外一个线程中,如果执行完了,那么将会调用这个方法来通知async
        """
        self.value = value
        self.=.send()

    def handle_error(self, context, exc_info):
        self.context = context
        self.exc_info = exc_info
        self.async.send()

    # link protocol:
    def successful(self):
        return True

这里,就基本上都清除了,关键在于async对象,其实就是libev的ev_async对象,通过这种类型的watcher来实现python的线程与协程之间的协作的。


这样就算理顺了gevent线程池这部分的实现逻辑。

你可能感兴趣的:(gevent的resolver与线程池)