openstack-ocata版本nova MQ(rpc)接收端(server)浅析

首先我们看服务端启动过程,在此取compute节点为例:
在nova安装时,会调取pbr模块,根据setup.cfg中相关信息生成启动服务的console_scripts,详情可见pbr官方文档:https://docs.openstack.org/developer/pbr/
安装好nova包并运行启动脚本nova-compute后,会执行nova.cmd.compute中的main()函数:

def main():
    config.parse_args(sys.argv)
    logging.setup(CONF, 'nova')
    priv_context.init(root_helper=shlex.split(utils.get_root_helper()))
    utils.monkey_patch()
    objects.register_all()
    # Ensure os-vif objects are registered and plugins loaded
    os_vif.initialize()

    gmr.TextGuruMeditation.setup_autorun(version)

    cmd_common.block_db_access('nova-compute')
    objects_base.NovaObject.indirection_api = conductor_rpcapi.ConductorAPI()

    server = service.Service.create(binary='nova-compute',
                                    topic=CONF.compute_topic)
    service.serve(server)
    service.wait()

首先我们看到其中config.parse_args(sys.argv)包含了一个rpc.init(CONF)逻辑,指向了nova.rpc.init()方法。这个方法中定义了一个全局变量TRANSPORT,其值调取了另外两个方法:TRANSPORT = create_transport(get_transport_url())

代码逻辑清晰,不在此贴出,简单阐述逻辑:

get_transport_url()返回了一个oslo_messaging.transport.TransportURL类的实例,其transport属性为字符串’rabbit’。create_transport()将上述获得的实例作为接收的初始化参数,最终生成了一个oslo_messaging.transport.Transport类实例,其初始化参数driver=oslo_messaging._drivers_impl_rabbit.RabbitDriver类实例,此实例的初始化参数:
conf=nova.rpc.CONF, url=之前获得的TransportURL实例, defalut_exchange='openstack', allowed_remote_exmods=[nova.exception.__name__,]

回到nova.cmd.compute.main,继续看
server=service.Service.create(binary='nova-compute',topic=CONF.compute_topic)
此方法作为nova.service.Service类实例的生成方法,对参数进行了初始化。下方service.serve(server)方法,逻辑为通过新建线程装载上面获得的的server,调用其start方法。我们来看start方法(nova.service.start())中关于rpc的部分:

target = messaging.Target(topic=self.topic, server=self.host)

        endpoints = [
            self.manager,
            baserpc.BaseRPCAPI(self.manager.service_name, self.backdoor_port)
        ]
        endpoints.extend(self.manager.additional_endpoints)

        serializer = objects_base.NovaObjectSerializer()

        self.rpcserver = rpc.get_server(target, endpoints, serializer)
        self.rpcserver.start()

分析逻辑前先明确几个变量的值:

self.topic = 'compute', 
self.host = 配置文件中写明的物理机地址,
self.manager = nova.compute.manager.ComputManagerself.manager.service_name = 'compute',
self.additional_endpoints = []

target = messaging.Target(topic=self.topic, server=self.host)返回了一个oslo_messaging.target.Target类的实例,具体作用在调用时分析。
endpoints = [self.manager, baserpc.BaseRPCAPI(self.manager.service_name, self.backdoor_port)]中,我们只分析self.manager作为endpoint时的情况。
serializer = objects_base.NovaObjectSerializer()返回了nova.objects.base.NovaObjectSerializer实例,具体作用在调用时分析。

下面我们看self.rpcserver = rpc.get_server(target, endpoints, serializer)

def get_server(target, endpoints, serializer=None):
    assert TRANSPORT is not None

    if profiler:
        serializer = ProfilerRequestContextSerializer(serializer)
    else:
        serializer = RequestContextSerializer(serializer)

    return messaging.get_rpc_server(TRANSPORT,
                                    target,
                                    endpoints,
                                    executor='eventlet',
                                    serializer=serializer)

因为在之前rpc.init()函数中已经定义了TRANSPORT全局变量的值,所以可以通过assert逻辑。
我们在此考虑不启用profiler的情况,则之前的serializer在此被RequestContextSerializer类包装了一层,具体逻辑在调用时进行分析。之后调用了oslo_messaging.get_rpc_server()方法,并将之前定义的一些参数传入:

def get_rpc_server(transport, target, endpoints,
                   executor='blocking', serializer=None, access_policy=None):
    """Construct an RPC server.

    :param transport: the messaging transport
    :type transport: Transport
    :param target: the exchange, topic and server to listen on
    :type target: Target
    :param endpoints: a list of endpoint objects
    :type endpoints: list
    :param executor: name of a message executor - for example
                     'eventlet', 'blocking'
    :type executor: str
    :param serializer: an optional entity serializer
    :type serializer: Serializer
    :param access_policy: an optional access policy.
           Defaults to LegacyRPCAccessPolicy
    :type access_policy: RPCAccessPolicyBase
    """
    dispatcher = rpc_dispatcher.RPCDispatcher(endpoints, serializer,
                                              access_policy)
    return RPCServer(transport, target, dispatcher, executor)

dispatcher在后面解析,我们发现该方法最终返回了一个oslo_messaging.rpc.server.RPCServer实例。
再回到nova.service.Service.start()中,发现定义了self.rpcserver后直接调用了其start方法,即RPCServer实例的start方法。此方法在RPCServer的父类oslo_messaging.server.MessageHandlingServer中定义:

    @ordered(reset_after='stop')
    def start(self, override_pool_size=None):
        """Start handling incoming messages.

        This method causes the server to begin polling the transport for
        incoming messages and passing them to the dispatcher. Message
        processing will continue until the stop() method is called.

        The executor controls how the server integrates with the applications
        I/O handling strategy - it may choose to poll for messages in a new
        process, thread or co-operatively scheduled coroutine or simply by
        registering a callback with an event loop. Similarly, the executor may
        choose to dispatch messages in a new thread, coroutine or simply the
        current thread.
        """
        # Warn that restarting will be deprecated
        if self._started:
            LOG.warning(_LW('Restarting a MessageHandlingServer is inherently '
                            'racy. It is deprecated, and will become a noop '
                            'in a future release of oslo.messaging. If you '
                            'need to restart MessageHandlingServer you should '
                            'instantiate a new object.'))
        self._started = True

        executor_opts = {}

        if self.executor_type == "threading":
            executor_opts["max_workers"] = (
                override_pool_size or self.conf.executor_thread_pool_size
            )
        elif self.executor_type == "eventlet":
            eventletutils.warn_eventlet_not_patched(
                expected_patched_modules=['thread'],
                what="the 'oslo.messaging eventlet executor'")
            executor_opts["max_workers"] = (
                override_pool_size or self.conf.executor_thread_pool_size
            )

        self._work_executor = self._executor_cls(**executor_opts)

        try:
            self.listener = self._create_listener()
        except driver_base.TransportDriverError as ex:
            raise ServerListenError(self.target, ex)

        self.listener.start(self._on_incoming)

还是先看几个实例属性或变量:

self.transport = transport
self._target = target
self.executor_type = executor = 'eventlet'

self._executor_cls = mgr.driver = stevedore.driver.DriverManager().driver = futurist.GreenThreadPoolExecutor类(参照之前entry_points插件加载方法)

executor_opts['max_workers'] = 64(默认值)

self._work_executor = self._executor_cls(**executor_opts) = futurist.GreenThreadPoolExecutor(**executor_opts)

下面看_create_listener()在子类RPCServer中被覆写:

    def _create_listener(self):
        return self.transport._listen(self._target, 1, None)

不难理解_listen方法是定义在之前rpc.init()中全局变量TRANSPORT中,即oslo_messaging.transport.Transport实例的_listen():

    def _listen(self, target, batch_size, batch_timeout):
        if not (target.topic and target.server):
            raise exceptions.InvalidTarget('A server\'s target must have '
                                           'topic and server names specified',
                                           target)
        return self._driver.listen(target, batch_size,
                                   batch_timeout)

而其又返回了self._driver.listen方法,也不难得知为oslo_messaging._drivers.amqpdriver类的listen方法:

    def listen(self, target, batch_size, batch_timeout):
        conn = self._get_connection(rpc_common.PURPOSE_LISTEN)

        listener = AMQPListener(self, conn)

        conn.declare_topic_consumer(exchange_name=self._get_exchange(target),
                                    topic=target.topic,
                                    callback=listener)
        conn.declare_topic_consumer(exchange_name=self._get_exchange(target),
                                    topic='%s.%s' % (target.topic,
                                                     target.server),
                                    callback=listener)
        conn.declare_fanout_consumer(target.topic, listener)

        return base.PollStyleListenerAdapter(listener, batch_size,
                                             batch_timeout)

至此,终于到了实际定义消息队列的逻辑。
首先,可以看到当RabbitDriver类实例初始化时:

    connection_pool = pool.ConnectionPool(
            conf, max_size, min_size, ttl,
            url, Connection)

    super(RabbitDriver, self).__init__(
            conf, url,
            connection_pool,
            default_exchange,
            allowed_remote_exmods

进入了父类的AMQPDriverBase的初始化,而_get_connection方法则在此层定义:

    def __init__(self, conf, url, connection_pool,
                 default_exchange=None, allowed_remote_exmods=None):
        super(AMQPDriverBase, self).__init__(conf, url, default_exchange,
                                             allowed_remote_exmods)

        self._default_exchange = default_exchange

        self._connection_pool = connection_pool

        self._reply_q_lock = threading.Lock()
        self._reply_q = None
        self._reply_q_conn = None
        self._waiter = None

    def _get_connection(self, purpose=rpc_common.PURPOSE_SEND):
        return rpc_common.ConnectionContext(self._connection_pool,
                                            purpose=purpose)

解析conn = self._get_connection(rpc_common.PURPOSE_LISTEN),其中purpose = rpc_common.PURPOSE_LISTEN = 'listen',返回了一个oslo_messaging._drivers.common.ConnectionContext类的实例,接收参数为connection_pool = self._connection_pool, purpose='listen'
再看ConnectionContext类的__init__():

    def __init__(self, connection_pool, purpose):
        """Create a new connection, or get one from the pool."""
        self.connection = None
        self.connection_pool = connection_pool
        pooled = purpose == PURPOSE_SEND
        if pooled:
            self.connection = connection_pool.get()
        else:
            # a non-pooled connection is requested, so create a new connection
            self.connection = connection_pool.create(purpose)
        self.pooled = pooled
        self.connection.pooled = pooled

这里我们关注self.connection = connection_pool.create(purpose),可知调用了oslo_messaging._drivers.pool.ConnectionPool的create方法:

    def create(self, purpose=common.PURPOSE_SEND):
        LOG.debug('Pool creating new connection')
        return self.connection_cls(self.conf, self.url, purpose)

逻辑可知此处返回的是oslo_messaging._drivers.impl_rabbit.Connection类的实例。

综上可知:

conn = oslo_messaging._drivers.common.ConnectionContext的实例,其属性
self.connection = oslo_messaging._drivers.impl_rabbit.Connection类的实例。

且因为ConnectionContext类覆写了获取属性的方法:
    def __getattr__(self, key):
        """Proxy all other calls to the Connection instance."""
        if self.connection:
            return getattr(self.connection, key)
        else:
            raise InvalidRPCConnectionReuse()
直接获取Connection实例的属性。

我们看Connection类的初始化方法:

    def __init__(self, conf, url, purpose):
        # NOTE(viktors): Parse config options
        driver_conf = conf.oslo_messaging_rabbit

        self.max_retries = driver_conf.rabbit_max_retries
        self.interval_start = driver_conf.rabbit_retry_interval
        self.interval_stepping = driver_conf.rabbit_retry_backoff
        self.interval_max = driver_conf.rabbit_interval_max

        self.login_method = driver_conf.rabbit_login_method
        self.fake_rabbit = driver_conf.fake_rabbit
        self.virtual_host = driver_conf.rabbit_virtual_host
        self.rabbit_hosts = driver_conf.rabbit_hosts
        self.rabbit_port = driver_conf.rabbit_port
        self.rabbit_userid = driver_conf.rabbit_userid
        self.rabbit_password = driver_conf.rabbit_password
        self.rabbit_ha_queues = driver_conf.rabbit_ha_queues
        self.rabbit_transient_queues_ttl = \
            driver_conf.rabbit_transient_queues_ttl
        self.rabbit_qos_prefetch_count = driver_conf.rabbit_qos_prefetch_count
        self.heartbeat_timeout_threshold = \
            driver_conf.heartbeat_timeout_threshold
        self.heartbeat_rate = driver_conf.heartbeat_rate
        self.kombu_reconnect_delay = driver_conf.kombu_reconnect_delay
        self.amqp_durable_queues = driver_conf.amqp_durable_queues
        self.amqp_auto_delete = driver_conf.amqp_auto_delete
        self.rabbit_use_ssl = driver_conf.rabbit_use_ssl
        self.kombu_missing_consumer_retry_timeout = \
            driver_conf.kombu_missing_consumer_retry_timeout
        self.kombu_failover_strategy = driver_conf.kombu_failover_strategy
        self.kombu_compression = driver_conf.kombu_compression

        if self.rabbit_use_ssl:
            self.kombu_ssl_version = driver_conf.kombu_ssl_version
            self.kombu_ssl_keyfile = driver_conf.kombu_ssl_keyfile
            self.kombu_ssl_certfile = driver_conf.kombu_ssl_certfile
            self.kombu_ssl_ca_certs = driver_conf.kombu_ssl_ca_certs

        # Try forever?
        if self.max_retries <= 0:
            self.max_retries = None

        if url.virtual_host is not None:
            virtual_host = url.virtual_host
        else:
            virtual_host = self.virtual_host

        self._url = ''
        if self.fake_rabbit:
            LOG.warning(_LW("Deprecated: fake_rabbit option is deprecated, "
                            "set rpc_backend to kombu+memory or use the fake "
                            "driver instead."))
            self._url = 'memory://%s/' % virtual_host
        elif url.hosts:
            if url.transport.startswith('kombu+'):
                LOG.warning(_LW('Selecting the kombu transport through the '
                                'transport url (%s) is a experimental feature '
                                'and this is not yet supported.'),
                            url.transport)
            if len(url.hosts) > 1:
                random.shuffle(url.hosts)
            for host in url.hosts:
                transport = url.transport.replace('kombu+', '')
                transport = transport.replace('rabbit', 'amqp')
                self._url += '%s%s://%s:%s@%s:%s/%s' % (
                    ";" if self._url else '',
                    transport,
                    parse.quote(host.username or ''),
                    parse.quote(host.password or ''),
                    self._parse_url_hostname(host.hostname) or '',
                    str(host.port or 5672),
                    virtual_host)
        elif url.transport.startswith('kombu+'):
            # NOTE(sileht): url have a + but no hosts
            # (like kombu+memory:///), pass it to kombu as-is
            transport = url.transport.replace('kombu+', '')
            self._url = "%s://%s" % (transport, virtual_host)
        else:
            if len(self.rabbit_hosts) > 1:
                random.shuffle(self.rabbit_hosts)
            for adr in self.rabbit_hosts:
                hostname, port = netutils.parse_host_port(
                    adr, default_port=self.rabbit_port)
                self._url += '%samqp://%s:%s@%s:%s/%s' % (
                    ";" if self._url else '',
                    parse.quote(self.rabbit_userid, ''),
                    parse.quote(self.rabbit_password, ''),
                    self._parse_url_hostname(hostname), port,
                    virtual_host)

        self._initial_pid = os.getpid()

        self._consumers = {}
        self._producer = None
        self._new_tags = set()
        self._active_tags = {}
        self._tags = itertools.count(1)

        # Set of exchanges and queues declared on the channel to avoid
        # unnecessary redeclaration. This set is resetted each time
        # the connection is resetted in Connection._set_current_channel
        self._declared_exchanges = set()
        self._declared_queues = set()

        self._consume_loop_stopped = False
        self.channel = None
        self.purpose = purpose

        # NOTE(sileht): if purpose is PURPOSE_LISTEN
        # we don't need the lock because we don't
        # have a heartbeat thread
        if purpose == rpc_common.PURPOSE_SEND:
            self._connection_lock = ConnectionLock()
        else:
            self._connection_lock = DummyConnectionLock()

        self.connection_id = str(uuid.uuid4())
        self.name = "%s:%d:%s" % (os.path.basename(sys.argv[0]),
                                  os.getpid(),
                                  self.connection_id)
        self.connection = kombu.connection.Connection(
            self._url, ssl=self._fetch_ssl_params(),
            login_method=self.login_method,
            heartbeat=self.heartbeat_timeout_threshold,
            failover_strategy=self.kombu_failover_strategy,
            transport_options={
                'confirm_publish': True,
                'client_properties': {
                    'capabilities': {
                        'authentication_failure_close': True,
                        'connection.blocked': True,
                        'consumer_cancel_notify': True
                    },
                    'connection_name': self.name},
                'on_blocked': self._on_connection_blocked,
                'on_unblocked': self._on_connection_unblocked,
            },
        )

        LOG.debug('[%(connection_id)s] Connecting to AMQP server on'
                  ' %(hostname)s:%(port)s',
                  self._get_connection_info())

        # NOTE(sileht): kombu recommend to run heartbeat_check every
        # seconds, but we use a lock around the kombu connection
        # so, to not lock to much this lock to most of the time do nothing
        # expected waiting the events drain, we start heartbeat_check and
        # retrieve the server heartbeat packet only two times more than
        # the minimum required for the heartbeat works
        # (heatbeat_timeout/heartbeat_rate/2.0, default kombu
        # heartbeat_rate is 2)
        self._heartbeat_wait_timeout = (
            float(self.heartbeat_timeout_threshold) /
            float(self.heartbeat_rate) / 2.0)
        self._heartbeat_support_log_emitted = False

        # NOTE(sileht): just ensure the connection is setuped at startup
        with self._connection_lock:
            self.ensure_connection()

        # NOTE(sileht): if purpose is PURPOSE_LISTEN
        # the consume code does the heartbeat stuff
        # we don't need a thread
        self._heartbeat_thread = None
        if purpose == rpc_common.PURPOSE_SEND:
            self._heartbeat_start()

        LOG.debug('[%(connection_id)s] Connected to AMQP server on '
                  '%(hostname)s:%(port)s via [%(transport)s] client with'
                  ' port %(client_port)s.',
                  self._get_connection_info())

        # NOTE(sileht): value chosen according the best practice from kombu
        # http://kombu.readthedocs.org/en/latest/reference/kombu.common.html#kombu.common.eventloop
        # For heatbeat, we can set a bigger timeout, and check we receive the
        # heartbeat packets regulary
        if self._heartbeat_supported_and_enabled():
            self._poll_timeout = self._heartbeat_wait_timeout
        else:
            self._poll_timeout = 1

        if self._url.startswith('memory://'):
            # Kludge to speed up tests.
            self.connection.transport.polling_interval = 0.0
            # Fixup logging
            self.connection.hostname = "memory_driver"
            self.connection.port = 1234
            self._poll_timeout = 0.05

    # FIXME(markmc): use oslo sslutils when it is available as a library
    _SSL_PROTOCOLS = {
        "tlsv1": ssl.PROTOCOL_TLSv1,
        "sslv23": ssl.PROTOCOL_SSLv23
    }

    _OPTIONAL_PROTOCOLS = {
        'sslv2': 'PROTOCOL_SSLv2',
        'sslv3': 'PROTOCOL_SSLv3',
        'tlsv1_1': 'PROTOCOL_TLSv1_1',
        'tlsv1_2': 'PROTOCOL_TLSv1_2',
    }
    for protocol in _OPTIONAL_PROTOCOLS:
        try:
            _SSL_PROTOCOLS[protocol] = getattr(ssl,
                                               _OPTIONAL_PROTOCOLS[protocol])
        except AttributeError:
            pass

分析逻辑,此处所有类变量都使用默认值:

  • 根据driver_conf中定义的各类option默认值,形成实例属性;
  • 将提供的rabbit_hosts随机排序后,生成标准的amqp_url格式字符串,如:'amqp://(rabbit_username):(rabbit_password)@(rabbit_hostname):(rabbit_port)/(virtual_host)'。如果提供了多个rabbit_hosts,则各个url中间由分号隔开;
  • 生成了self.connection类属性,并赋值为kombu.connection.Connection类的实例,且self.transport=kombu.transport.pyamqp:Transport
  • self.ensure_connection方法中,调用了self.ensure方法,并将self.connection.connection作为参数传入。因代码过长,逐步分解来看:
    method = self.connection.connection
                   ↓  kombu.connection.Connection.connection()

    @property
    def connection(self):
        """The underlying connection object.

        .. warning::
            This instance is transport specific, so do not
            depend on the interface of this object.

        """
        if not self._closed:
            if not self.connected:
                self.declared_entities.clear()
                self._default_channel = None
                self._connection = self._establish_connection()
                self._closed = False
            return self._connection
                   ↓  kombu.connection.Connection._extablish_connection()

     def _establish_connection(self):
        self._debug('establishing connection...')
        conn = self.transport.establish_connection()
        self._debug('connection established: %r', conn)
        return conn
                   ↓
     def establish_connection(self):
        """Establish connection to the AMQP broker."""
        conninfo = self.client
        for name, default_value in items(self.default_connection_params):
            if not getattr(conninfo, name, None):
                setattr(conninfo, name, default_value)
        if conninfo.ssl:
            raise NotImplementedError(NO_SSL_ERROR)
        opts = dict({
            'host': conninfo.host,
            'userid': conninfo.userid,
            'password': conninfo.password,
            'virtual_host': conninfo.virtual_host,
            'login_method': conninfo.login_method,
            'insist': conninfo.insist,
            'ssl': conninfo.ssl,
            'connect_timeout': conninfo.connect_timeout,
        }, **conninfo.transport_options or {})
        conn = self.Connection(**opts)
        conn.client = self.client
        self.client.drain_events = conn.drain_events
        return conn
    def _set_current_channel(self, new_channel):
        """Change the channel to use.

        NOTE(sileht): Must be called within the connection lock
        """
        if new_channel == self.channel:
            return

        if self.channel is not None:
            self._declared_queues.clear()
            self._declared_exchanges.clear()
            self.connection.maybe_close_channel(self.channel)

        self.channel = new_channel

        if new_channel is not None:
            if self.purpose == rpc_common.PURPOSE_LISTEN:
                self._set_qos(new_channel)
            self._producer = kombu.messaging.Producer(new_channel)
            for consumer in self._consumers:
                consumer.declare(self)

通过以上列出代码不难得知:该方法创建并返回了kombu层控制的compute节点与rabbit节点的连接,创建时使用了kombu.connection的autroretry方法生成进行包装,通过librabbitmq库建立channel。如果发生了connection或者channel相关的错误则会自动重建channel、self._producer并重试连接。

回到顶层,listener = AMQPListener(self, conn)待使用时分解,继续看

conn.declare_topic_consumer(exchange_name=self._get_exchange(target),
                                    topic=target.topic,
                                    callback=listener)
                                 ↓

def declare_topic_consumer(self, exchange_name, topic, callback=None,
                               queue_name=None):
        """Create a 'topic' consumer."""
    consumer = Consumer(exchange_name=exchange_name,
                            queue_name=queue_name or topic,
                            routing_key=topic,
                            type='topic',
                            durable=self.amqp_durable_queues,
                            exchange_auto_delete=self.amqp_auto_delete,
                            queue_auto_delete=self.amqp_auto_delete,
                            callback=callback,
                            rabbit_ha_queues=self.rabbit_ha_queues)

    self.declare_consumer(consumer)
                                ↓

    def declare_consumer(self, consumer):
        """Create a Consumer using the class that was passed in and
        add it to our list of consumers
        """

        def _connect_error(exc):
            log_info = {'topic': consumer.routing_key, 'err_str': exc}
            LOG.error(_LE("Failed to declare consumer for topic '%(topic)s': "
                          "%(err_str)s"), log_info)

        def _declare_consumer():
            consumer.declare(self)
            tag = self._active_tags.get(consumer.queue_name)
            if tag is None:
                tag = next(self._tags)
                self._active_tags[consumer.queue_name] = tag
                self._new_tags.add(tag)

            self._consumers[consumer] = tag
            return consumer

        with self._connection_lock:
            return self.ensure(_declare_consumer,
                               error_callback=_connect_error)

可知生成了Consumer类实例,并启用autoretry确保重试,调用了实例的declare方法,并且将新定义好的consumer一一对应新tag,将新tag放入self._new_tags,对应关系放入self._consumer中。下面我们看declare:

    def declare(self, conn):
        """Re-declare the queue after a rabbit (re)connect."""

        self.queue = kombu.entity.Queue(
            name=self.queue_name,
            channel=conn.channel,
            exchange=self.exchange,
            durable=self.durable,
            auto_delete=self.queue_auto_delete,
            routing_key=self.routing_key,
            queue_arguments=self.queue_arguments)

        try:
            LOG.debug('[%s] Queue.declare: %s',
                      conn.connection_id, self.queue_name)
            self.queue.declare()
        except conn.connection.channel_errors as exc:
            # NOTE(jrosenboom): This exception may be triggered by a race
            # condition. Simply retrying will solve the error most of the time
            # and should work well enough as a workaround until the race
            # condition itself can be fixed.
            # See https://bugs.launchpad.net/neutron/+bug/1318721 for details.
            if exc.code == 404:
                self.queue.declare()
            else:
                raise
        self._declared_on = conn.channel

调用kombu.entity.Queue类生成了一个实例记录了即将定义的queue的name、channel、exchange、callback方法及其余参数信息。之后在绑定的channel上定义了exchange和queue,并进行binding。

回到顶层,我们可以看到定义了两个topic的consumer,分别监听的topic为'compute'和'compute.%s' % host,也定义了一个fanout类型的consumer。

最后,listen方法的返回值为

return base.PollStyleListenerAdapter(listener, batch_size,
                                             batch_timeout)

class PollStyleListenerAdapter(Listener):
    """A Listener that uses a PollStyleListener for message transfer. A
    dedicated thread is created to do message polling.
    """
    def __init__(self, poll_style_listener, batch_size, batch_timeout):
        super(PollStyleListenerAdapter, self).__init__(
            batch_size, batch_timeout, poll_style_listener.prefetch_size
        )
        self._poll_style_listener = poll_style_listener
        self._listen_thread = threading.Thread(target=self._runner)
        self._listen_thread.daemon = True
        self._started = False

    def start(self, on_incoming_callback):
        super(PollStyleListenerAdapter, self).start(on_incoming_callback)
        self._started = True
        self._listen_thread.start()

    @excutils.forever_retry_uncaught_exceptions
    def _runner(self):
        while self._started:
            incoming = self._poll_style_listener.poll(
                batch_size=self.batch_size, batch_timeout=self.batch_timeout)

            if incoming:
                self.on_incoming_callback(incoming)

        # listener is stopped but we need to process all already consumed
        # messages
        while True:
            incoming = self._poll_style_listener.poll(
                batch_size=self.batch_size, batch_timeout=self.batch_timeout)

            if not incoming:
                return
            self.on_incoming_callback(incoming)

在oslo_messaging.server.MessageHandlingServer.start方法中,最后一步self.listener.start(self._on_incoming)即调用了上述start方法,而此方法又调用了之前listener = AMQPListener(self, conn)的poll方法:

@base.batch_poll_helper
    def poll(self, timeout=None):
        while not self._stopped.is_set():
            if self.incoming:
                return self.incoming.pop(0)
            try:
                self.conn.consume(timeout=timeout)
            except rpc_common.Timeout:
                return None

def batch_poll_helper(func):
    """Decorator to poll messages in batch

    This decorator is used to add message batching support to a
    :py:meth:`PollStyleListener.poll` implementation that only polls for a
    single message per call.
    """
    def wrapper(in_self, timeout=None, batch_size=1, batch_timeout=None):
        incomings = []
        driver_prefetch = in_self.prefetch_size
        if driver_prefetch > 0:
            batch_size = min(batch_size, driver_prefetch)

        with timeutils.StopWatch(timeout) as timeout_watch:
            # poll first message
            msg = func(in_self, timeout=timeout_watch.leftover(True))
            if msg is not None:
                incomings.append(msg)
            if batch_size == 1 or msg is None:
                return incomings

            # update batch_timeout according to timeout for whole operation
            timeout_left = timeout_watch.leftover(True)
            if timeout_left is not None and (
                    batch_timeout is None or timeout_left < batch_timeout):
                batch_timeout = timeout_left

        with timeutils.StopWatch(batch_timeout) as batch_timeout_watch:
            # poll remained batch messages
            while len(incomings) < batch_size and msg is not None:
                msg = func(in_self, timeout=batch_timeout_watch.leftover(True))
                if msg is not None:
                    incomings.append(msg)

        return incomings
    return wrapper

在没有获得消息的时候,调用了self.conn.consume,即oslo_messaging._drivers.impl_rabbit.Connection.consume:

    def consume(self, timeout=None):
        """Consume from all queues/consumers."""

        timer = rpc_common.DecayingTimer(duration=timeout)
        timer.start()

        def _raise_timeout(exc):
            LOG.debug('Timed out waiting for RPC response: %s', exc)
            raise rpc_common.Timeout()

        def _recoverable_error_callback(exc):
            if not isinstance(exc, rpc_common.Timeout):
                self._new_tags = set(self._consumers.values())
            timer.check_return(_raise_timeout, exc)

        def _error_callback(exc):
            _recoverable_error_callback(exc)
            LOG.error(_LE('Failed to consume message from queue: %s'),
                      exc)

        def _consume():
            # NOTE(sileht): in case the acknowledgment or requeue of a
            # message fail, the kombu transport can be disconnected
            # In this case, we must redeclare our consumers, so raise
            # a recoverable error to trigger the reconnection code.
            if not self.connection.connected:
                raise self.connection.recoverable_connection_errors[0]

            while self._new_tags:
                for consumer, tag in self._consumers.items():
                    if tag in self._new_tags:
                        consumer.consume(self, tag=tag)
                        self._new_tags.remove(tag)

            poll_timeout = (self._poll_timeout if timeout is None
                            else min(timeout, self._poll_timeout))
            while True:
                if self._consume_loop_stopped:
                    return

                if self._heartbeat_supported_and_enabled():
                    self._heartbeat_check()

                try:
                    self.connection.drain_events(timeout=poll_timeout)
                    return
                except socket.timeout as exc:
                    poll_timeout = timer.check_return(
                        _raise_timeout, exc, maximum=self._poll_timeout)

        with self._connection_lock:
            self.ensure(_consume,
                        recoverable_error_callback=_recoverable_error_callback,
                        error_callback=_error_callback)

简单分析逻辑:

  • 将新定义的consumer取出,执行consume方法:使用librabbitmq的channel中basic_consume进行消费;
  • 监测心跳,是否同MQ host失去联系;
  • 使用librabbitmq库的Connection中drain_events进行消息接受,并返回incoming
  • 使用ensure方法,确保connection和channel相关错误产生时进行重试

拿到incoming后,会使用rpc.server.RPCServer._process_incoming对其进行解析:

    def _process_incoming(self, incoming):
        message = incoming[0]
        try:
            message.acknowledge()
        except Exception:
            LOG.exception(_LE("Can not acknowledge message. Skip processing"))
            return

        failure = None
        try:
            res = self.dispatcher.dispatch(message)
        except rpc_dispatcher.ExpectedException as e:
            failure = e.exc_info
            LOG.debug(u'Expected exception during message handling (%s)', e)
        except Exception:
            # current sys.exc_info() content can be overridden
            # by another exception raised by a log handler during
            # LOG.exception(). So keep a copy and delete it later.
            failure = sys.exc_info()
            LOG.exception(_LE('Exception during message handling'))

        try:
            if failure is None:
                message.reply(res)
            else:
                message.reply(failure=failure)
        except Exception:
            LOG.exception(_LE("Can not send reply for message"))
        finally:
                # NOTE(dhellmann): Remove circular object reference
                # between the current stack frame and the traceback in
                # exc_info.
                del failure

首先,会对incoming进行acknowledge,之后进行res = self.dispatcher.dispatch(message):

    def dispatch(self, incoming):
        """Dispatch an RPC message to the appropriate endpoint method.

        :param incoming: incoming message
        :type incoming: IncomingMessage
        :raises: NoSuchMethod, UnsupportedVersion
        """
        message = incoming.message
        ctxt = incoming.ctxt

        method = message.get('method')
        args = message.get('args', {})
        namespace = message.get('namespace')
        version = message.get('version', '1.0')

        found_compatible = False
        for endpoint in self.endpoints:
            target = getattr(endpoint, 'target', None)
            if not target:
                target = self._default_target

            if not (self._is_namespace(target, namespace) and
                    self._is_compatible(target, version)):
                continue

            if hasattr(endpoint, method):
                if self.access_policy.is_allowed(endpoint, method):
                    return self._do_dispatch(endpoint, method, ctxt, args)

            found_compatible = True

        if found_compatible:
            raise NoSuchMethod(method)
        else:
            raise UnsupportedVersion(version, method=method)

    def _do_dispatch(self, endpoint, method, ctxt, args):
        ctxt = self.serializer.deserialize_context(ctxt)
        new_args = dict()
        for argname, arg in args.items():
            new_args[argname] = self.serializer.deserialize_entity(ctxt, arg)
        func = getattr(endpoint, method)
        result = func(ctxt, **new_args)
        return self.serializer.serialize_entity(ctxt, result)

逻辑很简单,取出相关参数,进行格式解析及转换,传入endpoint(self.manager)寻找对应方法,运行并获得结果。

运行完成后拿到结果,在_process_incoming中通过message.reply向publisher返回结果。

至此,nova计算节点(server)端服务流程浅析完毕。

你可能感兴趣的:(openstack)