ros源码分析(三)ros::spin( )背后发生的事

接收器subscriber的demo大家也耳熟能详,它和发布器一样,在nodehandle中调用subscribe函数,实现话题名与回调函数的注册,而在进程的main函数中,又在调用ros::spin函数,那么就从这两个部分依次分析接收器的原理。

一、subscribe的声明

 在node_handle.cpp中,subscribe函数的声明如下,它声明了接收器选项,并且在话题管理中实现了注册。

template//M是rosmsg,T是拥有该接收器的类,fp是注册的回调函数
  Subscriber subscribe(const std::string& topic, uint32_t queue_size, void(T::*fp)(M), T* obj, 
                       const TransportHints& transport_hints = TransportHints())
  {
    SubscribeOptions ops;
    ops.template initByFullCallbackType(topic, queue_size, boost::bind(fp, obj, _1));
    ops.transport_hints = transport_hints;//传输方式的管理,TCP与UDP等
    return subscribe(ops);
  }

 这段函数是接收器选项的完整初始化过程,我们可以看到它其实是初始化了一系列在API中传入的话题名、消息缓冲区长度以及回调函数,尤其是helper,成为了持有该回调函数的共享指针。

  template
  void initByFullCallbackType(const std::string& _topic, uint32_t _queue_size,
       const boost::function& _callback,
       const boost::function::Message>(void)>& factory_fn = DefaultMessageCreator::Message>())
  {
    typedef typename ParameterAdapter

::Message MessageType; topic = _topic; queue_size = _queue_size; md5sum = message_traits::md5sum(); datatype = message_traits::datatype(); helper = boost::make_shared >(_callback, factory_fn); }

之后在话题管理的注册中,该函数调用的是topic_manager 中的subscribe函数,调查接收器与发布器是否属于同一进程的语句其实占了大量篇幅,我们的重点主要在接收器是怎么了解到有哪些进程在发布这个话题的。

bool TopicManager::subscribe(const SubscribeOptions& ops)
{
  boost::mutex::scoped_lock lock(subs_mutex_);

  if (addSubCallback(ops))//检查订阅同一话题的接收器,是否有重复的
  {
    return true;
  }

  if (isShuttingDown())
  {
    return false;
  }

  if (ops.md5sum.empty())//没有校验码
  {
    throw InvalidParameterException("Subscribing to topic [" + ops.topic + "] with an empty md5sum");
  }

  if (ops.datatype.empty())//没有message_trait
  {
    throw InvalidParameterException("Subscribing to topic [" + ops.topic + "] with an empty datatype");
  }

  if (!ops.helper)//没有回调函数
  {
    throw InvalidParameterException("Subscribing to topic [" + ops.topic + "] without a callback");
  }

  const std::string& md5sum = ops.md5sum;
  std::string datatype = ops.datatype;

  SubscriptionPtr s(boost::make_shared(ops.topic, md5sum, datatype, ops.transport_hints));
  s->addCallback(ops.helper, ops.md5sum, ops.callback_queue, ops.queue_size, ops.tracked_object, ops.allow_concurrent_callbacks);

  if (!registerSubscriber(s, ops.datatype))
  {
    ROS_WARN("couldn't register subscriber on topic [%s]", ops.topic.c_str());
    s->shutdown();
    return false;
  }

  subscriptions_.push_back(s);

  return true;
}

 那么再转到第一次注册该接收器的函数,它与rosmaster通信之后,master便将话题与节点消息存入map中,并且反馈发布该话题的进程信息放在payload中,在多个场合都会使master调用execute函数,它们都会与rosmaster进行一次通信,但返回的response不同。

bool TopicManager::registerSubscriber(const SubscriptionPtr& s, const string &datatype)
{
  XmlRpcValue args, result, payload;
  args[0] = this_node::getName();
  args[1] = s->getName();
  args[2] = datatype;
  args[3] = xmlrpc_manager_->getServerURI();

  //master::execute函数在发布器部分已分析过,是通过XMLRPC的客户端与rosmaster的服务器相通信,把接收的话题名与节点名交给rosmaster,并且,回应的有发布该话题的发布器进程的有关信息,所谓payload就是负载的意思,真是让人费解。。。
  if (!master::execute("registerSubscriber", args, result, payload, true))
  {
    return false;
  }

  vector pub_uris;
  for (int i = 0; i < payload.size(); i++)
  {
    if (payload[i] != xmlrpc_manager_->getServerURI())
    {
      pub_uris.push_back(string(payload[i]));
    }
  }

  bool self_subscribed = false;
  PublicationPtr pub;
  const std::string& sub_md5sum = s->md5sum();
  // Figure out if we have a local publisher
  {
    boost::recursive_mutex::scoped_lock lock(advertised_topics_mutex_);
    V_Publication::const_iterator it = advertised_topics_.begin();
    V_Publication::const_iterator end = advertised_topics_.end();
    for (; it != end; ++it)
    {
      pub = *it;
      const std::string& pub_md5sum = pub->getMD5Sum();

      if (pub->getName() == s->getName() && !pub->isDropped())
	{
	  if (!md5sumsMatch(pub_md5sum, sub_md5sum))
	    {
	      ROS_ERROR("md5sum mismatch making local subscription to topic %s.",
			s->getName().c_str());
	      ROS_ERROR("Subscriber expects type %s, md5sum %s",
			s->datatype().c_str(), s->md5sum().c_str());
	      ROS_ERROR("Publisher provides type %s, md5sum %s",
			pub->getDataType().c_str(), pub->getMD5Sum().c_str());
	      return false;
	    }

	  self_subscribed = true;
	  break;
	}
    }
  }
  //处理相应发布器消息!
  s->pubUpdate(pub_uris);
  if (self_subscribed)//如果找到了进程内的相同话题发布器则直接加入
  {
    s->addLocalConnection(pub);
  }

  return true;
}

 在一次通信后,如果具备同样话题的发布器,则这一系列的发布器信息将会依次与接收器建立联系,并且在建立联系的过程中也要再次通知rosmaster。

bool Subscription::pubUpdate(const V_string& new_pubs)
{
  boost::mutex::scoped_lock lock(shutdown_mutex_);

  if (shutting_down_ || dropped_)
  {
    return false;
  }

  bool retval = true;
  
  {
    std::stringstream ss;

    for (V_string::const_iterator up_i = new_pubs.begin();
         up_i != new_pubs.end(); ++up_i)
    {
      ss << *up_i << ", ";
    }

    ss << " already have these connections: ";
    {
      boost::mutex::scoped_lock lock(publisher_links_mutex_);
      for (V_PublisherLink::iterator spc = publisher_links_.begin();
           spc!= publisher_links_.end(); ++spc)
      {
        ss << (*spc)->getPublisherXMLRPCURI() << ", ";
      }
    }
    //挂起的连接,其实是一个个连接在等待处理
    boost::mutex::scoped_lock lock(pending_connections_mutex_);
    S_PendingConnection::iterator it = pending_connections_.begin();
    S_PendingConnection::iterator end = pending_connections_.end();
    for (; it != end; ++it)
    {
      ss << (*it)->getRemoteURI() << ", ";
    }

    ROSCPP_LOG_DEBUG("Publisher update for [%s]: %s", name_.c_str(), ss.str().c_str());
  }
  
  V_string additions;//非本进程的连接
  V_PublisherLink subtractions;
  V_PublisherLink to_add;
  // could use the STL set operations... but these sets are so small
  // it doesn't really matter.
  {
    boost::mutex::scoped_lock lock(publisher_links_mutex_);

    for (V_PublisherLink::iterator spc = publisher_links_.begin();
         spc!= publisher_links_.end(); ++spc)
    {
      bool found = false;
      for (V_string::const_iterator up_i = new_pubs.begin();
           !found && up_i != new_pubs.end(); ++up_i)
      {
        if (urisEqual((*spc)->getPublisherXMLRPCURI(), *up_i))
        {
          found = true;
          break;
        }
      }

      if (!found)
      {
        subtractions.push_back(*spc);
      }
    }

    for (V_string::const_iterator up_i  = new_pubs.begin(); up_i != new_pubs.end(); ++up_i)
    {
      bool found = false;
      for (V_PublisherLink::iterator spc = publisher_links_.begin();
           !found && spc != publisher_links_.end(); ++spc)
      {
        if (urisEqual(*up_i, (*spc)->getPublisherXMLRPCURI()))
        {
          found = true;
          break;
        }
      }

      if (!found)
      {
        boost::mutex::scoped_lock lock(pending_connections_mutex_);
        S_PendingConnection::iterator it = pending_connections_.begin();
        S_PendingConnection::iterator end = pending_connections_.end();
        for (; it != end; ++it)
        {
          if (urisEqual(*up_i, (*it)->getRemoteURI()))
          {
            found = true;
            break;
          }
        }
      }

      if (!found)
      {
        additions.push_back(*up_i);//我们要进行negotiate,也就是新增的链接对的发布器进程
      }
    }
  }

  for (V_PublisherLink::iterator i = subtractions.begin(); i != subtractions.end(); ++i)
  {
	const PublisherLinkPtr& link = *i;
    if (link->getPublisherXMLRPCURI() != XMLRPCManager::instance()->getServerURI())
    {
      ROSCPP_LOG_DEBUG("Disconnecting from publisher [%s] of topic [%s] at [%s]",
                        link->getCallerID().c_str(), name_.c_str(), link->getPublisherXMLRPCURI().c_str());
		  link->drop();
	  }
	  else
	  {
		  ROSCPP_LOG_DEBUG("Disconnect: skipping myself for topic [%s]", name_.c_str());
	  }
	}

  for (V_string::iterator i = additions.begin();
            i != additions.end(); ++i)
  {
    // this function should never negotiate a self-subscription
    if (XMLRPCManager::instance()->getServerURI() != *i)
    {
      retval &= negotiateConnection(*i);//根据TCP协议,依次与发布器进程的socket建立链接对,存入pending中待处理,至于pending的连接转化为publisher_links_在check函数中
    }
    else
    {
      ROSCPP_LOG_DEBUG("Skipping myself (%s, %s)", name_.c_str(), XMLRPCManager::instance()->getServerURI().c_str());
    }
  }

  return retval;
}
void Subscription::pendingConnectionDone(const PendingConnectionPtr& conn, XmlRpcValue& result)
{
  boost::mutex::scoped_lock lock(shutdown_mutex_);
  if (shutting_down_ || dropped_)
  {
    return;
  }

  {
    boost::mutex::scoped_lock pending_connections_lock(pending_connections_mutex_);
    pending_connections_.erase(conn);
  }

  TransportUDPPtr udp_transport;

  std::string peer_host = conn->getClient()->getHost();
  uint32_t peer_port = conn->getClient()->getPort();
  std::stringstream ss;
  ss << "http://" << peer_host << ":" << peer_port << "/";
  std::string xmlrpc_uri = ss.str();
  udp_transport = conn->getUDPTransport();

  XmlRpc::XmlRpcValue proto;
  if(!XMLRPCManager::instance()->validateXmlrpcResponse("requestTopic", result, proto))
  {
  	ROSCPP_LOG_DEBUG("Failed to contact publisher [%s:%d] for topic [%s]",
              peer_host.c_str(), peer_port, name_.c_str());
  	closeTransport(udp_transport);
  	return;
  }

  if (proto.size() == 0)
  {
  	ROSCPP_LOG_DEBUG("Couldn't agree on any common protocols with [%s] for topic [%s]", xmlrpc_uri.c_str(), name_.c_str());
  	closeTransport(udp_transport);
  	return;
  }

  if (proto.getType() != XmlRpcValue::TypeArray)
  {
  	ROSCPP_LOG_DEBUG("Available protocol info returned from %s is not a list.", xmlrpc_uri.c_str());
  	closeTransport(udp_transport);
  	return;
  }
  if (proto[0].getType() != XmlRpcValue::TypeString)
  {
  	ROSCPP_LOG_DEBUG("Available protocol info list doesn't have a string as its first element.");
  	closeTransport(udp_transport);
  	return;
  }

  std::string proto_name = proto[0];
  if (proto_name == "TCPROS")
  {
    if (proto.size() != 3 ||
        proto[1].getType() != XmlRpcValue::TypeString ||
        proto[2].getType() != XmlRpcValue::TypeInt)
    {
    	ROSCPP_LOG_DEBUG("publisher implements TCPROS, but the " \
                "parameters aren't string,int");
      return;
    }
    std::string pub_host = proto[1];
    int pub_port = proto[2];
    ROSCPP_CONN_LOG_DEBUG("Connecting via tcpros to topic [%s] at host [%s:%d]", name_.c_str(), pub_host.c_str(), pub_port);

    TransportTCPPtr transport(boost::make_shared(&PollManager::instance()->getPollSet()));
    if (transport->connect(pub_host, pub_port))
    {
      ConnectionPtr connection(boost::make_shared());
      TransportPublisherLinkPtr pub_link(boost::make_shared(shared_from_this(), xmlrpc_uri, transport_hints_));

      connection->initialize(transport, false, HeaderReceivedFunc());
      pub_link->initialize(connection);

      ConnectionManager::instance()->addConnection(connection);

      boost::mutex::scoped_lock lock(publisher_links_mutex_);
      addPublisherLink(pub_link);//将新建立好的连接纳入publisher_link中

      ROSCPP_CONN_LOG_DEBUG("Connected to publisher of topic [%s] at [%s:%d]", name_.c_str(), pub_host.c_str(), pub_port);
    }
    else
    {
      ROSCPP_CONN_LOG_DEBUG("Failed to connect to publisher of topic [%s] at [%s:%d]", name_.c_str(), pub_host.c_str(), pub_port);
    }
  }
.....

  通过以上的步骤,在创建一个接收器的同时,可以轻易地得知发布该话题的其他进程,并与它们建立联系。正如publisher部分所看到的,publisher在一个socket对中写缓冲区,而接收器正在读缓冲区。

二、ros::spin函数

读缓冲区的行为正是ros::spin函数所驱动的。每一个企图使用ros接收器收获信息的进程,都在return 0之前具有ros::spin()或者spinOnce(),否则将无法从回调函数中获取信息。大致地说,spin函数是一个单线程的函数,它将本个进程中所subscribe的话题与回调函数一个个依次轮循调用。

void spin()
{
  SingleThreadedSpinner s;
  spin(s);
}

进一步地,转到spinner.cpp,这个spin函数在不断调用callAvailable函数,这个spin函数在声明时便将queue形参进行了默认声明,也就是说,在每个进程使用ros::spin时,都要重新生成一个callbackqueue。

void SingleThreadedSpinner::spin(CallbackQueue* queue)
{
  if (!queue)
  {
    queue = getGlobalCallbackQueue();
  }

  if (!spinner_monitor.add(queue, true))//SpinnerMonitor这个结构体主要是检查线程id,防止紊乱的作用
  {
    ROS_ERROR_STREAM("SingleThreadedSpinner: " << DEFAULT_ERROR_MESSAGE + " You might want to use a MultiThreadedSpinner instead.");
    return;
  }

  ros::WallDuration timeout(0.1f);
  ros::NodeHandle n;
  while (n.ok())
  {
    queue->callAvailable(timeout);//它其实是不断地调用SubscriptionQueue::call这个函数
  }
  spinner_monitor.remove(queue);
}

 从这一层级,我们明确了不断调用了callAvailable函数的动作,接下来是callAvailable函数:

void CallbackQueue::callAvailable(ros::WallDuration timeout)
{
  setupTLS();
  TLS* tls = tls_.get();//tls包含了一个callbackinfo的双端队列,同时还具备了当前执行的线程编号

  {
    boost::mutex::scoped_lock lock(mutex_);

    if (!enabled_)
    {
      return;
    }

    if (callbacks_.empty())//这也是一个callbackinfo的双端队列
    {
      if (!timeout.isZero())
      {
        condition_.wait_for(lock, boost::chrono::nanoseconds(timeout.toNSec()));
      }

      if (callbacks_.empty() || !enabled_)
      {
        return;
      }
    }

    bool was_empty = tls->callbacks.empty();
    //将callbacks_里的内容全部转移到tls中
    tls->callbacks.insert(tls->callbacks.end(), callbacks_.begin(), callbacks_.end());
    callbacks_.clear();

    calling_ += tls->callbacks.size();

    if (was_empty)
    {
      tls->cb_it = tls->callbacks.begin();
    }
  }

  size_t called = 0;
  //把里面的callbackinfo处理完,再将新的补充到callbacks_中
  while (!tls->callbacks.empty())
  {
    if (callOneCB(tls) != Empty)//返回调用call函数的结果,有成功、重新调用和完毕的选项
    {
      ++called;
    }
  }

  {
    boost::mutex::scoped_lock lock(mutex_);
    calling_ -= called;
  }
}

上面所调用的callOneCB所返回的结果都是call函数所决定的,接下来看看 SubscriptionQueue::call函数,它从接收器队列中收获一个helper,也就是包含接收器以及回调函数地址的类,并执行一次读缓冲区与反序列化,形成ros msg之后执行回调函数。

CallbackInterface::CallResult SubscriptionQueue::call()
{
  // The callback may result in our own destruction.  Therefore, we may need to keep a reference to ourselves
  // that outlasts the scoped_try_lock
  boost::shared_ptr self;
  boost::recursive_mutex::scoped_try_lock lock(callback_mutex_, boost::defer_lock);
  //被阻塞,塞回队列重来
  if (!allow_concurrent_callbacks_)
  {
    lock.try_lock();
    if (!lock.owns_lock())
    {
      return CallbackInterface::TryAgain;
    }
  }

  VoidConstPtr tracker;
  Item i;

  {
    boost::mutex::scoped_lock lock(queue_mutex_);
    //item队列基本不会是空的
    if (queue_.empty())//这个队列是在handleMessage函数中,依次加入item,其中包括了helper(回调函数的共享指针)
    {
      return CallbackInterface::Invalid;
    }

    i = queue_.front();

    if (queue_.empty())
    {
      return CallbackInterface::Invalid;
    }

    if (i.has_tracked_object)
    {
      tracker = i.tracked_object.lock();

      if (!tracker)
      {
        return CallbackInterface::Invalid;
      }
    }

    queue_.pop_front();
    --queue_size_;
  }

  VoidConstPtr msg = i.deserializer->deserialize();//反序列化,顾名思义是将消息从socket的buffer中重新构造成rosmsg

  // msg can be null here if deserialization failed
  if (msg)
  {
    try
    {
      self = shared_from_this();
    }
    catch (boost::bad_weak_ptr&) // For the tests, where we don't create a shared_ptr
    {}

    SubscriptionCallbackHelperCallParams params;//这个结构体里只有event这个成员
    params.event = MessageEvent(msg, i.deserializer->getConnectionHeader(), i.receipt_time, i.nonconst_need_copy, MessageEvent::CreateFunction());
    i.helper->call(params);//执行本helper的callback函数!
  }

  return CallbackInterface::Success;
}

这是看起来费解的两个typedef~

//这俩的定义参考std::function即可,一通操作后得到的是一个行为像函数的对象
typedef boost::function Callback;
typedef boost::function CreateFunction;

 在上述的call函数中,调用了起关键的执行作用的deserialize与call函数,接下来是它们的详细实现。

VoidConstPtr MessageDeserializer::deserialize()
{
  boost::mutex::scoped_lock lock(mutex_);

  if (msg_)
  {
    return msg_;
  }

  if (serialized_message_.message)
  {
    msg_ = serialized_message_.message;
    return msg_;
  }

  if (!serialized_message_.buf && serialized_message_.num_bytes > 0)
  {
    // If the buffer has been reset it means we tried to deserialize and failed
    return VoidConstPtr();
  }
  //获取了缓冲区地址,读一次缓冲区并将信息流转换为msg
  try
  {
    SubscriptionCallbackHelperDeserializeParams params;
    params.buffer = serialized_message_.message_start;
    params.length = serialized_message_.num_bytes - (serialized_message_.message_start - serialized_message_.buf.get());
    params.connection_header = connection_header_;
    msg_ = helper_->deserialize(params);
  }
  catch (std::exception& e)
  {
    ROS_ERROR("Exception thrown when deserializing message of length [%d] from [%s]: %s", (uint32_t)serialized_message_.num_bytes, (*connection_header_)["callerid"].c_str(), e.what());
  }

  serialized_message_.buf.reset();

  return msg_;
}

 callback_其实就是一个boost::function,通过msg创造一个event之后,再将event放入adapter中提取msg的指针,得到了一个回调函数的执行。

virtual void call(SubscriptionCallbackHelperCallParams& params)
  {
    Event event(params.event, create_);
    callback_(ParameterAdapter

::getParameter(event)); }

 通过上述流程,在spin函数中,创造了依次执行回调函数的机制。它是个单线程函数,也就是说,回调函数执行的顺序取决于注册的顺序。
三、多线程的spin

除了单线程的spin(),作者还提供了多线程选项,也就是MultiThreadedSpinner,它需要被用户显式地调用,并输入所设定的线程数。在这里使用了boost::thread_group创建线程池对多个线程轮流输出,而线程池中的任务队列正是各个离散的callone函数,哪个线程先输出完毕,再从任务队列中获取一个积压的任务,再深入的调用方式与spin()一致。

void AsyncSpinnerImpl::start()
{
  boost::mutex::scoped_lock lock(mutex_);

  if (continue_)
    return; // already spinning

  if (!spinner_monitor.add(callback_queue_, false))
  {
    ROS_ERROR_STREAM("AsyncSpinnerImpl: " << DEFAULT_ERROR_MESSAGE);
    return;
  }

  continue_ = true;

  for (uint32_t i = 0; i < thread_count_; ++i)
  {
    threads_.create_thread(boost::bind(&AsyncSpinnerImpl::threadFunc, this));//
  }
}

 

在话题通信的最后我们做一个阶段性的总结,ros的话题通信作为一种“全双工”的进程通信方式,具备简单便捷的巨大优势,但仅在本机的进程通信,从传输速率来说,不如单纯的管道与socket;另外,内部的实现也比较繁琐;并且,rosmaster的存在一度也十分令人费解,而master会偶尔出现卡死的现象导致无法收到信息,这个问题的起因至今我还没找出来(如果有大神愿意告诉我就更好了。。。),当然从作者的角度出发,rosmaster的存在主要是为了多机分布式通信,再另外可能就是软件设计风格吧,它确实对我影响非常大。

在话题通信之后,我计划再学习一下ros service、pluginlib、dyanmic reconfigure,与大家共勉~

你可能感兴趣的:(ros)