读《Smart Pointer Parameters》有感

最近在给xgboost添加一些接口，便于从稀疏向量数组增量构造DMatrix（在实际业务场景中可以避免数组拼接）。DMatrix源码中有一个构造函数是这样的：

DMatrix* DMatrix::Create(std::unique_ptr&& source,
                         const std::string& cache_prefix) {
  if (cache_prefix.length() == 0) {
    return new data::SimpleDMatrix(std::move(source));
  } else {
#if DMLC_ENABLE_STD_THREAD
    return new data::SparsePageDMatrix(std::move(source), cache_prefix);
#else
    LOG(FATAL) << "External memory is not enabled in mingw";
    return nullptr;
#endif
  }
}

调用者创建一个DataSource，用unique_ptr包装，然后move给Create函数，自此调用者的source就不再可用（相当于Rust中的transfer ownership）。Create拿到rvalue reference后再次move source创建一个SimpleMatrix。调用代码如下：

std::unique_ptr source(new data::SimpleCSRSource());
DMatrix::Create(std::move(source))

经过层层move之后，source最终归宿在SimpleMatrix的私有字段source_

  // source data pointer.
  std::unique_ptr source_;

创建过程实际上就是两次transfer ownership，而且是向下传递，每次传递，调用者的指针就失效了。

在学习智能指针的过程中看到了《Smart Pointer Parameters》，其中精准描述了这种场景，不过传参方式有点不一样：

// Passing unique_ptr by value means “sink.”
void f( unique_ptr );   (c)

这种pass by value强制调用者使用move语义，生动展示了什么叫“最小且完整的接口”，把错误的使用方式扼杀在摇篮里（编译期）。

unique_ptr pw = ... ;
good_sink( pw );             // error: good!
good_sink( move(pw) );       // compiles: crystal clear what's going on

作者甚至为这种行为取了个很形象的名字sink/下沉，ownership由调用者下沉到被调用者的scope里。

文章中也分析了pass by reference：

//Passing unique_ptr by reference is for in/out unique_ptr parameters.
void f( unique_ptr& );

文章说这种方式最好用于修改unique_ptr，不要用于修改里面的object。对于参数是智能指针的函数，操作最好只限于指针（lifetime 管理），而不去触碰里面的object。如果要触碰最好直接传指针或reference：

//Prefer passing parameters by * or &.
void f( widget* ); 
void f( widget& );

但是这里有个问题是，如果在被调用函数在执行过程中对象被改变了怎么办（多线程环境）？文章说不用担心

Thanks to structured lifetimes, by default arguments passed to f in the caller outlive f‘s function call lifetime, which is extremely useful (not to mention efficient) and makes non-owning * and & appropriate for parameters.

这里的意思是调用者会保证对象在调用过程中的有效性。

回到xgboost，由于不太了解rvalue reference是否会有前面提到的问题，不太好评价，但是就直观性来说，Create函数声明采用call by value更合适。另外还有一点，既然soure一旦move进SimpleMatrix就独占不再暴露给外界，那么是否可以去掉unique_ptr这层封装呢？这样语义更准确。

文章的开头还探讨了传share_ptr value对性能的影响：

void f( shared_ptr );

通篇读下来各种醍醐灌顶，sink语义这个提法感触最深。编程就是精准表达（最小且完整）语义，学习过程中多积累这种best practices，遇到合适的场景就知道该用什么（pattern match）。

虽然这些知识点很有趣，但是在开发过程中时常需要考虑这么多东西，就有点伤神了。C++还是适合逻辑已经分析很透彻的情况下使用，如果要敏捷开发快速迭代还是太累人了。

这里就要提到Java了，GC说：“随便写，有我兜底”；还有Rust，编译器说：“随便写，编译过了，算我输”。

思考题：Java能否实现move语义呢？

读《Smart Pointer Parameters》有感

你可能感兴趣的:(读《Smart Pointer Parameters》有感)