Lambda 与 Bind的性能比较
转载请说明出处:http://blog.csdn.net/cywosp/article/details/9379403
先让我们看看下面函数:
template <typename Function> void do_test_loop(Function func, const uint64_t upper_limit = 1000000000ULL) { for (uint64_t i = 0; i < upper_limit; ++i) func(i); }
该函数只是简单对func函数做upper_limit(默认10亿次)次循环调用。有很多种方法给被反复调用的函数func传入实际参数,在这里我们只讨论两种:
1. 使用std::bind来生成一个多态的std::function<void (uint64_t)>函数。
2.lambda表达式
void test_accumulate_bind_function(uint64_t& x, uint64_t i) { x += i; } uint64_t test_accumulate_bind() { namespace arg = std::placeholders; uint64_t x = 0; std::function<void (uint64_t)> accumulator = std::bind(&test_accumulate_bind_function, std::ref(x), arg::_1); do_test_loop(accumulator); return x; }
这是一个简单的函数,在使用boost::bind函数时,我所遇到的最大的问题是它需要你将函数和逻辑分离,这样会导致难以理解的代码。对比较大的函数来说,这并不是什么大问题,但是对于小函数来说运行时的上下文切换将会很耗时(令人讨厌),例如上面所列举的函数。
与上述函数相同的lambda表达式如下:
uint64_t test_accumulate_lambda() { uint64_t x = 0; auto accumulator = [&x] (uint64_t i) { x += i; }; do_test_loop(accumulator); return x; }
lambda表达式没有运行时的上下文切换。当然,我们也因此失去了std::function所具有的高级的多态特性。lambda是一种由编译器静态关联的无名类型,这也是为什么在定义该类型时必须使用auto关键字的原因。变量accumulator表示lambda表达式的结果(没有其他的lambda表达式能生成与此一样的结果)。即使是两个内容差不多的表达式也不会有相同的类型。如果do_test_loop是一个在cpp文件中实现的函数,那么我们将在其的作用域范围内获取不到传入进来的lambda表达式类型。 幸运的是,有些聪明的人已考虑到了这个潜在的问题,并且由一个lambda表达式赋值给一个std::function类型不仅仅是可能的,而且还是极其容易的:
uint64_t test_accumulate_bound_lambda() { uint64_t x = 0; std::function<void (uint64_t)> accumulator = [&x] (uint64_t i) { x += i; }; do_test_loop(accumulator); return x; }
通过使用 lambda语义来替代std::bind,我们获取到了std::function多态的所有威力和C++ lambda表达式所拥有的便利和高性能表现。这听起来像是一种双赢。
对于这三个函数我们可以做个简单的比较(使用timer类):
template <typename Function> void run_test(const std::string& name, Function func) { std::cout << name; timer t; volatile_write(func()); timer::duration duration = t.elapsed(); std::cout << '\t' << duration.count() << std::endl; } int main() { run_test("Accumulate (lambda) ", &test_accumulate_lambda); run_test("Accumulate (bind) ", &test_accumulate_bind); run_test("Accumulate (bound lambda)", &test_accumulate_bound_lambda); }
事不宜迟,我们先来看看使用gcc 4.4.2 -O3编译并且在Inter Core i7 Q740机器上运行的结果:
Accumulate (lambda) 7 Accumulate (bind) 4401849 Accumulate (bound lambda) 4379315
每当我在做性能测试时看到运行结果耗非常悬殊时我都会反汇编程序看看编译器到底做了什么。
(gdb) disassemble test_accumulate_lambda Dump of assembler code for function _Z22test_accumulate_lambdav: 0x0000000000400e70 <+0>: movabs $0x6f05b59b5e49b00,%rax 0x0000000000400e75 <+5>: retq End of assembler dump.
在经过编译器优化之后,整个函数仅仅是将0x6f05b59b5e49b00(十进制值为:499999999500000000)移动到了rax寄存器中就返回了。编译器非常智能的知道了我们仅仅是对0到1000000000之间的数字求和并直接帮我们进行了代码替换的优化,另我影响深刻的是编译器竟然可以做到这点并且非常合理。函数的内容对do_test_loop函数的实例是静态已知,所以编译器将原有的代码转化成了如下所示的代码:
uint64_t test_accumulate_lambda() { uint64_t x = 0; // do_test_loop: for (uint64_t i = 0; i < 1000000000; ++i) x += i; return x; }
任何优秀的编译器都将对其进行优化。我认为要从这个简单例子中获取的最重要的信息是:编译器知道lambda函数是具有静态性的,因此你可以放心的使用lambda函数而不必担心它性能。那么我们调用的std::function又是怎样的一个过程呢?在这里它的多态性让我们很难去剖析,当函数do_test_loop被函数std::function<void (uint64_t)>实例化时,编译器并不知道func的行为,因此它能做任何事情(它只是std::function的入口点)。std::bind和lambda表达式之间的不同之处是极其细微的。如果你多次的运行测试用例,在我的电脑里lambda表达式的总会比std::bind的快一点,但是这些数据并不具有统计学的意义。这种性能在以后很有可能在不同的机器上会发生改变,如果我要猜测我会说这有std::reference_wrapper的作用。下面让我们来看看两个函数的堆栈。
std::bind #0 test_accumulate_bind_function (x=@0x7fffffffe5d0, i=0) at lambda_vs_bind.cpp:106 #1 0x0000000000401111 in operator() (__args#0=0, this=<optimized out>) at /usr/local/include/gcc-4.6.2/functional:2161 #2 do_test_loop<std::function<void(long unsigned int)> > (func=<optimized out>, upper_limit=<optimized out>) at lambda_vs_bind.cpp:93 #3 test_accumulate_bind () at lambda_vs_bind.cpp:115 #4 0x0000000000401304 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x401080 <test_accumulate_bind()>) at lambda_vs_bind.cpp:84 #5 0x0000000000401411 in main () at lambda_vs_bind.cpp:136
Lambda Expression #0 std::_Function_handler<void(long unsigned int), test_accumulate_bound_lambda()::<lambda(uint64_t)> >::_M_invoke(const std::_Any_data &, unsigned long) (__functor=..., __args#0=0) at /usr/local/include/gcc-4.6.2/functional:1778 #1 0x0000000000400fa9 in operator() (__args#0=0, this=<optimized out> at /usr/local/include/gcc-4.6.2/functional:2161 #2 do_test_loop<std::function<void(long unsigned int)> > (func=<optimized out>, upper_limit=<optimized out>) at lambda_vs_bind.cpp:93 #3 test_accumulate_bound_lambda () at lambda_vs_bind.cpp:126 #4 0x0000000000401304 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x400f20 <test_accumulate_bound_lambda()>) at lambda_vs_bind.cpp:84 #5 0x000000000040143e in main () at lambda_vs_bind.cpp:140
它们的不同之处仅仅是在std::function的operator()函数调用,为了正真发生了什么,我们来快速的看一下g++ 4.6.2的std::function是怎么实现的:
template<typename _Res, typename... _ArgTypes> class function<_Res(_ArgTypes...)> : public _Maybe_unary_or_binary_function<_Res, _ArgTypes...>, private _Function_base { // a whole bunch of implementation details private: typedef _Res (*_Invoker_type)(const _Any_data&, _ArgTypes...); _Invoker_type _M_invoker; };
最令我感兴趣的是std::function没有使用virtual而是使用了一个函数指针。这样做有一些优势所在,这样能够让你在使用std::function时不需要处理指针和引用——这在对象内部是非常复杂的。
boost:bind
那么老方法boost::bind又是怎样的呢?为了简单起见,我们在上面的测试用例中直接用boost来替代std。
Accumulate (boost bind) 3223174 Accumulate (boost bound lambda) 4255098
令人感到奇怪的是boost::bind要比std::bind要快25%左右,boost::bind的调用堆栈与std::bind的看起来很相像:
#0 test_accumulate_bind_function (x=@0x7fffffffe600, i=0) at lambda_vs_bind.cpp:114 #1 0x00000000004018a3 in operator() (a0=0, this=<optimized out>) at /usr/local/include/boost/function/function_template.hpp:1013 #2 do_test_loop<boost::function<void(long unsigned int)> > (upper_limit=<optimized out>, func=<optimized out>) at lambda_vs_bind.cpp:101 #3 test_accumulate_boost_bind () at lambda_vs_bind.cpp:144 #4 0x0000000000401f44 in run_test<unsigned long (*)()> (name=<optimized out>, func=0x401800 <test_accumulate_boost_bind()>) at lambda_vs_bind.cpp:92 #5 0x000000000040207e in main () at lambda_vs_bind.cpp:161
(我大概可以写一整篇的文章来描述问什么boost::bind要比std::bind快了... ...)
functional template<typename _Functor, typename... _ArgTypes> inline typename _Bind_helper<_Functor, _ArgTypes...>::type bind(_Functor&& __f, _ArgTypes&&... __args) { typedef _Bind_helper<_Functor, _ArgTypes...> __helper_type; typedef typename __helper_type::__maybe_type __maybe_type; typedef typename __helper_type::type __result_type; return __result_type(__maybe_type::__do_wrap(std::forward<_Functor>(__f)), std::forward<_ArgTypes>(__args)...); } boost/bind/bind.hpp (with the macros expanded) template<class F, class A1, class A2> _bi::bind_t<_bi::unspecified, F, typename _bi::list_av_2<A1, A2>::type> bind(F f, A1 a1, A2 a2) { typedef typename _bi::list_av_2<A1, A2>::type list_type; return _bi::bind_t<_bi::unspecified, F, list_type> (f, list_type(a1, a2)); }
更多信息
1. 源代码
你可以从这里获取到该程序的源代码http://www.gockelhut.com/c++/files/lambda_vs_bind.cpp。它在g++ 4.6.2的编译器上通过了编译并且能够运行,在支持c++11更好的编译器上编译将会更好。我的Boost库的版本是1.47,较早的版本和更新的版本的库都将工作得很好,因为boost::bind语法在一段时间内不会有太大更新(将来不一定)。如果你希望编译和运行都不用boost,那么将USE_BOOST的值改为0即可。
2. volatile_write
volatile_write函数是一个由我编写的强制的让系统在内存中写数据的简单函数,这样就能防止优化器去优化那些在函数run_test中没有做任何事情的代码。
template <typename T> void volatile_write(const T& x) { volatile T* p = new T; *p = x; delete p; }
原文地址:http://www.gockelhut.com/c++/articles/lambda_vs_bind
lambda_vs_bind.cpp
/** * Copyright 2011 Travis Gockel * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. **/ // Turn building and testing boost::bind on or off with this macro #define USE_BOOST 1 // workaround for varieties of g++-4.6 with --std=gnu++0x #ifndef _GLIBCXX_USE_NANOSLEEP # define _GLIBCXX_USE_NANOSLEEP #endif #include <cstdint> #include <chrono> #include <iostream> #include <string> #include <thread> #if USE_BOOST #include <boost/function.hpp> #include <boost/bind.hpp> #endif class timer { public: typedef std::chrono::high_resolution_clock clock; typedef clock::time_point time_point; typedef clock::duration duration; public: timer() { reset(); } void reset() { _starttime = clock::now(); } duration elapsed() const { return clock::now() - _starttime; } protected: time_point _starttime; }; bool test_timer() { using std::chrono::milliseconds; typedef timer::duration duration; const milliseconds sleep_time(500); timer t; std::this_thread::sleep_for(sleep_time); duration recorded = t.elapsed(); // make sure the clock and this_thread::sleep_for is precise within one millisecond (or at least in agreement as to // how inaccurate they are) return (recorded - milliseconds(1) < sleep_time) && (recorded + milliseconds(1) > sleep_time); } template <typename T> void volatile_write(const T& x) { volatile T* p = new T; *p = x; delete p; } template <typename Function> void run_test(const std::string& name, Function func) { std::cout << name; timer t; volatile_write(func()); timer::duration duration = t.elapsed(); std::cout << '\t' << duration.count() << std::endl; } template <typename Function> void do_test_loop(Function func, const uint64_t upper_limit = 1000000000ULL) { for (uint64_t i = 0; i < upper_limit; ++i) func(i); } uint64_t test_accumulate_lambda() { uint64_t x = 0; auto accumulator = [&x] (uint64_t i) { x += i; }; do_test_loop(accumulator); return x; } void test_accumulate_bind_function(uint64_t& x, uint64_t i) { x += i; } uint64_t test_accumulate_bind() { namespace arg = std::placeholders; uint64_t x = 0; std::function<void (uint64_t)> accumulator = std::bind(&test_accumulate_bind_function, std::ref(x), arg::_1); do_test_loop(accumulator); return x; } uint64_t test_accumulate_bound_lambda() { uint64_t x = 0; std::function<void (uint64_t)> accumulator = [&x] (uint64_t i) { x += i; }; do_test_loop(accumulator); return x; } #if USE_BOOST uint64_t test_accumulate_boost_bind() { uint64_t x = 0; boost::function<void (uint64_t)> accumulator = boost::bind(&test_accumulate_bind_function, boost::ref(x), _1); do_test_loop(accumulator); return x; } uint64_t test_accumulate_boost_bound_lambda() { uint64_t x = 0; boost::function<void (uint64_t)> accumulator = [&x] (uint64_t i) { x += i; }; do_test_loop(accumulator); return x; } #endif int main() { if (!test_timer()) { std::cout << "Failed timer test." << std::endl; return -1; } run_test("Accumulate (lambda) ", &test_accumulate_lambda); run_test("Accumulate (bind) ", &test_accumulate_bind); run_test("Accumulate (bound lambda) ", &test_accumulate_bound_lambda); #if USE_BOOST run_test("Accumulate (boost bind) ", &test_accumulate_boost_bind); run_test("Accumulate (boost bound lambda)", &test_accumulate_bound_lambda); #endif }