C++ Pass by value vs Pass by reference vs Pass by pointers

Rule of thumb: "Use references when you can and pointers when you have to".
When a function is called, the arguments in a function can be passed by value or passed by reference.

Callee is a function called by another and the caller is a function that calls another function (the callee).

The values that are passed in the function call are called the actual parameters.

The values received by the function (when it is called ) are called the formal parameters.

image.png

Passing by Value

Pass by value means that a copy of the actual parameter’s value is made in memory, i.e. the caller and callee have two independent variables with the same value. If the callee modifies the parameter value, the effect is not visible to the caller.

Passes an argument by value.

  • Callee does not have any access to the underlying element in the calling code.
  • A copy of the data is sent to the callee.
  • Changes made to the passed variable do not affect the actual value.
image.png

Passing by value is the most straightforward way to pass parameters. When a function is invoked, the arguments are copied to the local scope of the function. For example,

// class declaration
class Foo {};
void PassByValue(Foo f, int n){
  // Do something with f and n.
}
int main(){
  Foo foo;
  int i = 1;
  PassByValue(foo, i);
}

When PassByValue is invoked in main()’s body, a copy of foo and i is given to PassByValue (copied into PassByValue’s stack frame). This works well for primitive types like ints, floats, pointers, and small classes, but it doesn’t work well for big types. Think std::strings from the STL or if Foo was a really big class with a lot of member variables. Copying these objects would be inefficient if the function doesn’t do a lot of work on these objects. The majority of the CPU time for the function call would be spent on copying the arguments.
Another issue is that in the function call, the program is operating on a copy of the Foo object, not the actual foo object from main(). Any mutations on the object will not be visible in the main() function after PassByValue() completes because all of the work will be discarded once the PassByValue function ends.
These issues can be addressed by passing by pointer or passing by reference.

Pass by reference

Pass by reference (also called pass by address) means to pass the reference of an argument in the calling function to the corresponding formal parameter of the called function so that a copy of the address of the actual parameter is made in memory, i.e. the caller and the callee use the same variable for the parameter. If the callee modifies the parameter variable, the effect is visible to the caller’s variable.

image.png

Overview:

Passes an argument by reference.

  • Callee gives a direct reference to the programming element in the calling code.
  • The memory address of the stored data is passed.
  • Changes to the value have an effect on the original data.

Last but not least, we have pass by reference. Think of a reference variable as another alias for an object. It is a second variable name for the same object. There is no new memory being consumed (except for the variable name) and all operations on the reference variable affect the real underlying object being referred to. For example,

int& count_ref = count;
count++;
count_ref++;
// count is now 2

This is powerful because it allows us to pass large objects into functions by giving another name to the object and not copying the large object, while also allowing us avoid the hairy complications of passing by pointer. A reference variable can never point to null so there doesn’t need to be a null check in every function, and since a reference variable is not heap allocated, there is no worry of a memory leak or figuring out who is responsible for owning and deleting the object.

class Foo {
 public:
  int data[100];
};
void PassByReference(Foo& f, int n){
  // Do something with f and n.
}
int main(){
  Foo foo;
  int i = 0;
  PassByReference(foo, i);
  return 0;
}

Now anything that PassByReference() does to Foo will be still visible in main() after the function ends, but no large objects were copied so the overhead is negligible. Only a reference variable f was created to refer to the foo . Note here that the ampersand does not mean the address-of operator used to retrieve a pointer to an object. Here, it specifies that the variable is a reference variable that refers to an object that already exists. For you C++ language lawyers out there, the address-of operator (&) is an operator that can only be applied to an lvalue, whereas the reference ampersand is used in a declaration specifier sequence.

Pass by Pointer

A pointer is a special type of object that has the memory address of some object. The object can be accessed by dereferencing (*) the pointer, which knows the type of object it is pointing to. Since pointers are memory addresses, they are only 32 or 64 bits so they take up at most 8 bytes. Let’s revisit the previous example but pass Foo by pointer instead.

class Foo {
 public:
  int data[100];
};
void PassByPointer(Foo* f, int n){
  // Do something with *f and n.
}
int main(){
  Foo foo;
  int i = 0;
  PassByPointer(&foo, i);
  return 0;
}

Here, we are passing &foo as the first argument to PassByPointer(). The & operator means “address of” and it is called the address-of operator. Simple, right? It gets more confusing because the & can also denote a reference variable, it depends on the context of where the & is.

So now, foo is not really being copied into the function. The pointer to foo is being copied, and as state above, this is only 8 bytes on most modern computing architectures. This is not a lot of data and can be done in one instruction cycle. Now PassByPointer() can access foo’s member variables by using the dereference operator.

All modifications to the object will also be visible in the main() function after PassByPointer() ends. This may seem like the perfect solution, but it does have its drawbacks.

One drawback with passing by pointer is the lifetime of the pointed to data. The memory location containing the object can change at any time. A common scenario is when the memory location of the object is updated or deleted by another thread. For example, thread A might delete foo and thread B might try to read foo. Depending on the order of execution of threads A and B, there could be a read-after-delete situation, or everything could be fine if the read happens before the delete. This is called a race condition. When multiple threads are accessing the same resource, concurrency primitives such as mutexes must be designed into the program.

There is also a special pointer called nullptr which has value 0x0 and, as the name suggests, points to nothing. Trying to dereference (read from) it will lead to a dreaded segmentation fault. Since nullptr is allowed wherever a function accepts a pointer type, PassByPointer() could be called like PassByPointer(nullptr, 0); . Now, PassByPointer(), as well as every function that has a pointer parameter, needs to do a nullptr check to make sure that it received a valid pointer before dereference it.

void PassByPointer(Foo* f, int n){
  if(f == nullptr) return;
  // Do something with *f and n.
}

As you can see, this can be very tedious and it’s very easy to miss a function in a large codebase.
Another drawback is the concept of ownership. This mostly applies to objects allocated on the heap (i.e. dynamic memory or free store). For example,

  Foo* foo = new Foo();
  int n = 0;
  PassByPointer(foo, n);
  // Do I delete foo here? Or will PassByPointer do it?
  // delete foo;
  return 0;
}

As you may already know, objects allocated on the heap need to be freed before the program ends. In this case, who is responsible for deleting Foo? Is it main()? Or is it PassByPointer()? A programmer would have to look at the implementation of PassByPointer() or maybe the documentation of the function if it is been kept up to date (that is a very big if). If PassByPointer() is taken from a library, the user of the library shouldn’t have to jump around the implementation details to figure out who is responsible for deleting the heap allocated object. Even worse, if the library writers decide to switch the function’s responsibility from not deleting the pointer to deleting the pointer or vice versa, the library user’s code will have a double delete error or memory leak. Nonetheless to say, pointers are tricky but there is hope. In C++11’s standard template library, there is something called a unique pointer which encapsulates the concept of ownership and avoids these problems with ‘raw’ pointers. Unique pointers require knowledge of move semantics and rvalues which I will not cover here. For more information about unique pointers, check out
https://en.cppreference.com/w/cpp/memory/unique_ptr

Good reading materials:
C++ Pass by Value, Pointer*, &Reference

你可能感兴趣的:(C++ Pass by value vs Pass by reference vs Pass by pointers)