Let’s briefly consider the memory layout for objects and values. It should help to illustrate some of the fundamental differences. Consider if we had a class and a struct, both containing two int
fields:
class SampleClass
{
public int x;
public int y;
}
struct SampleStruct
{
public int x;
public int y;
}
They both appear similar, but instances of them are quite different. This can be seen graphically in Figure 1, and is described further below.
You’ll immediately notice that the size of a value is smaller than that of an object.
An object is completely self-describing. Areference to it is the size of a machine pointer—that is, 32 bits on a 32-bit machine, 64 bits on a 64-bit—that points into the GC heap. The target of the pointer is actually another pointer, which refers to an internal CLR data structure called a method table. The method table facilitates method calls and is also used to obtain an object’s type dynamically. The double word before that (fancy name for 4 bytes, or 32 bits) makes up the so-called sync-block, which is used to store such miscellaneous things as locking, COM interoperability, and hash code caching (among others). After these come the actual values that make up instance state of the object.
The sum of this is that there is roughly a quad word (8-byte, 64-bit) overhead per object. This is, of course, on a 32-bit machine; for 64-bit machines, the size would be slightly larger. The exact number is an implementation detail and can actually grow once you start using certain parts of the runtime. For example, the sync-block points at other internal per-object runtime data structures that can collect dust over time as you use an object.
Values are not self-describing at all. Rather they are just a glob of bytes that compose its state. Notice above that the pointer just refers to the first byte of our value, with no sync-block or method table involved. You might wonder how type checking is performed in the absence of any type information tagged to an instance. A method table does of course exist for each value type. The solution is that the location in which a value is stored may only store values of a certain type. This is guaranteed by the verifier.
For example, a method body can have a number of local slots in which values may be stored, each of which stores only values of a precise type; similarly, fields of a type have a precise type. The size of the storage location for values is always known statically. For example, the SampleStruct
above consumes 64 bits of space, because it consists of two 32-bit integers. Notice that there is no overhead—what you see is what you get. This is quite different from reference types, which need extra space to carry runtime type information around. In cases where structs aren’t properly aligned, the CLR will pad them; this occurs for structs that don’t align correctly on word boundaries.
Lastly, because values are really just a collection of bytes representing the data stored inside an instance, a value cannot take on the special value of null
. In other words, 0
is a meaningful value for all value types. The Nullable<T>
type adds support for nullable value types. We discuss this shortly.
The size of a value type can be discovered in C# by using the sizeof(T)
operator, which returns the size of a target type T
. It uses the sizeof
instruction in the IL:
Console.WriteLine(sizeof(SampleStruct));
For primitive types, this simply embeds the constant number in the source file instead of using the sizeof
instruction, since these do not vary across implementations. For all other types, this requires unsafe code permissions to execute.
As we’ve seen, objects and values are treated differently by the runtime. They are represented in different manners, with objects having some overhead for virtual method dispatch and runtime type identity, and values being simple raw sequences of bytes. There are some cases where this difference can cause a mismatch between physical representation and what you would like to do. For example:
Object
, either a local, field, or argument, for example, will not work correctly. A reference expects that the first double word it points to will be a pointer to the method table for an object. this
pointer compatible with the original method definition be defined. The value of a derived value type will not suffice. To solve all four of these problems, we need a way to bridge the gap between objects and values.
This is where boxing and unboxing come into the picture. Boxing a value transforms it into an object by copying it to the managed GC heap into an object-like structure. This structure has a method table and generally looks just like an object such that Object
compatibility and virtual and interface method dispatch work correctly. Unboxing a boxed value type provides access to the raw value, which most of the time is copied to the caller’s stack, and which is necessary to store it back into a slot typed as holding the underlying value type.
Some languages perform boxing and unboxing automatically. Both C# and VB do. As an example, the C# compiler will notice assignment from int
to object
in the following program:
int x = 10;
object y = x;
int z = (int)y;
It responds by inserting a box
instruction in the IL automatically when y
is assigned the value of x
, and an unbox
instruction when z
is assigned the value of y
:
ldc.i4.s 10
stloc.0
ldloc.0
box [mscorlib]System.Int32
stloc.1
ldloc.1
unbox.any [mscorlib]System.Int32
stloc.2
The code loads the constant 10
and stores it in the local slot 0; it then loads the value 10
onto the stack and boxes it, storing it in local slot 1; lastly, it loads the boxed 10
back onto the stack, unboxes it into an int
, and stores the value in local slot 2. You might have noticed the IL uses the unbox.any
instruction. The difference between unbox
and unbox.any
is clearly distinguished in Chapter 3, although the details are entirely an implementation detail.
A new type, System.Nullable<T>
, has been added to the BCL in 2.0 for the purpose of providing null
semantics for value types. It has deep support right in the runtime itself. (Nullable<T>
is a generic type. If you are unfamiliar with the syntax and capabilities of generics, I recommend that you first read about this at the end of this chapter. The syntax should be more approachable after that. Be sure to return, though; Nullable<T>
is a subtly powerful new feature.)
The T
parameter for Nullable<T>
is constrained to struct arguments. The type itself offers two properties:
namespace System
{
struct Nullable<T> where T : struct
{
public Nullable(T value);
public bool HasValue { get; }
public T Value { get; }
}
}
The semantics of this type are such that, if HasValue
is false
, the instance represents the semantic value null
. Otherwise, the value represents its underlying Value. C# provides syntax for this. For example, the first two and second two lines are equivalent in this program:
Nullable<int> x1 = null;
Nullable<int> x2 = new Nullable<int>();
Nullable<int> y1 = 55;
Nullable<int> y2 = new Nullable<int>(55);
Furthermore, C# aliases the type name T?
to Nullable<T>
; so, for example, the above example could have been written as:
int? x1 = null;
int? x2 = new int?();
int? y1 = 55;
int? y2 = new int?(55);
This is pure syntactic sugar. C# compiles it into the proper Nullable<T>
construction and property accesses in the IL.
C# also overloads nullability checks for Nullable<T>
types to implement the intuitive semantics. That is, x == null
, when x
is a Nullable<T>
where HasValue == false
, evaluates to true
. To maintain this same behavior when a Nullable<T>
is boxed—transforming it into its GC heap representation— the runtime will transform Nullable<T>
values where HasValue == false
into real null
references. Notice that the former is purely a language feature, while the latter is an intrinsic property of the type’s treatment in the runtime.
To illustrate this, consider the following code:
int? x = null;
Console.WriteLine(x == null);
object y = x; // boxes ‘x’, turning it into a null
Console.WriteLine(y == null);
As you might have expected, both WriteLines
print out "True
." But this only occurs because the language and runtime know intimately about the Nullable<T>
type.
Note also that when a Nullable<T>
is boxed, yet HasValue == true
, the box operation extracts the Value
property, boxes that, and leaves that on the stack instead. Consider this in the following example:
int? x = 10;
Console.WriteLine(x.GetType());
This snippet prints out the string "System.Int32
", not "System.Nullable`1<System.Int32>
", as you might have expected. The reason is that, to call GetType
on the instance, the value must be boxed. This has to do with how method calls are performed, namely that to call an inherited instance method on a value type, the instance must first be boxed. The reason is that the code inherited does not know how to work precisely with the derived type (it was written before it even existed!), and thus it must be converted into an object. Boxing the Nullable<T>
with HasValue == true
results in a boxed int
on the stack, not a boxed Nullable<T>
. We then make the method invocation against that.