The Object base type and implementation of interfaces. Boxing


It seems we came through hell and high water and can nail any interview, even the one for .NET CLR team. However, let's not rush to microsoft.com and search for vacancies. Now, we need to understand how value types inherit an object if they contain neither a reference to SyncBlockIndex, not a pointer to a virtual methods table. This will completely explain our system of types and all pieces of a puzzle will find their places. However, we will need more than one sentence.


Now, let's remember again how value types are allocated in memory. They get the place in memory right where they are. Reference types get allocation on the heap of small and large objects. They always give a reference to the place on the heap where the object is. Each value type has such methods as ToString, Equals and GetHashCode. They are virtual and overridable, but don’t allow to inherit a value type by overriding methods. If value types used overridable methods, they would need a virtual methods table to route calls. This would lead to the problems of passing structures to unmanaged world: extra fields would go there. As a result, there are descriptions of value type methods somewhere, but you cannot access them directly via a virtual methods table.


This may bring the idea that the lack of inheritance is artificial


This chapter was translated from Russian jointly by author and by professional translators. You can help us with translation from Russian or English into any other language, primarily into Chinese or German.

Also, if you want thank us, the best way you can do that is to give us a star on github or to fork repository github/sidristij/dotnetbook.

This may bring the idea that the lack of inheritance is artificial:


  • there is inheritance from an object, but not direct;
  • there are ToString, Equals and GetHashCode inside a base type. In value types these methods have their own behavior. This means, that methods are overridden in relation to an object;
  • moreover, if you cast a type to an object, you have the full right to call ToString, Equals and GetHashCode;
  • when calling an instance method for a value type, the method gets another structure that is a copy of an original. That means calling an instance method is like calling a static method: Method(ref structInstance, newInternalFieldValue). Indeed, this call passes this, with one exception, however. A JIT should compile the body of a method, so it would be unnecessary to offset structure fields, jumping over the pointer to a virtual methods table, which doesn’t exist in the structure. It exists for value types in another place.

Types are different in behavior, but this difference is not so big on the level of implementation in the CLR. We will talk about it a little later.


Let's write the following line in our program:


var obj = (object)10;

It will allow us to deal with number 10 using a base class. This is called boxing. That means we have a VMT to call such virtual methods as ToString(), Equals and GetHashCode. In reality boxing creates a copy of a value type, but not a pointer to an original. This is because we can store the original value everywhere: on the stack or as a field of a class. If we cast it to an object type, we can store a reference to this value as long as we want. When boxing happens:


  • the CLR allocates space on the heap for a structure + SyncBlockIndex + VMT of a value type (to call ToString, GetHashCode, Equals);
  • it copies an instance of a value type there.

Now, we’ve got a reference variant of a value type. A structure has got absolutely the same set of system fields as a reference type,
becoming a fully-fledged reference type after boxing. The structure became a class. Let’s call it a .NET somersault. This is a fair name.


Just look at what happens if you use a structure which implements an interface using the same interface.



struct Foo : IBoo
{
    int x;
    void Boo()
    {
        x = 666;
    }
}

IBoo boo = new Foo();

boo.Boo();

When we create the Foo instance, its value goes to the stack in fact. Then we put this variable into an interface type variable and the structure into a reference type variable. Next, there is boxing and we have the object type as an output. But it is an interface type variable. That means we need type conversion. So, the call happens in a way like this:


IBoo boo = (IBoo)(box_to_object)new Foo();
boo.Boo();

Writing such code is not effective. You will have to change a copy instead of an original:


void Main()
{
    var foo = new Foo();
    foo.a = 1;
    Console.WriteLite(foo.a);   // -> 1

    IBoo boo = foo;
    boo.Boo();                  // looks like changing foo.a to 10
    Console.WriteLite(foo.a);   // -> 1
}

struct Foo: IBoo
{
    public int a;
    public void Boo()
    {
        a = 10;
    }
}

interface IBoo
{
    void Boo();
}

The first time we look at the code, we don’t have to know what we deal with in the code other than our own and see a cast to IBoo interface. This makes us think Foo is a class and not a structure. Then there is no visual division in structures and classes, which makes us think the
interface modification results must get into foo, which doesn’t happen as boo is a copy of foo. That is misleading. In my opinion, this code should get comments, so other developers could deal with it.


The second thing relates to the previous thoughts that we can cast a type from an object to IBoo. This is another proof that a boxed value type is a reference variant of a value type. Or, all types in a system of types are reference types. We can just work with structures as with value types, passing their value entirely. Dereferencing a pointer to an object as you would say in the world of C++.


You can object that if it was true, it would look like this:


var referenceToInteger = (IInt32)10;

We would get not just an object, but a typed reference for a boxed value type. It would destroy the whole idea of value types (i.e. integrity of their value) allowing for great optimization, based on their properties. Let’s take down this idea!


public sealed class Boxed<T>
{
    public T Value;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
     public override bool Equals(object obj)
     {
         return Value.Equals(obj);
     }

     [MethodImpl(MethodImplOptions.AggressiveInlining)]
     public override string ToString()

     {
         return Value.ToString();
     }

     [MethodImpl(MethodImplOptions.AggressiveInlining)]
     public override int GetHashCode()
     {
         return Value.GetHashCode();
     }
}

We’ve got a complete analog of boxing. However, we can change its contents by calling instance methods. These changes will affect all parts with a reference to this data structure.


var typedBoxing = new Boxed<int> { Value = 10 };
var pureBoxing = (object)10;

The first variant isn’t very attractive. Instead of casting a type we create nonsense. The second line is much better, but the two lines are almost identical. The only difference is that there is no memory cleaning with zeros during the usual boxing after allocating memory on the heap. The necessary structure takes the memory straight away whereas the first variant needs cleaning. This makes it work longer than the usual boxing by 10%.


Instead, we can call some methods for our boxed value.


struct Foo
{
    public int x;

    public void ChangeTo(int newx)
    {
        x = newx;
    }
}
var boxed = new Boxed<Foo> { Value = new Foo { x = 5 } };
boxed.Value.ChangeTo(10);
var unboxed = boxed.Value;

We’ve got a new instrument. Let's think what we can do with it.


  • Our Boxed<T> type does the same as the usual type: allocates memory on the heap, passes a value there and allows to get it, by doing a kind of unbox;
  • If you lose a reference to a boxed structure, the GC will collect it;
  • However, we can now work with a boxed type, i.e. calling its methods;
  • Also, we can replace an instance of a value type in the SOH/LOH for another one. We couldn’t do it before, as we would have to do unboxing, change structure to another one and do boxing back, giving a new reference to customers.

The main problem of boxing is creating traffic in memory. The traffic of unknown number of objects, the part of which can survive up to generation one, where we get problems with garbage collection. There will be a lot of garbage and we could have avoided it. But when we have the traffic of short-lived objects, the first solution is pooling. This is an ideal end of .NET somersault.


var pool = new Pool<Boxed<Foo>>(maxCount:1000);
var boxed = pool.Box(10);
boxed.Value=70;

// use boxed value here

pool.Free(boxed);

Now boxing can work using a pool, which eliminates memory traffic while boxing. We can even make objects go back to life in finalization method and put themselves back into the pool. This might be useful when a boxed structure goes to asynchronous code other than yours and you cannot understand when it became unnecessary. In this case, it will return itself back to pool during GC.


Let’s conclude:


  • If boxing is accidental and shouldn’t happen, don’t make it happen. It can lead to problems with performance.
  • If boxing is necessary for the architecture of a system, there may be variants. If the traffic of boxed structures is small and almost invisible, you can use boxing. If the traffic is visible, you might want to do the pooling of boxing, using one of the solutions stated above. It spends some resources, but makes GC work without overload;

Ultimately let’s look at a totally impractical code:


static unsafe void Main()
{
    // here we create boxed int
    object boxed = 10;

    // here we get the address of a pointer to a VMT
    var address = (void**)EntityPtr.ToPointerWithOffset(boxed);

    unsafe
    {
        // here we get a Virtual Methods Table address
        var structVmt = typeof(SimpleIntHolder).TypeHandle.Value.ToPointer();

       // change the VMT address of the integer passed to Heap into a VMT SimpleIntHolder, turning Int into a structure
       *address = structVmt;
    }

    var structure = (IGetterByInterface)boxed;

    Console.WriteLine(structure.GetByInterface());
}

interface IGetterByInterface
{
    int GetByInterface();
}

struct SimpleIntHolder : IGetterByInterface
{
    public int value;

    int IGetterByInterface.GetByInterface()
    {
        return value;
    }
}

The code uses a small function, which can get a pointer from a reference to an object. The library is available at github address. This example shows that usual boxing turns int into a typed reference type. Let’s
look at the steps in the process:


  1. Do boxing for an integer.
  2. Get the address of an obtained object (the address of Int32 VMT)
  3. Get the VMT of a SimpleIntHolder
  4. Replace the VMT of a boxed integer to the VMT of a structure.
  5. Make unboxing into a structure type
  6. Display the field value on screen, getting the Int32, that was
    boxed.

I do it via the interface on purpose as I want to show that it will work
that way.


Nullable\<T>


It is worth mentioning about the behavior of boxing with Nullable value types. This feature of Nullable value types is very attractive as the boxing of a value type which is a sort of null returns null.


int? x = 5;
int? y = null;

var boxedX = (object)x; // -> 5
var boxedY = (object)y; // -> null

This leads us to a peculiar conclusion: as null doesn’t have a type, the
only way to get a type, different from the boxed one is the following:


int? x = null;
var pseudoBoxed = (object)x;
double? y = (double?)pseudoBoxed;

The code works just because you can cast a type to anything you like
with null.


Going deeper in boxing


As a final bit, I would like to tell you about System.Enum type. Logically this should be a value type as it’s a usual enumeration: aliasing numbers to names in a programming language. However, System.Enum is a reference type. All the enum data types, defined in your field as well as in .NET Framework are inherited from System.Enum. It’s a class data type. Moreover, it’s an abstract class, inherited from System.ValueType.


    [Serializable]
    [System.Runtime.InteropServices.ComVisible(true)]
    public abstract class Enum : ValueType, IComparable, IFormattable, IConvertible
    {
        // ...
    }

Does it mean that all enumerations are allocated on the SOH and when we use them, we overload the heap and GC? Actually no, as we just use them. Then, we suppose that there is a pool of enumerations somewhere and we just get their instances. No, again. You can use enumerations in structures while marshaling. Enumerations are usual numbers.


The truth is that CLR hacks data type structure when forming it if there is enum turning a class into a value type:


// Check to see if the class is a valuetype; but we don't want to mark System.Enum
// as a ValueType. To accomplish this, the check takes advantage of the fact
// that System.ValueType and System.Enum are loaded one immediately after the
// other in that order, and so if the parent MethodTable is System.ValueType and
// the System.Enum MethodTable is unset, then we must be building System.Enum and
// so we don't mark it as a ValueType.
if(HasParent() &&
    ((g_pEnumClass != NULL && GetParentMethodTable() == g_pValueTypeClass) ||
    GetParentMethodTable() == g_pEnumClass))
{
    bmtProp->fIsValueClass = true;
    HRESULT hr = GetMDImport()->GetCustomAttributeByName(bmtInternal->pType->GetTypeDefToken(),
                                                            g_CompilerServicesUnsafeValueTypeAttribute,
                                                            NULL, NULL);

    IfFailThrow(hr);
    if (hr == S_OK)
    {
        SetUnsafeValueClass();
    }
}

Why doing this? In particular, because the idea of inheritance — to do a customized enum, you, for example, need to specify the names of possible values. However, it is impossible to inherit value types. So, developers designed it to be a reference type that can turn it into a value type when compiled.


What if you want to see boxing personally?


Fortunately, you don’t have to use a disassembler and get into the code jungle. We have the texts of the whole .NET platform core and many of them are identical in terms of .NET Framework CLR and CoreCLR. You can click the links below and see the implementation of boxing right away:



Here, the only method is used for unboxing:
JIT_Unbox(..), which is a wrapper around JIT_Unbox_Helper(..).


Also, it is interesting that (https://stackoverflow.com/questions/3743762/unboxing-does-not-create-a-copy-of-the-value-is-this-right), unboxing doesn’t mean copying data to the heap. Boxing means passing a pointer to the heap while testing the compatibility of types. The IL opcode following unboxing will define the actions with this address. The data might be copied to a local variable or the stack for calling a method. Otherwise, we would have a double copying; first when copying from the heap to somewhere, and then copying to the destination place.


Questions


Why .NET CLR can’t do pooling for boxing itself?


If we talk to any Java developer, we will know two things:


  • All value types in Java are boxed, meaning they are not essentially value types. Integers are also boxed.
  • For the reason of optimization all integers from -128 to 127 are taken from the pool of objects.

So, why this doesn’t happen in .NET CLR during boxing? It is simple. Because we can change the content of a boxed value type, that is we can do the following:


object x = 1;
x.GetType().GetField("m_value", BindingFlags.Instance | BindingFlags.NonPublic).SetValue(x, 138);
Console.WriteLine(x); // -> 138

Or like this (С++/CLI):


void ChangeValue(Object^ obj)
{
    Int32^ i = (Int32^)obj;
    *i = 138;
}

If we dealt with pooling, then we would change all ones in application to 138, which is not good.


The next is the essence of value types in .NET. They deal with value, meaning they work faster. Boxing is rare and addition of boxed numbers belongs to the world of fantasy and bad architecture. This is not useful at all.


Why it is not possible to do boxing on stack instead of the heap, when you call a method that takes an object type, which is a value type in fact?


If the value type boxing is done on the stack and the reference will go to the heap, the reference inside the method can go somewhere else, for example a method can put the reference in the field of a class. The method will then stop, and the method that made boxing will also stop. As a result, the reference will point to a dead space on the stack.


Why it is not possible to use Value Type as a field?


Sometimes we want to use a structure as a field of another structure which uses the first one. Or simpler: use structure as a structure field. Don't ask me why this can be useful. It cannot. If you use a structure as its field or through dependence with another structure, you create recursion, which means infinite size structure. However, .NET Framework has some places where you can do it. An example is System.Char, which contains itself:


public struct Char : IComparable, IConvertible
{

    // Member Variables

    internal char m_value;

    //...
}

All CLR primitive types are designed this way. We, mere mortals, cannot implement this behavior. Moreover, we don't need this: it is done to give primitive types a spirit of OOP in CLR.


This charper translated from Russian as from language of author by professional translators. You can help us with creating translated version of this text to any other language including Chinese or German using Russian and English versions of text as source.

Also, if you want to say «thank you», the best way you can choose is giving us a star on github or forking repository https://github.com/sidristij/dotnetbook

Комментарии (0)