Why does a compiler generated IEnumerator<T> hold a reference to the instance that created it?

While working on a project, I wrote an iterator block similar to the following:

public class Sequence<T> : IEnumerable<T>
{
    public T Head{get; private set;}
    public Sequence<T> Tail {get; private set;}

    public bool IsEmpty {get; private set;}

    public IEnumerator<T> GetEnumerator()
    {
        Sequence<T> collection = this;

        while (!collection.IsEmpty)
        {
            yield return collection.Head;
            collection = collection.Tail;
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

As you can see, I expected that after the second call to MoveNext, the GC would be able to collect the original collection, since the iterator block no longer holds a reference to it, only its tail (as seen in collection = collection.Tail).

However, this did not happen. I discovered the compiler-generated IEnumerator<T> will always hold a reference to the instance of Sequence<T> that created it.

To prove this, I wrote the following iterator block and inspected the generated IL:

public IEnumerator<T> GetEnumerator()
{
    yield return default(T);
}

To my surprise, the IL was equivalent to this:

public IEnumerator<T> GetEnumerator()
{
    var enumerator = new CompilerGeneratedEnumerator();
    enumerator.this_field = this;
}

Verbatim:

.maxstack 2
.locals init (
    [0] class Sequences.Sequence`1/'<GetEnumerator>d__3'<!T>
)

IL_0000: ldc.i4.0
IL_0001: newobj instance void class Sequences.Sequence`1/'<GetEnumerator>d__3'<!T>::.ctor(int32)
IL_0006: stloc.0
IL_0007: ldloc.0
IL_0008: ldarg.0
IL_0009: stfld class Sequences.Sequence`1<!0> class Sequences.Sequence`1/'<GetEnumerator>d__3'<!T>::'<>4__this'
IL_000e: ldloc.0
IL_000f: ret

By looking at the IL for <GetEnumerator>d__3, it seems the <>4__this field is never accessed. So why is it generated anyway? Why does the enumerator need to point to the instance of Sequence<T> that created it?

I was able to get around this problem by writing my own IEnumerator<T>, but I'm still wondering why this happens in the first place.


If you want to compile this yourself, you can grab the project's source from here: https://github.com/dcastro/Sequences

And here's the original iterator block:

ISequence<T> sequence = this;

while (!sequence.IsEmpty)
{
    yield return sequence.Head;
    sequence = sequence.Tail;
}
Jon Skeet
people
quotationmark

Logically your first method should capture this. This line:

Sequence<T> collection = this;

... will only execute on the first call to MoveNext(), so it really does need to capture it, and it can only capture it in an instance variable in the generated code. The compiler could explicitly null it out after its final use, but usually that would just be wasteful.

Now your second case is more interesting. Yes, in order to complete the method it doesn't need a reference to this - but if you were in a debugger, and you had a breakpoint on the yield return statement, you would expect to be able to inspect this, as you're in an instance method. So at least in a build with debug information and no optimization, I think it's reasonable to include this as an instance variable. In an optimized build it would make sense not to capture this (and accept that if you're debugging a build not meant for debugging, there are some limitations) but I guess this is just an optimization the compiler authors didn't consider important.

people

See more on this question at Stackoverflow