LINQ Delayed Query

Keywords: C# Lambda

LINQ defines a series of standard query operators, through which we can use query grammar or method grammar to query data sources. LINQ does not query data sources immediately after defining query statements, but only when it traverses the returned results through foreach. This technology is called LINQ delayed query, for example:

//Delayed query
int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
int i = 0;

var q = numbers.Where(x => { i++; return x > 2; });

foreach (var v in q)
{
    Console.WriteLine("v = {0}, i = {1}", v, i);
}

The output of this code is as follows

This shows that the program does not query until it executes the foreach loop, and then it modifies the program slightly to convert the query results to List:

//Delayed query
int[] numbers = new int[] { 5, 4, 1, 3, 9, 8, 6, 7, 2, 0 };
int i = 0;

var q = numbers.Where(x => { i++; return x > 2; }).ToList();

foreach (var v in q)
{
    Console.WriteLine("v = {0}, i = {1}", v, i);
}

Implementation results

The program has queried when it executes ToList, so it returns all i values of 10. Why is LINQ's delayed query so? Let's look at the source code of the extension method Where in the Enumerable static class under the System.Linq namespace (http://reference source.microsoft.com/#System.Core/System/Linq/Enumerable.cs, 577032c8811e20d3):

public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
            if (source == null) throw Error.ArgumentNull("source");
            if (predicate == null) throw Error.ArgumentNull("predicate");
            if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);
            if (source is TSource[]) return new WhereArrayIterator<TSource>((TSource[])source, predicate);
            if (source is List<TSource>) return new WhereListIterator<TSource>((List<TSource>)source, predicate);
            return new WhereEnumerableIterator<TSource>(source, predicate);
        }

That is to say, var q = numbers. Where (x = > {i++; return x > 2;}) is defined; the program simply returns a WhereArray Iterator < TSource > object with a reference to source data and lambda expressions, which is as follows:

class WhereArrayIterator<TSource> : Iterator<TSource>
{
    TSource[] source;
    Func<TSource, bool> predicate;
    int index;

    public WhereArrayIterator(TSource[] source, Func<TSource, bool> predicate) {
        this.source = source;
        this.predicate = predicate;
    }

    public override Iterator<TSource> Clone() {
        return new WhereArrayIterator<TSource>(source, predicate);
    }

    public override bool MoveNext() {
        if (state == 1) {
            while (index < source.Length) {
                TSource item = source[index];
                index++;
                if (predicate(item)) {
                    current = item;
                    return true;
                }
            }
            Dispose();
        }
        return false;
    }

    public override IEnumerable<TResult> Select<TResult>(Func<TSource, TResult> selector) {
        return new WhereSelectArrayIterator<TSource, TResult>(source, predicate, selector);
    }

    public override IEnumerable<TSource> Where(Func<TSource, bool> predicate) {
        return new WhereArrayIterator<TSource>(source, CombinePredicates(this.predicate, predicate));
    }
}

We see the if (predicate (item) statement under the MoveNext() method, where the program determines each value of the source data, so when the returned WhereArrayIterator < TSource > object is looped, the compiler implicitly calls its MoveNext method to execute the query.

Let's look at the ToList extension method under the Enumerable class:

public static List<TSource> ToList<TSource>(this IEnumerable<TSource> source) {
            if (source == null) throw Error.ArgumentNull("source");
            return new List<TSource>(source);
        }

The ToList method returns a new List <TSource> object. Let's look at the constructor of List <TSource>.

public List(IEnumerable<T> collection) 
{
    if (collection==null)
        ThrowHelper.ThrowArgumentNullException(ExceptionArgument.collection);
    Contract.EndContractBlock();

    ICollection<T> c = collection as ICollection<T>;
    if( c != null) {
        int count = c.Count;
        if (count == 0)
        {
            _items = _emptyArray;
        }
        else {
            _items = new T[count];
            c.CopyTo(_items, 0);
            _size = count;
        }
    }    
    else {                
        _size = 0;
        _items = _emptyArray;
        // This enumerable could be empty.  Let Add allocate a new array, if needed.
        // Note it will also go to _defaultCapacity first, not 1, then 2, etc.
        
        using(IEnumerator<T> en = collection.GetEnumerator()) {
            while(en.MoveNext()) {
                Add(en.Current);                                    
            }
        }
    }
}

The constructor converts metadata to ICollection, because our metadata type cannot be converted to ICollection, returns null, and then executes the code of the else segment, where en.MoveNext() is executed to query the data source.

Posted by Kaboom on Thu, 27 Jun 2019 15:06:32 -0700