LINQ performance analysis

Keywords: C# SQL less Attribute Python

catalog

LINQ's advantage is not to provide any new functions, but to enable us to achieve the original functions in an updated, simpler and more elegant way. But generally speaking, what this kind of function brings is the impact on Performance -- LINQ is no exception. The main purpose of this article is to let you understand the impact of LINQ queries on performance. We will introduce the basic LINQ performance analysis method and provide some data. There are also some common misunderstandings that we can carefully avoid once we understand them.

In general, there are always different ways to do the same thing in the. NET framework. Sometimes these differences are only reflected in personal preferences or code form consistency. In other cases, however, the right choice will play a decisive role in the overall program. This is also true in LINQ - there are some practices that are well suited for use in LINQ queries, and some that should be avoided as much as possible.

We still use LINQ to Text Files sample program (link: https://www.vinanysoft.com/c-sharp/linq-to-text-files/ )As a start. We can see the importance of choosing the right way to read the text file in LINQ query.

Choose the appropriate flow operation mode

There is a potential problem with the LINQ to Text Files sample program, which uses the ReadAllLines method. This method returns all content in the CSV file at once. This is not a problem for small files. However, if the file is very large, then the program will take up quite amazing memory!

These are not the only problems. Such queries may affect the delayed query execution feature in LINQ that we expect. In general, the execution of a query will start when it is needed, that is, a query will only start execution when we start traversing its results (using a foreach loop, for example). Here, however, the ReadAllLines method executes immediately and loads the entire file into memory. But it's very likely that the program doesn't need all of the data.

LINQ to Objects is highly recommended to execute queries in a delayed manner at design time. This stream like processing also saves resources (memory, CPU, etc.). So we should also try to use similar methods to write programs.

The. NET framework provides many ways to read text files, File.ReadAllLines Is one of the more simple. The better solution is to load the file in the form of stream with the StreamReader object, which will greatly save resources and make the execution of the program more fluent. There are many ways to integrate StreamReader into query statements, one of the more elegant is to create a custom query operator.

Lines query operator, used to return lines of text one by one from the StreamReader object:

public static class StreamReaderEnumerable
{
    public static IEnumerable<string> Lines(this StreamReader source)
    {
        string line;

        if (source == null)
        throw new ArgumentNullException("source");

        while ((line = source.ReadLine()) != null)
        yield return line;
    }
}

The Lines query operator is provided as an extended method of the StreamReader class. This operator will return each row of data in the source file provided by StreamReader in turn, but it will not load any data into memory until the query actually starts executing.

Use the Lines query operator to stream CSV files:

using (StreamReader reader = new StreamReader("books.csv"))
{
    var books = from line in reader.Lines()
                where !line.StartsWith("#")
                let parts = line.Split(',')
                select new
                {
                    Title = parts[1],
                    Publisher = parts[3],
                    Isbn = parts[0]
                };
}

The advantage of the above approach is that we can operate on large files while maintaining a small memory footprint. This kind of problem is very important to improve the efficiency of query execution. Without careful design, query statements often consume a lot of memory.

Let's review the changes in the current version of LINQ to Text Files. The key is to implement deferred evaluation - objects are created only when they are needed, that is, when they start traversing the results, rather than when they are in place at the beginning of the query.

If we use foreach to traverse the result of the query:

foreach (var book in books)
{
    Console.WriteLine(book.Isbn);
}

The book object in the foreach loop only exists in the current iteration, that is, not all objects in the collection must exist in memory at the same time. Each iteration involves reading a line from a file, dividing it into string arrays, and creating objects based on the result of segmentation. Once the current object has been manipulated, the program begins to read the next line of the file until all the lines in the file have been processed.

It can be seen that due to the advantage of delayed execution, the program uses less resources, and the memory consumption is greatly reduced.

Be careful to execute immediately

Most standard query operators implement deferred execution through iterators. As mentioned earlier, this will help reduce the resources consumed by the program. However, there are some query operators that break the elegant delay execution feature. In fact, the behavior of these query operators itself needs to traverse all elements of the sequence at once.

Generally, those operators that return quantity values instead of sequences need to be executed immediately, such as Aggregate, Average, Count, LongCount, Max, Min and Sum. It's no surprise that aggregation is meant to compute a quantity value from a set of set data. To calculate this result, the operator needs to traverse every element of the collection.

In addition, some operators that return sequences, rather than quantity values, also need to traverse the source sequence completely before returning. For example, OrderBy, OrderByDescending, and Reverse. Such operators change the position of elements in the source sequence. In order to calculate the position of an element in a sequence correctly, these operators need to traverse the source sequence first.

Let's continue to use the LINQ to Text Files example to describe the problem in detail. In the previous section, we loaded the source file line by line as a stream instead of loading it completely at once. As shown in the following code:

using (StreamReader reader = new StreamReader("books.csv"))
{
    var books = from line in reader.Lines()
                where !line.StartsWith("#")
                let parts = line.Split(',')
                select new
                {
                    Title = parts[1],
                    Publisher = parts[3],
                    Isbn = parts[0]
                };
}

foreach (var book in books)
{
    Console.WriteLine(book.Isbn);
}

The above code is executed in this order.

  • (1) At the beginning of a loop, use the Lines operator to read a line from the file.
    • a. If the entire file has been processed, the process will terminate.
  • (2) Use the Where operator to operate on this line.
    • a. If the line starts with a comment line, it will go back to step 1.
    • b. If the line is not a comment, continue processing.
  • (3) Split the row into parts.
  • (4) Create an object with the Select operator.
  • (5) Operate on the book object according to the statement in foreach.
  • (6) Go back to step 1.

You can see each step clearly by debugging step by step in Visual Studio. Here we also suggest that you can debug it once so that you can understand the execution process of LINQ query more clearly.

If you decide to process each line in a file in a different order (such as through an order by clause or a call to the Reverse operator), that process changes. For example, we added the Reverse operator to the query. The code is as follows:

...
from line in reader.Lines().Reverse()
...

At this point, the execution order of the query becomes the following.

  • (1) Execute the Reverse operator.
    • a. Immediately call the Lines operator to read all Lines and reverse them.
  • (2) At the beginning of a loop, the Reverse operator returns a row in the sequence.
    • a. If the entire file has been processed, the process will terminate.
  • (3) Use the Where operator to operate on this line.
    • a. If the line starts with a comment line, it will go back to step 2.
    • b. If the line is not a comment, continue processing.
  • (4) Split the row into parts.
  • (5) Create an object with the Select operator.
  • (6) Operate on the book object according to the statement in foreach.
  • (7) Go back to step 2.

As you can see, the Reverse operator completely destroys the previous graceful pipeline process, because it loads all the lines in the text file into memory at one time from the very beginning. Therefore, unless there is such a need, it is not easy to use such operators, otherwise, when dealing with large data sources, it will significantly reduce the execution efficiency of the program and occupy a large amount of memory.

Some conversion operators can also break the delayed execution features of queries, such as ToArray, ToDictionary, ToList, and ToLookup. Although these operators also return sequences, they are given at one time in the form of a collection containing all elements in the source sequence. In order to create the collection to be returned, these operators must completely traverse each element in the source sequence.

Now you've learned about the inefficient behavior of some query operators. Next, I'll introduce a common scenario, from which you can see why we should use LINQ and its standard query operators carefully.

Will LINQ to Objects degrade code performance

In many cases, LINQ to Objects does not directly provide the results we want. If we want to find an element in a given set, the value of a specified attribute of that element is the largest of all set elements. It's like finding the most chocolate in a box of cookies. This box of biscuits is that collection, and the amount of chocolate is the attribute to be compared.

At first, you might think of using max directly from the standard query operator. But the max operator can only return the maximum value, not the object that contains it. Max can help you find the maximum number of chocolates, but he can't tell you which cookie it is.

When dealing with this common scenario, we have many choices, including using LINQ in different ways or using traditional code directly. Let's take a look at several alternative ways to make up for Max's shortcomings.

SampleData reference link: https://www.vinanysoft.com/c-sharp/linq-in-action-test-data/

Different methods

The first is to use the foreach loop:

var books = SampleData.Books;
Book maxBook = null;
foreach (var book in books)
{
    if (maxBook == null || book.PageCount > maxBook.PageCount)
    {
        maxBook = book;
    }
}

This solution is very easy to understand, with references to "the most pages so far" retained. This method only needs to traverse the set once, and its time complexity is O(n). Unless we can learn more about the set, this is the fastest method in theory.

The second method is to sort the book objects in the collection according to the number of pages, and then obtain the first element:

var books = SampleData.Books;
var sortedList = from book in books
                    orderby book.PageCount descending
                    select book;
var maxBook = sortedList.First();

In the above approach, we first use LINQ query to arrange the book collection in reverse order according to the number of pages, and then get the first element. The disadvantage is that we have to sort the entire set before we can get results. Its time complexity is O(n log n).

The third method is to use subqueries:

var books = SampleData.Books;
var maxList = from book in books
                where book.PageCount == books.Max(b => b.PageCount)
                select book;
var maxBook = maxList.First();

In this method, we will find each book in the set whose page number is equal to the maximum page number, and then get the first one. However, in this way, the maximum number of pages will be calculated once when comparing each element, and the time complexity will increase to O(n2).

The fourth method is to use two queries:

var books = SampleData.Books;
var maxPageCount = books.Max(book => book.PageCount);
var maxList = from book in books
                where book.PageCount == maxPageCount
                select book;
var maxBook = maxList.First();

This approach is similar to the third, but does not calculate the maximum number of pages each time - it is calculated first. This reduces the time complexity to O(n), but we still need to traverse the set twice.

The significance of the last method is that it can be better integrated with LINQ, i.e. through a custom query operator. The following code shows the implementation of the MaxElement operator.

public static TElement MaxElement<TElement, TData>(
    this IEnumerable<TElement> source,
    Func<TElement, TData> selector)
    where TData : IComparable<TData>
{
    if (source == null)
        throw new ArgumentNullException("source");
    if (selector == null)
        throw new ArgumentNullException("selector");

    Boolean firstElement = true;
    TElement result = default(TElement);
    TData maxValue = default(TData);
    foreach (TElement element in source)
    {
        var candidate = selector(element);
        if (firstElement || (candidate.CompareTo(maxValue) > 0))
        {
            firstElement = false;
            maxValue = candidate;
            result = element;
        }
    }
    return result;
}

The query operator is very simple to use:

var maxBook = books.MaxElement(book => book.PageCount);

The following table shows the running time of the above five methods, each of which has been executed 20 times:

Method average time (MS) minimum time (MS) maximum time (MS)
foreach               4.15                        4                           5
OrderBy + First       360.6                       316                         439
 Subquery 4432.5 4364 4558
 Two queries 7.7.7 10
 Custom query operator 7.7 7 12

The test environment is Windows 10 professional edition, AMD Ryzen 5 2400G with Radeon Vega Graphics 3.60 GHz CPU, 32G memory, and the programs are compiled in Release mode.

The test code is as follows:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using LinqInAction.LinqBooks.Common;

static class Demo
{
    public static void Main()
    {
        BooksForPerformance();

        Console.WriteLine("{0,-20}{1,-20}{2,-20}{3,-20}", "method", "Average time (MS)", "Minimum time (MS)", "Maximum time (MS)");

        var time = 20;
        var result = Test(Foreach, time);
        Console.WriteLine($"{"foreach",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(OrderByAndFirst, time);
        Console.WriteLine($"{"OrderBy + First",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(Subquery, time);
        Console.WriteLine($"{"Subquery",-19}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(TwoQueries, time);
        Console.WriteLine($"{"Two queries",-18}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(Custom, time);
        Console.WriteLine($"{"Custom query operators",-14}{result.avg,-28}{result.min,-28}{result.max,-28}");

        Console.ReadKey();
    }

    private static void BooksForPerformance()
    {
        var rndBooks = new Random(123);
        var rndPublishers = new Random(123);
        var publisherCount = SampleData.Publishers.Count();

        var result = new List<Book>();
        for (int i = 0; i < 1000000; i++)
        {
            var publisher = SampleData.Publishers.Skip(rndPublishers.Next(publisherCount)).First();
            var pageCount = rndBooks.Next(1000);
            result.Add(new Book
            {
                Title = pageCount.ToString(),
                PageCount = pageCount,
                Publisher = publisher
            });
        }

        SampleData.Books = result.ToArray();
    }

    /// <summary>
    ///The first method
    /// </summary>
    /// <returns></returns>
    static void Foreach()
    {
        var books = SampleData.Books;
        Book maxBook = null;
        foreach (var book in books)
        {
            if (maxBook == null || book.PageCount > maxBook.PageCount)
            {
                maxBook = book;
            }
        }
    }

    /// <summary>
    ///The second method
    /// </summary>
    static void OrderByAndFirst()
    {
        var books = SampleData.Books;
        var sortedList = from book in books
                         orderby book.PageCount descending
                         select book;
        var maxBook = sortedList.First();
    }

    /// <summary>
    ///The third method
    /// </summary>
    static void Subquery()
    {
        var books = SampleData.Books;
        var maxList = from book in books
                      where book.PageCount == books.Max(b => b.PageCount)
                      select book;
        var maxBook = maxList.First();
    }

    /// <summary>
    ///The fourth method
    /// </summary>
    static void TwoQueries()
    {
        var books = SampleData.Books;
        var maxPageCount = books.Max(book => book.PageCount);
        var maxList = from book in books
                      where book.PageCount == maxPageCount
                      select book;
        var maxBook = maxList.First();
    }

    /// <summary>
    ///The fifth method
    /// </summary>
    static void Custom()
    {
        var books = SampleData.Books;
        var maxBook = books.MaxElement(book => book.PageCount);
    }

    /// <summary>
    ///Testing
    /// </summary>
    /// <param name="action"></param>
    /// <param name="time"></param>
    /// <returns></returns>
    static (double avg, long max, long min) Test(Action action, int time)
    {
        List<long> times = new List<long>();
        Stopwatch stopwatch = new Stopwatch();

        for (int i = 0; i < time; i++)
        {
            stopwatch.Start();
            action();
            stopwatch.Stop();
            times.Add(stopwatch.ElapsedMilliseconds);
            stopwatch.Reset();
        }

        return (times.Average(), times.Max(), times.Min());
    }

    public static TElement MaxElement<TElement, TData>(
        this IEnumerable<TElement> source,
        Func<TElement, TData> selector)
        where TData : IComparable<TData>
    {
        if (source == null)
            throw new ArgumentNullException("source");
        if (selector == null)
            throw new ArgumentNullException("selector");

        Boolean firstElement = true;
        TElement result = default(TElement);
        TData maxValue = default(TData);
        foreach (TElement element in source)
        {
            var candidate = selector(element);
            if (firstElement || (candidate.CompareTo(maxValue) > 0))
            {
                firstElement = false;
                maxValue = candidate;
                result = element;
            }
        }
        return result;
    }
}

From the above statistics, we can see that the performance difference between different approaches is very large. Therefore, before using LINQ query, we must carefully consider! Generally speaking, the efficiency of traversing a set only once is much higher than other methods. Although compared with the traditional, non LINQ way, the efficiency of custom query operator is not the best, but it is still far ahead of other methods. So you can choose whether to use this custom query operator or go back to the traditional foreach solution according to your preference. In my opinion, although the custom query operator has some performance overhead, it is obviously a more elegant solution in the context of LINQ.

What has been learned

The first thing to note is the complexity of LINQ to Objects queries. Because most of our operations are time-consuming loop traversal, we should optimize them as much as possible to save CPU resources. Try not to traverse the same collection multiple times, because this is obviously not an efficient operation. In other words, no one wants to count the number of chocolates on a cookie over and over again. Your goal is to find the cookie as soon as possible, so you can start the next step as soon as possible.

We also need to consider the context in which the query will be executed. For example, for the same query, the efficiency of execution in the context of LINQ to Objects and LINQ to SQL can vary greatly. Because LINQ to SQL will be limited by the SQL language itself, and it needs to interpret query statements in its own way.

The conclusion is that LINQ to Objects must be used wisely. Also know that LINQ to Objects is not the final solution to all problems. In some cases, the traditional method may be better, such as using the foreach loop directly. In other cases, although LINQ can also be used, you may need to create custom query operators to improve execution efficiency. There is a philosophy in python that Python code is simple, readable, and maintainable, while performance optimization should all be implemented in C + +. The corresponding LINQ philosophy is to write all the codes with LINQ method, and encapsulate all the optimized parts into the custom query operators.

The cost of using LINQ to Objects

LINQ to Objects brings amazing code simplicity and readability. As a comparison, the traditional operation set code is lengthy and complicated. Here are some reasons not to use LINQ. Of course, it's not really about not using LINQ, it's about letting you know the performance overhead of LINQ.

One of the simplest queries provided by LINQ is filtering, as shown in the following code:

var query = from book in SampleData.Books
            where book.PageCount > 500
            select book;

The above operations can also be implemented using traditional methods. The following code shows the implementation of foreach:

var books = new List<Book>();
foreach (var book in SampleData.Books)
{
    if (book.PageCount > 500)
    {
        books.Add(book);
    }
}

The following code uses the for loop

var books = new List<Book>();
for (int i = 0; i < SampleData.Books.Length; i++)
{
    var book = SampleData.Books[i];
    if (book.PageCount > 500)
    {
        books.Add(book);
    }
}

The following code uses the list < T >. Findall method:

var books = SampleData.Books.ToList().FindAll(book => book.PageCount > 500);

Although there are other ways to achieve this, the main purpose here is not to list them one by one. To compare the performance of each approach, we randomly created a collection of one million objects. The following table shows the statistical results of 20 runs in Release mode:

Method average time (MS) minimum time (MS) maximum time (MS)
foreach               18.45                       13                          55
for                   15.2                        9                           63
List<T>.FindAll       14.15                       11                          63
LINQ                  27.05                       20                          77

Surprised? Or some disappointment? LINQ to Objects seems to be a lot slower than other methods! However, don't give up LINQ immediately. Look at the following tests before making a decision.

First, the test results are all based on the same query. If the query is slightly modified, what will be the result? For example, modify the condition in the where clause to change the comparison integer field PageCount to the comparison string field Tit1e:

var result = (from book in books
                where book.Title.StartsWith("l")
                select book).ToList();

Modify the other test code in the same way and run it 20 times again. The results will be as follows:

Method average time (MS) minimum time (MS) maximum time (MS)
foreach               144.3                       136                         177
for                   134.55                      125                         156
List<T>.FindAll       136.45                      131                         161
LINQ                  148.4                       136                         193

The test code is as follows:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using LinqInAction.LinqBooks.Common;

static class Demo
{
    public static void Main()
    {
        var books = BooksForPerformance();

        Console.WriteLine("{0,-20}{1,-20}{2,-20}{3,-20}", "method", "Average time (MS)", "Minimum time (MS)", "Maximum time (MS)");

        var time = 20;
        var result = Test(Foreach, books, time);
        Console.WriteLine($"{"foreach",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(For, books, time);
        Console.WriteLine($"{"for",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(FindAll, books, time);
        Console.WriteLine($"{"List<T>.FindAll",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(Linq, books, time);
        Console.WriteLine($"{"LINQ",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");



        Console.ReadKey();
    }

    private static List<Book> BooksForPerformance()
    {
        var rndBooks = new Random(123);
        var rndPublishers = new Random(123);
        var publisherCount = SampleData.Publishers.Count();

        var result = new List<Book>();
        for (int i = 0; i < 1000000; i++)
        {
            var publisher = SampleData.Publishers.Skip(rndPublishers.Next(publisherCount)).First();
            var pageCount = rndBooks.Next(1000);
            result.Add(new Book
            {
                Title = pageCount.ToString(),
                PageCount = pageCount,
                Publisher = publisher
            });
        }

        return result;
    }

    /// <summary>
    ///The first method
    /// </summary>
    /// <returns></returns>
    static void Foreach(List<Book> books)
    {
        var result = new List<Book>();
        foreach (var book in books)
        {
            if (book.Title.StartsWith("l"))
            {
                result.Add(book);
            }
        }
    }

    /// <summary>
    ///The second method
    /// </summary>
    static void For(List<Book> books)
    {
        var result = new List<Book>();
        for (int i = 0; i < books.Count; i++)
        {
            var book = books[i];
            if (book.Title.StartsWith("l"))
            {
                result.Add(book);
            }
        }
    }

    /// <summary>
    ///The third method
    /// </summary>
    static void FindAll(List<Book> books)
    {
        var result = books.FindAll(book => book.Title.StartsWith("l"));
    }

    /// <summary>
    ///The fourth method
    /// </summary>
    static void Linq(List<Book> books)
    {
        var result = (from book in books
                      where book.Title.StartsWith("l")
                      select book).ToList();
    }


    /// <summary>
    ///Testing
    /// </summary>
    /// <param name="action"></param>
    /// <param name="books"></param>
    /// <param name="time"></param>
    /// <returns></returns>
    static (double avg, long max, long min) Test(Action<List<Book>> action, List<Book> books, int time)
    {
        List<long> times = new List<long>();
        Stopwatch stopwatch = new Stopwatch();

        for (int i = 0; i < time; i++)
        {
            stopwatch.Start();
            action(books);
            stopwatch.Stop();
            times.Add(stopwatch.ElapsedMilliseconds);
            stopwatch.Reset();
        }

        return (times.Average(), times.Max(), times.Min());
    }
}

LINQ takes about five times longer than the previous example to compare integer values. This is because string operations are more time-consuming than numeric operations. But the most interesting thing is that this time LINQ is only a little slower than the fastest. The results of the two comparisons clearly show that some additional performance overhead caused by LINQ does not necessarily become the bottleneck of program efficiency.

But why are the two tests so different? When we change the comparison condition in the where clause from integer to string, we actually increase the execution time of each piece of code accordingly. This extra time will be applied to all test code, but the performance overhead of LINQ is always kept at a relatively constant level. Therefore, we can think that the less operations performed in the query, the greater the performance overhead of LINQ.

It's not surprising - everything has its advantages and disadvantages, and LINQ doesn't just bring benefits. LINQ requires some extra work, such as creating objects and a higher dependency on the garbage collector. This extra work makes LINQ's execution efficiency greatly dependent on the query to be executed. Sometimes efficiency may be reduced by only 5%, while sometimes it may be reduced by 500%.

The conclusion is, don't be afraid to use LINQ, but be careful when using it. For some simple and frequent operations, perhaps the traditional method is more suitable for some. For simple filtering or search operations, we can still use list < T > and array built-in support, such as FindAll, foreach, Find, ConvertAll and TrueForAll. Of course, where LINQ will have a huge performance impact, we can use traditional foreach or for loops instead. For queries that are not executed very frequently, you can use LINQ to Objects with confidence. For those operations that are not very time sensitive, whether the execution time is 60 ms or 10 ms does not make any significant difference to the running of the program. Don't forget how readable and maintainable LINQ can be at the source level!

Performance and simplicity: can't fish and bear paw have both

As we have just seen, LINQ seems to give us a problem in balancing the performance of the code with the simplicity and clarity of the code. Let's look at an example program to prove or disprove this theory. This test will be grouped. LINQ query in the following code groups books by publishing house and sorts the grouped results by publishing house name.

var result = from book in books
                group book by book.Publisher.Name
    into publisherBooks
                orderby publisherBooks.Key
                select publisherBooks;

If LINQ is not used, the same function can be achieved with traditional methods:

var result = new SortedDictionary<string, List<Book>>();
foreach (var book in books)
{
    if (!result.TryGetValue(book.Publisher.Name, out var publisherBooks))
    {
        publisherBooks = new List<Book>();
        result[book.Publisher.Name] = publisherBooks;
    }
    publisherBooks.Add(book);
}

Results of 20 runs:

Method average time (MS) minimum time (MS) maximum time (MS)
LINQ                  61.85                       46                          124
Foreach               421.45                      391                         505

Test code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using LinqInAction.LinqBooks.Common;

static class Demo
{
    public static void Main()
    {
        var books = BooksForPerformance();

        Console.WriteLine("{0,-20}{1,-20}{2,-20}{3,-20}", "method", "Average time (MS)", "Minimum time (MS)", "Maximum time (MS)");

        var time = 20;
        var result = Test(Linq, books, time);
        Console.WriteLine($"{"LINQ",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        result = Test(Foreach, books, time);
        Console.WriteLine($"{"Foreach",-22}{result.avg,-28}{result.min,-28}{result.max,-28}");

        Console.ReadKey();
    }

    private static List<Book> BooksForPerformance()
    {
        var rndBooks = new Random(123);
        var rndPublishers = new Random(123);
        var publisherCount = SampleData.Publishers.Count();

        var result = new List<Book>();
        for (int i = 0; i < 1000000; i++)
        {
            var publisher = SampleData.Publishers.Skip(rndPublishers.Next(publisherCount)).First();
            var pageCount = rndBooks.Next(1000);
            result.Add(new Book
            {
                Title = pageCount.ToString(),
                PageCount = pageCount,
                Publisher = publisher
            });
        }

        return result;
    }

    /// <summary>
    ///The first method
    /// </summary>
    /// <returns></returns>
    static void Linq(List<Book> books)
    {
        var result = (from book in books
                      group book by book.Publisher.Name
            into publisherBooks
                      orderby publisherBooks.Key
                      select publisherBooks).ToList();
    }

    /// <summary>
    ///The second method
    /// </summary>
    static void Foreach(List<Book> books)
    {
        var result = new SortedDictionary<string, List<Book>>();
        foreach (var book in books)
        {
            if (!result.TryGetValue(book.Publisher.Name, out var publisherBooks))
            {
                publisherBooks = new List<Book>();
                result[book.Publisher.Name] = publisherBooks;
            }
            publisherBooks.Add(book);
        }
    }

    /// <summary>
    ///Testing
    /// </summary>
    /// <param name="action"></param>
    /// <param name="books"></param>
    /// <param name="time"></param>
    /// <returns></returns>
    static (double avg, long max, long min) Test(Action<List<Book>> action, List<Book> books, int time)
    {
        List<long> times = new List<long>();
        Stopwatch stopwatch = new Stopwatch();

        for (int i = 0; i < time; i++)
        {
            stopwatch.Start();
            action(books);
            stopwatch.Stop();
            times.Add(stopwatch.ElapsedMilliseconds);
            stopwatch.Reset();
        }

        return (times.Average(), times.Max(), times.Min());
    }
}

There is no doubt that the code of traditional methods is longer and more complex. Although it's not too hard to understand, if there is a further demand for functions, you can imagine that this code will be longer and more complex. The LINQ version is always simple!

The main difference between the two pieces of code is that they use two completely different ideas. LINQ version uses declarative methods, while the traditional version is implemented by a series of commands. Before LINQ, the code in C is imperative, because the language itself is so. The imperative code gives a detailed description of the complete steps required to perform certain operations. LINQ's declarative method only describes the expected results, and does not care about the specific implementation process. Different from the detailed description of the implementation steps, LINQ code is more like a direct definition of the results. This is the core difference between the two!

As I said before, you should have been convinced of all the conveniences brought by LINQ. So what will this new example program prove? The answer is, if you test the execution efficiency of these two methods, you will see that LINQ version is faster than traditional code!

Of course, you may have doubts about the results, but we will leave the research work to you. Here we want to say: if you want to get the same execution efficiency as LINQ in traditional code, you may need to continue to write more complex code.

From the perspective of memory usage and insertion time, SortedDictionary is a relatively inefficient data structure. In addition, we used TryGetValue in every loop. And LINQ operator can handle it more effectively
Such scenes. Of course, this non LINQ version of the code also has room for performance improvement, but it will also bring more complexity.

Original link: https://www.vinanysoft.com/c-sharp/linq-performance-analysis/

Posted by TreColl on Sun, 31 May 2020 21:06:09 -0700