Why is LINQ better than SQL

Keywords: SQL Database xml Programming

If you haven't been addicted to LINQ, you'll wonder what a fuss it is. SQL is not broken. Why do we need to fix it? Why do we need another query language?

The popular saying is that LINQ is integrated with C_# (or VB), thus eliminating the gap between programming language and database, and providing a single query interface for the combination of multiple data sources. Although these are facts, they are only part of the story. More importantly, LINQ is more effective than SQL in most cases when querying a database.

Compared with SQL, LINQ is simpler, cleaner and more advanced. It's more like comparing C# with C++. Really, although sometimes using C++ is still the best choice (for example, using SQL scenarios), in most scenarios, using modern clean language without having to operate on the underlying details is a big victory.

SQL is a very old language - it was invented in 1974. Despite numerous extensions, they have never been redesigned. This makes it a bit confusing - unlike VB6 or Visual FoxPro. You may have slowly become accustomed to it so that you don't see any mistakes or omissions!

Let's look at an example. You want to write a simple query to get customer data, as follows:

SELECT UPPER(Name)
FROM Customer
WHERE Name LIKE 'A%'
ORDER BY Name

Now let's assume that we want to provide the data in the result set to a web page, and we want to get rows 21 to 30. So we need a subquery:

SELECT UPPER(Name) FROM
(
   SELECT *, RN = row_number()
   OVER (ORDER BY Name)
   FROM Customer
   WHERE Name LIKE 'A%'
) A
WHERE RN BETWEEN 21 AND 30
ORDER BY Name

And if you need older databases that support versions (prior to SQL Server 2005), things will get worse:

SELECT TOP 10 UPPER (c1.Name)
FROM Customer c1
WHERE
   c1.Name LIKE 'A%'
   AND c1.ID NOT IN
   (
      SELECT TOP 20 c2.ID
      FROM Customer c2
      WHERE c2.Name LIKE 'A%'
      ORDER BY c2.Name
   ) 
ORDER BY c1.Name

This is not only complicated and confusing, but also against the DRY principle. The following is to use LINQ to implement the same query function. Obviously, simplicity is better:

var query =
   from c in db.Customers
   where c.Name.StartsWith ("A")
   orderby c.Name
   select c.Name.ToUpper();

var thirdPage = query.Skip(20).Take(10);

Only when we enumerate thirdPage will the query actually execute. In a scenario from LINQ to SQL or Entity Framework, the translation engine converts the query (which we combined in two steps) into an SQL statement optimized for the database server to which it connects.

Composability

You may have noticed another more subtle (subtle but significant) benefit of LINQ. We chose two query steps in the combination:

IQueryable<T> Paginate<T> (this IQueryable<T> query, int skip, int take)
{
   return query.Skip(skip).Take(take);
}

We can do this:

var query = ...
var thirdPage = query.Paginate (20, 10);

More importantly, we can make arbitrary paging queries here. In other words, through LINQ you can break queries into parts and reuse them in your application.

union

Another advantage of LINQ is that you can query relationships without JOIN. For example, we want to list all customers who shop for $1000 or more and live in Washington. We will assume that purchasing is project-oriented (that is, the classic purchasing/project purchasing scenario) and that cash sales (without customer records) are included. This requires queries between four tables (Purchase, Customer, Address and PurchaseItem). With LINQ, such queries are effortless:

from p in db.Purchases
where p.Customer.Address.State == "WA" || p.Customer == null
where p.PurchaseItems.Sum (pi => pi.SaleAmount) > 1000
select p

Compare this with the same function of SQL:

SELECT p.*
FROM Purchase p
    LEFT OUTER JOIN 
        Customer c INNER JOIN Address a ON c.AddressID = a.ID
    ON p.CustomerID = c.ID	
WHERE
   (a.State = 'WA' || p.CustomerID IS NULL)
    AND p.ID in
    (
        SELECT PurchaseID FROM PurchaseItem
        GROUP BY PurchaseID HAVING SUM (SaleAmount) > 1000
    )

To expand this example further, suppose we want to reverse the result set by price and display the salesperson's name and the number of items purchased in the final projection. We can naturally express the query conditions of these attachments without repeating them:

from p in db.Purchases
where p.Customer.Address.State == "WA" || p.Customer == null
let purchaseValue = p.PurchaseItems.Sum (pi => pi.SaleAmount)
where purchaseValue > 1000
orderby purchaseValue descending
select new
{
   p.Description,
   p.Customer.SalesPerson.Name,
   PurchaseItemCount = p.PurchaseItems.Count()
}

Here's how to use SQL

SELECT 
    p.Description,
    s.Name,
    (SELECT COUNT(*) FROM PurchaseItem pi WHERE p.ID = pi.PurchaseID) PurchaseItemCount	
FROM Purchase p
    LEFT OUTER JOIN 
        Customer c 
            INNER JOIN Address a ON c.AddressID = a.ID
            LEFT OUTER JOIN SalesPerson s ON c.SalesPersonID = s.ID
    ON p.CustomerID = c.ID	
WHERE
    (a.State = 'WA' OR p.CustomerID IS NULL)
    AND p.ID in
    (
        SELECT PurchaseID FROM PurchaseItem
        GROUP BY PurchaseID HAVING SUM (SaleAmount) > 1000
    )
ORDER BY
    (SELECT SUM (SaleAmount) FROM PurchaseItem pi WHERE p.ID = pi.PurchaseID) DESC

The interesting thing is that the above SQL queries can be converted back to LINQ, and every query generated will be duplicated foolishly. Queries like this (usually non-working versions) are often posted in forums - the result of thinking in SQL rather than LINQ. It's like complaining about GOTO's clumsy grammar when converting Fortran programs into C# 6.

Data trimming

Selecting data from multiple tables in query Federation - the end result will be a flat tuple in units of behavior. If you've been using SQL for years, you probably don't think it's going to happen to you -- it causes data duplication, which makes the result set unusable on the client side. So when it happens, it's often hard to accept. In contrast, LINQ allows you to retrieve restored hierarchical data. This avoids duplication, makes the result set easy to process, and in most cases eliminates the need for joint operations. For example, suppose we want to extract a group of customers, and each record carries their high-value transactions. With LINQ, you can do this:

from c in db.Customers
where c.Address.State == "WA"
select new
{
   c.Name,
   c.CustomerNumber,
   HighValuePurchases = c.Purchases.Where (p => p.Price > 1000)
}

High Value Purchases, here is a collection. Since we are querying a related attribute, we do not need to join. So the details of whether this is an internal or an external Union are well abstracted. In this case, when translated into SQL, it may be an external union: LINQ does not exclude rows because a subset returns zero elements. If we want something that can be translated into an internal union, we can do this:

from c in db.Customers
where c.Address.State == "WA"
let HighValuePurchases = c.Purchases.Where (p => p.Price > 1000)where HighValuePurchases.Any()select new
{
   c.Name,
   c.CustomerNumber,
   HighValuePurchases
}

LINQ also supports out-of-plane union, self-union, group query and other different types of queries through a rich set of operators.

Parameterization

If we want to parameterize the previous example, does the "WA" state come from a variable? In fact, we just need to do the following:

string state = "WA";

var query =
   from c in db.Customers
   where c.Address.State == state
   ...

It does not confuse the parameters above the DbCommand object or worry about SQL injection attacks. LINQ parameterization is inline, type-safe and highly readable. It not only solves the problem - it also solves it very well.

Because LINQ queries can be combined, we can conditionally add predicates. For example, we write a method as follows:

IQueryable<Customer> GetCustomers (string state, decimal? minPurchase)
{
    var query = Customers.AsQueryable();
    
    if (state != null)
        query = query.Where (c => c.Address.State == state);
    
    if (minPurchase != null)
        query = query.Where (c => c.Purchases.Any (p => p.Price > minPurchase.Value));
    
    return query;
}

If we call this method with empty state and minPurchase values, the following SQL will be generated when we enumerate the result set:

SELECT [t0].[ID], [t0].[Name], [t0].[AddressID]
FROM [Customer] AS [t0]

However, if we specify the values of state and minPurchase, LINQ to SQL not only adds predicates to queries, but also necessary joint statements:

SELECT [t0].[ID], [t0].[Name], [t0].[AddressID]
FROM [Customer] AS [t0]
LEFT OUTER JOIN [Address] AS [t1] ON [t1].[ID] = [t0].[AddressID]
WHERE (EXISTS(
    SELECT NULL AS [EMPTY]
    FROM [Purchase] AS [t2]
    WHERE ([t2].[Price] > @p0) AND ([t2].[CustomerID] = [t0].[ID])
    )) AND ([t1].[State] = @p1)

Because our method returns an IQueryable, queries are not actually translated into SQL and executed before enumeration. This gives the call the opportunity to add further predicates, pagination, custom projections, and so on.

Static type security

In previous queries, if we declared the state variable as an integer rather than a string, then the query might report an error at compile time rather than at run time. This also applies to the mistake of table or column names. This has a real benefit in refactoring: if you don't complete the work at hand, the compiler will prompt you.

Client Processing

LINQ allows you to easily transfer parts of the query to the client for processing. For database servers with heavy load, this can actually improve performance. As long as you don't get more data than you need (in other words, you still have to filter on the server), you can often transfer the pressure of reordering, transforming, and reorganizing the result set to less-loaded application servers. With LINQ, all you need to do is move AsEnumerable() to the query, and everything since that point can be executed locally.

When to query the database without LINQ

Although LINQ is powerful, it cannot replace SQL. It can meet more than 95% of the requirements, but sometimes you still need SQL:

  • Queries that need to be manually adjusted (especially when optimization and locking hints are required);

  • Some involve queries that need to select temporary tables and then query those tables.

  • Predicted updates and batch insertion operations.

And when you use triggers, you still need SQL. Although such things are not often needed when using LINQ, SQL is essential when using stored procedures and functions. You can combine SQL with LINQ by writing table-valued functions in SQL, and then call these functions in more complex LINQ queries.

Understanding the two query languages is not a problem, because anyway you want to learn LINQ - LINQ is very useful for querying local collections and XML DOM. If you still use the old Xml Document-based DOM, LINQ to XML DOM operations can be a dramatic improvement.

Also, LINQ is easier to master than SQL, so if you want to write a good query, using LINQ will be better than SQL.

Posted by kimbeejo on Mon, 08 Apr 2019 00:06:32 -0700