LINQ learning note 4: LINQ to OBJECT operation file directory

Keywords: C# snapshot Windows

This note is excerpted from: https://www.cnblogs.com/liqingwen/p/5816051.html Record the learning process for future reference.

Many file system operations are essentially queries, so LINQ is a good way to use them.

I. query files with specified attributes or names

This example shows how to find all files in the specified directory tree with the specified file extension (for example,. txt), and how to return the latest or oldest files in the tree based on the creation time.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Query for a file with the specified property or name
            //File path
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            //Take a snapshot of the file system
            var dir = new DirectoryInfo(path);
            //This method assumes that the application has search permission for all folders under the specified path
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);

            //Create query
            var qurey = from file in files
                            where file.Extension == ".txt"
                            orderby file.Name
                            select file;

            //Execution query
            foreach (var file in qurey)
            {
                Console.WriteLine(file.FullName);
            }

            //Create and execute a new query by querying the creation time of the old file as a starting point.
            //Last: Select the last one, because it is in ascending order of date, so the latest one points to the last one.
            var newestFile = (from file in qurey
                              orderby file.CreationTime
                              select new { file.FullName, file.CreationTime }).Last();

            Console.WriteLine($"\r\nThe newest .txt file is {newestFile.FullName}. Creation time: {newestFile.CreationTime}");
            Console.Read();
            #endregion
        }
    }

The operation results are as follows:

2. Group files by extension

This example demonstrates how to use LINQ to perform advanced grouping and sorting operations on a list of files or folders. In addition, it demonstrates how to use skip < tSource > and take < tSource > methods for console windows

The output in the interface is paged.

The following query shows how to group the contents of a specified directory tree by file extension.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Group files by extension
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\Common7\";
            //"path"Length of, which is later used to be removed at output“ path"This prefix.
            var trimLength = path.Length;
            //Take a snapshot of the file system
            var dir = new DirectoryInfo(path);
            //This method assumes that the application has search permission for all folders under the specified path.
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);

            //Create query
            var query = from file in files
                        group file by file.Extension.ToLower() into fileGroup
                        orderby fileGroup.Key
                        select fileGroup;

            //Show one group at a time. If the number of rows of the list entity is greater than the number of rows in the console window, the output is paginated. 
            PageOutput(trimLength, query);
            #endregion
        }

        /// <summary>
        /// Paging output
        /// </summary>
        /// <param name="rootLength"></param>
        /// <param name="query"></param>
        private static void PageOutput(int rootLength, IOrderedEnumerable<IGrouping<string, FileInfo>> query)
        {
            //Flag to jump out of paging loop
            var isAgain = true;
            //Height of console output
            var numLines = Console.WindowHeight - 3;

            //Traversal group set
            foreach (var g in query)
            {
                var currentLine = 0;

                do
                {
                    Console.Clear();
                    Console.WriteLine(string.IsNullOrEmpty(g.Key) ? "[None]" : g.Key);

                    //From " currentLine"Start display“ numLines"Number of bars
                    var resultPage = g.Skip(currentLine).Take(numLines);

                    //Execution query
                    foreach (var info in resultPage)
                    {
                        Console.WriteLine("\t{0}", info.FullName.Substring(rootLength));
                    }

                    //Record output lines
                    currentLine += numLines;
                    Console.WriteLine("Click "any key" to continue, press“ End"Key exit");

                    //Choose whether to jump out for users
                    var key = Console.ReadKey().Key;
                    if (key != ConsoleKey.End) continue;

                    isAgain = false;
                    break;
                } while (currentLine < g.Count());

                if (!isAgain)
                {
                    break;
                }
            }
        }
    }

The operation results are as follows:

3. Query the total number of bytes in a group of folders

This example shows how to retrieve the total number of bytes used by all files in a specified folder and all its subfolders.

The Sum method adds the value of all items selected in the select clause. You can easily modify this query to retrieve the largest or smallest file in the specified directory tree by calling min < tSource > or

Max < tSource > method, not Sum.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Queries the total number of bytes in a set of folders
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);
            var query = from file in files
                        select file.Length;

            //Cache results to avoid multiple access to the file system
            var fileLengths = query as long[] ?? query.ToArray();
            //Returns the size of the largest file 
            var largestLength = fileLengths.Max();
            //Returns the total number of bytes in all files under the specified folder
            var totalBytes = fileLengths.Sum();
            Console.WriteLine();

            Console.WriteLine("There are {0} bytes in {1} files under {2}", totalBytes, files.Count(), path);
            Console.WriteLine("The largest files is {0} bytes.", largestLength);
            Console.Read();
            #endregion
        }
    }

The operation results are as follows:

IV. compare the contents of the two folders

This example demonstrates three ways to compare two file lists:

1. Query a Boolean value that specifies whether the two file lists are the same.

2. Query is used to retrieve the intersection of files in two folders at the same time.

3. Query is used to retrieve the difference set of files in one folder but not in another.

    /// <summary>
    /// File name and byte comparison class
    /// </summary>
    public class FileComparer : IEqualityComparer<FileInfo>
    {
        public bool Equals(FileInfo x, FileInfo y)
        {
            return string.Equals(x.Name, y.Name, StringComparison.CurrentCultureIgnoreCase) && x.Length == y.Length;
        }

        //Returns a standard hash value. according to IEqualityComparer Rule, if equal, then the hash value must also be equal.
        //Because the equality defined here is just a simple value equality, not a reference identity, it is possible that two or more objects will produce the same hash value. 
        public int GetHashCode(FileInfo obj)
        {
            var s = string.Format("{0}{1}", obj.Name, obj.Length);
            return s.GetHashCode();
        }
    }

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Queries the total number of bytes in a set of folders
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);
            var query = from file in files
                        select file.Length;

            //Cache results to avoid multiple access to the file system
            var fileLengths = query as long[] ?? query.ToArray();
            //Returns the size of the largest file 
            var largestLength = fileLengths.Max();
            //Returns the total number of bytes in all files under the specified folder
            var totalBytes = fileLengths.Sum();
            Console.WriteLine();

            Console.WriteLine("There are {0} bytes in {1} files under {2}", totalBytes, files.Count(), path);
            Console.WriteLine("The largest files is {0} bytes.", largestLength);
            Console.Read();
            #endregion
        }
    }

The operation results are as follows:

The FileComparer class shown here demonstrates how to use a custom comparer class with standard query operators. This class is not designed to be used in actual scenarios, it just uses each

The name and length of the file, in bytes, determine whether the contents of each folder are the same. In practice, the comparator should be modified to perform more strict equality checking.

V. query the largest file in the directory tree

This example demonstrates five queries related to file size in bytes:

1. How to retrieve the maximum file size (in bytes).

2. How to retrieve the minimum file size (in bytes).

3. How to retrieve the maximum or minimum file of FileInfo object from one or more folders under the specified root folder.

4. How to retrieve a sequence, such as 10 largest files.

The following example contains five different queries that demonstrate how to query and group files based on file size in bytes. You can easily modify these examples to make the query base

Some other property on the FileInfo object.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Query the largest file in the directory tree
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);
            var query1 = from file in files
                         select file.Length;

            //Returns the size of the largest file
            var maxSize = query1.Max();
            Console.WriteLine("The length of the largest file under {0} is {1}", path, maxSize);
            Console.WriteLine();

            //Reverse order
            var query2 = from file in files
                         let len = file.Length
                         where len > 0
                         orderby len descending
                         select file;

            var fileInfos = query2 as FileInfo[] ?? query2.ToArray();
            //The first in reverse order is the largest file
            var longestFile = fileInfos.First();
            //The first file in reverse order is the smallest file
            var smallestFile = fileInfos.Last();

            Console.WriteLine("The largest file under {0} is {1} with a length of {2} bytes", 
                path, longestFile.FullName, longestFile.Length);
            Console.WriteLine();

            Console.WriteLine("The smallest file under {0} is {1} with a length of {2} bytes", 
                path, smallestFile.FullName, smallestFile.Length);
            Console.WriteLine();

            Console.WriteLine("===== The 10 largest files under {0} are: =====", path);

            //Return to the top 10 largest files
            var queryTenLargest = fileInfos.Take(10);
            foreach (var file in queryTenLargest)
            {
                Console.WriteLine("{0}: {1} bytes", file.FullName, file.Length);
            }
            Console.Read();
            #endregion
        }
    }

The operation results are as follows:

To return one or more complete FileInfo objects, the query must first examine each object in the data source and then sort them by the value of their Length property so that

Returns a single object or sequence with the maximum length. Use first < tSource > to return the first element in the list use take < tSource > to return the first n elements.

6. Query duplicate files in the directory tree

Sometimes, files with the same name may exist in multiple folders. For example, in the Visual Studio installation folder, there are multiple folders that contain the readme.htm file.

This example shows how to query for duplicate file names in the specified root folder.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Example 1 of querying duplicate files in the directory tree
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);
            var charsToSkip = path.Length;

            var queryDupNames = (from file in files
                                 group file.FullName.Substring(charsToSkip) by file.Name into fileGroup
                                 where fileGroup.Count() > 1
                                 select fileGroup).Distinct();

            PageOutput(queryDupNames);
            #endregion
        }

        /// <summary>
        /// Paging output
        /// </summary>
        /// <typeparam name="TK"></typeparam>
        /// <typeparam name="TV"></typeparam>
        /// <param name="queryDupNames"></param>
        private static void PageOutput<TK, TV>(IEnumerable<IGrouping<TK, TV>> queryDupNames)
        {
            //Height of console output
            var numLines = Console.WindowHeight - 3;

            var dupNames = queryDupNames as IGrouping<TK, TV>[] ?? queryDupNames.ToArray();
            foreach (var queryDupName in dupNames)
            {
                //Paging start
                var currentLine = 0;

                do
                {
                    Console.Clear();
                    Console.WriteLine("Filename = {0}", queryDupName.Key.ToString() == string.Empty ? "[none]" : queryDupName.Key.ToString());

                    //skip currentLine OK, take numLines That's ok.
                    var resultPage = queryDupName.Skip(currentLine).Take(numLines);

                    foreach (var fileName in resultPage)
                    {
                        Console.WriteLine("\t{0}", fileName);
                    }

                    //Incrementer records the number of rows displayed
                    currentLine += numLines;

                    //Press a little tired, let it automatically next page.
                    Thread.Sleep(100);

                } while (currentLine < queryDupName.Count());
            }
        }
    }

The operation results are as follows:

This example shows how to query files whose size and creation time also match.

    /// <summary>
    /// PortableKey class
    /// </summary>
    public class PortableKey
    {
        public string Name { get; set; }
        public DateTime CreationTime { get; set; }
        public double Length { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Example 2 of querying duplicate files in the directory tree
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);
            var charsToSkip = path.Length;

            //Note the use of a composite key. Files with three attributes matching belong to the same group.
            //Anonymous types can also be used for compound keys, but cannot cross method boundaries. 
            var queryDupFiles = from file in files
                                group file.FullName.Substring(charsToSkip) by
                                    new PortableKey() { Name = file.Name, CreationTime = file.CreationTime, Length = file.Length } into fileGroup
                                where fileGroup.Count() > 1
                                select fileGroup;

            var queryDupNames = queryDupFiles as IGrouping<PortableKey, string>[] ?? queryDupFiles.ToArray();
            var list = queryDupNames.ToList();
            var count = queryDupNames.Count();

            //Paging output
            PageOutput(queryDupNames);
            Console.Read();
            #endregion
        }

        /// <summary>
        /// Paging output
        /// </summary>
        /// <typeparam name="TK"></typeparam>
        /// <typeparam name="TV"></typeparam>
        /// <param name="queryDupNames"></param>
        private static void PageOutput<TK, TV>(IEnumerable<IGrouping<TK, TV>> queryDupNames)
        {
            //Height of console output
            var numLines = Console.WindowHeight - 3;

            var dupNames = queryDupNames as IGrouping<TK, TV>[] ?? queryDupNames.ToArray();
            foreach (var queryDupName in dupNames)
            {
                //Paging start
                var currentLine = 0;

                do
                {
                    Console.Clear();
                    Console.WriteLine("Filename = {0}", queryDupName.Key.ToString() == string.Empty ? "[none]" : queryDupName.Key.ToString());

                    //skip currentLine OK, take numLines That's ok.
                    var resultPage = queryDupName.Skip(currentLine).Take(numLines);

                    foreach (var fileName in resultPage)
                    {
                        Console.WriteLine("\t{0}", fileName);
                    }

                    //Incrementer records the number of rows displayed
                    currentLine += numLines;

                    //Press a little tired, let it automatically next page.
                    Thread.Sleep(100);

                } while (currentLine < queryDupName.Count());
            }
        }
    }

VII. Query the contents of files in folders

This example shows how to query all files in a specified directory tree, open each file, and check its contents. This type of technology can be used to index or reverse index the contents of a directory tree. This example

Although a simple string search is performed, a more complex type of pattern matching can be performed using regular expressions.

    class Program
    {
        static void Main(string[] args)
        {
            #region LINQ Querying the contents of a file in a folder
            const string path = @"C:\Program Files (x86)\Microsoft Visual Studio\2017\";
            var dir = new DirectoryInfo(path);
            var files = dir.GetFiles("*.*", SearchOption.AllDirectories);

            //String to match
            const string searchTerm = @"Visual Studio";
            //Search the contents of each file.
            //You can also replace with regular expressions Contains Method
            var queryMatchingFiles = from file in files
                                     where file.Extension == ".html"
                                     let content = GetFileConetnt(file.FullName)
                                     where content.Contains(searchTerm)
                                     select file.FullName;

            //Execution query
            Console.WriteLine("The term \"{0}\" was found in:", searchTerm);
            foreach (var filename in queryMatchingFiles)
            {
                Console.WriteLine(filename);
            }
            Console.Read();
            #endregion
        }

        /// <summary>
        /// Read all contents of the file
        /// </summary>
        /// <param name="fileName"></param>
        /// <returns></returns>
        static string GetFileConetnt(string fileName)
        {
            //If we have deleted the file after the snapshot, ignore it and return an empty string. 
            return File.Exists(fileName) ? File.ReadAllText(fileName) : "";
        }
    }

The operation results are as follows:

Posted by Chotor on Fri, 27 Dec 2019 05:52:12 -0800