File directory operation instance 1

Keywords: Python

File directory operation example 1 - traversal

The practical operations available in this example are: making file and directory management program, disk occupancy management, batch copying, moving and compressing files and directories, etc.
The basic implementation uses get to recursion and os.walk; The extended implementation will get to os.stat to obtain the file status.

Basic implementation: traverse a directory and express the ownership of all files and directories.

Example:
Operation object: folder a contains folder b and file c, and folder b contains folder d and file e.
Operation results: {files': [], 'a': {files': ['c'], 'b': {files': ['e'],'d ': {files': []}.

  • Idea 1: get all files and all directories, divide paths, and add directories and files through recursive paths. (four step operation)
    1. Get all subdirectory paths
    2. Get all file paths
    3. Create all subdirectories based on the subdirectory path
    4. Put all sub files into the subdirectory according to the sub file path
  • Idea 2: recursively traverse the directory to determine whether the path is a directory or a file. If it is a directory, recurse the directory again. (recursive operation)

Extended implementation: obtain attributes such as the size of files and directories.

Example:
Operation object: folder a contains folder b and file c, and folder b contains folder d and file e.
Operation result: {inode protection mode ":" the [owner] has the following permissions: ['read ',' write ',' execute]; [user group] has the following permissions: ['read ',' write ',' execute]; [other user] has the following permissions: ['read ',' write ',' execute]; " , inode node number: 1111111111111111, inode resident device: 1111111111, inode link number: 1, owner user ID: 0, owner group ID: 0, allocation unit size: 0.000 b, last access time: 2021-11-11 11 11:11, last modified time: 2021-11-11 11 11:11, creation time: '2021-11-11 11:11', 'number of folders': 3,' number of files': 2, 'total directory size': 0.000 b '}.

  • Idea: get the file size and number recursively through the basic implementation to become the attribute of the directory; First construct the function to obtain a single file, and then construct the function to obtain a directory.

Basic implementation - idea 1

Basic preparation

Import related libraries first:

import os  # One of the system libraries used here to handle file, directory, and path operations
import json  # json library, used here to format the print dictionary
import time  # Time library, used here to process time objects
  • os.walk(top, topdown=True, onerror=None, followlinks=False)
    • Traverse the specified directory, including all subdirectories. The required parameter is top (directory path to traverse).
    • topdown=True traverses from the root directory, and topdown=False traverses from the deepest subdirectory to the root directory.
    • onerror: a callable object is required. When an exception is required, it will be called.
    • followlinks=True will traverse the directory actually referred to by the shortcut in the directory (soft connection in linux), and followlinks=False will ignore the shortcut.
    • Return structure: (current subdirectory path, list of folders contained in the current directory, list of files contained in the current directory).
    • Supplementary note: when the specified directory is an absolute path, the returned current directory is also an absolute path; When the specified directory is a relative path, the current directory returned is also a relative path.
    • Follow the example to demonstrate relative directory traversal:
    • >>> import os
      >>> for a in os.walk('test'):
      ...     print(a)
      ('test', ['a'], [])
      ('test\\a', ['b'], ['c'])
      ('test\\a\\b', ['d'], ['e'])
      ('test\\a\\b\\d', [], [])
      
    • Example absolute directory traversal:
    • >>> import os
      >>> path = os.path.abspath('test')  # E:\Python\DEMO\Study\test
      >>> for a in os.walk(path):
      ...     print(a)
      ('E:\\Python\\DEMO\\Study\\test', ['a'], [])
      ('E:\\Python\\DEMO\\Study\\test\\a', ['b'], ['c'])
      ('E:\\Python\\DEMO\\Study\\test\\a\\b', ['d'], ['e'])
      ('E:\\Python\\DEMO\\Study\\test\\a\\b\\d', [], [])
      

Actual operation

First, define the dictionary for storing data, otherwise the data will be short-lived and disappear as soon as it is obtained.

path = os.path.abspath('practise')  # Get the complete path of the folder to be traversed, e: \ Python \ demo \ study \ practice on this computer
dirs = []  # Store all paths
files = []  # Store all documents
paths = {'files': []}  # Directory tree dictionary

The first and second steps are relatively simple, which are directly given here:

# Get all subdirectory paths
for a in os.walk(path):
    dirs.append(a[0])
# Get all file paths
for a in os.walk(path):
    if a[2]:
        for f in a[2]:
            files.append(f'{a[0]}\\{f}')

Next, create all subdirectories according to the subdirectory path. Only after all directories are created can we ensure that all subdirectories can be placed.
Since the obtained path may be absolute or relative, the original path prefix should be removed here to facilitate path segmentation into the subdirectory dictionary.

for d in dirs[1:]:
    temp = d.replace(f'{path}\\', '').split('\\')

Since os.walk() above uses topdown=True by default to traverse from the root directory, the parent directory of each subdirectory will be created by the previous subdirectory. Here to get the deepest dictionary.

    for p in range(len(temp)):
        temp2 = paths
        for t in temp[:p]:
            temp2 = temp2[t]

If the subdirectory is not in the dictionary, add a new subdirectory dictionary.

        if temp[p] not in temp2:
            temp2.update({temp[p]: {'files': []}})

Of course, adding subdirectories can also be done quickly by using eval function. eval(f'paths{temp[:p] or ""}'.replace(', '] [') completes the operation of dividing paths into subdirectory dictionaries. Temp2 [temp [P]] = {files': []} is not used above because there will be a yellow background warning. To look good, use dict.update() method:

        if temp[p] not in eval(f'paths{temp[:p] or ""}'.replace(', ', '][')):  # If the subdirectory is not in the dictionary, add
            eval(f'paths{temp[:p] or ""}'.replace(', ', ']['))[temp[p]] = {'files': []}

The next step is to put all the sub files into the sub directory according to the sub file path.
Similarly, since the obtained path may be an absolute path or a relative path, the original path prefix should be removed here.

for f in files:
    temp = f.replace(f'{path}\\', '').split('\\')

When there is only one in the list after path segmentation, it indicates that it is in the root directory, so the key value of files is added directly.

    if len(temp) == 1:
        paths['files'].append(temp[0])

The rest is consistent with the previous step to obtain the deepest dictionary.

    else:
        temp2 = paths
        for t in temp[:-1]:
            temp2 = temp2[t]
        temp2['files'].append(temp[-1])

Similarly, adding files can also be done quickly using the eval function:

    else:
        eval(f'paths{temp[:-1] or ""}'.replace(', ', ']['))['files'].append(temp[-1])

This is idea 1. Each step has a clear division of labor, and the entire directory is obtained by relying on the unique return method of os.walk().
Format and print the entire directory tree:

print(json.dumps(paths, indent=4, ensure_ascii=False))

Basic realization - idea 2

Basic preparation

  • os.listdir(path)
    • Traverse the specified directory to obtain all files and directories under the directory.
    • Return structure: list of files and directories. You can judge whether the target is a file or a folder through relevant methods.
    • Supplementary note: no matter whether the incoming parameter is an absolute path or not, the returned will only be the name of the file or folder itself, not the path.
    • Follow the example to demonstrate relative directory traversal:
    • >>> import os
      >>> path = os.path.abspath('test/a')
      >>> for a in os.listdir(path):
      ...     print(a)
      b
      c
      
  • os.path.exists()
    • Judge whether the target exists. If the file does not exist, return False; otherwise, return True.
  • os.path.isfile()
    • Judge whether the target is a file. If the file does not exist or is not a file, return False; otherwise, return True.
  • os.path.isdir()
    • Judge whether the target is a folder. If the folder does not exist or is not a folder, return False, otherwise return True.

Actual operation

Still, first define the dictionary to store data. Because it is obtained recursively, only one path is required for the variable. It feels very memory saving.

path = os.path.abspath('practise')  # Get the complete path of the folder to be traversed, e: \ Python \ demo \ study \ practice on this computer
paths = {'files': []}  # Directory tree dictionary

Then you need to determine what to recurse when recursing. As you can see here, zp is the abbreviation of sub paths, which is the subdirectory dictionary; lj is the Pinyin combination of paths, which represents the path of chani subdirectory. If it is not recursive, the function is to traverse the subdirectory and put the files in the subdirectory into the subdirectory dictionary. Here, the default value is set to path, which means to start from the root directory.

def loop(zp: dict, lj: str = path):
    for i in os.listdir(lj):
        if os.path.isfile(f'{lj}\\{i}'):
            zp['files'].append(i)
        elif os.path.isdir(f'{lj}\\{i}'):
            zp[i] = {'files': []}

After determining the method of adding files through a single path, add a line of loop(zp[i], f'{lj}\{i}') at the end of else identified as a folder, and pass in the subdirectory dictionary and subdirectory path to start recursive subdirectory. The recursive functions are as follows:

def loop(zp: dict, lj: str = path):
    for i in os.listdir(lj):
        if os.path.isfile(f'{lj}\\{i}'):
            zp['files'].append(i)
        elif os.path.isdir(f'{lj}\\{i}'):
            zp[i] = {'files': []}
            loop(zp[i], f'{lj}\\{i}')

For simplicity, I wrote the loop function as my favorite lambda function. Here, I use the exec() function to add subdirectories:

loop = lambda _, __ = path: [os.path.isdir(f'{__}\\{___}') and (exec("_[___] = {'files': []}") or loop(_[___], f'{__}\\{___}')) or _['files'].append(___) for ___ in os.listdir(__)]

Next, call the function to fill in the paths variable. The current path defaults to the path variable, so you don't need to add it.

loop(paths)

This is the second idea. A recursive function is used to get files and add them to subdirectories.
Format and print the entire directory tree:

print(json.dumps(paths, indent=4, ensure_ascii=False))

Expand implementation - Ideas

Basic preparation

  • os.stat(path, follow_symlinks)
    • Essentially, the stat of the Linux system is called to return the information of the file or folder itself.
    • followlinks=True will traverse the directory actually referred to by the shortcut in the directory (soft connection in linux), and followlinks=False will ignore the shortcut.
    • Return structure: os.stat_ The result (...) object contains the following properties:
      • stat.st_mode: inode protection mode
      • stat.st_ino: inode node number
      • stat.st_dev: device where inode resides
      • stat.st_nlink: number of links to inode
      • stat.st_uid: user ID of the owner
      • stat.st_gid: group ID of the owner
      • stat.st_size: file size (for directories, it is the size without subdirectories and files) [size of ordinary files in bytes]
      • stat.st_atime: time of last visit
      • stat.st_mtime: time of last modification
      • stat.st_ctime: creation time
  • os.path.getsize(filename)
    • Essentially, OS. Stat (filename). St is called_ Size, used to obtain the file size (for a directory, it is the size without subdirectories and files).
  • time.strftime()
    • Essentially, OS. Stat (filename). St is called_ Size, used to obtain the file size (for a directory, it is the size without subdirectories and files).

Actual operation

It's easy to obtain information directly. Now let's make corresponding changes for the obtained data and become more humanized.
First, convert the inode protection mode (permission) into something that ordinary people can understand, stat.st_mode is actually a 15 bit binary converted to decimal.

Therefore, use the following function to obtain the permission part:

    def permission_detail(permission: int):
        result = ''
        for name, per in zip(['owner', 'User group', 'Other users'], map(lambda p_: bin(permission)[2 + p_ * 3: 2 + p_ * 3 + 3], range(2, 5))):
            per_ = []
            for n_, p__ in zip(['read', 'write', 'implement'], map(int, list(per))):
                if p__:
                    per_.append(n_)
            result += f'[{name}]Have the following permissions:{per_};'
        return result

Then, because the file size unit obtained is bytes by default, it is too humanized. Here, we improve the output to enable it to output the unit we want or automatically adapt to the unit of appropriate digital size, where size_unit function get_info local variable:

    def unit_size(size: int):
        size_list = iter(['b', 'kb', 'mb', 'gb', 'tb'])
        unit = next(size_list)
        if size_unit == 'auto':
            while size / 1024 > 1:
                try:
                    size /= 1024
                    unit = next(size_list)
                except StopIteration:
                    break
            return f'{size:.3f} {unit}'
        elif size_unit == 'bit':
            return f'{size * 8} Bit'
        elif size_unit == 'b':
            return f'{size} Bytes'
        elif size_unit == 'kb':
            return f'{size / 1024:.3f} KB'
        elif size_unit == 'mb':
            return f'{size / 1024 / 1024:.3f} MB'
        elif size_unit == 'gb':
            return f'{size / 1024 / 1024 / 1024:.3f} GB'
        elif size_unit == 'tb':
            return f'{size / 1024 / 1024 / 1024 / 1024:.3f} TB'
        return size

There should be no stranger to the timestamp, but I can't see anything from this string, so we use time.strftime to format the time object, where str_time comes with the function get_info local variable:

    def time_str(time_object: float):
        if str_time:
            return time.strftime("%Y-%m-%d %H:%M:%S", time.localtime(time_object))
        return time_object

Finally, integrate the skeleton for obtaining information, including path and size_unit is the unit, str_time is the control variable for whether to format the string:

def get_info(lj: str, size_unit: str = 'auto', str_time: bool = True):
    size_unit = size_unit.strip().lower()
    if os.path.exists(lj):
        stat = os.stat(lj)
        if os.path.isfile(lj):
            return {
                'inode Protection mode': permission_detail(stat.st_mode),
                'inode Node number': stat.st_ino,
                'inode Resident device': stat.st_dev,
                'inode Number of links': stat.st_nlink,
                'Owner's user ID': stat.st_uid,
                'Owner's group ID': stat.st_gid,
                'file size': unit_size(stat.st_size),
                'Last visited': time_str(stat.st_atime),
                'Time of last modification': time_str(stat.st_mtime),
                'Creation time': time_str(stat.st_ctime)
            }
        elif os.path.isdir(lj):
            directories = [di[0] for di in os.walk(lj) if di[0] != lj]
            file_total = eval('+'.join([str([f'{di[0]}\\{fi}' for fi in di[2]]) for di in os.walk(lj) if di[2]]))
            stat = os.stat(lj)
            return {
                'inode Protection mode': permission_detail(stat.st_mode),
                'inode Node number': stat.st_ino,
                'inode Resident device': stat.st_dev,
                'inode Number of links': stat.st_nlink,
                'Owner's user ID': stat.st_uid,
                'Owner's group ID': stat.st_gid,
                'Allocation unit size': unit_size(stat.st_size),
                'Last visited': time_str(stat.st_atime),
                'Time of last modification': time_str(stat.st_mtime),
                'Creation time': time_str(stat.st_ctime),
                'Number of folders': len(directories),
                'Number of files': len(file_total),
                'Total directory size': unit_size(sum([os.path.getsize(fi) for fi in file_total]))
            }
    return 'Object does not exist!'

Conclusion

For the first time, I spent some time writing case anatomy. Holding words is not my strong point; The original plan was to work more every week. If you are in a hurry, the materials you may find will be more casual. I hope you like it~

Posted by zenabi on Mon, 22 Nov 2021 03:29:31 -0800