abandon os.path , embrace pathlib

Keywords: Python Windows Unix Django

Based on Python's file, directory and path operations, we usually use os.path modular.

pathlib is its replacement in os.path On the basis of the encapsulation, the path is objectified, the api is more popular, the operation is more convenient, more in line with the thinking habits of programming.

The pathlib module provides classes that use semantics to represent file system paths, which are suitable for a variety of operating systems. The path class is divided into a pure path (which provides pure computation without I/O) and a specific path (inherited from the pure path, but provides I/O operations).

First, let's look at the organization structure of pathlib module. Its core is six classes. The base class of these six classes is PurePath class, and the other five classes are derived from it:

Arrowheads connect two classes with inheritance relationship. Take PurePosixPath and PurePath classes for example. PurePosixPath inherits from PurePath, that is, the former is a subclass of the latter.

  • PurePath class: the path is regarded as a common string. It can be used to splice multiple specified strings into a path format suitable for the current operating system. At the same time, it can also judge whether any two paths are equal. From the English name, Pure means Pure, which means that PurePath class only cares about path operation, regardless of whether the path in the real file system is valid, whether the file exists, whether the directory exists and other practical problems.
  • PurePosixPath and PureWindowsPath are subclasses of PurePath. The former is used to operate UNIX (including Mac OS X) style operating system paths, and the latter is used to operate Windows operating system paths. We all know that there are some differences in path separators between the two styles of operating systems.
  • The Path class is different from the above three classes. While operating the Path, it can also operate the file / directory and interact with the real file system, for example, to determine whether the Path really exists.
  • PosixPath and WindowsPath are subclasses of Path, which are used to operate Unix (Mac OS X) style paths and Windows style paths respectively.

PurePath, PurePosixPath, and PureWindowsPath are three pure path classes commonly used in some special situations, such as:

  • If you need to operate windows path in UNIX device, or UNIX path in windows device. Because we can't instantiate a real Windows path on Unix, but we can instantiate a pure Windows path and pretend we're operating windows.

  • You want to make sure that your code only operates on paths and does not interact with the operating system.

Popular science: on UNIX type operating system and Windows operating system, the path format is totally different. The main difference lies in the root path and path separator. The root path of UNIX system is slash (/), while the root path of Windows system is drive (C:); the separator used for UNIX system path is forward slash (/), while the backslash (\) is used for Windows.

1, PurePath class

PurePath class (as well as PurePosixPath class and PureWindowsPath class) provides a large number of construction methods, instance methods and class instance properties for us to use.

When the PurePath class is instantiated, the operating system is automatically adapted. If in UNIX or Mac OS X system, the constructor actually returns PurePosixPath object; otherwise, if using PurePath to create an instance on Windows system, the constructor returns PureWindowsPath object.

For example, in a Windows system, execute the following statement:

from pathlib import PurePath

path = PurePath('file.txt')
print(type(path))

# <class 'pathlib.PureWindowsPath'>

PurePath also supports passing in multiple path strings when creating objects, which will be spliced into one path. For example:

from pathlib import PurePath

path = PurePath('https:','www.liujiangblog.com','django')
print(path)

# https:\www.liujiangblog.com\django

As you can see, since the running environment is windows erasure system, the output is the path of Windows platform format.

If you want to create UNIX style paths in Windows, you need to specify the use of the PurePosixPath class, and vice versa. For example:

from pathlib import PurePosixPath
path = PurePosixPath('https:','www.liujiangblog.com','django')
print(path)

# https:/www.liujiangblog.com/django

Emphasis: when doing pure path operation, it is playing with strings. It has no actual connection with the local file system and does not do any disk IO operation. The path constructed by PurePath is essentially a string, which can be converted to a string using str().

In addition, if no string parameter is passed in when using the construction method of PurePath class, etc. is equivalent to the passed in point. (current path) as the parameter:

from pathlib import PurePath

path1 = PurePath()

path2 = PurePath('.')

print(path1 == path2)

# True

If more than one parameter in the passed PurePath construction method contains more than one root path, only the last root path and subsequent child paths will take effect. For example:

from pathlib import PurePath

path = PurePath('C:/', 'D:/', 'file.txt')
print(path)

# D:\file.txt

As an additional reminder, when constructing strings in Python, be sure to pay attention to the difference between forward / backward slashes when escaping and not escaping. And the use and non use of r native strings. Don't write wrong

If the parameters passed to the PurePath constructor contain extra slashes or. Will be ignored directly, but.. will not be ignored:

from pathlib import PurePath
path = PurePath('C:/./..file.txt')
print(path)

# C:\..file.txt

PurePath instances support comparison operators. For paths of the same style, you can judge whether they are equal or compare sizes (in fact, comparing the size of strings). For paths of different styles, you can only judge whether they are equal (obviously, they can't be equal), but you can't compare sizes:

from pathlib import *

# Unix style paths are case sensitive
print(PurePosixPath('/D/file.txt') == PurePosixPath('/d/file.txt'))

# Windows style paths are case insensitive
print(PureWindowsPath('D://file.txt') == PureWindowsPath('d://file.txt'))

# False
# True

The common methods and properties of PurePath instances are listed below:

Instance properties and methods Function description
PurePath.parts Returns the parts contained in the path string.
PurePath.drive Returns the drive letter in the path string.
PurePath.root Returns the root path in the path string.
PurePath.anchor Returns the drive letter and root path in the path string.
PurePath.parents Returns all the parent paths of the current path.
PurPath.parent Returns the previous path of the current path, equivalent to the return value of parents[0].
PurePath.name Returns the filename in the current path.
PurePath.suffixes Returns all suffix names of files in the current path.
PurePath.suffix Returns the file suffix in the current path. This is the last element of the list of sufficesses attributes.
PurePath.stem Returns the name of the main file in the current path.
PurePath.as_posix() Converts the current path to a UNIX style path.
PurePath.as_uri() Converts the current path to a URL. Only absolute paths can be converted, otherwise ValueError will be raised.
PurePath.is_absolute() Determine whether the current path is an absolute path.
PurePath.joinpath(*other) Connect multiple paths together, similar to the slash (/) connector described earlier.
PurePath.match(pattern) Determines whether the current path matches the specified wildcard.
PurePath.relative_to(*other) Gets the result after removing the reference path from the current path.
PurePath.with_name(name) Replace the filename in the current path with a new filename. ValueError is raised if there is no filename in the current path.
PurePath.with_suffix(suffix) Replace the file suffix in the current path with a new suffix. If there is no suffix in the current path, a new suffix is added.

2, Path class

More often, we use the Path class directly, not PurePath.

Path is a subclass of PurePath. In addition to supporting various constructors, properties and methods provided by PurePath, it also provides methods to determine the validity of the path, and even to determine whether the path corresponds to a file or a folder. If it is a file, it also supports reading and writing files.

Path has two subclasses, PosixPath and WindowsPath. The function of these two subclasses is obvious and will not be discussed in detail.

Basic use

from pathlib import Path

# Create instance
p = Path('a','b','c/d')  	
p = Path('/etc')  	

-------------------------------------------------------
p = Path()		

# WindowsPath('.')
p.resolve()                    	# Analytic path, not necessarily real path
# WindowsPath('C:/Users/liujiangblog')
--------------------------------------------------
# Return to the current real absolute path at any time
p.cwd()
# WindowsPath('D:/work/2020/django3')
Path.cwd()
# WindowsPath('D:/work/2020/django3')
p.home()
# WindowsPath('C:/Users/liujiangblog')
Path.home()
# WindowsPath('C:/Users/liujiangblog')

Directory operation

p = Path(r'd:\test\11\22')
p.mkdir(exist_ok=True)          # Create file directory (if tt directory exists, otherwise an error will be reported)
# In general, I will use the following creation method
p.mkdir(exist_ok=True, parents=True) # Recursively create file directory
p.rmdir()		#Delete the current directory, but it must be empty

p
# WindowsPath('d:/test/11/22 ') p still exists

Traverse directory

p = Path(r'd:\test')
# WindowsPath('d:/test')
p.iterdir()                     # amount to os.listdir
p.glob('*')                     # amount to os.listdir , but you can add matching criteria
p.rglob('*')                    # amount to os.walk , you can also add matching criteria

create a file

file = Path(r'd:\test\11\22\test.py')
file.touch()				# The touch method is used to create an empty file. The directory must exist, otherwise it cannot be created
#Traceback (most recent call last):
#  File "<input>", line 1, in <module>
# .....
#FileNotFoundError: [Errno 2] No such file or directory: 'd:\\test\\11\\22\\test.py'

p = Path(r'd:\test\11\22')
p.mkdir(exist_ok=True,parents=True)
file.touch()

File operation

p = Path(r'd:\test\tt.txt.bk')
p.name                          # Get file name
# tt.txt.bk
p.stem                          # Get the part of file name except suffix
# tt.txt
p.suffix                        # file extension
# .bk
p.suffixs                       # File suffixes
# ['.txt', '.bk']
p.parent                        # Equivalent to dirnanme
# WindowsPath('d:/test')
p.parents                       # Returns an iterable containing all the parent directories
# <WindowsPath.parents>
for i in p.parents:
    print(i)
# d:\test
# d:\
p.parts                         # Splits a path into tuples by delimiters
# ('d:\\', 'test', 'tt.txt.bk')

p = Path('C:/Users/Administrator/Desktop/')
p.parent
# WindowsPath('C:/Users/Administrator')

p.parent.parent
# WindowsPath('C:/Users')


# Index 0 is the direct parent directory. The larger the index, the closer it is to the root directory
for x in p.parents: print(x)
# C:\Users\Administrator
# C:\Users
# C:\
# For more technical articles, please visit the official website https://www.liujiangblog.com

# with_name(name) replaces the last part of the path and returns a new path
Path("/home/liujiangblog/test.py").with_name('python.txt')
# WindowsPath('/home/liujiangblog/python.txt')

# with_suffix(suffix) replaces the extension and returns the new path. If the extension exists, it does not change
Path("/home/liujiangblog/test.py").with_suffix('.txt')
# WindowsPath('/home/liujiangblog/test.txt')

file information

p = Path(r'd:\test\tt.txt')
p.stat()                        # Get details
# os.stat_result(st_mode=33206, st_ino=562949953579011, st_dev=3870140380, st_nlink=1, st_uid=0, st_gid=0, st_size=0, st_atime=1525254557, st_mtime=1525254557, st_ctime=1525254557)
p.stat().st_size                # file size
# 0
p.stat().st_ctime               # Creation time
# 1525254557.2090347
# Other information can be obtained in the same way
p.stat().st_mtime               # Modification time

File reading and writing

open(mode='r', bufferiong=-1, encoding=None, errors=None, newline=None)

The method is similar to Python's built-in open function, which returns a file object.

p = Path('C:/Users/Administrator/Desktop/text.txt')
with p.open(encoding='utf-8') as f: 
	print(f.readline())  

read_bytes(): read the file in 'rb' mode and return data of bytes type

write_bytes(data): write data to file as' wb '

p = Path('C:/Users/Administrator/Desktop/text.txt')
p.write_bytes(b'Binary file contents')
# 20
p.read_bytes()
# b'Binary file contents'

read_text(encoding=None, errors=None): read the path corresponding file in 'r' mode, and return the text

write_text(data, encoding=None, errors=None): write string to path corresponding file in 'w' mode

p = Path('C:/Users/Administrator/Desktop/text.txt')
p.write_text('Text file contents')
# 18
p.read_text()
# 'Text file contents'

Judgment operation

Return Boolean

  • is_dir(): directory or not
  • is_file(): whether it is a normal file
  • is_symlink(): whether it is a soft link
  • is_socket(): whether it is a socket file
  • is_block_device(): is it a block device
  • is_char_device(): is it a character device
  • is_absolute(): is it an absolute path
p = Path(r'd:\test')
p = Path(p, 'test.txt')           # String splicing
p.exists()                      # Judge whether the file exists
p.is_file()                     # Determine whether it is a document
p.is_dir()                      # Determine whether it is a directory

Path splicing and decomposition

In pathlib, there are three main ways to splice paths by splicing operator /

  • Path object / path object
  • Path object / String
  • String / Path object

Decomposition path is mainly through parts method

p=Path()
p
# WindowsPath('.')
p = p / 'a'
p
# WindowsPath('a')
p = 'b' / p
p
# WindowsPath('b/a')
p2 = Path('c')
p = p2 / p
p
# WindowsPath('c/b/a')
p.parts
# ('c', 'b', 'a')
p.joinpath("c:","liujiangblog.com","jack")    # When splicing, the front part is ignored
# WindowsPath('c:liujiangblog.com/jack')

# For more technical articles, please visit the official website https://www.liujiangblog.com

wildcard

  • glob(pattern): the pattern of rationing
  • rglob(pattern): the pattern assigned through allocation, and recursively search the directory

Return value: a generator

p=Path(r'd:\vue_learn')
p.glob('*.html')   # Match all HTML files and return a generator generator
# <generator object Path.glob at 0x000002ECA2199F90>
list(p.glob('*.html'))
# [WindowsPath('d:/vue_learn/base.html'), WindowsPath('d:/vue_learn/components.html'), WindowsPath('d:/vue_learn/demo.html').........................
g = p.rglob('*.html')	# Recursive matching
next(g)  
# WindowsPath('d:/vue_learn/base.html')
next(g)
# WindowsPath('d:/vue_learn/components.html')

Regular matching

Use the match method for pattern matching, and return True if successful

p = Path('C:/Users/Administrator/Desktop/text.txt')
p.match('*.txt')
# True
Path('C:/Users/Administrator/Desktop/text.txt').match('**/*.txt')
# True

More examples

from pathlib import Path

p1 = Path(__file__)  #Get current file path
#D:\liujiangblog\test1.py

p2 = Path.cwd()  #Get the directory of the current file
#D:\test

p3=Path.cwd().parent  #Parent directory of the current file directory
#D:\

p=Path.cwd().joinpath('a')  #Path splicing
#D:\test\a

st=Path(__file__).stat()  #Get information about the current file
#os.stat_result(st_mode=33206, st_ino=6473924464701313, st_dev=1559383105, st_nlink=1, st_uid=0, st_gid=0, st_size=300, st_atime=1578661629, st_mtime=1578661629, st_ctime=1576891792)
a=st.st_size   #File size in bytes

p=p1.parent  #Parent path of p1
z=p1.parents #All ancestor paths of p, return an object
# for i in z:
#     print(i)
pp = Path('D:/python')  #Create a path object
a=pp.is_file()  #Determine whether pp is a file
a=pp.is_dir()  #Determine whether pp is a directory
a=p2.is_absolute()  #Determine if p2 is an absolute path
a=p2.match('d:\*')  #Determine whether p2 conforms to a certain mode
a=p2.glob('*.py') #Search for the first mock exam file in p2 mode -- search p2 directory only
a=p3.glob('**\*.py')  #Search for the first mock exam file in p3, including all subdirectories.
# a=p3.rglob('*.py')  #Search for the first mock exam file in p3, including all subdirectories.
# for i in a:
#     print(i)
# For more technical articles, please visit the official website https://www.liujiangblog.com
#pp.mkdir()  #Create directory -- throw an exception if it already exists
a=p1.name  #Get file name
#test1.py
a=p1.suffix #Get suffix
#.py
a=pp.stem  #Last part of catalog without suffix
a=pp.with_name('vocab.txt')  #Replace the last part and return a new path
a=p1.with_suffix('.lm') #Replace the extension and return the new path. If the extension exists, it will not change
#D:\ss\test1.lm
dir=Path('d:/')
a=dir.iterdir()  #Iterator of all file and folder paths -- only return the excluding subdirectories of this directory
# for i in a:
#     print(i)

file=Path('D:/ss.lm')
#file.rename('d:/cc.txt')  #Rename and move - both files and folders
#Throw an exception if the file does not exist
#Move must be on the same drive
#Exception thrown when the target file already exists

file.replace('d:/cc.txt')  #Rename and move - both files and folders
#Similar to rename, overwrite the original file when the target file or folder already exists

For more information visit: https://www.liujiangblog.com

For more video tutorials visit: https://www.liujiangblog.com/video/

Posted by SuNcO on Tue, 19 May 2020 20:44:08 -0700