1. filecmp comparison file
The filecmp module provides functions and a class to compare files and directories on the file system.
1.1 sample data
Use the following code to create a set of test files.
import os def mkfile(filename, body=None): with open(filename, 'w') as f: f.write(body or filename) return def make_example_dir(top): if not os.path.exists(top): os.mkdir(top) curdir = os.getcwd() os.chdir(top) os.mkdir('dir1') os.mkdir('dir2') mkfile('dir1/file_only_in_dir1') mkfile('dir2/file_only_in_dir2') os.mkdir('dir1/dir_only_in_dir1') os.mkdir('dir2/dir_only_in_dir2') os.mkdir('dir1/common_dir') os.mkdir('dir2/common_dir') mkfile('dir1/common_file', 'this file is the same') os.link('dir1/common_file', 'dir2/common_file') mkfile('dir1/contents_differ') mkfile('dir2/contents_differ') # Update the access and modification times so most of the stat # results will match. st = os.stat('dir1/contents_differ') os.utime('dir2/contents_differ', (st.st_atime, st.st_mtime)) mkfile('dir1/file_in_dir1', 'This is a file in dir1') os.mkdir('dir2/file_in_dir1') os.chdir(curdir) return if __name__ == '__main__': os.chdir(os.path.dirname(__file__) or os.getcwd()) make_example_dir('example') make_example_dir('example/dir1/common_dir') make_example_dir('example/dir2/common_dir')
Running this script will generate a file tree in the axexample directory.
The same directory structure exists in the common dir directory to provide interesting recursive comparison options.
1.2 comparison documents
cmp() is used to compare two files on the file system.
import filecmp print('common_file :', end=' ') print(filecmp.cmp('example/dir1/common_file', 'example/dir2/common_file', shallow=True), end=' ') print(filecmp.cmp('example/dir1/common_file', 'example/dir2/common_file', shallow=False)) print('contents_differ:', end=' ') print(filecmp.cmp('example/dir1/contents_differ', 'example/dir2/contents_differ', shallow=True), end=' ') print(filecmp.cmp('example/dir1/contents_differ', 'example/dir2/contents_differ', shallow=False)) print('identical :', end=' ') print(filecmp.cmp('example/dir1/file_only_in_dir1', 'example/dir1/file_only_in_dir1', shallow=True), end=' ') print(filecmp.cmp('example/dir1/file_only_in_dir1', 'example/dir1/file_only_in_dir1', shallow=False))
The shillo parameter tells cmp() whether to view the contents of the file in addition to the metadata of the file. By default, a shallow comparison is done using the information obtained by os.stat(). If the results are the same, the files are considered the same. Therefore, for files of the same size created at the same time, even if their contents are different, they will be reported as the same file. When shallow is False, the contents of the file are compared.
If you compare a set of files in two directories non recursively, you can use cmpfiles(). Parameters are the directory name and two locations to check on the list I love you. The list of public files passed in should contain only the file names (directories will cause the match to fail), and these files should appear in both locations. The next example shows a simple way to construct a common list. Like cmp(), this comparison has a shallow flag.
import filecmp import os # Determine the items that exist in both directories d1_contents = set(os.listdir('example/dir1')) d2_contents = set(os.listdir('example/dir2')) common = list(d1_contents & d2_contents) common_files = [ f for f in common if os.path.isfile(os.path.join('example/dir1', f)) ] print('Common files:', common_files) # Compare the directories match, mismatch, errors = filecmp.cmpfiles( 'example/dir1', 'example/dir2', common_files, ) print('Match :', match) print('Mismatch :', mismatch) print('Errors :', errors)
cmpfiles() returns a list of three file names, including matched files, unmatched files, and files that cannot be compared (due to permission issues or for other reasons).
1.3 comparison list
The functions described earlier are suitable for relatively simple comparisons. For recursive comparison or more complete analysis of large directory trees, the dircmp class is more useful. In the simplest case, report() prints a report comparing two directories.
import filecmp dc = filecmp.dircmp('example/dir1', 'example/dir2') dc.report()
The output is a plain text report that displays only the contents of a given directory and does not recursively compare its subdirectories. In this case, the file "not the same" is considered the same because there is no comparison. You cannot have dircmp compare the contents of a file like cmp().
For more details and to complete a recursive comparison, you can use report \ full close().
import filecmp dc = filecmp.dircmp('example/dir1', 'example/dir2') dc.report_full_closure()
The output will include a comparison of all sibling subdirectories.
1.4 using differences in procedures
In addition to generating print reports, dircmp can also calculate the list of files, which can be used directly in the program. The following properties are only computed when requested, so creating a dircmp instance does not incur overhead for unused data.
import filecmp import pprint dc = filecmp.dircmp('example/dir1', 'example/dir2') print('Left:') pprint.pprint(dc.left_list) print('\nRight:') pprint.pprint(dc.right_list)
The files and subdirectories contained in the directory being compared are listed in the left? List and the right? List, respectively.
You can filter the input by passing a list of names to ignore to the constructor (names specified in the list will be ignored). By default, names such as RCS, CVS, and tags are ignored.
import filecmp import pprint dc = filecmp.dircmp('example/dir1', 'example/dir2', ignore=['common_file']) print('Left:') pprint.pprint(dc.left_list) print('\nRight:') pprint.pprint(dc.right_list)
Here, remove common file from the list of files to compare.
The file names shared by the two input directories are saved in common, and the files unique to each directory are listed in left only and right only.
import filecmp import pprint dc = filecmp.dircmp('example/dir1', 'example/dir2') print('Common:') pprint.pprint(dc.common) print('\nLeft:') pprint.pprint(dc.left_only) print('\nRight:') pprint.pprint(dc.right_only)
The "left" directory is the first parameter of dircmp(), and the "right" directory is the second parameter.
Public members can be further broken down into files, directories, and "interesting" elements (content of different types in the two directories, or where os.stat() points to errors).
import filecmp import pprint dc = filecmp.dircmp('example/dir1', 'example/dir2') print('Common:') pprint.pprint(dc.common) print('\nDirectories:') pprint.pprint(dc.common_dirs) print('\nFiles:') pprint.pprint(dc.common_files) print('\nFunny:') pprint.pprint(dc.common_funny)
In the sample data, the file in dir1 element is a file in one directory and a subdirectory in another, so it appears in the "interesting" list.
The differences between files can be similarly divided.
import filecmp dc = filecmp.dircmp('example/dir1', 'example/dir2') print('Same :', dc.same_files) print('Different :', dc.diff_files) print('Funny :', dc.funny_files)
The file not the same is compared by os.stat() and does not check the content, so it is included in the same files list.
Finally, subdirectories are saved to facilitate recursive comparisons.
import filecmp dc = filecmp.dircmp('example/dir1', 'example/dir2') print('Subdirectories:') print(dc.subdirs)
The attribute subdirs is a dictionary that maps the directory name to the new dircmp object.