Performance comparison of Python methods for obtaining file names
Usually, the simple method of getting file names from folders in python is os.system('ll /data / '), but when there are a large number of files in a folder, this method is completely impractical;
Nearly 6 million files have been generated in the / dd directory. Let's see the performance comparison between different methods. shell script for fast file generation
for i in $(seq 1 1000000);do echo text >>$i.txt;done
1. System command ls -l
# System command ls -l import time import subprocess start = time.time() result = subprocess.Popen('ls -l /dd/', stdout=subprocess.PIPE,shell=True) for file in result.stdout: pass print(time.time()-start) # Direct jamming
2. glob module
# glob module import glob import time start = time.time() result = glob.glob("/dd/*") for file in result: pass print(time.time()-start) # 49.60481119155884
3. os.walk module
# os.walk module import os import time start = time.time() for root, dirs, files in os.walk("/dd/", topdown=False): pass print(time.time()-start) # 8.906772375106812
4. OS. Scanner module
# Os.scanner module import os import time start = time.time() path = os.scandir("/dd/") for i in path: pass print(time.time()-start) # 4.118424415588379
5. shell find command
# shell find command import time import subprocess start = time.time() result = subprocess.Popen('find /dd/', stdout=subprocess.PIPE,shell=True) for file in result.stdout: pass print(time.time()-start) # 6.205533027648926
6. The shell ls-1-f command does not sort
# Shell LS - 1 - f command import time import subprocess start = time.time() result = subprocess.Popen('ls -1 -f /dd/', stdout=subprocess.PIPE,shell=True) for file in result.stdout: pass print(time.time()-start) # 3.3476643562316895
7,os.listdir
# os.listdir import os import time start = time.time() result = os.listdir('/dd') for file in result: pass print(time.time()-start) # 2.6720399856567383