Python development [notes]: obtain file names from directories of massive files -- method performance comparison

Keywords: Python shell

Performance comparison of Python methods for obtaining file names

 

Usually, the simple method of getting file names from folders in python is os.system('ll /data / '), but when there are a large number of files in a folder, this method is completely impractical;

 

Nearly 6 million files have been generated in the / dd directory. Let's see the performance comparison between different methods. shell script for fast file generation

for i in $(seq 1 1000000);do echo text >>$i.txt;done

  

 

1. System command ls -l

# System command ls -l

import time
import subprocess

start = time.time()
result = subprocess.Popen('ls -l /dd/', stdout=subprocess.PIPE,shell=True)

for file in result.stdout:
    pass
print(time.time()-start)

# Direct jamming

  

2. glob module

# glob module

import glob
import time


start = time.time()
result = glob.glob("/dd/*")
for file in result:
    pass
print(time.time()-start)

# 49.60481119155884

  

3. os.walk module

# os.walk module

import os
import time

start = time.time()
for root, dirs, files in os.walk("/dd/", topdown=False):
        pass
print(time.time()-start)

# 8.906772375106812

  

4. OS. Scanner module

# Os.scanner module

import os
import time

start = time.time()
path = os.scandir("/dd/")
for i in path:
    pass
print(time.time()-start)

# 4.118424415588379

  

5. shell find command

# shell find command

import time
import subprocess

start = time.time()
result = subprocess.Popen('find /dd/', stdout=subprocess.PIPE,shell=True)

for file in result.stdout:
    pass
print(time.time()-start)

# 6.205533027648926

  

6. The shell ls-1-f command does not sort

# Shell LS - 1 - f command

import time
import subprocess

start = time.time()
result = subprocess.Popen('ls -1 -f /dd/', stdout=subprocess.PIPE,shell=True)

for file in result.stdout:
    pass
print(time.time()-start)

# 3.3476643562316895

  

7,os.listdir

# os.listdir

import os
import time


start = time.time()
result = os.listdir('/dd')
for file in result:
    pass
print(time.time()-start)

# 2.6720399856567383

Posted by pumaf1 on Wed, 04 Dec 2019 21:11:38 -0800