Python 3 standard library: linecache efficiently reads text files

Keywords: Python

1. linecache efficiently reads text files

When processing Python source files, the linecache module is used in other parts of Python standard library. The cache implementation saves the contents of the file in memory (resolved to a separate line). This API indexes a list to return the requested line, which saves time compared with repeatedly reading the file and parsing the text to find the required line. This module is particularly useful for finding multiple lines in the same file, such as generating a traceback for an error report.

1.1 test data

The public data are as follows:

import os
import tempfile
import linecache

lorem = '''Lorem ipsum dolor sit amet, consectetuer
adipiscing elit.  Vivamus eget elit. In posuere mi non
risus. Mauris id quam posuere lectus sollicitudin
varius. Praesent at mi. Nunc eu velit. Sed augue massa,
fermentum id, nonummy a, nonummy sit amet, ligula. Curabitur
eros pede, egestas at, ultricies ac, apellentesque eu,
tellus.

Sed sed odio sed mi luctus mollis. Integer et nulla ac augue
convallis accumsan. Ut felis. Donec lectus sapien, elementum
nec, condimentum ac, interdum non, tellus. Aenean viverra,
mauris vehicula semper porttitor, ipsum odio consectetuer
lorem, ac imperdiet eros odio a sapien. Nulla mauris tellus,
aliquam non, egestas a, nonummy et, erat. Vivamus sagittis
porttitor eros.'''

def make_tempfile():
    fd, temp_file_name = tempfile.mkstemp()
    os.close(fd)
    with open(temp_file_name, 'wt') as f:
        f.write(lorem)
    return temp_file_name

def cleanup(filename):
    os.unlink(filename)

filename = make_tempfile()

1.2 read specific lines

The file line read by the linecache module starts from 1, but usually the array index of the list starts from 0.

# Pick out the same line from source and cache.
# (Notice that linecache counts from 1)
print('SOURCE:')
print('{!r}'.format(lorem.split('\n')[4]))
print()
print('CACHE:')
print('{!r}'.format(linecache.getline(filename, 5)))

cleanup(filename)

The lines returned include a line break at the end.

1.3 handling blank lines

The return value always contains a line break at the end of the line, so if the text behavior is empty, the return value is a line break.

# Blank lines include the newline
print('BLANK : {!r}'.format(linecache.getline(filename, 8)))

cleanup(filename)

Line 8 of the input file does not contain any text.

1.4 error handling

If the requested line number exceeds the range of legal line numbers in the file, getline() returns an empty string.

# The cache always returns a string, and uses
# an empty string to indicate a line which does
# not exist.
not_there = linecache.getline(filename, 500)
print('NOT THERE: {!r} includes {} characters'.format(
    not_there, len(not_there)))

The input file has only 15 lines, so requesting line 500 is similar to trying to read beyond the end of the file.

When reading a non-existent file, the same way is used.  

# Errors are even hidden if linecache cannot find the file
no_such_file = linecache.getline(
    'this_file_does_not_exist.txt', 1,
)
print('NO FILE: {!r}'.format(no_such_file))

When the caller attempts to read data, the module does not generate an exception.

1.5 reading Python source files

Because of the frequent use of linecache when generating traceback trace records, one of its key features is to be able to specify the base name of the module to find the Python source module in the import path.

# Look for the linecache module, using
# the built in sys.path search.
module_line = linecache.getline('linecache.py', 3)
print('MODULE:')
print(repr(module_line))

# Look at the linecache module source directly.
file_src = linecache.__file__
if file_src.endswith('.pyc'):
    file_src = file_src[:-1]
print('\nFILE:')
with open(file_src, 'r') as f:
    file_line = f.readlines()[2]
print(repr(file_line))

If the cache population code in linecache cannot find the file with the specified name in the current directory, it searches for the module with the specified name in sys.path. For this example, look for linecache.py. Since there is no copy of this file in the current directory, the corresponding file in the standard library will be found.

Posted by webhamster on Sat, 14 Mar 2020 21:20:06 -0700