File Processing in Python

Keywords: encoding Python Windows

File Processing in Python

Basic operation of files

The absolute path of the file should be written; if the file is in the same directory as the code, the file name can be written directly.

About file encoding: What encoding to use, you must use what encoding to open, otherwise it may be scrambled.

# Create a file test.txt with the contents of:

11111111
22222222
33333333
  1. open() function: Open the file, apply for system call to the operating system, occupy file handle, return value is a file object

You can specify permissions and encoding formats for reading files

file = open('test.txt','r',encoding='utf8')
print(file)

# <_io.TextIOWrapper name='test.txt' mode='r' encoding='utf-8'>
  1. close() function: Close the open file and clear the occupied file handle
file = open('test.txt','r',encoding='utf8')
file.close()

3. with func() as VAR:

Opening a file eliminates the need to explicitly call the close() function to close the occupied handle

with open('a.txt','w',encoding='gbk') as file:
    file.write('Police\n')
with open('a.txt','r',encoding='gbk') as file:
    data = file.readlines()
    print(data)

# Create a.txt file
# [police \n']

Reading operation of files

  1. read() function: read the contents of the file, and the return value of the function is the contents of the read file

After reading, the cursor moves to the end of the file
close() function should be called after reading the file to close the read file and clear the occupied file handle.

file = open('test.txt','r',encoding='utf8')
data = file.read()
print(data)
file.close()

# 11111111
# 22222222
# 33333333
  1. readable() function: Determines whether the file is readable, and the return value of the function is True or False
file = open('test.txt','r',encoding='utf-8')
data1 = file.readable()
print(data1)

# True
  1. readline() function: scan and read a line of files one by one

Read a default line wrap, add the "end=" parameter, remove the line wrap

file = open('test.txt','r',encoding='utf-8')
print('The first line:',file.readline())
print('The second line:',file.readline(),end='')
print('The third line:',file.readline(),end='')
print('The fourth line:',file.readline(),end='')

# Line 1: 11111111111
# 
# Line 2: 22222222
# The third line: 333333, the fourth line: 
  1. readlines() function:
file = open('test.txt','r',encoding='utf-8')
print(file.readlines())

# ['11111111\n', '22222222\n', '33333333\n']

Writing of files

If the file exists, empty the file and prepare to write to it
If the file does not exist, create an empty file ready to write to the content

file = open('test.txt','w',encoding='utf8')
print(file)
print(file.readable())

# <_io.TextIOWrapper name='test.txt' mode='w' encoding='utf8'>
# False
  1. write() function: Write content to a file

Only one type of str data can be written to a file

Pay attention to adding newline character " n" to change lines

file = open('test.txt','w',encoding='utf8')
file.write('11111')
file.write('\n22222\n')
file.close()
file = open('test.txt','r',encoding='utf8')
print(file.read())

# Automatically create test.txt file
# 11111
# 22222
  1. writable() function: test whether the file is writable
file = open('test.txt','w',encoding='utf8')
print(file.writable())

# True
  1. writelines() function:

If a file exists, it will overwrite the contents of the original file.

file = open('test.txt','w',encoding='utf8')
file.writelines(['33333\n','44444'])
file.close()
file = open('test.txt','r',encoding='utf8')
print(file.read())

# Cover the original content of test.txt
# 33333
# 44444

File Addition

If the file exists, add content to the file
If the file does not exist, create an empty file to append content to the empty file

file = open('test1.txt','a',encoding='utf-8')
file.write('11111\n')
file.write('11111\n')
file.close()
file =  open('test1.txt','r',encoding='utf-8')
print(file.read())

# Create the test1.txt file
# 11111
# 11111

Modification of files

The essence of file modification is: first read the contents of the file (if the file exists) into memory, after modifying in memory, overwrite the original file, and the final result looks like a direct modification of the file.

with open('a.txt','r',encoding='gbk') as file:
    data = file.readlines()
    print(data)
with open('a.txt','w',encoding='gbk') as file:
    file.writelines([data[0],'Catch thief'])

# The content of a.txt file:
# policeman and thief

b (binary: binary) mode of file

Binary mode does not need to specify a string encoding format, otherwise error is reported

The string we see must be "coded" or "decoded" by encoding formats between the binary stored on disk and the string.

  • There are two ways to convert a string to binary. The encode() function is more recommended here.
(1). bytes() function

    //Format:bytes('str',encoding='FORMAT')
    //Usage:
        a = bytes('abc Police',encoding='utf-8')
        print(type(a),'\n',a)

        # <class 'bytes'> 
        # b'abc\xe8\xad\xa6\xe5\xaf\x9f'


(2). encode() function

    //Format:'str'.encode('FORMAT')
    //Usage:
        b = 'abc Thief'.encode('utf-8')
        print(type(b),'\n',b)

        # <class 'bytes'> 
        # b'abc\xe5\xb0\x8f\xe5\x81\xb7'
  1. rb mode of file

Binary mode does not need to specify a string encoding format, otherwise error is reported

Note: The newline character in windows system is "rn"

with open('test.txt','rb') as f:
    data = f.read()
    print(data)

# b'11111\r\n22222\r\n33333\r\n44444\r\n\xe8\xad\xa6\xe5\xaf\x9f'

decode() function: python built-in function

# The encoding format of test.txt file is utf-8, and UTF-8 must also be used in decoding.
with open('test.txt','rb') as f:
    data = f.read()
    print(data.decode('utf-8'))

# 11111
# 22222
# 33333
# 44444
# Police
  1. wb mode of file

Binary mode does not need to specify a string encoding format, otherwise error is reported

with open('aaa.txt','wb') as f:
    data = f.write('policeman and thief'.encode('utf-8'))

# Test Writing Results
with open('aaa.txt','r',encoding='utf-8') as f1:
    print(f1.read())

# policeman and thief
  1. File ab mode

Binary mode does not need to specify a string encoding format, otherwise error is reported

with open('aaa.txt','ab') as f:
    data = f.write('policeman and thief'.encode('utf-8'))

# Test Writing Results
with open('aaa.txt','r',encoding='utf-8') as f1:
    print(f1.read())

# Police Catch Thieves Police Catch Thieves

Other modes of documents

file.name() filename

file.closed() to determine whether it is closed

file.flush() brushes the contents of memory to disk

file.tell() displays the current cursor position

file.seek() moves the cursor and skips over how many bytes (not characters)

Forward, no memory cursor position: file.seek(int,0)
Forward, Memory cursor position: file.seek(int,1)
Inverse order, file.seek(int,2)

file.read() will move the cursor to the end of the file, so it cannot be misused.

    1. file.seek(int,0) function: forward, not remembering cursor position
with open('aaa.txt','w',encoding='utf-8') as f:
    f.write('11111\n22222\n')
with open('aaa.txt','r',encoding='utf-8') as f:
    print('Current cursor position:',f.tell())
    print(f.read())
    f.seek(3)   # Equivalent to f.seek(3,0)
    print('Current cursor position:',f.tell())
    print(f.read())
    f.seek(10)  # Equivalent to f.seek(10,0)
    print('Current cursor position:',f.tell())
    print(f.read())

# Current cursor position: 0
# 11111
# 22222

# Current cursor position: 3
# 11
# 22222

# Current cursor position: 10
# 22
    1. file.seek(int,1) function: forward, memory cursor position

Note: Must be in "rb" mode

with open('aaa.txt','w',encoding='utf-8') as f:
    f.write('11111\n22222\n')
with open('aaa.txt','rb') as f:
    print('Current cursor position:',f.tell())
    f.seek(3,1)
    print('Current cursor position:',f.tell())
    f.seek(10,1)
    print('Current cursor position:',f.tell())

# Current cursor position: 0
# Current cursor position: 3
# Current cursor position: 13
    1. file.seek(int,2) function: move cursor in reverse order

Note: Must be in "rb" mode
Note: Move the cursor in reverse order. The number of bytes moved must be negative!

with open('aaa.txt','w',encoding='utf-8') as f:
    f.write('11111\n22222\n')
with open('aaa.txt','rb') as f:
    print('Current cursor position:',f.tell())
    f.seek(-6,2)
    print('Current cursor position:',f.tell())
    print(f.read())

# Current cursor position: 0
# Current cursor position: 8
# b'2222\r\n'

file.truncate() retains truncated bytes (not characters)

You must open it in write mode, but you can't use w and W + modes. You can use wb, r + and other modes.

with open('aaa.txt','w',encoding='utf-8') as f:
    f.write('11111\n22222\n')
with open('aaa.txt','wb') as f:     #You can also use r + mode
    print('Current cursor position:',f.tell())
    print(f.truncate(10))

# 10

Posted by BigDaddy13 on Sat, 18 May 2019 00:50:45 -0700