Python Road [Part 5]: File Processing Based on Python

Keywords: Python encoding REST

Reading catalogue

I. File Operation

1, introduction

Computer system is divided into three parts: computer hardware, operating system and application program.

Applications written in python or other languages that want to save data permanently must be stored in the hard disk, which involves the application to operate hardware. As we all know, applications can not directly operate hardware, which uses the operating system. Operating system encapsulates complex hardware operations into simple interfaces for users/applications. Files are the virtual concepts that operating system provides to applications to operate hard disk. Users or applications can save their own data permanently by operating files.

With the concept of file, we do not need to consider the details of operating the hard disk any more. We only need to pay attention to the process of operating the file.

# 1. Open the file, get the file handle and assign it to a variable.
# 2. Manipulating files through handles
# 3. Closing Documents

The code is as follows:

f=open('Zhu Rui',encoding='utf-8') #Python 3 defaults to utf-8 encoding, but the system encoding followed by the new file needs to be converted to utf-8 through encoding
data=f.read()
print(data)
f.close()

2. In python

#1. Open the file, get the file handle and assign it to a variable
f=open('a.txt','r',encoding='utf-8') #The default open mode is r

#2. Manipulating files through handles
data=f.read()

#3. Close files
f.close()

 

3. Process analysis of f = open('file.txt','r')

# 1. Initiate system call open(...) from the application to the operating system.
# 2. The operating system opens the file and returns a file handle to the application.
# 3. Application assigns file handle to variable f

2. Mode of Opening Documents

File handle = open('file path','mode')
# 1. The mode of opening a file is (default text mode)
r, read-only mode [default mode, file must exist, exception thrown if none exists]
w, Write-only mode [not readable, files are created if they do not exist, and contents are emptied if they exist]
a. Additional mode [unreadable, non-existent, created, existent, only appended content]

# 2. For non-text files, we can only use B mode,'b'means to operate in bytes (and all files are stored in bytes, without considering the character encoding of text files, jpg format of picture files, avi format of video files)
rb
wb
ab
Note: When opened in b mode, the content read is byte type, and the byte type must be provided when writing, so the encoding cannot be specified.

# 3. Understanding
"+" means that a file can be read and written at the same time
r+: Read and write [Readable, Writable]
w+: Write and read [Readable, Writable]
a+: Write and read [Readable, Writable]

x, write-only mode [unreadable; create if nonexistence, report error if existence]
x+, Write and Read [Readable, Writable]
xb

 

3. The Method of Operating Documents

Read operations commonly used in file processing:

# f=open('Zhu Rui', encoding='utf-8')
# data=f.read()
# print(data)
# f.close()

#Open file mode (rwa)
# f=open('Zhu Rui','r',encoding='utf-8')
# # data=f.read()
# # print(data)
# print(f.readable()) #Determine whether the file is read?
#
# f=open('Zhu Rui','w',encoding='utf-8')
# # data=f.read()
# # print(data)
# print(f.writable()) #Determine whether a file is written or not?

file=open('readline',encoding='utf-8')
print('First elements',file.readline(),end='') #Cancel line change by adding end to blank
print('Second elements',file.readline())
print('Third elements',file.readline())
print('Fourth elements',file.readline())
print('Fifth elements',file.readline())
print('Sixth elements',file.readline())
print('Seventh elements',file.readline())
print('Eighth elements',file.readline())
print('Ninth elements',file.readline())
//Output results:
C:\Python35\python3.exe G:/python_s3/day16/File processing.py
//Line 1 111111111111111111111111111
//Line 2 2222222222222222222222222

//Line 3 33333333333333333333333

//Line 4 4444444444444444

//Line 5 54545

//Line 6 4545641111111111111111

//Line 7 3333333333333333343532236236

File processing write operations:

f=open('Zhu Rui','w',encoding='utf8')
f.write('23456789\n')
f.write('1233489087766\n')
f.write('33334444333\n')
f.write('1233\n')
f.writelines(['555\n','6666\n']) #File content can only be a string, can only write string
f.write(3)
f.close()

File Processing Additional Operation:

f=open('Zhu Rui','a',encoding='utf-8')
f.write('Write to the end of the file')

Other mode operations for file processing:

File processing other operations:
f=open('xxx','r+',encoding='gbk')
# data=f.read()
# print(data)
# f.write('123sb')



f.write('sb')


#File modification
src_f=open('xxx','r',encoding='gbk')
data=src_f.readlines()
src_f.close()

# for i in data:
#     print(i)
print(data)
dst_f=open('xxx','w',encoding='gbk')
# dst_f.writelines(data)
dst_f.write(data[0])
dst_f.close()

with open('a.txt','w') as f: #Open the file in with mode without closing it again
    f.write('1111\n')


src_f=open('xxx','r',encoding='gbk')
dst_f=open('xxx','w',encoding='gbk')
with open('xxx','r',encoding='gbk') as src_f,\
        open('xxx_new','w',encoding='gbk') as dst_f:
    data=src_f.read()
    dst_f.write(data)

f=open('a.txt')
print(f.encoding) #View File Coding

 

IV. Cursor Moving in Documents

 1,read(3):

1. The file is opened in text mode, which means reading three characters.
2. The mode of file opening is b mode, which means reading 3 characters.

2. The cursor movement in the rest of the files is in bytes, such as: seek, tell, truncate

Be careful:
1. seek has three modes 0,1,2 in which 1 and 2 must be operated in mode b, but in either mode, bytes are used as a single displacement.

2. Truncate is a truncated file, so the opening mode of the file must be writable, but it can not be opened by w or w +, because it clears the file directly, so truncate should test the effect in r + or a + mode.
#Realization of tail-f function based on seek
import time with open('test','rb') as f: f.seek(0,2) while True: line=f.readline() if line: print(line.decode('utf-8')) else: time.sleep(0.2)
f=open('log file','rb')
for i in f:
    offs=-10
    while True:
        f.seek(offs,2)
        data=f.readlines()
        if len(data) > 1:
            print('The last line of the file is%s' %(data[-1].decode('utf-8')))
            break
        offs*=2

Posted by cbrknight on Tue, 14 May 2019 12:58:36 -0700