This article teaches you to batch operate Excel files with Python

When you need to process Excel files in batches at work, are you still processing them manually one by one? Learn the following automatic batch processing methods and bid farewell to the mechanical and inefficient work!

01

OS library introduction

OS (Operation System) refers to the operating system. In Python, the OS library mainly provides some functions to interact with the operating system, that is, the computer system. Many automation operations rely on the functions of the library.

02

Basic operations of OS Library

1 get current working path

In Chapter 2 of the book compare Excel and easily learn Python report automation, we introduced how to install Anaconda and how to write code using Jupyter Notebook.

But do you know where the code written in the Jupiter notebook is stored on the computer? Do many readers not know? It's easy to know. Just enter the following code in Jupyter Notebook and run it.

import os
os.getcwd()

Running the above code will get the following results.

'C:\\Users\\zhangjunhong\\python library\\Python Report automation'

The above file path is the path where the Notebook code file is located. Under which file path your code is stored, you will get the corresponding results.

2 get all file names under a folder

We often import local computer files into Python for processing. We need to know the storage path and file name of the file before importing. If there are only one or two files, you can manually enter the file name and file path, but sometimes there are many files to import. Manual input efficiency will be relatively low, and code is needed to improve efficiency.

There are four Excel files in the folder shown in Figure 1.

Figure 1

We can use os.listdir(path) to get all the file names under the path. The specific implementation code is as follows.

import os
os.listdir('D:/Data-Science/share/data/test')

Running the above code will get the following results.

['3 Monthly performance-Ming Ming Zhang.xlsx', 'Li Dan's performance in March.xlsx', 'Wang Yueyue-3 Monthly performance.xlsx', 'Chen Kai's performance in March.xlsx']

3 rename the file

Renaming files is a high-frequency requirement. We can use os.rename('old_name','new_name ') to rename files. old_ Name is the old file name, new_ Name is the new file name.

Let's first create a new folder called test under the test folder_ Old file, and then use the following code to put test_ Change the old file name to test_new.

os.rename('D:/Data-Science/share/data/test/test_old.xlsx'
          ,'D:/Data-Science/share/data/test/test_new.xlsx')

After running the above code, go to the test folder to see test_ The old file no longer exists, only test_new.

4 create a folder

When we want to create a new folder under the specified path, we can choose to create a new folder manually or by using os.mkdir(path). We only need to specify the specific path.

When the following code is run, it means that a new folder named test11 is created under the D: / Data Science / share / data path. The effect is shown in Figure 2.

os.mkdir('D:/Data-Science/share/data/test11')

Figure 2

5 delete a folder

Deleting a folder corresponds to creating a folder. Of course, we can also choose to delete a folder manually, or use OS. Removediers (path) to delete, indicating the path to delete.

When the following code is run, it means that the test11 folder just created is deleted.

os.removedirs('D:/Data-Science/share/data/test11')

6 delete a file

Deleting a file is to delete a specific file, while deleting a folder is to delete the entire folder, including all files in the folder. os.remove(path) is used to delete the file, indicating the path where the file is located.

When we run the following code, it means that we will test in the test folder_ The new file was deleted.

os.remove('D:/Data-Science/share/data/test/test_new.xlsx')

03

Batch operation

1. Batch read multiple files under a folder

Sometimes a folder contains multiple similar files, such as the performance files of different people in a department. We need to read these files into Python in batch and then process them.

We learned earlier how to read a file with load_work(), or read_excel(), no matter which method is adopted, you only need to indicate the path of the file to be read.

How to batch read? First get all the file names under the folder, and then traverse and read each file.

The specific implementation code is as follows.

import pandas as pd
#Get all file names under the folder
name_list = os.listdir('D:/Data-Science/share/data/test')
#for loop traversal read
for i in name_list:
df = pd.read_excel(r'D:/Data-Science/share/data/test/' + i)
print('{}Read complete!'.format(i))

If you want to operate on the data of the read file, you only need to place the specific operation implementation code after the read code. For example, we need to delete duplicate values for each read in file. The implementation code is as follows.

import pandas as pd
#Get all file names under the folder
name_list = os.listdir('D:/Data-Science/share/data/test')
#for loop traversal read
for i in name_list:
df = pd.read_excel(r'D:/Data-Science/share/data/test/' + i)
df = df.drop_duplicates() #Delete duplicate value processing
print('{}Read complete!'.format(i))

2 create folders in batch

Sometimes we need to create specific folders according to specific topics, such as 12 folders according to months. We described how to create a single folder earlier. If you want to create multiple folders in batch, you only need to traverse and execute the statements of a single folder. The specific implementation code is as follows.

month_num = ['1 month','2 month','3 month','4 month','5 month','6 month','7 month','8 month','9 month','10 month','11
 month','12 month']
for i in month_num:
os.mkdir('D:/Data-Science/share/data/' + i)
print('{}Creation completed!'.format(i))

After running the above code, 12 folders will be created under the file path, as shown in Figure 3.

Figure 3

3 batch rename files

Sometimes we have many files with the same subject, but the file names of these files are confused. For example, the file shown in Figure 4 is the performance of each employee in March, but the naming format is different. We should unify it into the format of "name + performance in March". To achieve this effect, you can rename files through the operations you learned earlier. Previously, we only introduced the operations on a single file. How can you batch operate multiple files at the same time?

Figure 4

The specific implementation code is as follows.

import os
#Gets all file names under the specified folder
old_name = os.listdir('D:/Data-Science/share/data/test')
name = ["Ming Ming Zhang","Li Dan","
Yue
 Wang Yue
","Kai Chen"]
#Traverse each name
for n in name:
#Traverse each old file name
for o in old_name:
#Determine whether the old file name contains a specific name
#Rename if included
if n in o:
os.rename('D:/Data-Science/share/data/test/' + o, 'D:/Data-Science/
share/data/test/' + n +"3 Monthly performance.xlsx")

After running the above code, you can see that all the original file names under the folder have been renamed, as shown in Figure 5.

Figure 5

04

Other batch operations

1 batch merge multiple files

Under the folder shown in Figure 6, there are monthly sales daily reports from January to June. It is known that the structure of these daily reports is the same, with only two columns of "date" and "sales volume". Now we want to combine these daily reports of different months into one.

Figure 6

The specific implementation code of merging monthly sales daily reports into one document is as follows.

import os
import pandas as pd
#Gets all file names under the specified file
name_list = os.listdir('D:/Data-Science/share/data/sale_data')
#Create an empty DataFrame with the same structure
df_o = pd.DataFrame({'date':[],'sales volume':[]})
#Traverse and read each file
for i in name_list:
df = pd.read_excel(r'D:/Data-Science/share/data/sale_data/' + i)
#Longitudinal splicing
df_v = pd.concat([df_o,df])
#Assign the spliced result to df_o
df_o = df_v
df_o

Run the above code and you will get the merged file df_o. As shown in Figure 7.

Figure 7

2 split a document into multiple documents according to the specified column

The above describes how to merge multiple files in batch. We also have the inverse requirement of merging multiple files, that is, splitting a file into multiple files according to the specified column.

As for the above data set, let's assume that we have obtained a document from January to June. In addition to the columns of "date" and "sales volume", this document also has a column of "month". What you need to do now is to split this file into multiple files according to the "month" column, and store each month as a separate file.

The specific implementation code is as follows.

#Generate a new month column
df_o['month'] = df_o['date'].apply(lambda x:x.month)
#Traverse each month value
for m in df_o['month'].unique():
#Filter the data of specific month values
df_month = df_o[df_o['month'] == m]
#Save the filtered data
df_month.to_csv(r'D:/Data-Science/share/data/split_data/' + str (m) + 'Monthly sales daily_After splitting.csv')

By running the above code, you can see multiple split files in the target path, as shown in Figure 8.

Figure 8

*This article is excerpted from the book "compare Excel and easily learn Python report automation". For more information about report automation using python, welcome to read this book!

Posted by ten31studios on Wed, 17 Nov 2021 00:33:09 -0800