Python playing with Excel: automatic function filling and data sorting

Keywords: Programming Excel Python Lambda JSON

In the work, we often use excel to deal with data and display, but for some work we can use the help of programs to achieve efficient problem-solving effect.

For example, the json returned from the interface is parsed and saved in Excel, the data in excel is processed according to certain rules and then written into a new EXCEL file, and the content from multiple files is summarized into an excel, etc.

Usually in the work, I need to write the complicated data into excel. Unless the built-in formula of Excel can be processed, I will think of using python for processing at the first time.

When processing sea (large) data, it may not be so convenient to use Excel, especially when there are many columns and one screen display is not complete. If we can master Python to operate excel skillfully, we can greatly improve our work efficiency.

Easy to use the automatic filling function of explosion

Next, we imitate Excel and use Pandas to realize the function auto fill function and calculate columns.

Read the document and calculate the total price

import pandas as pd
#Read Excel file
books = pd.read_excel('book1.xlsx',index_col='ID')
NAME1 Total price of unit price and quantity
1 Book1 10.5 10 NaN
2 Book2 11.0 10 NaN
3 Book3 11.5 10 NaN
4 Book4 12.0 10 NaN

In Excel, we can write functions and fill them automatically, which is also convenient.

However, in Python, only one line of code is needed. Although it seems that it is not as fast as Excel and easy to operate, when the data volume is large and the calculation is complicated, the advantages of pandas data operation are highlighted.

1. books['Total price']=books['Unit Price']*books['number']

You can also use the following statement. Obviously, the above method is relatively simple, but the following method is very suitable for calculating from a certain segment.

2. for i in books.index: books['Total price'].at[i]=books['Unit Price'].at[i]*books['number'].at[i]

Operation result:

NAME1 Total price of unit price and quantity
1 Book1 10.5 10 105.0
2 Book2 11.0 10 110.0
3 Book3 11.5 10 115.0
4 Book4 12.0 10 120.0

Now, let's give each book a 2 yuan increase

The first method:

books['Unit Price']= books['Unit Price']+2

The second method: using lambda expressions

books['Unit Price']=books['Unit Price'].apply(lambda x:x+2)
     NAME1    Unit price quantity     Total price
1    Book1  12.5  10  105.0
2    Book2  13.0  10  110.0
3    Book3  13.5  10  115.0
4    Book4  14.0  10  120.0

Whatever you like

Data sorting, function sort_values usage:

DataFrame.sort_values(by='##',axis=0,ascending=True, inplace=False, na_position='last')

Parameter Description:

Parameter description
 by specifies the column name (axis=0 or 'index') or index value (axis=1 or 'columns')
Axis if axis=0 or 'index', it will be sorted according to the data size in the specified column; if axis=1 or 'columns', it will be sorted according to the data size in the specified index. The default axis=0
 Whether ascending is arranged in ascending order according to the array of the specified column. The default value is True, that is, ascending
 Does inplace replace the original data with the sorted data set? The default value is False, that is, do not replace
 na_position {'first','last'}, set the display position of the missing value

Let's rank the prices in descending order

import  pandas as pdproducts = pd.read_excel('book2.xlsx',index_col='ID')

Highlight: if you need to sort the two conditions, by=[list] list is OK, isn't it very simple

products.sort_values(by=['Is it worth buying','Price'],inplace=True,ascending= [True,False])

Operation result:

ID  Is the price of goods worth buying
10 product10 201 no
5 product5 65 no
7 product7 45 no
9 product9 199 yes
8 product8 156 yes
3 product3 123 yes
4 product4 111 yes
1 product1 100 yes
2 product2 88 yes
6 product6 76 yes

Preparation before drawing, data filtering and filtering (loc function)

The apply function is the function with the highest degree of freedom among all functions in pandas.

The function is as follows:

DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds) the most useful parameter of this function is the first parameter, which is the function, equivalent to the function pointer of C/C + +.

This function needs to be implemented by itself. The input parameters of the function are determined according to axis. For example, if axis = 1, a row of data will be used as Series data The structure is passed to the function we implement. In the function, we implement the calculation between different attributes of Series, and return a result. Then the apply function will automatically traverse the data of each row of DataFrame, and finally combine all the results into a Series data structure and return.

Objective: to read the student information table and screen out the students who meet the age range (18-20 years old) and the grade range (85-100 points).

import  pandas as pd
def age18_20(age):
   #18 < = age < = 20 Python specific writing
    return   18<=age<=20
def  level_s(score):
    return 85<=score<=100
stu = pd.read_excel('students.xlsx',index_col='ID')

Operation result:

    Name age score
 5 Li 18 88
 8 Wang 3 19 86

You can optimize the above code, use lambda expressions, try not to use functions, which makes the code concise, efficient and tall.

Data visualization on high

import  pandas as pd
import matplotlib.pyplot as plt
#Read data
stu = pd.read_excel('students.xlsx',index_col='ID')
#Generate histogram'full name',y='achievement',color="orange",title='Student transcript')
#Or ['name'], Stu ['score'], color="orange")
#Tight display
#display picture

Don't panic. I have a set of learning materials, including 40 + E-books, 800 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn! Python learning materials

Pay attention to the official account [Python circle].


Posted by 3r0ss on Sun, 24 May 2020 23:33:32 -0700