In the work, we often use excel to deal with data and display, but for some work we can use the help of programs to achieve efficient problem-solving effect.
For example, the json returned from the interface is parsed and saved in Excel, the data in excel is processed according to certain rules and then written into a new EXCEL file, and the content from multiple files is summarized into an excel, etc.
Usually in the work, I need to write the complicated data into excel. Unless the built-in formula of Excel can be processed, I will think of using python for processing at the first time.
When processing sea (large) data, it may not be so convenient to use Excel, especially when there are many columns and one screen display is not complete. If we can master Python to operate excel skillfully, we can greatly improve our work efficiency.
Easy to use the automatic filling function of explosion
Next, we imitate Excel and use Pandas to realize the function auto fill function and calculate columns.
Read the document and calculate the total price
import pandas as pd #Read Excel file books = pd.read_excel('book1.xlsx',index_col='ID') print(books) ************************ NAME1 Total price of unit price and quantity ID 1 Book1 10.5 10 NaN 2 Book2 11.0 10 NaN 3 Book3 11.5 10 NaN 4 Book4 12.0 10 NaN
In Excel, we can write functions and fill them automatically, which is also convenient.
However, in Python, only one line of code is needed. Although it seems that it is not as fast as Excel and easy to operate, when the data volume is large and the calculation is complicated, the advantages of pandas data operation are highlighted.
1. books['Total price']=books['Unit Price']*books['number']
You can also use the following statement. Obviously, the above method is relatively simple, but the following method is very suitable for calculating from a certain segment.
2. for i in books.index: books['Total price'].at[i]=books['Unit Price'].at[i]*books['number'].at[i]
Operation result:
NAME1 Total price of unit price and quantity ID 1 Book1 10.5 10 105.0 2 Book2 11.0 10 110.0 3 Book3 11.5 10 115.0 4 Book4 12.0 10 120.0
Now, let's give each book a 2 yuan increase
The first method:
books['Unit Price']= books['Unit Price']+2
The second method: using lambda expressions
books['Unit Price']=books['Unit Price'].apply(lambda x:x+2)
NAME1 Unit price quantity Total price ID 1 Book1 12.5 10 105.0 2 Book2 13.0 10 110.0 3 Book3 13.5 10 115.0 4 Book4 14.0 10 120.0
Whatever you like
Data sorting, function sort_values usage:
DataFrame.sort_values(by='##',axis=0,ascending=True, inplace=False, na_position='last')
Parameter Description:
Parameter description by specifies the column name (axis=0 or 'index') or index value (axis=1 or 'columns') Axis if axis=0 or 'index', it will be sorted according to the data size in the specified column; if axis=1 or 'columns', it will be sorted according to the data size in the specified index. The default axis=0 Whether ascending is arranged in ascending order according to the array of the specified column. The default value is True, that is, ascending Does inplace replace the original data with the sorted data set? The default value is False, that is, do not replace na_position {'first','last'}, set the display position of the missing value
Let's rank the prices in descending order
import pandas as pdproducts = pd.read_excel('book2.xlsx',index_col='ID') products.sort_values(by='Price',inplace=True,ascending=False) print(products)
Highlight: if you need to sort the two conditions, by=[list] list is OK, isn't it very simple
products.sort_values(by=['Is it worth buying','Price'],inplace=True,ascending= [True,False])
Operation result:
ID Is the price of goods worth buying 10 product10 201 no 5 product5 65 no 7 product7 45 no 9 product9 199 yes 8 product8 156 yes 3 product3 123 yes 4 product4 111 yes 1 product1 100 yes 2 product2 88 yes 6 product6 76 yes
Preparation before drawing, data filtering and filtering (loc function)
The apply function is the function with the highest degree of freedom among all functions in pandas.
The function is as follows:
DataFrame.apply(func, axis=0, broadcast=False, raw=False, reduce=None, args=(), **kwds) the most useful parameter of this function is the first parameter, which is the function, equivalent to the function pointer of C/C + +.
This function needs to be implemented by itself. The input parameters of the function are determined according to axis. For example, if axis = 1, a row of data will be used as Series data The structure is passed to the function we implement. In the function, we implement the calculation between different attributes of Series, and return a result. Then the apply function will automatically traverse the data of each row of DataFrame, and finally combine all the results into a Series data structure and return.
Objective: to read the student information table and screen out the students who meet the age range (18-20 years old) and the grade range (85-100 points).
import pandas as pd def age18_20(age): #18 < = age < = 20 Python specific writing return 18<=age<=20 def level_s(score): return 85<=score<=100 stu = pd.read_excel('students.xlsx',index_col='ID') stu=stu.loc[stu['Age'].apply(age18_20)].loc[stu['achievement'].apply(level_s)] print(stu)
Operation result:
Name age score ID 5 Li 18 88 8 Wang 3 19 86
You can optimize the above code, use lambda expressions, try not to use functions, which makes the code concise, efficient and tall.
Data visualization on high
import pandas as pd import matplotlib.pyplot as plt #Read data stu = pd.read_excel('students.xlsx',index_col='ID') #sort stu.sort_values(by='achievement',inplace=True) #Generate histogram stu.plot.bar(x='full name',y='achievement',color="orange",title='Student transcript') #Or plt.bar(stu ['name'], Stu ['score'], color="orange") #Tight display plt.tight_layout() #display picture plt.show()
Don't panic. I have a set of learning materials, including 40 + E-books, 800 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn! https://shimo.im/docs/JWCghr8prjCVCxxK/ Python learning materials
Pay attention to the official account [Python circle].