Fundamentals of Machine Learning - Basic Use of pandas

Introduction of pandas

Python Data Analysis Library or pandas is a tool based on NumPy, which is created to solve data analysis tasks. Pandas incorporates a large number of libraries and some standard data models, providing the tools needed to operate large data sets efficiently. Pandas provides a large number of functions and methods that enable us to process data quickly and conveniently.

Data structure of pandas:

Series: One-dimensional arrays, similar to one-dimensional arrays in Numpy. They are also similar to Python's basic data structure List. The difference is that elements in List can be of different data types, while Array and Series only allow the same data types to be stored, so that memory can be used more effectively and operation efficiency can be improved.

Time-Series: Series indexed by time.

DataFrame: Two-dimensional tabular data structure. Many functions are similar to data.frame in R. The DataFrame can be understood as a container for Series. Data Frame is the main content below.

Panel: A three-dimensional array can be understood as a container for the DataFrame.

This article mainly introduces DateFrame and Series, including the introduction of charging DataFrame.

The address of the data file used in this paper: Basic use of pandas. zip

This article only introduces the basic use of pandas with examples. For further study, please refer to pandas Official Documents.

DateFrame in pandas

Using pandas, we can easily perform some routine operations on the two-dimensional table structure.

1. Use pandas to read csv (or excel, etc.) files

import pandas
food_info = pandas.read_csv("food_info.csv")          # read csv file
# read Excel File usage pandas.read_excel()that will do
print(type(food_info))           # food_info is a DataFrame object
print(food_info.dtypes)          # Types of data

<class 'pandas.core.frame.DataFrame'>
NDB_No               int64
Shrt_Desc           object
Water_(g)          float64
Energ_Kcal           int64
Protein_(g)        float64
Lipid_Tot_(g)      float64
Ash_(g)            float64
Carbohydrt_(g)     float64
Fiber_TD_(g)       float64
Sugar_Tot_(g)      float64
Calcium_(mg)       float64
Iron_(mg)          float64
Magnesium_(mg)     float64
Phosphorus_(mg)    float64
Potassium_(mg)     float64
Sodium_(mg)        float64
Zinc_(mg)          float64
Copper_(mg)        float64
Manganese_(mg)     float64
Selenium_(mcg)     float64
Vit_C_(mg)         float64
Thiamin_(mg)       float64
Riboflavin_(mg)    float64
Niacin_(mg)        float64
Vit_B6_(mg)        float64
Vit_B12_(mcg)      float64
Vit_A_IU           float64
Vit_A_RAE          float64
Vit_E_(mg)         float64
Vit_D_mcg          float64
Vit_D_IU           float64
Vit_K_(mcg)        float64
FA_Sat_(g)         float64
FA_Mono_(g)        float64
FA_Poly_(g)        float64
Cholestrl_(mg)     float64
dtype: object

Output

Programmer Group

Fundamentals of Machine Learning - Basic Use of pandas

Introduction of pandas

Data structure of pandas:

DateFrame in pandas

Series in Data Frame

Hot Keywords