Pandas's way of learning

Keywords: Python

Well... It's coming again... It's not over after all.

Following the previous article, I finished writing Series and DataFrame, and then I should start some table operations.

1. Connection of tables

1.1. join operation

Parameters:

other: the second Dataframe, Series, or column in the Dataframe
on: duplicate column names in the second DataFrame with the first
how: ('parameter optional' left ',' right ',' outer ',' inner '), the default is' left', which means to use the index of the table on the left as the index after connection. Similarly, right, inner means to take the intersection. If there is an index with the same name in two tables, the final result is the table composed of indexes with the same name, and outer is the union set, This is the table that displays the indexes of the left table and the right table
lsuffix: suffix of duplicate column in the first DataFrame
rsuffix: suffix of duplicate column in the second DataFrame
Sort: sort by connection key. The default value is False

Upper Code:

import pandas as pd
 
album_dict = {'An ordinary day':['An ordinary day','Here you are. Give it to me','One meat and one vegetable'],'Xiao Wang's Diary':['Water town','Xiao Wang','Babble'],'Chick Guide':['City evening','Sea diary','Chick Guide']}
album = pd.DataFrame(album_dict)
album1 = pd.DataFrame({'An ordinary day':['get rid of-blues','midsummer','No questions'],'Xiao Wang's Diary':['Northeast ballad','two zero three','Hutong'],'Chick Guide':['Limit seventeen','Xiang Yu and Yu Ji','enter the sea']})
album.join(album1,on = None,how = 'left',lsuffix = '_left',rsuffix = '_right',sort = False)

Error code:

album.join(album1,on = None)

Will cause ValueError!!! Because the column name is repeated at this time, there are two methods: either you change the column name or add the parameters lsuffix and rsuffix.

1.2. merge operation (connection, automatic matching)

Parameter Description:

Left: the left DataFrame participating in the merge
Right: the right DataFrame participating in the merge
on: the column name used for connection must exist in the left and right dataframes
how: the optional parameters are "inner", "outer", "left", "right", and the default is "inner"
left_on: the column used as the join key in the DataFrame on the left
right_on: the column used as the join key in the right DataFrame
left_index: use the line reference on the left as its connection key
right_index: use the line reference on the right as its connection key
indicator: if True, the name will be_ The category type column of merge is added to the output object with value

how='left ', the connection method is the column with duplicate column names in the first DataFrame and the second DataFrame after merging:

import pandas as pd
 
album_dict = {'An ordinary day':['An ordinary day','Here you are. Give it to me','One meat and one vegetable'],'Xiao Wang's Diary':['Water town','Xiao Wang','Babble'],'Chick Guide':['City evening','Sea diary','Chick Guide']}
album = pd.DataFrame(album_dict)
album1 = pd.DataFrame({'An ordinary day':['get rid of-blues','midsummer','No questions'],'Xiao Wang's Diary':['Northeast ballad','two zero three','Hutong'],'Chick Guide':['Limit seventeen','Xiang Yu and Yu Ji','enter the sea']})
album2 = pd.merge(album, album1, how='left',on=['An ordinary day','Xiao Wang's Diary'],indicator=True)
album2

how='left ', the connection method is the column with duplicate column names in the second DataFrame and the first DataFrame after merging:

album2 = pd.merge(album, album1, how='right',on='An ordinary day')
album2

Use left_on and right_ Connect on:

import pandas as pd

result = pd.DataFrame({'key':[0,1,2],'key1':[1,2,3],'ab':[4,5,6]})
result1 = pd.DataFrame({'key':[3,2,1],'key1':[2,1,0],'cd':[6,5,4]})

final = pd.merge(result,result1,left_on='ab',right_on='cd')
final

Use left_on and right_index=True to connect:

import pandas as pd

result = pd.DataFrame({'key':[0,1,2],'key1':[1,2,3],'ab':[4,5,6]})
result1 = pd.DataFrame({'key':[3,2,1],'key1':[2,1,0],'cd':[6,5,4]})

final = pd.merge(result,result1,left_on='ab',right_index=True)
final

Will become this sub son because it is used at this time right_index=True. This means that the index of the table on the right is associated with the column of the table on the left (in the example, the column named ab). Since they do not have the same elements, they display almost an empty table. Just look at the index of the table on the right. The index is 0, 1 and 2.

1.3 concat (splicing)

Parameter Description:

object: Series，DataFrame
Axis: axis to be merged and connected. 0 indicates row and 1 indicates column
keys: after selection, multiple indexes can be created
join: optional parameters are 'inner' and 'outer'

final = pd.concat([result,result1])
final

Use the keys parameter:

final = pd.concat([result,result1],keys = ['x','y'])
final

Use the axis parameter: (set to 1 to splice along the column)

final = pd.concat([result,result1], axis=1)
final

Use the join parameter (which means taking intersection):

final = pd.concat([result,result1], join='inner')
final

2. Simple operation of table

2.1. Simple addition of tables

Append column:

result['key3'] = result.apply(lambda x:x.sum(),axis =1)
result

Append row( ignore_index: if True, the index in the source DataFrame object will be ignored):

ser = pd.Series([9,8,7], index=['key', 'key1', 'ab'])
df = result.append(ser, ignore_index=True)
df

2.2. Table row and column summation

Column summation:

result.loc['key2'] = result.apply(lambda x:x.sum(),axis =0)
result

Line summation:

result['key3'] = result.apply(lambda x:x.sum(),axis =1)
result

Posted by reinaldo on Thu, 30 Sep 2021 18:25:42 -0700

Programmer Group