Pandas's way of learning

Keywords: Python

Well... It's coming again... It's not over after all.

Following the previous article, I finished writing Series and DataFrame, and then I should start some table operations.

1. Connection of tables

1.1. join operation

Parameters:

  • other: the second Dataframe, Series, or column in the Dataframe
  • on: duplicate column names in the second DataFrame with the first
  • how: ('parameter optional' left ',' right ',' outer ',' inner '), the default is' left', which means to use the index of the table on the left as the index after connection. Similarly, right, inner means to take the intersection. If there is an index with the same name in two tables, the final result is the table composed of indexes with the same name, and outer is the union set, This is the table that displays the indexes of the left table and the right table
  • lsuffix: suffix of duplicate column in the first DataFrame
  • rsuffix: suffix of duplicate column in the second DataFrame
  • Sort: sort by connection key. The default value is False

Upper Code:

import pandas as pd
 
album_dict = {'An ordinary day':['An ordinary day','Here you are. Give it to me','One meat and one vegetable'],'Xiao Wang's Diary':['Water town','Xiao Wang','Babble'],'Chick Guide':['City evening','Sea diary','Chick Guide']}
album = pd.DataFrame(album_dict)
album1 = pd.DataFrame({'An ordinary day':['get rid of-blues','midsummer','No questions'],'Xiao Wang's Diary':['Northeast ballad','two zero three','Hutong'],'Chick Guide':['Limit seventeen','Xiang Yu and Yu Ji','enter the sea']})
album.join(album1,on = None,how = 'left',lsuffix = '_left',rsuffix = '_right',sort = False)

 

Error code:

album.join(album1,on = None)

  Will cause ValueError!!! Because the column name is repeated at this time, there are two methods: either you change the column name or add the parameters lsuffix and rsuffix.

  1.2. merge operation (connection, automatic matching)

  Parameter Description:

  • Left: the left DataFrame participating in the merge
  • Right: the right DataFrame participating in the merge
  • on: the column name used for connection must exist in the left and right dataframes
  • how: the optional parameters are "inner", "outer", "left", "right", and the default is "inner"
  • left_on: the column used as the join key in the DataFrame on the left
  • right_on: the column used as the join key in the right DataFrame
  • left_index: use the line reference on the left as its connection key
  • right_index: use the line reference on the right as its connection key
  • indicator: if True, the name will be_ The category type column of merge is added to the output object with value

how='left ', the connection method is the column with duplicate column names in the first DataFrame and the second DataFrame after merging:

import pandas as pd
 
album_dict = {'An ordinary day':['An ordinary day','Here you are. Give it to me','One meat and one vegetable'],'Xiao Wang's Diary':['Water town','Xiao Wang','Babble'],'Chick Guide':['City evening','Sea diary','Chick Guide']}
album = pd.DataFrame(album_dict)
album1 = pd.DataFrame({'An ordinary day':['get rid of-blues','midsummer','No questions'],'Xiao Wang's Diary':['Northeast ballad','two zero three','Hutong'],'Chick Guide':['Limit seventeen','Xiang Yu and Yu Ji','enter the sea']})
album2 = pd.merge(album, album1, how='left',on=['An ordinary day','Xiao Wang's Diary'],indicator=True)
album2

  how='left ', the connection method is the column with duplicate column names in the second DataFrame and the first DataFrame after merging:

album2 = pd.merge(album, album1, how='right',on='An ordinary day')
album2

Use left_on and right_ Connect on:

import pandas as pd

result = pd.DataFrame({'key':[0,1,2],'key1':[1,2,3],'ab':[4,5,6]})
result1 = pd.DataFrame({'key':[3,2,1],'key1':[2,1,0],'cd':[6,5,4]})

final = pd.merge(result,result1,left_on='ab',right_on='cd')
final

  Use left_on and right_index=True to connect:

import pandas as pd

result = pd.DataFrame({'key':[0,1,2],'key1':[1,2,3],'ab':[4,5,6]})
result1 = pd.DataFrame({'key':[3,2,1],'key1':[2,1,0],'cd':[6,5,4]})

final = pd.merge(result,result1,left_on='ab',right_index=True)
final

Will become this sub son because it is used at this time   right_index=True. This means that the index of the table on the right is associated with the column of the table on the left (in the example, the column named ab). Since they do not have the same elements, they display almost an empty table. Just look at the index of the table on the right. The index is 0, 1 and 2.

  1.3 concat (splicing)

  Parameter Description:

  • object: Series,DataFrame
  • Axis: axis to be merged and connected. 0 indicates row and 1 indicates column
  • keys: after selection, multiple indexes can be created
  • join: optional parameters are 'inner' and 'outer'
final = pd.concat([result,result1])
final

  Use the keys parameter:

final = pd.concat([result,result1],keys = ['x','y'])
final

Use the axis parameter: (set to 1 to splice along the column)

final = pd.concat([result,result1], axis=1)
final

  Use the join parameter (which means taking intersection):

final = pd.concat([result,result1], join='inner')
final

  

2. Simple operation of table

  2.1. Simple addition of tables

Append column:

result['key3'] = result.apply(lambda x:x.sum(),axis =1)
result

  Append row(   ignore_index: if True, the index in the source DataFrame object will be ignored):

ser = pd.Series([9,8,7], index=['key', 'key1', 'ab'])
df = result.append(ser, ignore_index=True)
df

 

2.2. Table row and column summation

Column summation:

result.loc['key2'] = result.apply(lambda x:x.sum(),axis =0)
result

Line summation:

result['key3'] = result.apply(lambda x:x.sum(),axis =1)
result

Posted by reinaldo on Thu, 30 Sep 2021 18:25:42 -0700