Python MySQL extracts the first N data of each group in the table

Keywords: MySQL SQL Python

In the process of large-scale data processing, sometimes it is necessary to take out a part of data by groups as a reference for the whole data.
For example, there is a table [1], which can be divided into 200 groups by column [1]. Now we want to take 10 records from each group of these 200 groups as a representative.
We can operate MySQL through the following Python statements to achieve this function:

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
for i in range(0,200): #Note that i is indexed from 0 at limit i,j
    sql = "INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;"%i
    cursor.execute(sql)
    conn.commit()
conn.close()

Of course, because this is an insert operation, we can also use excuteany to implement:

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
cursor.executemany("INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit %s,1) p where t.column1=p.column1 limit 10;", [(i,) for i in range(0,200)] ) #Note that "," in (i,) must be reserved, otherwise an error will be reported
conn.commit()
conn.close()

Alternatively, you can use MySQL stored procedures.

import mysql.connector
conn = mysql.connector.connect(host = '127.0.0.1', port = 3306, user = 'root', password = 'xxx', db = 'xxx' )
cursor = conn.cursor()
sql1="drop procedure if exists test;"
sql2="create procedure test() \
    begin\
    declare i bigint default 0; \
    DECLARE var BIGINT DEFAULT 0; \
    while i < 200 do \
    INSERT INTO Sample_table_1 SELECT t.column_1, t.column_2 FROM table_1 t, (SELECT * FROM table_1 group by column1 limit i,1) p where t.column1=p.column1 limit 10;\
    set i = i + 1;\
    end while;\
    end;"
sql3="call test;"
cursor.execute(sql1)
cursor.execute(sql2)
cursor.execute(sql3)
conn.commit()
conn.close()

In terms of execution efficiency, excuteany is slightly higher than excute, and the efficiency of stored procedures is the lowest.

Posted by techiefreak05 on Thu, 05 Dec 2019 03:10:24 -0800