On-line mysql database contains a table with daily statistical results, more than 10 million items per day, which is unexpected to us. How many statistical results are there? Operations and maintenance found that the disk occupied 200 G, and finally asked the operation, you can only retain the last three days, the previous data can only be deleted. Delete, how to delete?
Because this is an online database, there are many other tables in it. If you delete the data of this table directly, it will certainly not work, and it may have an impact on other tables. Trying to delete data for only one day at a time is still a big problem for Carton. There's no way to do that. Write a Python script and delete it in batches.
The specific ideas are as follows:
- Delete data for only one day at a time.
- Delete the data for one day and delete 50,000 items at a time.
- After one day's data deletion, begin to delete the next day's data;
# -*-coding:utf-8 -*- import sys # This is our Python Module encapsulated internally. sys.path.append('/var/lib/hadoop-hdfs/scripts/python_module2') import keguang.commons as commons import keguang.timedef as timedef import keguang.sql.mysqlclient as mysql def run(starttime, endtime, regx): tb_name = 'statistic_ad_image_final_count' days = timedef.getDays(starttime,endtime,regx) # Traverse to delete data for all days for day in days: print '%s Data deletion begins'%(day) mclient = getConn() sql = ''' select 1 from %s where date = '%s' limit 1 '''%(tb_name, day) print sql result = mclient.query(sql) # If you query the data for this day, continue deleting while result is not (): sql = 'delete from %s where date = "%s" limit 50000'%(tb_name, day) print sql mclient.execute(sql) sql = ''' select 1 from %s where date = '%s' limit 1 '''%(tb_name, day) print sql result = mclient.query(sql) print '%s Data deletion completed'%(day) mclient.close() # Return mysql connection def getConn(): return mysql.MysqlClient(host = '0.0.0.0', user = 'test', passwd = 'test', db= 'statistic') if __name__ == '__main__': regx = '%Y-%m-%d' yesday = timedef.getYes(regx, -1) starttime = '2019-08-17' endtime ='2019-08-30' run(starttime, endtime, regx)
Cyclic judgment data, if any, continue to delete 50,000 data on that day; otherwise, start deleting data on the next day. It took half an hour to delete it.
Welcome to my public address.