Comparison of several batch insertion efficiencies of [MyBatis]
There are three main methods for batch data processing:
- Execute a single insert statement repeatedly
- foreach Splicing sql
- Batch processing
1, Preliminary preparation
Based on Spring Boot + Mysql, and lombok is used to omit get/set. See pom.xml for details.
1.1 table structure
id Use database self augmentation.
DROP TABLE IF EXISTS `user_info_batch`; CREATE TABLE `user_info_batch` ( `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key id', `user_name` varchar(100) NOT NULL COMMENT 'title of account', `pass_word` varchar(100) NOT NULL COMMENT 'Login password', `nick_name` varchar(30) NOT NULL COMMENT 'nickname', `mobile` varchar(30) NOT NULL COMMENT 'cell-phone number', `email` varchar(100) DEFAULT NULL COMMENT 'e-mail address', `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time', `gmt_update` timestamp NULL DEFAULT NULL COMMENT 'Update time', PRIMARY KEY (`id`) USING BTREE ) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';
1.2 project configuration file
Careful, you may have found that the database url Followed by a paragraph Rewritebackedstatements = true, what's the use? Don't worry, I'll introduce it later.
# Database configuration spring: datasource: url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true username: mybatis password: password driver-class-name: com.mysql.cj.jdbc.Driver # mybatis mybatis: mapper-locations: classpath:mapper/*.xml type-aliases-package: cn.van.mybatis.batch.entity
1.3 entity class
@Data @Accessors(chain = true) public class UserInfoBatchDO implements Serializable { private Long id; private String userName; private String passWord; private String nickName; private String mobile; private String email; private LocalDateTime gmtCreate; private LocalDateTime gmtUpdate; }
1.4Â UserInfoBatchMapper
public interface UserInfoBatchMapper { /** Single insert * @param info * @return */ int insert(UserInfoBatchDO info); /** * foreach insert * @param list * @return */ int batchInsert(List list); }
1.5Â UserInfoBatchMapper.xml
<?xml version="1.0" encoding="UTF-8"?> insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update) values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP}) insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update) values (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})
1.6 preliminary data
In order to facilitate the test, several variables are extracted and loaded in advance.
private List list = new ArrayList<>(); private List lessList = new ArrayList<>(); private List lageList = new ArrayList<>(); private List warmList = new ArrayList<>(); // Counting tool private StopWatch sw = new StopWatch();
- In order to facilitate the assembly of data, a common method is extracted.
private List assemblyData(int count){ List list = new ArrayList<>(); UserInfoBatchDO userInfoDO; for (int i = 0;i < count;i++){ userInfoDO = new UserInfoBatchDO() .setUserName("Van") .setNickName("Dust blog") .setMobile("17098705205") .setPassWord("password") .setGmtUpdate(LocalDateTime.now()); list.add(userInfoDO); } return list; }
- Preheating data
@Before public void assemblyData() { list = assemblyData(200000); lessList = assemblyData(2000); lageList = assemblyData(1000000); warmList = assemblyData(5); }
2, Execute a single insert statement repeatedly
Maybe lazy programmers will do this. It is very simple to directly nest a for loop on the original single insert statement.
2.1 corresponding mapper interface
int insert(UserInfoBatchDO info);
2.2 test method
Because this method is too slow, the data is reduced to two thousand strip
@Test public void insert() { log.info("[[program warm-up]"); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info("[[end of warm-up]"); sw.start("Execute a single insert statement repeatedly"); // It's too slow to insert 20w here, so I only inserted 2000 for (UserInfoBatchDO userInfoBatchDO : lessList) { userInfoBatchMapper.insert(userInfoBatchDO); } sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }
2.3 execution time
- for the first time
----------------------------------------- ms % Task name ----------------------------------------- 59887 100% Execute a single insert statement repeatedly
- The second time
----------------------------------------- ms % Task name ----------------------------------------- 64853 100% Execute a single insert statement repeatedly
- third time
----------------------------------------- ms % Task name ----------------------------------------- 58235 100% Execute a single insert statement repeatedly
Insert 2000 in this way Data, average time of execution three times: 60991 ms.
3, foreach Splicing SQL
3.1 corresponding mapper interface
int batchInsert(List list);
3.2 test method
Both this method and the next method are tested with 20w data.
@Test public void batchInsert() { log.info("[[program warm-up]"); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info("[[end of warm-up]"); sw.start("foreach Splicing sql"); userInfoBatchMapper.batchInsert(list); sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }
3.3 execution time
- for the first time
----------------------------------------- ms % Task name ----------------------------------------- 18835 100% foreach Splicing sql
- The second time
----------------------------------------- ms % Task name ----------------------------------------- 17895 100% foreach Splicing sql
- third time
----------------------------------------- ms % Task name ----------------------------------------- 19827 100% foreach Splicing sql
Insert 20w in this way Data, average time of execution three times: 18852 ms.
4, Batch processing
This way mapper And xml Reused 2.1.
four point one rewriteBatchedStatements parameter
At the beginning of the test, I found that it was changed to The methods submitted by Mybatis Batch do not work. In fact, when inserting, it is still inserting records one by one, and the speed is far lower than the original foreach The method of splicing SQL is very unscientific.
Later, it was found that for batch execution, a new parameter needs to be added to the connection URL string: rewritebackedstatements = true
- Introduction to rewritebackedstatements parameter
The rewritebackedstatements parameter should be added to the url of the JDBC connection of MySql, and the driver of version 5.1.13 or above should be guaranteed to realize high-performance batch insertion. By default, the MySql JDBC driver ignores the executeBatch() statement and breaks up a group of SQL statements that we expect to execute in batches and sends them to the MySql database one by one. In fact, batch insertion is a single insertion, which directly leads to low performance. Only when the rewritebackedstatements parameter is set to true, the driver will help you execute SQL in batches. This option is valid for both INSERT/UPDATE/DELETE.
4.2 batch preparation
- Manual injection SqlSessionFactory
@Resource private SqlSessionFactory sqlSessionFactory;
- Test code
@Test public void processInsert() { log.info("[[program warm-up]"); for (UserInfoBatchDO userInfoBatchDO : warmList) { userInfoBatchMapper.insert(userInfoBatchDO); } log.info("[[end of warm-up]"); sw.start("Batch execution insert"); // Open batch SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH); UserInfoBatchMapper mapper = session.getMapper(UserInfoBatchMapper.class); for (int i = 0,length = list.size(); i < length; i++) { mapper.insert(list.get(i)); //Commit every 20000 to prevent memory overflow if(i%20000==19999){ session.commit(); session.clearCache(); } } session.commit(); session.clearCache(); sw.stop(); log.info("all cost info:{}",sw.prettyPrint()); }
4.3 execution time
- for the first time
----------------------------------------- ms % Task name ----------------------------------------- 09346 100% Batch execution insert
- The second time
----------------------------------------- ms % Task name ----------------------------------------- 08890 100% Batch execution insert
- third time
----------------------------------------- ms % Task name ----------------------------------------- 09042 100% Batch execution insert
Insert 20w in this way Data, average time of execution for three times: 9092 ms.
4.4 if the data is larger
When I expand the data to 100w foreach Splicing sql I can't complete the insertion in this way, so I can only test the insertion time of the batch.
During the test, only the list Cut lageList Just test.
- for the first time
----------------------------------------- ms % Task name ----------------------------------------- 32419 100% Batch execution insert
- The second time
----------------------------------------- ms % Task name ----------------------------------------- 31935 100% Batch execution insert
- third time
----------------------------------------- ms % Task name ----------------------------------------- 33048 100% Batch execution insert
Insert 100w in this way Data, average time of execution three times: 32467 ms.
5, Summary
Batch insert method | Data volume | Average time to execute three times |
---|---|---|
Circular insertion of single piece of data | 2000 | 60991 ms |
foreach Splicing sql | 20w | 18852 ms |
Batch processing | 20w | 9092 ms |
Batch processing | 100w | 32467 ms |
- Although the efficiency of circular inserting a single piece of data is very low, the amount of code is very small, and it can be used when the amount of data is small, but it is forbidden to use because of the large amount of data, which is too inefficient;
- foreach The method of splicing sql has a large section of xml and sql statements to write, which is easy to make mistakes. Although the efficiency is fair, it still can not be used when dealing with a large amount of data, so it is not recommended;
- Batch execution is recommended when there is a large amount of data insertion, and it is also convenient to use.
www.somanba.cn www.sobd.cc www.awaedu.com