Comparison of several batch insertion efficiencies of [MyBatis]

Keywords: Database MySQL

Comparison of several batch insertion efficiencies of [MyBatis]

There are three main methods for batch data processing:

  1. Execute a single insert statement repeatedly
  2. foreach   Splicing   sql
  3. Batch processing

1, Preliminary preparation

Based on Spring Boot  +  Mysql, and lombok is used to omit get/set. See pom.xml for details.

1.1 table structure

id   Use database self augmentation.

DROP TABLE IF EXISTS `user_info_batch`;
CREATE TABLE `user_info_batch` (
                           `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key id',
                           `user_name` varchar(100) NOT NULL COMMENT 'title of account',
                           `pass_word` varchar(100) NOT NULL COMMENT 'Login password',
                           `nick_name` varchar(30) NOT NULL COMMENT 'nickname',
                           `mobile` varchar(30) NOT NULL COMMENT 'cell-phone number',
                           `email` varchar(100) DEFAULT NULL COMMENT 'e-mail address',
                           `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
                           `gmt_update` timestamp NULL DEFAULT NULL COMMENT 'Update time',
                           PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';

1.2 project configuration file

Careful, you may have found that the database url   Followed by a paragraph   Rewritebackedstatements = true, what's the use? Don't worry, I'll introduce it later.

# Database configuration
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity

1.3 entity class

@Data
@Accessors(chain = true)
public class UserInfoBatchDO implements Serializable {
    private Long id;

    private String userName;

    private String passWord;

    private String nickName;

    private String mobile;

    private String email;

    private LocalDateTime gmtCreate;

    private LocalDateTime gmtUpdate;
}

1.4 UserInfoBatchMapper

public interface UserInfoBatchMapper {

    /** Single insert
     * @param info
     * @return
     */
    int insert(UserInfoBatchDO info);

    /**
     * foreach insert
     * @param list
     * @return
     */
    int batchInsert(List list);
}

1.5 UserInfoBatchMapper.xml

<?xml version="1.0" encoding="UTF-8"?>



  
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP})
  

  
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values
    
      (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})
    
  

1.6 preliminary data

In order to facilitate the test, several variables are extracted and loaded in advance.

    private List list = new ArrayList<>();
    private List lessList = new ArrayList<>();
    private List lageList = new ArrayList<>();
    private List warmList = new ArrayList<>();
    // Counting tool
    private StopWatch sw = new StopWatch();
  • In order to facilitate the assembly of data, a common method is extracted.
    private List assemblyData(int count){
        List list = new ArrayList<>();
        UserInfoBatchDO userInfoDO;
        for (int i = 0;i < count;i++){
            userInfoDO = new UserInfoBatchDO()
                    .setUserName("Van")
                    .setNickName("Dust blog")
                    .setMobile("17098705205")
                    .setPassWord("password")
                    .setGmtUpdate(LocalDateTime.now());
            list.add(userInfoDO);
        }
        return list;
    }
  • Preheating data
    @Before
    public void assemblyData() {
        list = assemblyData(200000);
        lessList = assemblyData(2000);
        lageList = assemblyData(1000000);
        warmList = assemblyData(5);
    }

2, Execute a single insert statement repeatedly

Maybe lazy programmers will do this. It is very simple to directly nest a for loop on the original single insert statement.

2.1 corresponding mapper interface

int insert(UserInfoBatchDO info);

2.2 test method

Because this method is too slow, the data is reduced to   two thousand   strip

@Test
public void insert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Execute a single insert statement repeatedly");
    // It's too slow to insert 20w here, so I only inserted 2000
    for (UserInfoBatchDO userInfoBatchDO : lessList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

2.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
59887  100%  Execute a single insert statement repeatedly
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
64853  100%  Execute a single insert statement repeatedly
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
58235  100%  Execute a single insert statement repeatedly

Insert 2000 in this way   Data, average time of execution three times: 60991 ms.

3, foreach   Splicing SQL

3.1 corresponding mapper interface

int batchInsert(List list);

3.2 test method

Both this method and the next method are tested with 20w data.

@Test
public void batchInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("foreach Splicing sql");
    userInfoBatchMapper.batchInsert(list);
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

3.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
18835  100%  foreach Splicing sql
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
17895  100%  foreach Splicing sql
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
19827  100%  foreach Splicing sql

Insert 20w in this way   Data, average time of execution three times: 18852 ms.

4, Batch processing

This way   mapper   And xml   Reused   2.1.

four point one   rewriteBatchedStatements   parameter

At the beginning of the test, I found that it was changed to   The methods submitted by Mybatis Batch do not work. In fact, when inserting, it is still inserting records one by one, and the speed is far lower than the original   foreach   The method of splicing SQL is very unscientific.

Later, it was found that for batch execution, a new parameter needs to be added to the connection URL string: rewritebackedstatements = true

  • Introduction to rewritebackedstatements parameter

The rewritebackedstatements parameter should be added to the url of the JDBC connection of MySql, and the driver of version 5.1.13 or above should be guaranteed to realize high-performance batch insertion. By default, the MySql JDBC driver ignores the executeBatch() statement and breaks up a group of SQL statements that we expect to execute in batches and sends them to the MySql database one by one. In fact, batch insertion is a single insertion, which directly leads to low performance. Only when the rewritebackedstatements parameter is set to true, the driver will help you execute SQL in batches. This option is valid for both INSERT/UPDATE/DELETE.

4.2 batch preparation

  • Manual injection   SqlSessionFactory
    @Resource
    private SqlSessionFactory sqlSessionFactory;
  • Test code
@Test
public void processInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Batch execution insert");
    // Open batch
    SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
    UserInfoBatchMapper mapper = session.getMapper(UserInfoBatchMapper.class);
    for (int i = 0,length = list.size(); i < length; i++) {
        mapper.insert(list.get(i));
        //Commit every 20000 to prevent memory overflow
        if(i%20000==19999){
            session.commit();
            session.clearCache();
        }
    }
    session.commit();
    session.clearCache();
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

4.3 execution time

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
09346  100%  Batch execution insert
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
08890  100%  Batch execution insert
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
09042  100%  Batch execution insert

Insert 20w in this way   Data, average time of execution for three times: 9092 ms.

4.4 if the data is larger

When I expand the data to   100w   foreach   Splicing   sql   I can't complete the insertion in this way, so I can only test the insertion time of the batch.

During the test, only the   list   Cut   lageList   Just test.

  • for the first time
-----------------------------------------
ms     %     Task name
-----------------------------------------
32419  100%  Batch execution insert
  • The second time
-----------------------------------------
ms     %     Task name
-----------------------------------------
31935  100%  Batch execution insert
  • third time
-----------------------------------------
ms     %     Task name
-----------------------------------------
33048  100%  Batch execution insert

Insert 100w in this way   Data, average time of execution three times: 32467 ms.

5, Summary

Batch insert methodData volumeAverage time to execute three times
Circular insertion of single piece of data200060991 ms
foreach   Splicing sql20w18852 ms
Batch processing20w9092 ms
Batch processing100w32467 ms
  1. Although the efficiency of circular inserting a single piece of data is very low, the amount of code is very small, and it can be used when the amount of data is small, but it is forbidden to use because of the large amount of data, which is too inefficient;
  2. foreach   The method of splicing sql has a large section of xml and sql statements to write, which is easy to make mistakes. Although the efficiency is fair, it still can not be used when dealing with a large amount of data, so it is not recommended;
  3. Batch execution is recommended when there is a large amount of data insertion, and it is also convenient to use.
    www.somanba.cn
    www.sobd.cc
    www.awaedu.com

Posted by kaspari22 on Sat, 18 Sep 2021 21:56:29 -0700