Comparison of several batch insertion efficiencies of [MyBatis]

There are three main methods for batch data processing:

Execute a single insert statement repeatedly
foreach Splicing sql
Batch processing

1, Preliminary preparation

Based on Spring Boot + Mysql, and lombok is used to omit get/set. See pom.xml for details.

1.1 table structure

id Use database self augmentation.

DROP TABLE IF EXISTS `user_info_batch`;
CREATE TABLE `user_info_batch` (
                           `id` bigint(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'Primary key id',
                           `user_name` varchar(100) NOT NULL COMMENT 'title of account',
                           `pass_word` varchar(100) NOT NULL COMMENT 'Login password',
                           `nick_name` varchar(30) NOT NULL COMMENT 'nickname',
                           `mobile` varchar(30) NOT NULL COMMENT 'cell-phone number',
                           `email` varchar(100) DEFAULT NULL COMMENT 'e-mail address',
                           `gmt_create` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Creation time',
                           `gmt_update` timestamp NULL DEFAULT NULL COMMENT 'Update time',
                           PRIMARY KEY (`id`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC COMMENT 'Mybatis Batch';

1.2 project configuration file

Careful, you may have found that the database url Followed by a paragraph Rewritebackedstatements = true, what's the use? Don't worry, I'll introduce it later.

# Database configuration
spring:
  datasource:
    url: jdbc:mysql://47.111.118.152:3306/mybatis?rewriteBatchedStatements=true
    username: mybatis
    password: password
    driver-class-name: com.mysql.cj.jdbc.Driver
# mybatis
mybatis:
  mapper-locations: classpath:mapper/*.xml
  type-aliases-package: cn.van.mybatis.batch.entity

1.3 entity class

@Data
@Accessors(chain = true)
public class UserInfoBatchDO implements Serializable {
    private Long id;

    private String userName;

    private String passWord;

    private String nickName;

    private String mobile;

    private String email;

    private LocalDateTime gmtCreate;

    private LocalDateTime gmtUpdate;
}

1.4 UserInfoBatchMapper

public interface UserInfoBatchMapper {

    /** Single insert
     * @param info
     * @return
     */
    int insert(UserInfoBatchDO info);

    /**
     * foreach insert
     * @param list
     * @return
     */
    int batchInsert(List list);
}

1.5 UserInfoBatchMapper.xml

<?xml version="1.0" encoding="UTF-8"?>



  
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values (#{userName,jdbcType=VARCHAR}, #{passWord,jdbcType=VARCHAR},#{nickName,jdbcType=VARCHAR}, #{mobile,jdbcType=VARCHAR}, #{email,jdbcType=VARCHAR}, #{gmtCreate,jdbcType=TIMESTAMP}, #{gmtUpdate,jdbcType=TIMESTAMP})
  

  
    insert into user_info_batch (user_name, pass_word, nick_name, mobile, email, gmt_create, gmt_update)
    values
    
      (#{item.userName,jdbcType=VARCHAR}, #{item.passWord,jdbcType=VARCHAR}, #{item.nickName,jdbcType=VARCHAR}, #{item.mobile,jdbcType=VARCHAR}, #{item.email,jdbcType=VARCHAR}, #{item.gmtCreate,jdbcType=TIMESTAMP}, #{item.gmtUpdate,jdbcType=TIMESTAMP})

1.6 preliminary data

In order to facilitate the test, several variables are extracted and loaded in advance.

    private List list = new ArrayList<>();
    private List lessList = new ArrayList<>();
    private List lageList = new ArrayList<>();
    private List warmList = new ArrayList<>();
    // Counting tool
    private StopWatch sw = new StopWatch();

In order to facilitate the assembly of data, a common method is extracted.

    private List assemblyData(int count){
        List list = new ArrayList<>();
        UserInfoBatchDO userInfoDO;
        for (int i = 0;i < count;i++){
            userInfoDO = new UserInfoBatchDO()
                    .setUserName("Van")
                    .setNickName("Dust blog")
                    .setMobile("17098705205")
                    .setPassWord("password")
                    .setGmtUpdate(LocalDateTime.now());
            list.add(userInfoDO);
        }
        return list;
    }

Preheating data

    @Before
    public void assemblyData() {
        list = assemblyData(200000);
        lessList = assemblyData(2000);
        lageList = assemblyData(1000000);
        warmList = assemblyData(5);
    }

2, Execute a single insert statement repeatedly

Maybe lazy programmers will do this. It is very simple to directly nest a for loop on the original single insert statement.

2.1 corresponding mapper interface

int insert(UserInfoBatchDO info);

2.2 test method

Because this method is too slow, the data is reduced to two thousand strip

@Test
public void insert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Execute a single insert statement repeatedly");
    // It's too slow to insert 20w here, so I only inserted 2000
    for (UserInfoBatchDO userInfoBatchDO : lessList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

2.3 execution time

for the first time

-----------------------------------------
ms     %     Task name
-----------------------------------------
59887  100%  Execute a single insert statement repeatedly

The second time

-----------------------------------------
ms     %     Task name
-----------------------------------------
64853  100%  Execute a single insert statement repeatedly

third time

-----------------------------------------
ms     %     Task name
-----------------------------------------
58235  100%  Execute a single insert statement repeatedly

Insert 2000 in this way Data, average time of execution three times: 60991 ms.

3, foreach Splicing SQL

3.1 corresponding mapper interface

int batchInsert(List list);

3.2 test method

Both this method and the next method are tested with 20w data.

@Test
public void batchInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("foreach Splicing sql");
    userInfoBatchMapper.batchInsert(list);
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

3.3 execution time

for the first time

-----------------------------------------
ms     %     Task name
-----------------------------------------
18835  100%  foreach Splicing sql

The second time

-----------------------------------------
ms     %     Task name
-----------------------------------------
17895  100%  foreach Splicing sql

third time

-----------------------------------------
ms     %     Task name
-----------------------------------------
19827  100%  foreach Splicing sql

Insert 20w in this way Data, average time of execution three times: 18852 ms.

4, Batch processing

This way mapper And xml Reused 2.1.

four point one rewriteBatchedStatements parameter

At the beginning of the test, I found that it was changed to The methods submitted by Mybatis Batch do not work. In fact, when inserting, it is still inserting records one by one, and the speed is far lower than the original foreach The method of splicing SQL is very unscientific.

Later, it was found that for batch execution, a new parameter needs to be added to the connection URL string: rewritebackedstatements = true

Introduction to rewritebackedstatements parameter

The rewritebackedstatements parameter should be added to the url of the JDBC connection of MySql, and the driver of version 5.1.13 or above should be guaranteed to realize high-performance batch insertion. By default, the MySql JDBC driver ignores the executeBatch() statement and breaks up a group of SQL statements that we expect to execute in batches and sends them to the MySql database one by one. In fact, batch insertion is a single insertion, which directly leads to low performance. Only when the rewritebackedstatements parameter is set to true, the driver will help you execute SQL in batches. This option is valid for both INSERT/UPDATE/DELETE.

4.2 batch preparation

Manual injection SqlSessionFactory

    @Resource
    private SqlSessionFactory sqlSessionFactory;

Test code

@Test
public void processInsert() {
    log.info("[[program warm-up]");
    for (UserInfoBatchDO userInfoBatchDO : warmList) {
        userInfoBatchMapper.insert(userInfoBatchDO);
    }
    log.info("[[end of warm-up]");
    sw.start("Batch execution insert");
    // Open batch
    SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH);
    UserInfoBatchMapper mapper = session.getMapper(UserInfoBatchMapper.class);
    for (int i = 0,length = list.size(); i < length; i++) {
        mapper.insert(list.get(i));
        //Commit every 20000 to prevent memory overflow
        if(i%20000==19999){
            session.commit();
            session.clearCache();
        }
    }
    session.commit();
    session.clearCache();
    sw.stop();
    log.info("all cost info:{}",sw.prettyPrint());
}

4.3 execution time

for the first time

-----------------------------------------
ms     %     Task name
-----------------------------------------
09346  100%  Batch execution insert

The second time

-----------------------------------------
ms     %     Task name
-----------------------------------------
08890  100%  Batch execution insert

third time

-----------------------------------------
ms     %     Task name
-----------------------------------------
09042  100%  Batch execution insert

Insert 20w in this way Data, average time of execution for three times: 9092 ms.

4.4 if the data is larger

When I expand the data to 100w foreach Splicing sql I can't complete the insertion in this way, so I can only test the insertion time of the batch.

During the test, only the list Cut lageList Just test.

for the first time

-----------------------------------------
ms     %     Task name
-----------------------------------------
32419  100%  Batch execution insert

The second time

-----------------------------------------
ms     %     Task name
-----------------------------------------
31935  100%  Batch execution insert

third time

-----------------------------------------
ms     %     Task name
-----------------------------------------
33048  100%  Batch execution insert

Insert 100w in this way Data, average time of execution three times: 32467 ms.

5, Summary

Batch insert method	Data volume	Average time to execute three times
Circular insertion of single piece of data	2000	60991 ms
foreach Splicing sql	20w	18852 ms
Batch processing	20w	9092 ms
Batch processing	100w	32467 ms

Although the efficiency of circular inserting a single piece of data is very low, the amount of code is very small, and it can be used when the amount of data is small, but it is forbidden to use because of the large amount of data, which is too inefficient;
foreach The method of splicing sql has a large section of xml and sql statements to write, which is easy to make mistakes. Although the efficiency is fair, it still can not be used when dealing with a large amount of data, so it is not recommended;
Batch execution is recommended when there is a large amount of data insertion, and it is also convenient to use.
www.somanba.cn

www.sobd.cc

www.awaedu.com

Posted by kaspari22 on Sat, 18 Sep 2021 21:56:29 -0700

Programmer Group

Comparison of several batch insertion efficiencies of [MyBatis]

Comparison of several batch insertion efficiencies of [MyBatis]

1, Preliminary preparation

2, Execute a single insert statement repeatedly

3, foreach Splicing SQL

4, Batch processing

5, Summary

Hot Keywords