Comparison of three batch insertion methods of MyBatis, I recommend the third!

Keywords: Java Database SQL

This article is reproduced from: https://mp.weixin.qq.com/s/K-wCUUztBxjGSlv2M5UA1Q

This paper is mainly used to record the comparison of three batch insertion methods, so as to facilitate the selection according to the actual situation in the follow-up work

The database uses SQL server, JDK version 1.8, and runs in the SpringBoot environment. Compare the three available methods

  • Execute a single insert statement repeatedly

  • xml splicing sql

  • Batch execution

Let's start with the conclusion: please insert a small amount of data repeatedly, which is convenient. If the quantity is large, please use batch processing. (consider about 20 pieces of data to be inserted. In my test and database environment, the time-consuming is 100 milliseconds, and convenience is the most important.)

xml is not used to splice sql at any time.

code

Splicing SQL xml

newId() is the function of SQL server to generate UUID, which is irrelevant to this article

<insert id="insertByBatch" parameterType="java.util.List">
    INSERT INTO tb_item VALUES
    <foreach collection="list" item="item" index="index" separator=",">
        (newId(),#{item.uniqueCode},#{item.projectId},#{item.name},#{item.type},#{item.packageUnique},
        #{item.isPackage},#{item.factoryId},#{item.projectName},#{item.spec},#{item.length},#{item.weight},
        #{item.material},#{item.setupPosition},#{item.areaPosition},#{item.bottomHeight},#{item.topHeight},
        #{item.serialNumber},#{item.createTime}</foreach>
</insert>
Mapper Interface Mapper yes mybatis plug-in unit tk.Mapper The interface has little to do with the content of this article

public interface ItemMapper extends Mapper<Item> {
    int insertByBatch(List<Item> itemList);
}

Service class

@Service
public class ItemService {
    @Autowired
    private ItemMapper itemMapper;
    @Autowired
    private SqlSessionFactory sqlSessionFactory;
    //Batch processing
    @Transactional
    public void add(List<Item> itemList) {
        SqlSession session = sqlSessionFactory.openSession(ExecutorType.BATCH,false);
        ItemMapper mapper = session.getMapper(ItemMapper.class);
        for (int i = 0; i < itemList.size(); i++) {
            mapper.insertSelective(itemList.get(i));
            if(i%1000==999){//Commit every 1000 to prevent memory overflow
                session.commit();
                session.clearCache();
            }
        }
        session.commit();
        session.clearCache();
    }
    //Splicing sql
    @Transactional
    public void add1(List<Item> itemList) {
        itemList.insertByBatch(itemMapper::insertSelective);
    }
    //Circular insertion
    @Transactional
    public void add2(List<Item> itemList) {
        itemList.forEach(itemMapper::insertSelective);
    }
}

Test class

@RunWith(SpringRunner.class)
@SpringBootTest(webEnvironment = SpringBootTest.WebEnvironment.RANDOM_PORT, classes = ApplicationBoot.class)
public class ItemServiceTest {
    @Autowired
    ItemService itemService;

    private List<Item> itemList = new ArrayList<>();
    //Generate test List
    @Before 
    public void createList(){
        String json ="{\n" +
                "        \"areaPosition\": \"TEST\",\n" +
                "        \"bottomHeight\": 5,\n" +
                "        \"factoryId\": \"0\",\n" +
                "        \"length\": 233.233,\n" +
                "        \"material\": \"Q345B\",\n" +
                "        \"name\": \"TEST\",\n" +
                "        \"package\": false,\n" +
                "        \"packageUnique\": \"45f8a0ba0bf048839df85f32ebe5bb81\",\n" +
                "        \"projectId\": \"094b5eb5e0384bb1aaa822880a428b6d\",\n" +
                "        \"projectName\": \"project_TEST1\",\n" +
                "        \"serialNumber\": \"1/2\",\n" +
                "        \"setupPosition\": \"1B column\",\n" +
                "        \"spec\": \"200X200X200\",\n" +
                "        \"topHeight\": 10,\n" +
                "        \"type\": \"Steel\",\n" +
                "        \"uniqueCode\": \"12344312\",\n" +
                "        \"weight\": 100\n" +
                "    }";
        Item test1 = JSON.parseObject(json,Item.class);
        test1.setCreateTime(new Date());
        for (int i = 0; i < 1000; i++) {//The test will modify this quantity
            itemList.add(test1);
        }
    }
     //Batch processing
    @Test
    @Transactional
    public void tesInsert() {
        itemService.add(itemList);
    }
    //Splice string
    @Test
    @Transactional
    public void testInsert1(){
        itemService.add1(itemList);
    }
    //Circular insertion
    @Test
    @Transactional
    public void testInsert2(){
        itemService.add2(itemList);
    }
}

Test results:

10 and 25 data inserts have been tested for many times, and the volatility is large, but they are basically at the level of 100 milliseconds

In the splicing sql method, errors are reported when 500 and 1000 sql statements are inserted (it seems that the sql statement is too long, which is related to the database type and is not tested by other databases): com.microsoft.sqlserver.jdbc.SQLServerException: the incoming tabular data stream (TDS) remote procedure call (RPC) protocol stream is incorrect. Too many parameters are provided in this RPC request, which should be 2100 at most

Can be found

  • The time complexity of loop insertion is O(n), and the constant C is very large

  • The time complexity of splicing SQL inserts (should be) O(logn), but the successful completion times are not many and uncertain

  • The time complexity of batch processing efficiency is O(logn), and the constant C is relatively small

conclusion

Although the efficiency of circular inserting a single piece of data is very low, the amount of code is very small. When using the tk.Mapper plug-in, only the code is required:

@Transactional
public void add1(List<Item> itemList) {
    itemList.forEach(itemMapper::insertSelective);
}

Therefore, it must be used when the amount of data to be inserted is small.

xml splicing sql is the least recommended method. There are large sections of xml and sql statements to write when using, which is easy to make mistakes and low efficiency. More importantly, although the efficiency is OK, you hang up when you really need efficiency. What do you want?

Batch execution is recommended when there is a large amount of data insertion, and it is also convenient to use.

Posted by pjoshi on Fri, 10 Sep 2021 00:44:40 -0700