Spring Batch lightweight batch combat

Keywords: Database Spring batch

1 theoretical basis before actual combat

1.1 what is spring batch

Spring Batch is a lightweight and comprehensive batch processing framework designed to support the development of powerful batch processing applications that are crucial to the daily operation of enterprise systems. At the same time, developers can easily access and utilize more advanced enterprise services when necessary. Spring Batch is not a scheduling framework. It is designed to work with the scheduler, not replace it.

1.2 what can spring batch do

  • Automated and complex mass information processing, which can be processed most effectively without user interaction. These operations typically include time-based events (such as month end calculations, notifications, or communications).
  • Complex business rules (e.g., insurance benefit determination or rate adjustment) that are applied periodically and repeatedly on very large data sets.
  • Integrate the information received from internal and external systems into the recording system, which usually needs to be formatted, verified and processed in a transactional manner. Batch processing is used to process billions of transactions for enterprises every day.

Business scenario:

  • Submit batch periodically
  • Concurrent batch: parallel processing of jobs
  • Phased, enterprise message driven processing
  • Massively parallel batch processing
  • Manual or scheduled restart after failure
  • Sequential processing of dependent steps (extended to workflow driven batch processing)
  • Partial processing: skipping records (for example, when rolling back)
  • Batch transactions, applicable to small batch or existing stored procedures / scripts

In short, what Spring batch can do:

  • Reads a large number of records from a database, file, or queue.
  • Process data in some way.
  • Write back the data in the modified form.

1.3 infrastructure

1.4 core concepts and abstractions


Core concept: a Job has one to many steps, and each Step has exactly one ItemReader, one ItemProcessor and one ItemWriter. The Job needs to be started (using JobLauncher) and metadata about the currently running process needs to be stored (in JobRepository).

2 Introduction to each component

2.1 Job

A Job is an entity that encapsulates the entire batch process. Like other Spring projects, a Job is linked to an XML configuration file or Java based configuration. This configuration can be referred to as "Job configuration".

Configurable items:

  • Simple name of the job.
  • Definition and sorting of Step instances.
  • Whether the job can be restarted.

2.2 Step

A Step is a domain object that encapsulates an independent, continuous phase of a batch Job. Therefore, each Job consists entirely of one or more steps. A Step contains all the information needed to define and control the actual batch.

A StepExecution represents an attempt to execute one Step at a time. StepExecution each time Step runs, a new will be created, similar to JobExecution.

2.3 ExecutionContext

An ExecutionContext represents a collection of key / value pairs persisted and controlled by the framework to allow developers to have a place to store persistent states in the range of StepExecution objects or JobExecution objects.

2.4 JobRepository

JobRepository is the persistence mechanism for all the Stereotypes mentioned above. It provides CRUD operation, JobLauncher, Job and Step implementation. When a Job is started for the first time, a JobExecution is obtained from the repository, and the implementation of StepExecution and JobExecution is continued by passing them to the repository.

When using Java configuration, the @ EnableBatchProcessing annotation provides a JobRepository as one of the automatically configured components out of the box.

2.5 JobLauncher

JobLauncher represents a simple interface for a Job to start JobParameters using a given set, as shown in the following example:

public interface JobLauncher {

public JobExecution run(Job job, JobParameters jobParameters)
            throws JobExecutionAlreadyRunningException, JobRestartException,
                   JobInstanceAlreadyCompleteException, JobParametersInvalidException;
}

Expect to implement JobExecution, get a valid JobRepository from it, and execute the Job.

2.6 Item Reader

ItemReader is an abstraction that represents the input to retrieve Step one item at a time. When ItemReader runs out of items it can provide, it indicates this by returning null.

2.7 Item Writer

ItemWriter is an abstraction that represents the output of a Step, batch, or block of items at a time. Usually, anItemWriter does not know the input that it should receive next, and only knows the items that are passed in its current call.

2.8 Item Processor

ItemProcessor is an abstraction that represents the business processing of an item. When an ItemReader reads an item and an ItemWriter writes to it, it provides an access point to the ItemProcessor to transform or apply other business processes. If it is determined that the item is invalid when processing the item, null is returned, indicating that the item should not be written out.

3 Spring Batch practice

Next, we will use the theory we have learned to implement the simplest Spring Batch batch project

3.1 dependency and project structure and configuration files

rely on

<!--Spring batch-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-batch</artifactId>
</dependency>
<!-- web rely on-->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!-- lombok-->
<dependency>
    <groupId>org.projectlombok</groupId>
    <artifactId>lombok</artifactId>
    <version>1.18.20</version>
</dependency>
<!--  mysql-->
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.47</version>
</dependency>
<!--  mybatis-->
<dependency>
    <groupId>com.baomidou</groupId>
    <artifactId>mybatis-plus-boot-starter</artifactId>
    <version>3.2.0</version>
</dependency>

Project structure

configuration file

server.port=9000
spring.datasource.url=jdbc:mysql://localhost:3306/test
spring.datasource.username=root
spring.datasource.password=12345
spring.datasource.driver-class-name=com.mysql.jdbc.Driver

3.2 codes and data sheets

data sheet

CREATE TABLE `student` (
    `id` int(100) NOT NULL AUTO_INCREMENT,
    `name` varchar(45) DEFAULT NULL,
    `age` int(2) DEFAULT NULL,
    `address` varchar(45) DEFAULT NULL,
    PRIMARY KEY (`id`),
    UNIQUE KEY `id_UNIQUE` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=203579 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT

Student entity class

/**
 * @desc: Student Entity class
 * @author: YanMingXin
 * @create: 2021/10/15-12:17
 **/
@Data
@Accessors(chain = true)
@NoArgsConstructor
@AllArgsConstructor
@ToString
@TableName("student")
public class Student {

    @TableId(value = "id", type = IdType.AUTO)
    private Long sId;

    @TableField("name")
    private String sName;

    @TableField("age")
    private Integer sAge;

    @TableField("address")
    private String sAddress;

}

Mapper layer

/**
 * @desc: Mapper layer
 * @author: YanMingXin
 * @create: 2021/10/15-12:17
 **/
@Mapper
@Repository
public interface StudentDao extends BaseMapper<Student> {
}

Read class in simulation database (file)

/**
 * @desc: Read from simulation database
 * @author: YanMingXin
 * @create: 2021/10/16-10:13
 **/
public class StudentVirtualDao {

    /**
     * Simulate reading from database
     *
     * @return
     */
    public List<Student> getStudents() {
        ArrayList<Student> students = new ArrayList<>();
        students.add(new Student(1L, "zs", 23, "Beijing"));
        students.add(new Student(2L, "ls", 23, "Beijing"));
        students.add(new Student(3L, "ww", 23, "Beijing"));
        students.add(new Student(4L, "zl", 23, "Beijing"));
        students.add(new Student(5L, "mq", 23, "Beijing"));
        students.add(new Student(6L, "gb", 23, "Beijing"));
        students.add(new Student(7L, "lj", 23, "Beijing"));
        students.add(new Student(8L, "ss", 23, "Beijing"));
        students.add(new Student(9L, "zsdd", 23, "Beijing"));
        students.add(new Student(10L, "zss", 23, "Beijing"));
        return students;
    }
}

Service layer interface

/**
 * @desc:
 * @author: YanMingXin
 * @create: 2021/10/15-12:16
 **/
public interface StudentService {

    List<Student> selectStudentsFromDB();

    void insertStudent(Student student);
}

Service layer implementation class

/**
 * @desc: Service Layer implementation class
 * @author: YanMingXin
 * @create: 2021/10/15-12:16
 **/
@Service
public class StudentServiceImpl implements StudentService {

    @Autowired
    private StudentDao studentDao;

    @Override
    public List<Student> selectStudentsFromDB() {
        return studentDao.selectList(null);
    }

    @Override
    public void insertStudent(Student student) {
        studentDao.insert(student);
    }
}

The core configuration class is BatchConfiguration

/**
 * @desc: BatchConfiguration
 * @author: YanMingXin
 * @create: 2021/10/15-12:25
 **/
@Configuration
@EnableBatchProcessing
@SuppressWarnings("all")
public class BatchConfiguration {

    /**
     * Inject JobBuilderFactory
     */
    @Autowired
    public JobBuilderFactory jobBuilderFactory;

    /**
     * Inject StepBuilderFactory
     */
    @Autowired
    public StepBuilderFactory stepBuilderFactory;

    /**
     * Inject JobRepository
     */
    @Autowired
    public JobRepository jobRepository;

    /**
     * Inject JobLauncher
     */
    @Autowired
    private JobLauncher jobLauncher;

    /**
     * Inject custom StudentService
     */
    @Autowired
    private StudentService studentService;

    /**
     * Inject custom job
     */
    @Autowired
    private Job studentJob;

    /**
     * Encapsulate writer bean
     *
     * @return
     */
    @Bean
    public ItemWriter<Student> writer() {
        ItemWriter<Student> writer = new ItemWriter() {
            @Override
            public void write(List list) throws Exception {
                //debug found that the thread of the nested List reader nested the real List
                list.forEach((stu) -> {
                    for (Student student : (ArrayList<Student>) stu) {
                        studentService.insertStudent(student);
                    }
                });
            }
        };
        return writer;
    }

    /**
     * Encapsulate reader bean
     *
     * @return
     */
    @Bean
    public ItemReader<Student> reader() {
        ItemReader<Student> reader = new ItemReader() {
            @Override
            public Object read() throws Exception, UnexpectedInputException, ParseException, NonTransientResourceException {
                //Analog data acquisition
                StudentVirtualDao virtualDao = new StudentVirtualDao();
                return virtualDao.getStudents();
            }
        };
        return reader;
    }

    /**
     * Encapsulating processor bean s
     *
     * @return
     */
    @Bean
    public ItemProcessor processor() {
        ItemProcessor processor = new ItemProcessor() {
            @Override
            public Object process(Object o) throws Exception {
                //debug found that o is the data read by the reader in a single thread
                return o;
            }
        };
        return processor;
    }

    /**
     * Encapsulate custom step
     *
     * @return
     */
    @Bean
    public Step studentStepOne() {
        return stepBuilderFactory.get("studentStepOne")
            .chunk(1)
            .reader(reader()) //Join reader
            .processor(processor())  //Join processor
            .writer(writer())//Join writer
            .build();
    }

    /**
     * Encapsulating custom job s
     *
     * @return
     */
    @Bean
    public Job studentJob() {
        return jobBuilderFactory.get("studentJob")
            .flow(studentStepOne())//Join step
            .end()
            .build();
    }


    /**
     * Scheduled task execution using spring
     */
    @Scheduled(fixedRate = 5000)
    public void printMessage() {
        try {
            JobParameters jobParameters = new JobParametersBuilder()
                .addLong("time", System.currentTimeMillis())
                .toJobParameters();
            jobLauncher.run(studentJob, jobParameters);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

3.3 testing


1 s after project start

Looking at the database, there are so many tables in addition to the tables defined by our entity classes. These tables are the log and error recording tables of spring batch. The specific meaning of the fields needs to be studied

4. Summary after actual combat

Spring Batch has very fast write and read speeds, but the impact is that it consumes memory and database connection pool resources. If it is not used well, exceptions will occur. Therefore, we need to configure it correctly. Next, let's explore the source code:

4.1 JobBuilderFactory

Job acquisition uses the simple factory mode and builder mode. JobBuilderFactory obtains an instance of job object returned by JobBuilder after configuration. This instance is the top-level component in Spring Batch, including n and step

public class JobBuilderFactory {

   private JobRepository jobRepository;

   public JobBuilderFactory(JobRepository jobRepository) {
      this.jobRepository = jobRepository;
   }
   //Return to JobBuilder
   public JobBuilder get(String name) {
      JobBuilder builder = new JobBuilder(name).repository(jobRepository);
      return builder;
   }
}

jobBuilder class

public class JobBuilder extends JobBuilderHelper<JobBuilder> {

   /**
    * Create a new builder for the job with the specified name
    */
   public JobBuilder(String name) {
      super(name);
   }

   /**
    * Create a new job builder that will execute a step or sequence of steps.
    */
   public SimpleJobBuilder start(Step step) {
      return new SimpleJobBuilder(this).start(step);
   }

   /**
    * Create a new job builder that will execute the flow.
    */
   public JobFlowBuilder start(Flow flow) {
      return new FlowJobBuilder(this).start(flow);
   }

   /**
    * Create a new job builder that will execute a step or sequence of steps
    */
   public JobFlowBuilder flow(Step step) {
      return new FlowJobBuilder(this).start(step);
   }
}

4.2 StepBuilderFactory

Look directly at the StepBuilder class

public class StepBuilder extends StepBuilderHelper<StepBuilder> {

   public StepBuilder(String name) {
      super(name);
   }

   /**
    * Building a step with a custom micro thread is not necessarily a processing item.  
    */
   public TaskletStepBuilder tasklet(Tasklet tasklet) {
      return new TaskletStepBuilder(this).tasklet(tasklet);
   }

   /**
    * Build a step to process items in blocks according to the size provided. To extend this step to fault tolerance,
    * Call the faultolerant() method of simplestapbuilder on the builder.
    * @param <I> Input type
    * @param <O> type of output
    */
   public <I, O> SimpleStepBuilder<I, O> chunk(int chunkSize) {
      return new SimpleStepBuilder<I, O>(this).chunk(chunkSize);
   }

   public <I, O> SimpleStepBuilder<I, O> chunk(CompletionPolicy completionPolicy) {
      return new SimpleStepBuilder<I, O>(this).chunk(completionPolicy);
   }

   public PartitionStepBuilder partitioner(String stepName, Partitioner partitioner) {
      return new PartitionStepBuilder(this).partitioner(stepName, partitioner);
   }

   public PartitionStepBuilder partitioner(Step step) {
      return new PartitionStepBuilder(this).step(step);
   }

   public JobStepBuilder job(Job job) {
      return new JobStepBuilder(this).job(job);
   }

   /**
    * Create a new step builder that will execute the flow.
    */
   public FlowStepBuilder flow(Flow flow) {
      return new FlowStepBuilder(this).flow(flow);
   }
}

Reference documents:

https://docs.spring.io/spring-batch/docs/4.3.x/reference/html/index.html

https://www.jdon.com/springbatch.html

Posted by theblacksheep on Fri, 15 Oct 2021 22:35:32 -0700