Thinking about distributed jobs

Keywords: Java Spring github Maven

Introduction

When cleaning Github, we found a simple distributed task distribution system written before ClawHub/task-distribution , using zk's select master and queue, the scheduler uses spring's ThreadPoolTaskScheduler, and the task supports cron expression.

This reminds me that I also encapsulated Quartz before. What I did at that time was a stand-alone super large application, and I needed to do a scheduling system internally.

At present, I have used elastic job in the company, the combination of Spring's own scheduling and ZK, and the task scheduling of the configuration version provided by the company's Boss system.

Writing this article is mainly to briefly recall the ThreadPoolTaskScheduler and elastic Job of Quartz and Spring. Write your own thoughts on distributed jobs.

1,quartz

Official website address: quartz-scheduler.org

1.1 simple use

pom file of maven:

<dependency>
    <groupId>org.quartz-scheduler</groupId>
    <artifactId>quartz</artifactId>
    <version>2.3.2</version>
</dependency>

Specific tasks:

import org.quartz.Job;
import org.quartz.JobExecutionContext;

import java.time.LocalDateTime;
import java.util.Random;

public class HelloJob implements Job {
    @Override
    public void execute(JobExecutionContext jobExecutionContext) {
        System.out.println(jobExecutionContext.getJobDetail().getJobDataMap().get("jobDetailJobData1"));
        System.out.println(jobExecutionContext.getTrigger().getJobDataMap().get("triggerJobData1"));
        System.out.println("HelloJob start at:" + LocalDateTime.now() + ", prints: Hello Job-" + new Random().nextInt(100));
    }
}

Primary register:


import org.quartz.*;
import org.quartz.impl.StdSchedulerFactory;

import java.util.concurrent.TimeUnit;

public class MainScheduler {
    public static void main(String[] args) throws SchedulerException, InterruptedException {
        // 1. Create a Scheduler
        SchedulerFactory schedulerFactory = new StdSchedulerFactory();
        Scheduler scheduler = schedulerFactory.getScheduler();
        // 2. Create a JobDetail instance and bind it to PrintWordsJob class (Job execution content)
        JobDetail jobDetail = JobBuilder.newJob(HelloJob.class).usingJobData("jobDetailJobData1", "test JobDetail context")
                .withIdentity("job1", "group1").build();
        // 3. Build Trigger instance and execute it every 1s
        Trigger trigger = TriggerBuilder.newTrigger().usingJobData("triggerJobData1", "test Trigger context")
                .withIdentity("trigger1", "triggerGroup1")
                .startNow()//Immediate effect
                .withSchedule(SimpleScheduleBuilder.simpleSchedule()
                        .withIntervalInSeconds(1)//Every 1s
                        .repeatForever()).build();//Always execute

        //4, implementation
        scheduler.scheduleJob(jobDetail, trigger);
        System.out.println("--------MainScheduler start ! ------------");
        scheduler.start();

        //sleep
        TimeUnit.MINUTES.sleep(1);
        scheduler.shutdown();
        System.out.println("--------MainScheduler shutdown ! ------------");
    }
}

After the above three steps, execute the main method, and you can simply implement the scheduling task. The following describes the roles that appear.

1.2 role introduction

1.2.1 Job and JobDetail

Job is the task template in Quartz, and JobDetail is the description of job. When the Scheduler executes a task, it will create a new job according to JobDetail, and release it after use.

1.2.2,Trigger

Trigger, which describes when the task is triggered for execution. There are usually SimpleTrigger and CronTrigger. cron expressions are very powerful, and they are basically based on CronTrigger for task scheduling.

1.2.3 JobDataMap and JobExecutionContext

The context in which the JobExecutionContext task executes. The JobDataMap saves the data transferred by the context.

1.1 Architecture Principle


The above figure can simply describe the relationship between Quartz core objects.

2. Spring's ThreadPoolTaskScheduler

The underlying layer relies on the java.util.concurrent.ScheduledExecutorService of JUC.

2.1 simple use


import org.springframework.scheduling.concurrent.ThreadPoolTaskScheduler;
import org.springframework.scheduling.support.CronTrigger;

import java.time.LocalDateTime;
import java.util.Random;

public class Main {
    public static void main(String[] args) {
        ThreadPoolTaskScheduler scheduler = new ThreadPoolTaskScheduler();
        scheduler.setPoolSize(10);
        scheduler.initialize();
        scheduler.schedule(() -> {
            System.out.println("HelloJob start at:" + LocalDateTime.now() + ", prints: Hello Job-" + new Random().nextInt(100));
        }, new CronTrigger("0/1 * * * * ?"));
    }
}

3,Elastic-Job

Elastic job is a distributed scheduling solution, which consists of two independent subprojects, elastic job lite and elastic job cloud.
Official website: elasticjob.io.

We use a lightweight solution: elasticjob/elastic-job-lite

Official structure:

Job execution flow chart:

I only have experience in using it, but I don't know its principle in depth. Please refer to official documents for details.

4. Thinking about distributed jobs

Now there is also a cluster version of Quartz, which I haven't touched before. What I've met now is basically based on the packaging of Quartz, such as Elastic-Job,xxl job,azkaban.

Because the current projects are basically distributed systems, so for the scheduling system, the single machine is not suitable.

For the distributed scheduling system, there must be a lot of requirements. Here is my understanding:

  • Scheduled completion of tasks

The basic requirements of this scheduling system.

  • Task sharable

For some scheduled tasks with a large amount of data, it is very difficult to execute on only one machine, so there should be the function of task segmentation to execute large tasks on multiple systems.

  • Elastic expansion and contraction

When adding or reducing nodes, tasks can be balanced automatically.

  • Task configurable

The information of the task can be configured dynamically.

  • Task dynamic operation

Tasks start, pause, terminate and delete dynamically.

  • De centralization

Whether the task is executed or not is controlled by each node itself, but the execution of the task should not be repeated.

  • Failover

When a node fails, the task is transferred automatically.

  • Task execution record

The execution information of the task needs to be recorded.

It's so much to think of. I feel that elastic job is very useful.

Posted by renegade33 on Sat, 07 Dec 2019 06:10:21 -0800