Spring cloud upgrade 2020.0.x - 32. Improved load balancing algorithm

Keywords: Spring Cloud

Code address of this series: https://github.com/JoJoTec/spring-cloud-parent

In the previous section, we combed the ideas of realizing Feign circuit breaker and thread isolation. In this section, we will first discuss how to optimize the current load balancing algorithm without looking at the source code implementation (because the source code will include the improved part of the load balancing algorithm).

Previous load balancing algorithm

  1. Get the service instance list and sort the instance list according to the ip port. If not, even if position is the next one, it may represent the instance that has been called before
  2. According to the traceId in the request, get an atomic variable position with traceId as the key from the local cache with the initial value of random number, so as to prevent all requests from being called from the first instance, and then the second and third.
  3. position atom is added by one, then the number of instances is subtracted, and the corresponding subscript instance is returned for calling

The traceId contained in the request comes from the spring cloud sleuth link tracking. Based on this mechanism, we can ensure that the request will not be retried to the previously called instance. The source code is:

//It must implement the ReactorServiceInstanceLoadBalancer
//Instead of reactorloadbalancer < serviceinstance >
//Because the ReactorServiceInstanceLoadBalancer is registered
@Log4j2
public class RoundRobinWithRequestSeparatedPositionLoadBalancer implements ReactorServiceInstanceLoadBalancer {
    private final ServiceInstanceListSupplier serviceInstanceListSupplier;
    //Each request, including retry, will not exceed 1 minute
    //For more than 1 minute, the request must be heavy and should not be retried
    private final LoadingCache<Long, AtomicInteger> positionCache = Caffeine.newBuilder().expireAfterWrite(1, TimeUnit.MINUTES)
            //Random initial value to prevent calling from the first one every time
            .build(k -> new AtomicInteger(ThreadLocalRandom.current().nextInt(0, 1000)));
    private final String serviceId;
    private final Tracer tracer;


    public RoundRobinWithRequestSeparatedPositionLoadBalancer(ServiceInstanceListSupplier serviceInstanceListSupplier, String serviceId, Tracer tracer) {
        this.serviceInstanceListSupplier = serviceInstanceListSupplier;
        this.serviceId = serviceId;
        this.tracer = tracer;
    }
    
    //Each time you try again, you will actually call the choose method to retrieve an instance
    @Override
    public Mono<Response<ServiceInstance>> choose(Request request) {
        return serviceInstanceListSupplier.get().next().map(serviceInstances -> getInstanceResponse(serviceInstances));
    }

    private Response<ServiceInstance> getInstanceResponse(List<ServiceInstance> serviceInstances) {
        if (serviceInstances.isEmpty()) {
            log.warn("No servers available for service: " + this.serviceId);
            return new EmptyResponse();
        }
        return getInstanceResponseByRoundRobin(serviceInstances);
    }

    private Response<ServiceInstance> getInstanceResponseByRoundRobin(List<ServiceInstance> serviceInstances) {
        if (serviceInstances.isEmpty()) {
            log.warn("No servers available for service: " + this.serviceId);
            return new EmptyResponse();
        }
        //In order to solve the problem of different original algorithms, call concurrency may cause a request to retry the same instance
        //Get the context of the current request from sleuth's Tracer
        Span currentSpan = tracer.currentSpan();
        //If the context does not exist, it may not be the request of the front-end user, but triggered by some other mechanism, and we will create a new context
        if (currentSpan == null) {
            currentSpan = tracer.newTrace();
        }
        //Get the traceId of the request from the request context to uniquely identify a request
        long l = currentSpan.context().traceId();
        AtomicInteger seed = positionCache.get(l);
        int s = seed.getAndIncrement();
        int pos = s % serviceInstances.size();
        log.info("position {}, seed: {}, instances count: {}", pos, s, serviceInstances.size());
        return new DefaultResponse(serviceInstances.stream()
                //The order of the instance return list may be different. In order to maintain consistency, sort first and then retrieve
                .sorted(Comparator.comparing(ServiceInstance::getInstanceId))
                .collect(Collectors.toList()).get(pos));
    }
}

However, this load balancing algorithm still brings us problems when the number of requests increases sharply.

First of all, we did not expand the capacity for this sudden increase, which led to the performance pressure being very sensitive to the balanced distribution of pressure. For example, suppose there are nine instances of microservice A. when the business peak comes, the ideal situation is to ensure that the nine load pressures are completely balanced at any time. However, since we use the atomic variable position with an initial value of random number, although from the total amount of a day, the pressure responsible for balancing must be balanced, but in a short period of time, It is likely that the pressure all runs to some instances, resulting in these instances being crushed and fused, and then all run to other instances, which are crushed and fused, such a vicious circle.

Then, we use k8s deployment. Many micro service pods may run on the same virtual machine. In some cases, multiple pods of the same microservice may run to the same virtual machine Node. This can be seen from the ip network segment of the pod: for example, a microservice has the following seven examples: 10.238.13.12:8181, 10.238.13.24:8181, 10.238.15.12:8181, 10.238.17.12: 8181, 10.238.20.220:8181, 10.238.21.31:8181, 10.238.21.121:8181, Then 10.238.13.12:8181 and 10.238.13.24:8181 are likely to be on the same Node, and 10.238.21.31:8181 and 10.238.21.121:8181 are likely to be on the same Node. When we try again, we need to give priority to retrying instances that are not on the same Node as the previously tried instances, because as long as one instance on the same Node has problems or excessive pressure, others basically have problems or excessive pressure.

Finally, if the call to an instance fails all the time, the call priority of this instance needs to be ranked behind other normal instances. This is to reduce the impact of fast refresh publishing (stopping multiple old instances after starting many instances at once, and the number of instances is greater than the retry count configuration) on users, as well as the impact on users of multiple instances offline due to sudden exceptions in a usable area, as well as the business pressure has passed. After the pressure becomes smaller, you need to close the instances that are no longer needed, The migration of a large number of instances has a great impact on users.

Optimization scheme for the above problems

We propose an optimized solution to the above three problems:

  1. For each request, record:
  2. Which instances have been called in this request - > request the called instance cache
  3. The instance being called, how many requests are currently being processed - > the number of instance running requests
  4. Called instance, recent request error rate - > instance request error rate
  5. Randomly disrupt the instance list to prevent sending requests to the same instance when the above three indicators are the same.
  6. Sort according to the order that the current request has not been called before - > the smaller the error rate is, the higher the error rate is - > the smaller the number of instance running requests is, the higher the error rate is
  7. Take the first instance in the ordered list as the instance of this load balancing

The specific implementation is as follows: the following code comes from: https://github.com/JoJoTec/spring-cloud-parent

We used dependencies:

<dependency>
    <groupId>io.dropwizard.metrics</groupId>
    <artifactId>metrics-core</artifactId>
</dependency>

Cache class for recording instance data:

@Log4j2
public class ServiceInstanceMetrics {
	private static final String CALLING = "-Calling";
	private static final String FAILED = "-Failed";

	private MetricRegistry metricRegistry;

	ServiceInstanceMetrics() {
	}

	public ServiceInstanceMetrics(MetricRegistry metricRegistry) {
		this.metricRegistry = metricRegistry;
	}

	/**
	 * Record call instance
	 * @param serviceInstance
	 */
	public void recordServiceInstanceCall(ServiceInstance serviceInstance) {
		String key = serviceInstance.getHost() + ":" + serviceInstance.getPort();
		metricRegistry.counter(key + CALLING).inc();
	}
	/**
	 * End of record call instance
	 * @param serviceInstance
	 * @param isSuccess Is it successful
	 */
	public void recordServiceInstanceCalled(ServiceInstance serviceInstance, boolean isSuccess) {
		String key = serviceInstance.getHost() + ":" + serviceInstance.getPort();
		metricRegistry.counter(key + CALLING).dec();
		if (!isSuccess) {
			//If not, the record fails
			metricRegistry.meter(key + FAILED).mark();
		}
	}

	/**
	 * Gets the number of running calls
	 * @param serviceInstance
	 * @return
	 */
	public long getCalling(ServiceInstance serviceInstance) {
		String key = serviceInstance.getHost() + ":" + serviceInstance.getPort();
		long count = metricRegistry.counter(key + CALLING).getCount();
		log.debug("ServiceInstanceMetrics-getCalling: {} -> {}", key, count);
		return count;
	}

	/**
	 * Gets the number and rate of call failures in the last minute, which is actually the sliding average
	 * @param serviceInstance
	 * @return
	 */
	public double getFailedInRecentOneMin(ServiceInstance serviceInstance) {
		String key = serviceInstance.getHost() + ":" + serviceInstance.getPort();
		double rate = metricRegistry.meter(key + FAILED).getOneMinuteRate();
		log.debug("ServiceInstanceMetrics-getFailedInRecentOneMin: {} -> {}", key, rate);
		return rate;
	}
}

Load balancing core code:

private final LoadingCache<Long, Set<String>> calledIpPrefixes = Caffeine.newBuilder()
        .expireAfterAccess(3, TimeUnit.MINUTES)
        .build(k -> Sets.newConcurrentHashSet());
private final String serviceId;
private final Tracer tracer;
private final ServiceInstanceMetrics serviceInstanceMetrics;

//Each time you try again, you will actually call the choose method to retrieve an instance
@Override
public Mono<Response<ServiceInstance>> choose(Request request) {
    Span span = tracer.currentSpan();
    return serviceInstanceListSupplier.get().next()
            .map(serviceInstances -> {
                //Keep span the same as the span calling choose
                try (Tracer.SpanInScope cleared = tracer.withSpanInScope(span)) {
                    return getInstanceResponse(serviceInstances);
                }
            });
}


private Response<ServiceInstance> getInstanceResponse(List<ServiceInstance> serviceInstances) {
    if (serviceInstances.isEmpty()) {
        log.warn("No servers available for service: " + this.serviceId);
        return new EmptyResponse();
    }
    //Read the link tracking context of spring cloud sleuth for the current request and obtain the corresponding traceId
    Span currentSpan = tracer.currentSpan();
    if (currentSpan == null) {
        currentSpan = tracer.newTrace();
    }
    long l = currentSpan.context().traceId();
    return getInstanceResponseByRoundRobin(l, serviceInstances);
}

@VisibleForTesting
public Response<ServiceInstance> getInstanceResponseByRoundRobin(long traceId, List<ServiceInstance> serviceInstances) {
    //First, randomly disrupt the order of instances in the list
    Collections.shuffle(serviceInstances);
    //All parameters need to be cached first, otherwise the comparator will be called many times, and the parameters may change during sorting (the request statistics for instances are changing concurrently all the time)
    Map<ServiceInstance, Integer> used = Maps.newHashMap();
    Map<ServiceInstance, Long> callings = Maps.newHashMap();
    Map<ServiceInstance, Double> failedInRecentOneMin = Maps.newHashMap();
    serviceInstances = serviceInstances.stream().sorted(
            Comparator
                    //The network segment that has been called before is in the back here
                    .<ServiceInstance>comparingInt(serviceInstance -> {
                        return used.computeIfAbsent(serviceInstance, k -> {
                            return calledIpPrefixes.get(traceId).stream().anyMatch(prefix -> {
                                return serviceInstance.getHost().contains(prefix);
                            }) ? 1 : 0;
                        });
                    })
                    //With the lowest current error rate
                    .thenComparingDouble(serviceInstance -> {
                        return failedInRecentOneMin.computeIfAbsent(serviceInstance, k -> {
                            double value = serviceInstanceMetrics.getFailedInRecentOneMin(serviceInstance);
                            //Since moving average (EMA) is used, too small differences need to be ignored (keep two decimal places instead of rounding, but directly discard)
                            return ((int) (value * 100)) / 100.0;
                        });
                    })
                    //Least current load requests
                    .thenComparingLong(serviceInstance -> {
                        return callings.computeIfAbsent(serviceInstance, k ->
                                serviceInstanceMetrics.getCalling(serviceInstance)
                        );
                    })
    ).collect(Collectors.toList());
    if (serviceInstances.isEmpty()) {
        log.warn("No servers available for service: " + this.serviceId);
        return new EmptyResponse();
    }
    ServiceInstance serviceInstance = serviceInstances.get(0);
    //Record the returned network segment
    calledIpPrefixes.get(traceId).add(serviceInstance.getHost().substring(0, serviceInstance.getHost().lastIndexOf(".")));
    //At present, this is only recorded for compatibility with previous unit tests (call times test)
    positionCache.get(traceId).getAndIncrement();
    return new DefaultResponse(serviceInstance);
}

When to update the cache for recording instance data is in FeignClient's code for retry, open circuit and thread isolation, which we will see in the next section.

Q & A on scheme design in some groups

1. Why not use the cache shared by all micro services to save the call data to make these data more accurate?

Options for shared caching include putting these data records into Redis or memory grids such as Apache Ignite. But there are two problems:

  1. If data records are put into additional storage such as Redis, if Redis is unavailable, all load balancing will not be performed. If you put it into Apache Ignite, if the corresponding node goes offline, the corresponding load balancing cannot be performed. These are unacceptable.
  2. Suppose that microservice A needs to call microservice B, there may be A problem when an instance of A calls an instance of B, but there is no problem when other instances of A call this instance of B, for example, when A free zone is congested with another free zone. If the same cache Key is used to record the data of all instances of A calling the instance of B, it is obviously inaccurate.

Each microservice uses local cache to record the data of calling other instances. In our opinion, it is not only easier to implement, but also more accurate.

2. Use EMA instead of request window to count the latest error rate

It must be the most accurate to use the request window for statistics. For example, when we count the error rate in the last minute, we cache the requests in the last minute. When reading, we can add the cached request data together and take the average. However, this method may take up a lot of memory to cache these requests when requests surge. At the same time, when calculating the error rate, it will consume more CPU for calculation as the number of cache requests increases. It's not worth it.

EMA is a sliding average calculation method, which is common in various performance monitoring and statistics scenarios, such as dynamic calculation of TLAB size in JVM, scaling of G1 GC Region size, and many other places where JVM needs to dynamically obtain appropriate values. Instead of caching the request, he directly multiplies the latest value by a ratio and then adds the old value by (1 - this ratio). This ratio is generally higher than 0.5, indicating that EMA is more relevant to the current latest value.

However, EMA also brings another problem. We will find that as the program runs, the number of decimal places will be very large. We will see values similar to the following: 0.00000000123, 0.120000001 and 0.120000003. In order to ignore the impact of too detailed differences (in fact, these effects also come from the wrong request a long time ago), we only keep two decimal places for sorting.

WeChat search "my programming meow" attention to the official account, daily brush, easy to upgrade technology, and capture all kinds of offer:

Posted by liljim on Thu, 11 Nov 2021 12:49:31 -0800