Fuse principle and implementation Golang version

Keywords: Go

Dependency between services is very common in microservices. For example, the comment service depends on the audit service, and the audit service depends on the anti spam service. When the comment service calls the audit service, the audit service calls the anti spam service, and the anti spam service times out. Because the audit service depends on the anti spam service, the anti spam service times out, resulting in the audit service logic waiting all the time, At this time, the comment service is calling the audit service all the time. The audit service may cause service downtime due to the accumulation of a large number of requests

 

 

It can be seen that in the whole call chain, an exception in one of the middle links will cause some problems in the upstream calling services, and even cause the services of the whole call chain to go down, which is very terrible. Therefore, when a service calls another service as the caller, in order to prevent problems in the called service, which will lead to problems in the calling service, the calling service needs self-protection, and the common means of protection is fusing

Fuse principle
In fact, the fuse mechanism refers to the protection mechanism of the fuse in our daily life. When the circuit is overloaded, the fuse will automatically disconnect, so as to ensure that the electrical appliances in the circuit will not be damaged. The circuit breaker mechanism in service governance refers to that when initiating a service call, if the error rate returned by the callee exceeds a certain threshold, the subsequent request will not really initiate the request, but directly return an error to the caller

In this mode, the service caller maintains a state machine for each calling service (call path), in which there are three states:

Closed: in this state, we need a counter to record the number of call failures and the total number of requests. If the failure rate reaches the preset threshold within a certain time window, we will switch to the disconnected state. At this time, a timeout time will be opened. When the time reaches, we will switch to the semi closed state, This timeout gives the system a chance to correct the error that caused the call to fail in order to return to the normal working state. In the off state, the call error is time-based and will be reset within a specific time interval, which can prevent accidental errors from causing the fuse to enter the off state
Open: in this state, an error will be returned immediately when initiating a request. Generally, a timeout timer will be started. When the timer times out, the state will switch to the semi open state. You can also set a timer to regularly detect whether the service is restored
Half open: in this state, the application is allowed to send a certain number of requests to the called service. If these calls are normal, it can be considered that the called service has returned to normal. At this time, the fuse is switched to the off state and the count needs to be reset. If there is still a call failure in this part, it is considered that the callee has not recovered, the fuse will switch to the off state, and then reset the counter. The semi open state can effectively prevent the recovering service from being broken down again by a sudden large number of requests

 

 

The fuse mechanism is introduced into service governance to make the system more stable and flexible, provide stability when the system recovers from errors, reduce the impact of errors on system performance, and quickly reject service calls that may lead to errors without waiting for the real error to return

Fuse lead in
The principle of fuse is introduced above. After understanding the principle, do you think about how to introduce fuse? One solution is that fuses can be added to business logic, but it is obviously not elegant and universal enough. Therefore, we need to integrate fuses into the framework, and fuses are built in the zRPC framework

We know that the fuse is mainly used to protect the calling end. The calling end needs to pass through the fuse when initiating a request, and the client interceptor just has this function. Therefore, in the zRPC framework, the fuse is implemented in the client interceptor. The principle of the interceptor is shown in the following figure:

 

 


The corresponding code is:

func BreakerInterceptor(ctx context.Context, method string, req, reply interface{},
    cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
  // Fuse based on request method
    breakerName := path.Join(cc.Target(), method)
    return breaker.DoWithAcceptable(breakerName, func() error {
    // Actually initiate the call
        return invoker(ctx, method, req, reply, cc, opts...)
    // codes.Acceptable To determine which error needs to be added to the fusing error count
    }, codes.Acceptable)
}

 


Fuse realization
The implementation of fuse in zRPC refers to Google Sre overload protection algorithm. The principle of the algorithm is as follows:

Requests: the total number of requests initiated by the caller
Requests accepted: the number of requests normally processed by the callee
Under normal circumstances, the two values are equal. With the exception of the called party's service, it starts to reject the request, and the value of the number of requests accepted (acceptances) begins to be gradually less than the number of requests (requests). At this time, the caller can continue to send the request until requests = k * acceptants. Once this limit is exceeded, the fuse will open, The new request will be discarded locally with a certain probability and return an error directly. The calculation formula of the probability is as follows:

 

 


By modifying the K (multiple value) in the algorithm, the sensitivity of the fuse can be adjusted. When the multiple value is reduced, the adaptive fusing algorithm will be more sensitive. When the multiple value is increased, the sensitivity of the adaptive fusing algorithm will be reduced. For example, Assuming that the upper limit of the caller's request is adjusted from requests = 2 * accept to requests = 1.1 * accept, it means that one of the caller's ten requests will trigger a fuse

The code path is go zero / core / breaker

type googleBreaker struct {
    k     float64  // The multiple value defaults to 1.5
    stat  *collection.RollingWindow // The sliding time window is used to count request failures and successes
    proba *mathx.Proba // Dynamic probability
}

 


Implementation of adaptive fusing algorithm

func (b *googleBreaker) accept() error {
    accepts, total := b.history()  // Number of requests accepted and total number of requests
    weightedAccepts := b.k * float64(accepts)
  // Calculate the drop request probability
    dropRatio := math.Max(0, (float64(total-protection)-weightedAccepts)/float64(total+1))
    if dropRatio <= 0 {
        return nil
    }
    // Dynamically judge whether the fuse is triggered
    if b.proba.TrueOnProba(dropRatio) {
        return ErrServiceUnavailable
    }

    return nil
}

 

 

Each time a request is initiated, the doReq method will be called. In this method, the accept validation is used to check whether the fuse is triggered. The acceptable method is used to judge which error s are counted in the failure count. The definition is as follows:

func Acceptable(err error) bool {
    switch status.Code(err) {
    case codes.DeadlineExceeded, codes.Internal, codes.Unavailable, codes.DataLoss: // Exception request error
        return false
    default:
        return true
    }
}

 


If the request is normal, both the request quantity and the request acceptance quantity will be increased by one through markSuccess. If the request is abnormal, only the request quantity will be increased by one

 

func (b *googleBreaker) doReq(req func() error, fallback func(err error) error, acceptable Acceptable) error {
    // Judge whether the fuse is triggered
  if err := b.accept(); err != nil {
        if fallback != nil {
            return fallback(err)
        } else {
            return err
        }
    }

    defer func() {
        if e := recover(); e != nil {
            b.markFailure()
            panic(e)
        }
    }()

  // Perform a real call
    err := req()
  // Normal request count
    if acceptable(err) {
        b.markSuccess()
    } else {
    // Exception request count
        b.markFailure()
    }

    return err
}

 


summary
The caller can protect itself through the fuse mechanism to prevent exceptions in calling downstream services, or too long time-consuming will affect the business logic of the caller. Many fully functional microservice frameworks will have built-in fuses. In fact, not only do microservice calls need fuses, but also fuse mechanisms can be introduced when calling dependent resources, such as mysql and redis.
--------
Copyright notice: This is the original article of CSDN blogger "kevwan", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/jfwan/article/details/109328874


--------
Copyright notice: This is the original article of CSDN blogger "kevwan", which follows the CC 4.0 BY-SA copyright agreement. Please attach the original source link and this notice for reprint.
Original link: https://blog.csdn.net/jfwan/article/details/109328874

Posted by rekha on Fri, 05 Nov 2021 20:48:57 -0700