Global routing provides a failure retry mechanism for all routing services. Global routing, that is, default filters
# Open route: spring.cloud.gateway.default-filters[0].name=Retry #According to the default parameters, only the GET request is retried, and the parameters are modified to support POST spring.cloud.gateway.default-filters[0].args.methods[0]=GET spring.cloud.gateway.default-filters[0].args.methods[1]=POST
The retry of spring cloud gateway can not be automatically retried by adding spring retry, nor can it be retried by adding some time parameters.
Let's take a look at the spring documentation:
6.26. The Retry GatewayFilter Factory
The Retry GatewayFilter factory supports the following parameters:
-
retries: The number of retries that should be attempted.
-
statuses: The HTTP status codes that should be retried, represented by using org.springframework.http.HttpStatus.
-
methods: The HTTP methods that should be retried, represented by usingorg.springframework.http.HttpMethod.
-
series: The series of status codes to be retried, represented by using
org.springframework.http.HttpStatus.Series. -
exceptions: A list of thrown exceptions that should be retried.
-
backoff: The configured exponential backoff for the retries. Retries are performed after a backoff interval of firstBackoff * (factor ^ n), where n is the iteration. If maxBackoff is configured, the maximum backoff applied is limited to maxBackoff. If basedOnPreviousValue is true, the backoff is calculated byusing prevBackoff * factor.
The following defaults are configured for Retry filter, if enabled:
-
retries: Three times
-
series: 5XX series
-
methods: GET method
-
exceptions: IOException and TimeoutException
-
backoff: disabled
The following listing configures a Retry GatewayFilter:
Example 55. application.yml
spring: cloud: gateway: routes: - id: retry_test uri: http://localhost:8080/flakey predicates: - Host=*.retry.com filters: - name: Retry args: retries: 3 statuses: BAD_GATEWAY methods: GET,POST backoff: firstBackoff: 10ms maxBackoff: 50ms factor: 2 basedOnPreviousValue: false
The above configuration is an example in the official document, which is configured to a specific route.
Route configuration parameters are defined in org.springframework.cloud.gateway.config.GatewayProperties;
Route definition is completed by routedefinitionroutelocator implementations, routelocator, beanfactoryaware, applicationeventpublisheraware;
private Route convertToRoute(RouteDefinition routeDefinition) { AsyncPredicate<ServerWebExchange> predicate = combinePredicates(routeDefinition); List<GatewayFilter> gatewayFilters = getFilters(routeDefinition); return Route.async(routeDefinition).asyncPredicate(predicate) .replaceFilters(gatewayFilters).build(); } private List<GatewayFilter> getFilters(RouteDefinition routeDefinition) { List<GatewayFilter> filters = new ArrayList<>(); // TODO: support option to apply defaults after route specific filters? if (!this.gatewayProperties.getDefaultFilters().isEmpty()) { filters.addAll(loadGatewayFilters(routeDefinition.getId(), new ArrayList<>(this.gatewayProperties.getDefaultFilters()))); } if (!routeDefinition.getFilters().isEmpty()) { filters.addAll(loadGatewayFilters(routeDefinition.getId(), new ArrayList<>(routeDefinition.getFilters()))); } AnnotationAwareOrderComparator.sort(filters); return filters; }
Determine whether to retry. Specific code: 1
ServerWebExchange exchange = context.applicationContext(); if (exceedsMaxIterations(exchange, retryConfig)) { return false; } // Judge the status code first. The priority of status code is higher than that of series HttpStatus statusCode = exchange.getResponse().getStatusCode(); boolean retryableStatusCode = retryConfig.getStatuses() .contains(statusCode); // null status code might mean a network exception? // If the status code does not exist, try again before judging the series if (!retryableStatusCode && statusCode != null) { // try the series retryableStatusCode = false; for (int i = 0; i < retryConfig.getSeries().size(); i++) { if (statusCode.series().equals(retryConfig.getSeries().get(i))) { retryableStatusCode = true; break; } } } final boolean finalRetryableStatusCode = retryableStatusCode; trace("retryableStatusCode: %b, statusCode %s, configured statuses %s, configured series %s", () -> finalRetryableStatusCode, () -> statusCode, retryConfig::getStatuses, retryConfig::getSeries); // Determine whether the http method needs to be retried HttpMethod httpMethod = exchange.getRequest().getMethod(); boolean retryableMethod = retryConfig.getMethods().contains(httpMethod); trace("retryableMethod: %b, httpMethod %s, configured methods %s", () -> retryableMethod, () -> httpMethod, retryConfig::getMethods); // Finally, both the status code and the request method need to be met before retrying return retryableMethod && finalRetryableStatusCode;
The default parameter series is SERVER_ERROR server error, so try again on the server, including all 5xx errors. For individual 4xx errors, add the statuses parameter.
Add: why do I toss retry
In the previously used eureka, there is no configuration failure retry when the service goes online and offline. Some column delays such as the online and offline of a service will lead to the failure of access to the interface provided by the gateway.
Later, with nacos, the service goes online and offline faster. By adjusting some column timeout and refresh time parameters, the time range of api access failure during the process of online and offline can be reduced (1-10 seconds)
# Reduce the interval, quickly update the service list of the gateway, and keep the list up to date # However, with the reduction of refresh interval and frequent thread sleep and wake-up, the efficiency is certainly not good ribbon.ServerListRefreshInterval=1000
I also looked up other ribbon parameters, tried to add a retry mechanism, and found that, um... The parameters set may be wrong. Anyway, I didn't try again.
hystrix.command.default.execution.timeout.enabled=true hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds=25000 ribbon.ReadTimeout=20000 ribbon.ConnectTimeout=5000 ribbon.MaxAutoRetries=1 ribbon.MaxAutoRetriesNextServer=1
However, when the service goes offline, there is neither connection timeout nor read timeout for a period of time, and the connection is rejected;
In addition, if the service is not graceful offline, such as kill -9, the gateway will encounter the following errors
2021-09-07 11:52:51,446 [reactor-http-epoll-1] TRACE o.s.c.g.f.LoadBalancerClientFilter - LoadBalancerClientFilter url before: lb://xxx/xx-api/ext/wanyee/msg/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid= 2021-09-07 11:52:51,446 [reactor-http-epoll-1] TRACE o.s.c.g.f.LoadBalancerClientFilter - LoadBalancerClientFilter url chosen: http://192.168.2.1:9898/xx-api/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid= 2021-09-07 11:52:51,450 [reactor-http-epoll-1] ERROR o.s.b.a.w.r.e.AbstractErrorWebExceptionHandler - [dc72bc7f-127] 500 Server Error for HTTP GET "/xx-api/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid=" io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: connection denied: /192.168.2.56:9198 Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: Error has been observed at the following site(s): |_ checkpoint ⇢ org.springframework.cloud.gateway.filter.WeightCalculatorWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ HTTP GET "/ams-api/ext/wanyee/msg/list?pageNo=30&pageSize=2&beginTime=2020-09-07%2000:00:00&endTime=2021-09-07%2023:59:59&keyword=&suid=&uid=" [ExceptionHandlingWebHandler] Stack trace: Caused by: java.net.ConnectException: finishConnect(..) failed: connection denied at io.netty.channel.unix.Errors.throwConnectException(Errors.java:124) at io.netty.channel.unix.Socket.finishConnect(Socket.java:251) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:673) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:650) at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:530) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:465) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 2021-09-07 11:52:51,452 [reactor-http-epoll-1] TRACE o.s.c.g.f.GatewayMetricsFilter - gateway.requests tags: [tag(httpMethod=GET),tag(httpStatusCode=500),tag(outcome=SERVER_ERROR),tag(routeId=ams-api),tag(routeUri=lb://ams),tag(status=INTERNAL_SERVER_ERROR)]
Search the wrong keywords online. Well, it's useless
However, spring will definitely solve this problem. This is the retry mechanism.