Dubbo: Elegant downtime

Keywords: Dubbo Zookeeper Java Netty

1. Overview

This article shares the elegant downtime of Dubbo, corresponding to Dubbo User Guide Elegant Downtime .

Define as follows:

Dubbo accomplishes elegant downtime through JDK's HutdownHook, so if a user uses a mandatory shutdown command such as kill-9 PID, the elegant downtime will not be executed and will only be executed through kill PID.

The principle is as follows:

service provider

  • When stopping, it is marked as not accepting new requests, and the new requests come in with a direct error, allowing the client to retry the other machine.//<1>
  • It then detects if the threads in the thread pool are running and, if so, waits for all threads to complete execution and forces shutdown unless a time-out occurs.// <2>

Service consumer

  • When stopped, no new call requests are made, and all new calls fail on the client side.// <3>
  • Then, detect whether a response to a request has not returned, wait for the response to return, and force it to close unless it times out.// <4>

2. ShutdownHook

ShutdownHook, Dubbo's elegant downtime, is initialized in AbstractConfig's static code block with the following code:

static {
    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
        public void run() {
            if (logger.isInfoEnabled()) {
                logger.info("Run shutdown hook now.");
            }
            ProtocolConfig.destroyAll();
        }
    }, "DubboShutdownHook"));
}
  • From the location of the code, this is not a good location.However, this is an appropriate location considering that it is guaranteed to be initialized to ShutdownHook.Of course, from the official TODO, there may be a change of place in the future.
  • ProtocolConfig#destroyAll() method, code as follows:

     1: public static void destroyAll() {
     2:     // Ignore if destroyed
     3:     if (!destroyed.compareAndSet(false, true)) {
     4:         return;
     5:     }
     6:     // Destroy Registry correlation
     7:     AbstractRegistryFactory.destroyAll();
     8: 
     9:     // Wait until the service is consumed and the registry notifies that the service provider is offline, increasing the success rate of graceful downtime without retrying.
    10:     // Wait for registry notification
    11:     try {
    12:         Thread.sleep(ConfigUtils.getServerShutdownTimeout());
    13:     } catch (InterruptedException e) {
    14:         logger.warn("Interrupted unexpectedly when waiting for registry notification during shutdown process!");
    15:     }
    16: 
    17:     // Destroy Protocol Related
    18:     ExtensionLoader<Protocol> loader = ExtensionLoader.getExtensionLoader(Protocol.class);
    19:     for (String protocolName : loader.getLoadedExtensions()) {
    20:         try {
    21:             Protocol protocol = loader.getLoadedExtension(protocolName);
    22:             if (protocol != null) {
    23:                 protocol.destroy();
    24:             }
    25:         } catch (Throwable t) {
    26:             logger.warn(t.getMessage(), t);
    27:         }
    28:     }
    29: }
    
    • Lines 2 to 5: Ignore if destroyed.
    • Line 7: Call the Abstract Registry Factory#destroyAll() method, destroy all Registries, and unsubscribe and register service providers and consumers in the application.For a detailed analysis, see 「2.1 AbstractRegistryFactory」 Medium.
    • Lines 9 to 15: sleep waits for a period of time for service consumers in other applications to receive a registry notification that the service provider for the application is offline, increasing the success rate of elegant downtime without retrying.

      • Of course, this is not an absolute wait, but rather the developer configures the "dubbo.service.shutdown.wait" parameter to set the wait time in milliseconds.ConfigUtils#getServerShutdownTimeout() method with the following code:

        public static int getServerShutdownTimeout() {
            // Default, 10 * 1000ms
            int timeout = Constants.DEFAULT_SERVER_SHUTDOWN_TIMEOUT;
            // Get the "dubbo.service.shutdown.wait" configuration item in milliseconds
            String value = ConfigUtils.getProperty(Constants.SHUTDOWN_WAIT_KEY);
            if (value != null && value.length() > 0) {
                try {
                    timeout = Integer.parseInt(value);
                } catch (Exception e) {
                }
            // If empty, get the "dubbo.service.shutdown.wait.seconds" configuration item in seconds.
            // ps: This parameter has been discarded and "dubbo.service.shutdown.wait" is recommended
            } else {
                value = ConfigUtils.getProperty(Constants.SHUTDOWN_WAIT_SECONDS_KEY);
                if (value != null && value.length() > 0) {
                    try {
                        timeout = Integer.parseInt(value) * 1000;
                    } catch (Exception e) {
                    }
                }
            }
            // Return
            return timeout;
        }
        
        • Default 10 * 1000ms.
      • In ISSUE#1021: Enhancement for graceful shutdown This is a very interesting discussion. Fat friends must have a look at it.

        Whether you use the most version 2.5.3 or the latest version 2.5.7, you can't do elegant downtime without setting up a retry mechanism. This change is mainly to modify a little code and add a configurable wait time to simply do "graceful downtime without starting a retry".

        The main implementation mechanism is to add a configurable wait time in the two phases, [after provider s disconnect the registry, before closing the response], and [after consumer s remove invoker s, before closing the client]. Currently, hands-on testing can be done without configuring retries or with elegant downtime.

        Since most dubbo-enabled companies now turn off retry mechanisms to avoid extreme avalanches and traffic storms, most interfaces fail to do elegant downtime with the current Dubbo elegant downtime settings, so the elegance without retrying is enhanced here in a simpler waySuccess rate of downtime.

  • Lines 17 to 28: Destroy all Protocol s.There are two types of protocols currently layered:

    • The rotocol implementation class integrated with Registry, Registry Protocol, focuses on the registration of services.Specific destruction logic, see 「2.3 RegistryProtocol」 Medium.
    • Protocol implementation classes for specific protocols, such as dubbo:// corresponding DubboProtocol, hessian:// corresponding HessianProtocol, focus on service exposure and reference.Because DubboProtocol is the most commonly used, we take it for example in 「2.2 DubboProtocol」 Share in.

2.1 AbstractRegistryFactory

#destroyAll() method, destroys all Registries.The code is as follows:

private static final Map<String, Registry> REGISTRIES = new ConcurrentHashMap<String, Registry>();

public static void destroyAll() {
    if (LOGGER.isInfoEnabled()) {
        LOGGER.info("Close all registries " + getRegistries());
    }
    // Acquire locks
    LOCK.lock();
    try {
        // Destroy
        for (Registry registry : getRegistries()) {
            try {
                registry.destroy();
            } catch (Throwable e) {
                LOGGER.error(e.getMessage(), e);
            }
        }
        // wipe cache
        REGISTRIES.clear();
    } finally {
        // Release lock
        LOCK.unlock();
    }
}
  • Call the Registry#destroy() method to destroy each Registry.
  • AbstractRegistry implements a common destruction logic: unregistering and subscribing.The code is as follows:

    @Override
    public void destroy() {
        // Destroyed, skipped
        if (!destroyed.compareAndSet(false, true)) {
            return;
        }
        if (logger.isInfoEnabled()) {
            logger.info("Destroy registry:" + getUrl());
        }
        // Unregister
        Set<URL> destroyRegistered = new HashSet<URL>(getRegistered());
        if (!destroyRegistered.isEmpty()) {
            for (URL url : new HashSet<URL>(getRegistered())) {
                if (url.getParameter(Constants.DYNAMIC_KEY, true)) {
                    try {
                        unregister(url); // Unregister
                        if (logger.isInfoEnabled()) {
                            logger.info("Destroy unregister url " + url);
                        }
                    } catch (Throwable t) {
                        logger.warn("Failed to unregister url " + url + " to registry " + getUrl() + " on destroy, cause: " + t.getMessage(), t);
                    }
                }
            }
        }
        // unsubscribe
        Map<URL, Set<NotifyListener>> destroySubscribed = new HashMap<URL, Set<NotifyListener>>(getSubscribed());
        if (!destroySubscribed.isEmpty()) {
            for (Map.Entry<URL, Set<NotifyListener>> entry : destroySubscribed.entrySet()) {
                URL url = entry.getKey();
                for (NotifyListener listener : entry.getValue()) {
                    try {
                        unsubscribe(url, listener); // unsubscribe
                        if (logger.isInfoEnabled()) {
                            logger.info("Destroy unsubscribe url " + url);
                        }
                    } catch (Throwable t) {
                        logger.warn("Failed to unsubscribe url " + url + " to registry " + getUrl() + " on destroy, cause: " + t.getMessage(), t);
                    }
                }
            }
        }
    }
    
    • Registry is registered and subscribed to by both service providers and consumers, so cancellation is required.
  • FailbackRegistry, a subclass of AbstractRegistry, implements the retry task of destroying the public.The code is as follows:

    @Override
    public void destroy() {
        // Ignore if destroyed
        if (!canDestroy()) {
            return;
        }
        // Call parent method, unregister and subscribe
        super.destroy();
        // Destroy Retry Task
        try {
            retryFuture.cancel(true);
        } catch (Throwable t) {
            logger.warn(t.getMessage(), t);
        }
    }
    
    protected boolean canDestroy(){
        return destroyed.compareAndSet(false, true);
    }
    
  • FailbackRegistry has multiple implementation classes with logic to destroy their corresponding client connections.Take ZookeeperRegistry as an example.The code is as follows:

    @Override
    public void destroy() {
        // Call parent method, unregister and subscribe
        super.destroy();
        try {
            // Close Zookeeper Client Connection
            zkClient.close();
        } catch (Exception e) {
            logger.warn("Failed to close zookeeper client " + getUrl() + ", cause: " + e.getMessage(), e);
        }
    }
    

2.2 DubboProtocol

#destroy() method, destroys all communications ExchangeClient and ExchangeServer.The code is as follows:

 1: @SuppressWarnings("Duplicates")
 2: @Override
 3: public void destroy() {
 4:     // Destroy all Exchange Servers
 5:     for (String key : new ArrayList<String>(serverMap.keySet())) {
 6:         ExchangeServer server = serverMap.remove(key);
 7:         if (server != null) {
 8:             try {
 9:                 if (logger.isInfoEnabled()) {
10:                     logger.info("Close dubbo server: " + server.getLocalAddress());
11:                 }
12:                 server.close(ConfigUtils.getServerShutdownTimeout());
13:             } catch (Throwable t) {
14:                 logger.warn(t.getMessage(), t);
15:             }
16:         }
17:     }
18: 
19:     // Destroy all ExchangeClient s
20:     for (String key : new ArrayList<String>(referenceClientMap.keySet())) {
21:         ExchangeClient client = referenceClientMap.remove(key);
22:         if (client != null) {
23:             try {
24:                 if (logger.isInfoEnabled()) {
25:                     logger.info("Close dubbo connect: " + client.getLocalAddress() + "-->" + client.getRemoteAddress());
26:                 }
27:                 client.close(ConfigUtils.getServerShutdownTimeout()); // Destroy
28:             } catch (Throwable t) {
29:                 logger.warn(t.getMessage(), t);
30:             }
31:         }
32:     }
33:     // Destroy all ghost ExchangeClient s
34:     for (String key : new ArrayList<String>(ghostClientMap.keySet())) {
35:         ExchangeClient client = ghostClientMap.remove(key);
36:         if (client != null) {
37:             try {
38:                 if (logger.isInfoEnabled()) {
39:                     logger.info("Close dubbo connect: " + client.getLocalAddress() + "-->" + client.getRemoteAddress());
40:                 }
41:                 client.close(ConfigUtils.getServerShutdownTimeout()); // Destroy
42:             } catch (Throwable t) {
43:                 logger.warn(t.getMessage(), t);
44:             }
45:         }
46:     }
47:     // [TODO 8033] parameter callback
48:     stubServiceMethodsMap.clear();
49:     super.destroy();
50: }
  • In fact, an application can be both a service provider and a service consumer.Therefore, you need to close ExchangeClient and ExchangeServer.
  • Lines 4 to 17: Loop the HeaderExchangeServer#close(timeout) method to destroy all ExchangeServers.For a detailed analysis, see 「2.2.1 HeaderExchangeServer」 .
  • Lines 19 to 32: Loop the ReferenceCountExchangeClient#close(timeout) method to destroy all ReferenceCountExchangeClients.Inside the method, the HeaderExchangeClient#close(timeout) method is called to close the HeaderExchangeClient object.For a detailed analysis, see 「2.2.2 HeaderExchangeClient」.
  • Lines 33 to 46: The LazyConnectExchangeClient#close(timeout) method is called circularly to close.For more information about the LazyConnectExchangeClient, see Perfect Dubbo Source Analysis - Remote Reference to Service Reference (Dubbo) Of 「5.2 LazyConnectExchangeClient」 .
  • Line 48: [TODO 8033] Parameter callback
  • Line 49: Call the parent AbstractExporter#unexport() method to remove the exposure of the service (Exporter).The code is as follows:

     1: @Override
     2: public void destroy() {
     3:     //  Destroy all Invoker s of service consumers corresponding to the agreement
     4:     for (Invoker<?> invoker : invokers) {
     5:         if (invoker != null) {
     6:             invokers.remove(invoker);
     7:             try {
     8:                 if (logger.isInfoEnabled()) {
     9:                     logger.info("Destroy reference: " + invoker.getUrl());
    10:                 }
    11:                 invoker.destroy();
    12:             } catch (Throwable t) {
    13:                 logger.warn(t.getMessage(), t);
    14:             }
    15:         }
    16:     }
    17:     // Destroy all Exporter s of service providers corresponding to the agreement
    18:     for (String key : new ArrayList<String>(exporterMap.keySet())) {
    19:         Exporter<?> exporter = exporterMap.remove(key);
    20:         if (exporter != null) {
    21:             try {
    22:                 if (logger.isInfoEnabled()) {
    23:                     logger.info("Unexport service: " + exporter.getInvoker().getUrl());
    24:                 }
    25:                 exporter.unexport();
    26:             } catch (Throwable t) {
    27:                 logger.warn(t.getMessage(), t);
    28:             }
    29:         }
    30:     }
    31: }
    
    • Lines 3 to 16: Cycle to destroy all Invokers (DubboInvoker here) of service consumers corresponding to the agreement (DubboProtocol).For a detailed analysis, see 「2.2.3 DubboInvoker」 .
    • Lines 17 to 30: Cycle, destroy all Exporters (DubboExporter here) of the service provider corresponding to the agreement (DubboProtocol).For a detailed analysis, see 「2.2.4 DubboExporter」 .

2.2.1 HeaderExchangeServer

#close(timeout) method, the overall process is as follows:

  • The red box section: Because the ProtocolListenerWrapper and the ProtocolFilterWrapper and Protocols'Ubbo SPI Wrapper implementation classes, they are called first when the DubboProtocol#destroy() method is called.At present, it is just a layer of packaging, no logic, the code is as follows:

     // ProtocolListenerWrapper.java
     @Override
    public void destroy() {
        protocol.destroy();
    }
    
    // ProtocolFilterWrapper.java
    @Override
    public void destroy() {
        protocol.destroy();
    }
    

2.2.2 HeaderExchangeClient

#close(timeout) method, the overall process is as follows:

2.2.3 DubboInvoker

#destroy() method, destroy ExchangeClient.The code is as follows:

 1: @Override
 2: public void destroy() {
 3:     // Ignore if destroyed
 4:     if (super.isDestroyed()) {
 5:         return;
 6:     } else {
 7:         // double check to avoid dup close
 8:         // Dual lock check to avoid being turned off
 9:         destroyLock.lock();
10:         try {
11:             if (super.isDestroyed()) {
12:                 return;
13:             }
14:             // Tag off
15:             super.destroy();
16:             // Remove `invokers`
17:             if (invokers != null) {
18:                 invokers.remove(this);
19:             }
20:             // Close ExchangeClient s
21:             for (ExchangeClient client : clients) {
22:                 try {
23:                     client.close(ConfigUtils.getServerShutdownTimeout());
24:                 } catch (Throwable t) {
25:                     logger.warn(t.getMessage(), t);
26:                 }
27:             }
28:         } finally {
29:             // Release lock
30:             destroyLock.unlock();
31:         }
32:     }
33: }
  • Code is easy to understand, fat friends see code comments.Here are just a few ways to share.
  • Parent AbstractInvoker#isDestroyed() method to determine if it has been destroyed.The code is as follows:

    /**
     * Is it destroyed
     */
    private AtomicBoolean destroyed = new AtomicBoolean(false);
    
    public boolean isDestroyed() {
        return destroyed.get();
    }
    
  • Parent AbstractInvoker#destroy() method, token destroyed.The code is as follows:

    /**
     * Is Available
     */
    private volatile boolean available = true;
    
    @Override
    public void destroy() {
        if (!destroyed.compareAndSet(false, true)) {
            return;
        }
        setAvailable(false);
    }
    
    protected void setAvailable(boolean available) {
        this.available = available;
    }
    
    • Also, it will mark that DubboInvoker is no longer available.
    • Calling the #invoke(Invocation) method after the tag has been destroyed throws an RpcException exception.The code is as follows:

      @Override
      public Result invoke(Invocation inv) throws RpcException {
          if (destroyed.get()) {
              throw new RpcException("Rpc invoker for service " + this + " on consumer " + NetUtils.getLocalHost()
                      + " use dubbo version " + Version.getVersion()
                      + " is DESTROYED, can not be invoked any more!");
          }
          
          // ...omit other code
      }
      
      • x
  • Lines 20 to 27: Loop, call the ReferenceCountExchangeClient#close(timeout) method, and close the client.In fact, the client has been closed in the DubboProtocol#destroy() method.Although it looks duplicated, it doesn't.Because DubboInvoker needs to be destroyed when the remote service provider closes, the client's link must be closed.Therefore, DubboInvoker must have this logic.

2.2.4 DubboExporter

#unexport() method, cancel exposure.The code is as follows:

/**
 * Service Key
 */
private final String key;
/**
 * Exporter aggregate
 *
 * key: Service Key
 *
 * The value is actually {@link com.alibaba.dubbo.rpc.protocol.AbstractProtocol#exporterMap}
 */
private final Map<String, Exporter<?>> exporterMap;

@Override
public void unexport() {
    // Unexpose
    super.unexport();
    // Remove yourself
    exporterMap.remove(key);
}
  • Call the parent AbstractExporter#unexport() method to unexpose.The code is as follows:

    /**
     * Invoker object
     */
    private final Invoker<T> invoker;
    /**
     * Whether to Unexpose Services
     */
    private volatile boolean unexported = false;
    
    @Override
    public void unexport() {
        // Marker Unexposed
        if (unexported) {
            return;
        }
        unexported = true;
        // Destroy
        getInvoker().destroy();
    }
    
    • Where invoker is shown in the following figure:

      • This Invoker was created using JavassistProxyFactory and actually implements the AbstractProxyInvoker Abstract class.So the #destroy() method is as follows, with the following code:

         @Override
        public void destroy() {
        }
        
        • _Empty, hey hey hey.

2.3 RegistryProtocol

#destroy() method, to remove all exporters from exposure.The code is as follows:

/**
 * A collection of bound relationships.
 *
 * key: Service Dubbo URL
 */
private final Map<String, ExporterChangeableWrapper<?>> bounds = new ConcurrentHashMap<String, ExporterChangeableWrapper<?>>();

@Override
public void destroy() {
    // Get Exporter Array
    List<Exporter<?>> exporters = new ArrayList<Exporter<?>>(bounds.values());
    // Unexpose all Exporter s
    for (Exporter<?> exporter : exporters) {
        exporter.unexport();
    }
    // empty
    bounds.clear();
}
  • Loop, call the ExporterChangeableWrapper#unexport() method, and cancel the service exposure.The code is as follows:

    /**
     * Exporter object exposed
     */
    private Exporter<T> exporter;
    
    @Override
    public void unexport() {
        String key = getCacheKey(this.originInvoker);
        // Remove `bounds`
        bounds.remove(key);
        // Unexpose
        exporter.unexport();
    }
    
    • Because the service provider integrates the Configurator of configuration rules, you need to use ExporterChangeableWrapper to save the original Invoker object.
      • Therefore, all of the above DE-exposure logic cannot destroy the ExporterChangeableWrapper's mapping to bounds and needs to be implemented through the #destroy() method of RegistryProtocol s.
      • Therefore, the exposed Exporter object, exporter, is called here and has been DE-exposed by the AbstractExporter#unexport() method.However, this logic cannot be removed here, because there may be a place to call the ExporterChangeableWrapper#unexport() method.

3. ExecutorUtil

3.1 gracefulShutdown

The #gracefulShutdown(executor, timeout) method closes gracefully, prohibits new tasks from being submitted, and completes old tasks.

public static void gracefulShutdown(Executor executor, int timeout) {
    // Ignore, if not ExecutorService, or closed
    if (!(executor instanceof ExecutorService) || isShutdown(executor)) {
        return;
    }
    // Close, disable new tasks from submitting, and finish existing tasks
    final ExecutorService es = (ExecutorService) executor;
    try {
        es.shutdown(); // Disable new tasks from being submitted <1>
    } catch (SecurityException ex2) {
        return;
    } catch (NullPointerException ex2) {
        return;
    }
    // Wait for the original task to finish.Force all tasks to end if waiting for a timeout
    try {
        if (!es.awaitTermination(timeout, TimeUnit.MILLISECONDS)) {
            es.shutdownNow();
        }
    } catch (InterruptedException ex) {
        // An InterruptedException exception occurs, also forcing the end of all tasks
        es.shutdownNow();
        Thread.currentThread().interrupt();
    }
    // New threads open to close if not closed successfully
    if (!isShutdown(es)) {
        newThreadToCloseExecutor(es);
    }
}

3.2 shutdownNow

The #shutdownNow(executor, timeout) method, which forces shutdown, including interrupting tasks that are already executing.

public static void shutdownNow(Executor executor, final int timeout) {
    // Ignore, if not ExecutorService, or closed
    if (!(executor instanceof ExecutorService) || isShutdown(executor)) {
        return;
    }
    // Close immediately, including interrupted tasks
    final ExecutorService es = (ExecutorService) executor;
    try {
        es.shutdownNow(); // <1>
    } catch (SecurityException ex2) {
        return;
    } catch (NullPointerException ex2) {
        return;
    }
    // Waiting for the original task to be interrupted
    try {
        es.awaitTermination(timeout, TimeUnit.MILLISECONDS);
    } catch (InterruptedException ex) {
        Thread.currentThread().interrupt();
    }
    // New threads open to close if not closed successfully
    if (!isShutdown(es)) {
        newThreadToCloseExecutor(es);
    }
}
  • Unlike the #gracefulShutdown(executor, timeout) method, the #shutdownNow() method is called at <1> instead of the #shutdown() method.

3.3 newThreadToCloseExecutor

#newThreadToCloseExecutor(ExecutorService) method, which opens new threads and constantly forces shutdown.

private static void newThreadToCloseExecutor(final ExecutorService es) {
    if (!isShutdown(es)) {
        shutdownExecutor.execute(new Runnable() {
            public void run() {
                try {
                    // Cycle 1000 times to force endpoint of thread pool
                    for (int i = 0; i < 1000; i++) {
                        // Close immediately, including interrupted tasks
                        es.shutdownNow();
                        // Waiting for the original task to be interrupted
                        if (es.awaitTermination(10, TimeUnit.MILLISECONDS)) {
                            break;
                        }
                    }
                } catch (InterruptedException ex) {
                    Thread.currentThread().interrupt();
                } catch (Throwable e) {
                    logger.warn(e.getMessage(), e);
                }
            }
        });
    }
}

666. Eggs

 

In theory, if a service provider is to be shut down, the general process is as follows:

Provider => registry: remove yourself
Provider => consumer: I'm ready to close, don't call me
All consumer => provider: Okay, I know
Provider => consumer: process all original requests
provider shutdown

But the reality is very complex, if you rely on consumer to answer and confirm.So Dubbo's choice is:

  • Provider removes itself from registry.And sleep waits for a certain amount of time (developer-allocated) for consumer to be notified.Of course, this process is not absolutely successful.For example, consumer cannot connect to registry, but to the upper provider.
  • provider informs consumer that he is ready to close and does not ask for himself.When all notifications are complete, wait until the original request is processed.When finished, close the local server and thread pool.

Of course, consumer also gracefully shuts down, waiting for all the requests it makes to end.Relatively simple.

Recommended reading articles:

Posted by Hexen on Wed, 04 Sep 2019 18:11:08 -0700