1. What is sentinel?
With the popularity of microservices, the stability between services becomes more and more important. Sentinel takes traffic as the entry point, and protects the stability of services from multiple dimensions such as traffic control, fuse degradation, system load protection, etc.
Sentinel has the following characteristics:
- Rich application scenarios: Sentinel has undertaken the core scenarios of Alibaba's "double 11" traffic promotion in the past 10 years, such as seckill, message peak cutting and valley filling, cluster traffic control, real-time fuse downstream unavailable applications, etc.
- Complete real-time monitoring: Sentinel also provides real-time monitoring function. You can see the second level data of a single machine accessing the application in the console, and even the summary operation of clusters of less than 500.
- Wide open source ecosystem: Sentinel provides out of the box integration modules with other open source frameworks / libraries, such as integration with Spring Cloud, Dubbo, gRPC. The Sentinel can be accessed quickly only by introducing corresponding dependency and simple configuration.
- Perfect SPI extension point: Sentinel provides simple and easy-to-use, perfect SPI extension interface. You can quickly customize the logic by implementing an extension interface. For example, custom rule management, adaptive dynamic data source and so on.
Sentinel's main features:
Sentinel's open source ecosystem:
Sentinel is divided into two parts:
- The core library (Java client) does not rely on any framework / library, and can run in all Java runtime environments. At the same time, it has better support for Dubbo / Spring Cloud and other frameworks.
- The Dashboard is developed based on Spring Boot, and can be run directly after packaging, without the need for additional application containers such as Tomcat.
2. Sentinel quick start
First, introduce Sentinel dependency
<dependency> <groupId>com.alibaba.csp</groupId> <artifactId>sentinel-core</artifactId> <version>1.7.1</version> </dependency>
Next, define resources
Resource is one of the core concepts in Sentinel. The most common resource is the Java method in our code. Of course, you can also define your resources more flexibly. For example, surround the code that needs to control traffic with Sentinel API SphU.entry("HelloWorld") and entry.exit(). In the following example, we wrap System.out.println("hello world") as a resource (protected logic) with API. For example:
try (Entry entry = SphU.entry("HelloWorld")) { // Your business logic here. System.out.println("hello world"); } catch (BlockException e) { // Handle rejected request. e.printStackTrace(); } // try-with-resources auto exit
You can also define resources using annotations https://github.com/alibaba/Sentinel/wiki/%E6%B3%A8%E8%A7%A3%E6%94%AF%E6%8C%81
For example:
@SentinelResource("HelloWorld") public void helloWorld() { // Logic in resources System.out.println("hello world"); }
Finally, define the rules
Next, the flow control rules are used to specify the number of requests that the resource is allowed to pass. For example, the following code defines that the resource HelloWorld can only pass up to 20 requests per second.
private static void initFlowRules(){ List<FlowRule> rules = new ArrayList<>(); FlowRule rule = new FlowRule(); rule.setResource("HelloWorld"); rule.setGrade(RuleConstant.FLOW_GRADE_QPS); // Set limit QPS to 20. rule.setCount(20); rules.add(rule); FlowRuleManager.loadRules(rules); }
Finish!
The complete code is as follows:
pom.xml
1 <?xml version="1.0" encoding="UTF-8"?> 2 <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 3 xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd"> 4 <modelVersion>4.0.0</modelVersion> 5 <parent> 6 <groupId>org.springframework.boot</groupId> 7 <artifactId>spring-boot-starter-parent</artifactId> 8 <version>2.2.2.RELEASE</version> 9 <relativePath/> <!-- lookup parent from repository --> 10 </parent> 11 <groupId>com.cjs.example</groupId> 12 <artifactId>sentinel-example</artifactId> 13 <version>0.0.1-SNAPSHOT</version> 14 <name>sentinel-example</name> 15 16 <properties> 17 <java.version>1.8</java.version> 18 <spring-cloud.version>Greenwich.SR4</spring-cloud.version> 19 <spring-cloud-alibaba.version>2.1.0.RELEASE</spring-cloud-alibaba.version> 20 </properties> 21 22 <dependencies> 23 <dependency> 24 <groupId>org.springframework.boot</groupId> 25 <artifactId>spring-boot-starter-actuator</artifactId> 26 </dependency> 27 <dependency> 28 <groupId>org.springframework.boot</groupId> 29 <artifactId>spring-boot-starter-web</artifactId> 30 </dependency> 31 <dependency> 32 <groupId>com.alibaba.cloud</groupId> 33 <artifactId>spring-cloud-starter-alibaba-sentinel</artifactId> 34 </dependency> 35 36 </dependencies> 37 38 <dependencyManagement> 39 <dependencies> 40 <dependency> 41 <groupId>org.springframework.cloud</groupId> 42 <artifactId>spring-cloud-dependencies</artifactId> 43 <version>${spring-cloud.version}</version> 44 <type>pom</type> 45 <scope>import</scope> 46 </dependency> 47 48 <dependency> 49 <groupId>com.alibaba.cloud</groupId> 50 <artifactId>spring-cloud-alibaba-dependencies</artifactId> 51 <version>${spring-cloud-alibaba.version}</version> 52 <type>pom</type> 53 <scope>import</scope> 54 </dependency> 55 </dependencies> 56 </dependencyManagement> 57 58 <build> 59 <plugins> 60 <plugin> 61 <groupId>org.springframework.boot</groupId> 62 <artifactId>spring-boot-maven-plugin</artifactId> 63 </plugin> 64 </plugins> 65 </build> 66 67 </project>
application.properties
server.port=8084 spring.application.name=sentinel-example spring.cloud.sentinel.transport.dashboard=127.0.0.1:8080
SentinelExampleApplication.java
1 package com.cjs.example.sentinel; 2 3 import com.alibaba.csp.sentinel.Entry; 4 import com.alibaba.csp.sentinel.SphU; 5 import com.alibaba.csp.sentinel.slots.block.BlockException; 6 import com.alibaba.csp.sentinel.slots.block.RuleConstant; 7 import com.alibaba.csp.sentinel.slots.block.flow.FlowRule; 8 import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager; 9 import org.springframework.boot.SpringApplication; 10 import org.springframework.boot.autoconfigure.SpringBootApplication; 11 12 import java.util.ArrayList; 13 import java.util.List; 14 15 @SpringBootApplication 16 public class SentinelExampleApplication { 17 18 public static void main(String[] args) { 19 SpringApplication.run(SentinelExampleApplication.class, args); 20 21 22 // Configure rules 23 initFlowRules(); 24 25 while (true) { 26 // In version 1.5.0, try with resources feature can be directly used to automatically exit entry 27 try (Entry entry = SphU.entry("HelloWorld")) { 28 // Protected logic 29 System.out.println("hello world"); 30 } catch (BlockException ex) { 31 // Handling flow controlled logic 32 System.out.println("blocked!"); 33 } 34 } 35 } 36 37 38 private static void initFlowRules() { 39 List<FlowRule> rules = new ArrayList<>(); 40 41 FlowRule rule = new FlowRule(); 42 rule.setResource("HelloWorld"); 43 rule.setGrade(RuleConstant.FLOW_GRADE_QPS); 44 // Set limit QPS to 20. 45 rule.setCount(20); 46 rules.add(rule); 47 48 FlowRuleManager.loadRules(rules); 49 50 } 51 }
TestController.java
1 package com.cjs.example.sentinel; 2 3 import com.alibaba.csp.sentinel.annotation.SentinelResource; 4 import org.springframework.web.bind.annotation.GetMapping; 5 import org.springframework.web.bind.annotation.RestController; 6 7 @RestController 8 public class TestController { 9 10 @GetMapping("/hello") 11 @SentinelResource("hello") 12 public String hello() { 13 return "hello"; 14 } 15 16 }
3. Sentinel console
Sentinel console shall at least include the following functions:
- Check the list of machines and health status: collect heartbeat packets sent by Sentinel client to determine whether the machine is online.
- Monitoring (single machine and cluster aggregation): through the monitoring API exposed by Sentinel client, the application monitoring information can be pulled and aggregated regularly, and the real-time monitoring of second level can be realized finally.
- Rule management and push: unified management of push rules.
- Authentication: authentication is very important in the production environment.
https://github.com/alibaba/Sentinel/wiki/%E6%8E%A7%E5%88%B6%E5%8F%B0
Get console:
Mode 1: download the already packed package
https://github.com/alibaba/Sentinel/releases
wget https://github.com/alibaba/Sentinel/releases/download/1.7.1/sentinel-dashboard-1.7.1.jar
Mode 2: through the source component
mvn clean package
start default console
java -Dserver.port=8080 -Dcsp.sentinel.dashboard.server=localhost:8080 -Dproject.name=sentinel-dashboard -jar sentinel-dashboard-1.7.1.jar
The default username and password are sentinel
4. Sentinel annotation support
@SentinelResource is used to define resources and provide optional exception handling and fallback configuration items
@Properties of SentinelResource annotation:
- value: resource name
- entryType : the entry type (inbound or outbound)
- blockHandler/blockHandlerClass: blockHandler is the name of the function that handles BlockException. The access scope of the blockHandler function needs to be public, the return type needs to match the original method, the parameter type needs to match the original method, and an additional parameter needs to be added at the end. The type is BlockException. The blockHandler function needs to be in the same Class as the original method by default. If you want to use functions of the other classes, you can specify blockHandlerClass. blockHandlerClass is the Class object of the corresponding function. Note that the corresponding function must be a static function, otherwise it cannot be resolved.
-
Fallback: the name of the fallback function, optional, used to provide fallback processing logic when an exception is thrown. The fallback function can handle all types of exceptions except those excluded in exceptionsToIgnore. Fallback function signature and location requirements: defaultFallback: default fallback function name, optional, usually used for general fallback logic
- The return value type must be the same as that of the original function;
- The method parameter list needs to be consistent with the original function, or an additional Throwable type parameter can be used to receive the corresponding exception;
- The fallback function needs to be in the same Class as the original method by default. If you want to use functions of other classes, you can specify fallbackClass as the Class object of the corresponding Class. Note that the corresponding function must be a static function, otherwise it cannot be resolved
- exceptionsToIgnore: used to specify which exceptions are excluded, which will not be included in the exception statistics or into the fallback logic, but will be thrown as is
It should be noted that if both blockHandler and fallback are configured, only the blockHandler processing logic will be entered when the BlockException is thrown due to current limiting degradation. If blockHandler, fallback and defaultFallback are not configured, BlockException will be thrown directly when it is degraded by current restriction (if throws BlockException is not defined by the method itself, it will be wrapped by JVM with a layer of undeclared throwableexception).
Example:
1 public class TestService { 2 3 // The corresponding 'handleException' function must be in the 'ExceptionUtil' class and must be a static function 4 @SentinelResource(value = "test", blockHandler = "handleException", blockHandlerClass = {ExceptionUtil.class}) 5 public void test() { 6 System.out.println("Test"); 7 } 8 9 // Primitive function 10 @SentinelResource(value = "hello", blockHandler = "exceptionHandler", fallback = "helloFallback") 11 public String hello(long s) { 12 return String.format("Hello at %d", s); 13 } 14 15 // Fallback function. The signature of the function is the same as the original function or a Throwable type parameter is added 16 public String helloFallback(long s) { 17 return String.format("Halooooo %d", s); 18 } 19 20 // Block exception handling function, the last parameter is one more BlockException, and the rest is consistent with the original function 21 public String exceptionHandler(long s, BlockException ex) { 22 // Do some log here. 23 ex.printStackTrace(); 24 return "Oops, error occurred at " + s; 25 } 26 }
As you can see, blockHandler and fallback must be in the same class as the original method. If you don't want to write in the same class, you can use blockHandlerClass to specify the class, and then use blockHandler to specify the method name.
If both blockHandler and fallback are configured, BlockException will only enter the blockHandler processing logic.
5. Basic concepts of sentinel
Resources
As long as the code defined through Sentinel API is a resource, it can be protected by Sentinel. In most cases, you can use method signatures, URL s, or even service names as resource names to identify resources.
rule
The rules set around the real-time state of resources can include flow control rules, fuse degradation rules and system protection rules. All rules can be adjusted dynamically and in real time.
flow control
Traffic control is a commonly used concept in network transmission, which is used to adjust the transmission data of network packets. However, from the perspective of system stability, there are also many concerns about the speed of processing requests. Requests coming at any time are often random and uncontrollable, and the processing capacity of the system is limited. We need to control the flow according to the processing capacity of the system. Sentinel, as a coordinator, can adjust random requests to appropriate shapes as required, as shown in the following figure:
The flow control has the following angles:
- The call relationship of resources, such as the call link of resources, the relationship between resources and resources;
- Operation indicators, such as QPS, thread pool, system load, etc;
- Control effects, such as direct current limiting, cold start, queuing, etc
Sentinel's design philosophy is to let you choose the angle of control freely and combine flexibly to achieve the desired effect.
Fusing degradation
The principle of Sentinel and Hystrix is the same: when the unstable performance of a resource in the call link is detected, such as the long response time of the request or the increasing proportion of exceptions, the call of this resource is limited to make the request fail quickly, so as to avoid cascading failures caused by affecting other resources.
Sentinel and Hystrix take a totally different approach to the means of restriction.
Hystrix isolates dependencies (resources in Sentinel's concept) through thread pool isolation. The advantage of this is the most thorough separation between resources. The disadvantage is that in addition to increasing the cost of thread switching (too many thread pools lead to too many threads), it is also necessary to allocate the size of thread pool to each resource in advance. As shown in the figure below:
Sentinel has taken two approaches to this issue:
- Limit by concurrent threads
Unlike resource pool isolation, Sentinel reduces the impact of unstable resources on other resources by limiting the number of concurrent threads of resources. In this way, there is no loss of thread switching and you do not need to pre allocate the size of the thread pool. When a resource is unstable, such as a longer response time, the direct impact on the resource is the gradual accumulation of threads. When the number of threads accumulates to a certain number on a specific resource, new requests to that resource are rejected. Stacked threads do not start receiving requests until they have completed their tasks.
- Degradation of resources by response time
In addition to controlling the number of concurrent threads, Sentinel can quickly degrade unstable resources through response time. When the response time of the dependent resource is too long, all access to the resource will be directly denied, and will not be restored until the specified time window has passed.
System load protection
Sentinel also provides adaptive protection of system dimensions. Preventing avalanche is an important part of system protection. When the system load is high, if the request continues to enter, it may cause the system to crash and fail to respond. In the cluster environment, network load balancing will forward the traffic that should be carried by this machine to other machines. If other machines are in an edge state at this time, the increased traffic will cause this machine to crash, and finally the whole cluster will not be available.
In view of this situation, Sentinel provides a corresponding protection mechanism to balance the system's inlet flow and the system's load, so as to ensure that the system can handle the most requests within its capacity.
6. How to use Sentinel
Sentinel can be simply divided into sentinel core library and Dashboard. The core library does not rely on Dashboard, but it can achieve the best results when combined with Dashboard.
Resources can be anything, services, methods in services, even a piece of code. Using Sentinel to protect resources is mainly divided into several steps:
- Define resources
- Definition rules
- Is the inspection rule effective
When coding, you only need to consider whether the code needs to be protected, and if so, define it as a resource.
Common ways to define resources
Mode 1: define resources by throwing exceptions
SphU includes try catch style API s. In this way, a BlockException will be thrown when a resource is throttled. At this time, exceptions can be caught for logical processing after current limiting. The sample code is as follows:
1 // Try with resources feature is available in version 1.5.0 2 // Resource names can use any string with business semantics, such as method names, interface names, or other uniquely identifiable strings. 3 try (Entry entry = SphU.entry("resourceName")) { 4 // Protected business logic 5 // do something here... 6 } catch (BlockException ex) { 7 // Resource access blocked, restricted or degraded 8 // Do the corresponding processing here 9 }
In particular, if the hotspot parameter is passed in during entry, the corresponding parameter (exit(count, args)) must be brought in during exit, otherwise there may be statistical errors. Try with resources cannot be used at this time. In addition, when Tracer.trace(ex) is used to record abnormal information, because of the problem of catch calling sequence in try-with-resources syntax, the abnormal number can not be correctly counted. Therefore, Tracer.trace(ex) can not be invoked in catch block of try-with-resources when statistical information is abnormal.
Manual exit example:
1 Entry entry = null; 2 // Make sure that finally is executed 3 try { 4 // The resource name can use any string with business semantics. Note that the number cannot be too many (more than 1K). If it exceeds several thousand, please pass it in as a parameter rather than as a resource name directly 5 // EntryType represents the traffic type (inbound/outbound), where system rules only take effect for IN type buried points 6 entry = SphU.entry("Custom resource name"); 7 // Protected business logic 8 // do something... 9 } catch (BlockException ex) { 10 // Resource access blocked, restricted or degraded 11 // Carry out corresponding processing operations 12 } catch (Exception ex) { 13 // If you need to configure degradation rules, you need to record business exceptions in this way 14 Tracer.traceEntry(ex, entry); 15 } finally { 16 // Make sure to exit, and make sure to match each entry with exit 17 if (entry != null) { 18 entry.exit(); 19 } 20 }
Example of hot spot parameter buried point:
1 Entry entry = null; 2 try { 3 // If you need to configure exceptions, only basic types are supported for parameters passed in. 4 // EntryType represents the traffic type, IN which the system rule is only valid for IN type buried points 5 // count is filled in 1 in most cases, which means that the statistics is a call. 6 entry = SphU.entry(resourceName, EntryType.IN, 1, paramA, paramB); 7 // Your logic here. 8 } catch (BlockException ex) { 9 // Handle request rejection. 10 } finally { 11 // Note: when you exit, you must also bring the corresponding parameters, otherwise there may be statistical errors. 12 if (entry != null) { 13 entry.exit(1, paramA, paramB); 14 } 15 }
Parameter description of SphU.entry():
Mode 2: define resources by annotation
Sentinel supports defining resources through @ SentinelResource annotation and configuring blockHandler and fallback functions for post current limiting processing. Example:
1 // The original business approach 2 @SentinelResource(blockHandler = "blockHandlerForGetUser") 3 public User getUserById(String id) { 4 throw new RuntimeException("getUserById command failed"); 5 } 6 7 // blockHandler function, called when the original method call is restricted / degraded / protected by the system 8 public User blockHandlerForGetUser(String id, BlockException ex) { 9 return new User("admin"); 10 }
Note that the blockHandler function is called when the original method is restricted / degraded / protected by the system, and the fallback function is used for all types of exceptions. Also note the formal requirements for blockHandler and fallback functions.
Mode 3: support asynchronous call
Sentinel supports asynchronous call link statistics. In asynchronous calls, you need to define resources through the SphU.asyncEntry(xxx) method, and you usually call the exit method in the asynchronous callback function. Example:
1 try { 2 AsyncEntry entry = SphU.asyncEntry(resourceName); 3 4 // Asynchronous call 5 doAsync(userId, result -> { 6 try { 7 // Handle the result of the asynchronous call here 8 } finally { 9 // exit after the callback 10 entry.exit(); 11 } 12 }); 13 } catch (BlockException ex) { 14 // Request blocked. 15 // Handle the exception (e.g. retry or fallback). 16 }
7. Sentinel work main process
In Sentinel, all resources correspond to a resource name, and an Entry object will be created for each resource call. Entries can be created automatically by adapting to the main framework, or explicitly by annotation or calling the SphU API. When creating an Entry, a series of function slots (slot chain s) will also be created. These slots have different responsibilities, such as:
- NodeSelectorSlot: it is responsible for collecting the paths of resources and storing the call paths of these resources in a tree structure, which is used to limit current and degrade according to the call paths
- ClusterBuilderSlot: it is used to store statistical information and caller information of the resource, such as RT, QPS, thread count, etc., which will be used as the basis for multi-dimensional flow restriction and degradation;
- StatisticSlot: used to record and count the runtime indicator monitoring information of different latitudes;
- FlowSlot: used to control the flow according to the preset flow restriction rules and the statistics status of the previous slots;
- AuthoritySlot: black and white list control is performed according to the configured black and white list and call source information;
- DegradeSlot: it uses statistical information and preset rules to do fuse degradation;
- SystemSlot: it controls the total inlet flow through the system status, such as load1, etc;
The overall architecture is as follows:
This colorful picture looks better
Sentinel extends SlotChainBuilder as SPI interface, which makes Slot Chain have the ability of extension. You can add custom slots and arrange the order of slots by yourself, so that you can add custom functions to sentinel.
8. Sentinel flow control
The principle of flow control is to monitor the QPS of application traffic or the number of concurrent threads. When the specified threshold value is reached, the flow is controlled to avoid being overwhelmed by the instantaneous flow peak, so as to ensure the high availability of application.
FlowSlot will control the traffic according to the preset rules and the real-time information counted by NodeSelectorSlot, ClusterNodeBuilderSlot and StatisticSlot.
The direct expression of flow restriction is to throw a FlowException exception when executing Entry nodeA = SphU.entry(resourceName). FlowException is a subclass of BlockException. You can catch BlockException to customize the processing logic after being restricted.
Multiple flow restriction rules can be created for the same resource. FlowSlot will traverse all current limiting rules of the resource in turn until there are rules triggering current limiting or all rules are traversed.
A current limiting rule is mainly composed of the following factors, which can be combined to achieve different current limiting effects:
- Resource: resource name, which is the object of the current restriction rule
- count: current limiting threshold
- grade: current limit threshold type (QPS or concurrent threads)
- limitApp: the call source of the flow control. If it is default, the call source will not be distinguished
- Strategy: call relationship current limiting strategy
- Control behavior: flow control effect (direct rejection, Warm Up, uniform queuing)
8.1. Traffic control based on QPS / concurrent number
There are two main types of traffic control statistics: one is to count the number of concurrent threads, the other is to count QPS. The type is defined by the grade field of FlowRule. Among them, 0 represents to limit the flow according to the concurrent quantity, and 1 represents to control the flow according to QPS. The number of threads and QPS value are obtained by StatisticSlot in real time.
You can view real-time statistics with the following command:
curl http://localhost:8719/cnode?id=resourceName
Flow control of concurrent threads
Concurrent thread number throttling is used to protect the number of business threads from being exhausted. For example, when the downstream application that the application depends on causes the instability of service and the increase of response delay for some reason, for the caller, it means that the throughput drops and more threads are occupied, and even the thread pool is exhausted in extreme cases. In order to deal with the situation of too many threads occupying, there are isolation schemes in the industry. For example, different business logic uses different thread pools to isolate resource contention (thread pool isolation) between businesses themselves. Although this isolation scheme has good isolation, the cost is that there are too many threads and the thread context switching has a large overhead, especially for the low latency calls. Sentinel concurrent thread number throttling is not responsible for creating and managing thread pools, but simply counting the number of threads in the current request context. If the threshold value is exceeded, new requests will be rejected immediately, with the effect similar to semaphore isolation.
QPS flow control
When the QPS exceeds a certain threshold, measures shall be taken for flow control. The effects of flow control include the following: direct rejection, Warm Up, uniform queuing. Corresponds to the controlBehavior field in FlowRule.
8.2. Flow control based on call relation
The call relationship includes the caller and the callee; one method may call other methods to form a hierarchical relationship of call links. Sentinel establishes the call relationship between different resources through NodeSelectorSlot, and records the real-time statistics of each resource through ClusterNodeBuilderSlot.
With the statistical information of the call link, we can derive a variety of flow control means.
Limit current according to caller
The origin parameter in the ContextUtil.enter(resourceName, origin) method indicates the caller's identity. This information will be counted in ClusterBuilderSlot. The call data of different callers to the same resource can be displayed by the following command:
According to the call link access current limit: link current limit
NodeSelectorSlot records the call links between resources, which form a call tree through the call relationship. The root node of this tree is a virtual node named machine root, and the entry of the call chain is all the child nodes of this virtual node.
A typical call tree is shown below:
Resource flow control with relation: associated flow control
When two resources compete or depend on each other, they are related. For example, there is a competition between the read and write operations of the same field in the database. Too high speed of reading will affect the speed of writing, and too high speed of writing will affect the speed of reading. If read and write operations are allowed to compete for resources, the cost of competing itself will reduce the overall throughput. For example, read dB and write DB represent database reading and writing respectively. We can set flow restriction rules for read DB to achieve the purpose of write priority: set FlowRule.strategy to ruleconstant.release and set flowrule.ref-identity to write dB. In this way, when the write library operation is too frequent, the request to read data will be limited.