Ali chaos Engineering - a preliminary understanding of chaosblade tool
ChaosBlade is an open-source experimental injection tool of Alibaba that follows the principle of chaos engineering and chaos experimental model. It helps enterprises improve the fault tolerance of distributed systems and ensure business continuity in the process of enterprise going to the cloud or migrating to the cloud native system.
Chaosblade is an internal MonkeyKing open source project. It is based on Alibaba's fault test and drill practice in recent ten years and combines the best ideas and practices of the group's businesses.
The ChaosBlade is not only simple to use, but also supports rich experimental scenarios, including:
Basic resources: such as CPU, memory, network, disk, process and other experimental scenarios;
Java applications: such as databases, caches, messages, JVM s themselves, microservices, etc. you can also specify any class method to inject various complex experimental scenarios;
C + + application: for example, specify any method or a line of code injection delay, variable and return value tampering and other experimental scenarios;
Docker container: such as killing container, CPU, memory, network, disk, process and other experimental scenarios in the container;
Cloud original platform: for example, CPU, memory, network, disk and process experiment scenarios on Kubernetes platform nodes, Pod network and Pod itself experiment scenarios such as killing Pod, and container experiment scenarios such as Docker container experiment scenarios above;
Encapsulating the scene into separate projects according to the domain implementation can not only standardize the implementation of the scene in the domain, but also facilitate the horizontal and vertical expansion of the scene. By following the chaotic experimental model, the unified call of chaosblade cli can be realized. The items currently included are as follows:
chaosblade: chaotic experiment management tool, including commands such as creating experiment, destroying experiment, querying experiment, preparing experiment environment and revoking experiment environment. It is an execution tool of chaotic experiment. The execution methods include CLI and HTTP. Provide complete instructions for commands, experimental scenes and scene parameters, and the operation is simple and clear.
Chaosblade spec go: chaotic experimental model is defined in Golang language. Scenes that are easy to implement in Golang language are based on this specification.
Chaosblade exec OS: basic resource experiment scenario implementation.
Chaosblade exec docker: the docker container experiment scenario is implemented by calling the Docker API for standardization.
Chaosblade operator: the Kubernetes platform experiment scenario is implemented. The chaotic experiment is defined through the Kubernetes standard CRD method. It is convenient to use the Kubernetes resource operation method to create, update and delete the experiment scenario, including kubectl, client go and other methods. In addition, it can also be executed using the above chaosblade cli ent tool.
Chaosblade exec JVM: the Java application experiment scenario is implemented. It uses Java Agent technology to dynamically mount without any access and at zero cost. It also supports unloading and completely recycles various resources created by the Agent.
Chaosblade exec cplus: C + + application experimental scenario implementation, using GDB technology to implement method and code line level experimental scenario injection.
wget -c https://chaosblade.oss-cn-hangzhou.aliyuncs.com/agent/github/1.3.0/chaosblade-1.3.0-linux-amd64.tar.gz tar -zxvf chaosblade-1.3.0-linux-amd64.tar.gz
- Environment variable configuration
If the installation directory is / root/chaosblade/chaosblade-1.3.0
cat >> ~/.bash_profile << \EOF PATH=$PATH:/root/chaosblade/chaosblade-1.3.0 export PATH EOF source ~/.bash_profile
The official documents are introduced in detail. Select some command tests below.
[root@node1 ~]# blade create -h Create a chaos engineering experiment Usage: blade create [command] Aliases: create, c Examples: blade create cpu load --cpu-percent 60 Available Commands: cplus C++ chaos experiments cpu Cpu experiment disk Disk experiment docker Docker experiment druid Experiment with the Druid dubbo Experiment with the Dubbo es ElasticSearch experiment! file File experiment gateway gateway experiment! hbase hbase experiment! http http experiment jedis jedis experiment jvm Experiment with the JVM k8s Kubernetes experiment kafka kafka experiment lettuce redis client lettuce experiment log log experiment mem Mem experiment mongodb MongoDB experiment mysql mysql experiment network Network experiment process Process experiment psql Postgrelsql experiment rabbitmq rabbitmq experiment redisson redisson experiment rocketmq Rocketmq experiment,can make message send or pull delay and exception script Script chaos experiment servlet java servlet experiment strace strace experiment systemd Systemd experiment tars tars experiment Flags: -a, --async whether to create asynchronously, default is false -e, --endpoint string the create result reporting address. It takes effect only when the async value is true and the value is not empty -h, --help help for create -n, --nohup used to internal async create, no need to config --uid string Set Uid for the experiment, adapt to docker Global Flags: -d, --debug Set client to DEBUG mode Use "blade create [command] --help" for more information about a command.
1.1 destruction test
blade status --type create blade destroy 6fa04946baf42920
1.2.2 full load test
The purpose is to verify the CPU's ability of service quality, monitoring alarm, traffic scheduling and elastic scaling under specific load.
blade create cpu fullload --timeout 60
Verify the impact of disk io high load on system services, such as monitoring alarm, service stability, etc.
blade create disk burn --read --path /home blade create disk burn --write --path /home
1.4. Disk full load test
Verify the impact of full disk on system services, such as monitoring alarm, service stability, etc.
The falllocate pre allocated space is used here, and the instantaneous full space (for most file systems, such as ext4 xfs) does not actually occupy space.
df -h /home blade c disk fill --path /home --percent 80 --retain-handle df -h /home
df -h /home blade c disk fill --path /home --reserve 0 --timeout 30 df -h /home
Provide http service for remote invocation
blade server start -p 9526
Open firewall port 9526/tcp
firewall-cmd --zone=public --add-port=9526/tcp --permanent firewall-cmd --reload
Where% 20 is a space
curl http://192.168.198.131:9526/chaosblade?cmd=create%20cpu%20load%20--cpu-percent%2050 curl http://192.168.198.131:9526/chaosblade?cmd=destroy%204b454fae5486e53e
2.1.nginx network delay test
Another client machine can be used to test the network latency, time curl http://192.168.198.131/ Multiple visits may be involved.
docker pull nginx:latest docker run --name nginx-test -p 80:80 -d nginx time curl http://192.168.198.131/ blade create docker network delay --time 3000 --interface eth0 --local-port 80 --container-id 70b82b112fac time curl http://192.168.198.131/ blade destroy 0de302011f872a49
- blade prepare jvm (taking several seconds) is a prerequisite for executing blade create jvm;
- The blade prepare jvm mounts the java agent on the same process several times. There is only one instance, and the update time changes.
- In the new version, perform the blade create jvm operation. If there is no running blade prepare jvm instance, it will be created implicitly (an error may be reported in the old version). Therefore, it is recommended to create a blade prepare jvm instance.
Note: fb5a73661026c4f0 is the ID of the blade create jvm test instance; D54ae242c4538e97 is the implicitly generated blade prepare jvm instance ID
blade status --type prepare --target jvm blade create jvm delay --time 3000 --classname=com.example.blade.HelloController --methodname=hello --pid 6428 blade status --type prepare --target jvm blade destroy fb5a73661026c4f0 blade status --type prepare --target jvm blade revoke d54ae242c4538e97
Note: 4d5adfcbc7d46956 is the ID of the blade create jvm test instance; C05f768526eaf4db is the explicitly generated blade prepare jvm instance ID
blade prepare jvm --pid 6428 blade create jvm delay --time 3000 --classname=com.example.blade.HelloController --methodname=hello --pid 6428 blade status --type prepare --target jvm blade destroy 4d5adfcbc7d46956" blade revoke c05f768526eaf4db blade status --type prepare --target jvm