Gremlin provides a library of possible failure modes to test. You can impact system resources, delay or drop network traffic to your dependencies, shut down your hosts, and much more!
Each attack, or “gremlin”, tests your resilience in a different way:
Resource gremlins are a great starting point – simple to run and understand. They reveal how your service degrades when starved of CPU, memory, IO, or disk.
|CPU||Generates high load for one or more CPU cores.|
|Memory||Allocates a specific amount of RAM.|
|IO||Puts read/write pressure on I/O devices such as hard disks.|
|Disk||Writes files to disk to fill it to a specific percentage.|
Network gremlins allow you to see the impact of lost or delayed traffic to your application. Test how your service behaves when you can’t reach one of your dependencies, internal or external. Limit the impact to only the traffic you want to test by specifying ports, hostnames, and IP addresses.
|Blackhole||Drops all matching network traffic.|
|Latency||Injects latency into all matching egress network traffic.|
|Packet loss||Induces packet loss into all matching egress network traffic.|
|DNS||Blocks access to DNS servers.|
State gremlins are another category that introduce chaos into your infrastructure.
|Shutdown||Reboots or halts the host operating system, allowing you to test, for example, how your system behaves when losing one or more cluster machines.|
|Time travel||Changes the host’s system time, which can be used to simulate adjusting to daylight saving time and other time-related events.|
|Process killer||An attack which kills the specified process, which can be used to simulate application or dependency crashes.|
Attacks can be run ad-hoc, programmatically or scheduled. You can schedule attacks to execute on certain days and within a specified time window. You can also set the maximum number of attacks a schedule can spawn.
When creating an attack, you can save often used bits of configuration using templates. Just give it a name, and anyone within your organization will be able to use it. We like having business hours defined for random attacks, and commonly used network targets stored for reuse.