Gremlin Infrastructure Attacks

Intro to Attacks

Gremlin is a simple, safe and secure way to use Chaos Engineering to improve system resilience. The Gremlin Platform provides a range of attacks which you can run against your infrastructure. This includes Resource Gremlins, Network Gremlins and State Gremlins. It is also possible to schedule regular attacks, create attack templates and view attack reports.

How to run attacks with Gremlin

Gremlin provides a library of possible failure modes to test. You can impact system resources, delay or drop network traffic to your dependencies, shut down your hosts, and much more!

Visit the attack creation page to start testing your infrastructure today. Go to the active attacks page to monitor ongoing attacks.

Each attack, or “gremlin”, tests your resilience in a different way:

How to use Resource Gremlins

Resource gremlins are a great starting point – simple to run and understand. They reveal how your service degrades when starved of CPU, memory, IO, or disk.

Gremlin Impact
CPU Generates high load for one or more CPU cores.
Memory Allocates a specific amount of RAM.
IO Puts read/write pressure on I/O devices such as hard disks.
Disk Writes files to disk to fill it to a specific percentage.

How to use Network Gremlins

Network gremlins allow you to see the impact of lost or delayed traffic to your application. Test how your service behaves when you can’t reach one of your dependencies, internal or external. Limit the impact to only the traffic you want to test by specifying ports, hostnames, and IP addresses.

Gremlin Impact
Blackhole Drops all matching network traffic.
Latency Injects latency into all matching egress network traffic.
Packet loss Induces packet loss into all matching egress network traffic.
DNS Blocks access to DNS servers.

Warning: Important considerations for targeting Kubernetes Pods with Network Attacks

How to use State Gremlins

State gremlins are another category that introduce chaos into your infrastructure.

Gremlin Impact
Shutdown Reboots or halts the host operating system, allowing you to test, for example, how your system behaves when losing one or more cluster machines.
Time travel Changes the host’s system time, which can be used to simulate adjusting to daylight saving time and other time-related events.
Process killer An attack which kills the specified process, which can be used to simulate application or dependency crashes.

How to schedule attacks with Gremlin

Attacks can be run ad-hoc, programmatically or scheduled. You can schedule attacks to execute on certain days and within a specified time window. You can also set the maximum number of attacks a schedule can spawn.

How to create attack templates with Gremlin

When creating an attack, you can save often used bits of configuration using templates. Just give it a name, and anyone within your organization will be able to use it. We like having business hours defined for random attacks, and commonly used network targets stored for reuse.

Attack Stage Progression

Every Attack on Gremlin is composed of one or more Executions, where each Execution is an instance of the attack running on a specific target.

The Stage progression of an Attack is derived from the Stage progression of all of an Attack’s Executions. Gremlin weighs the importance of Stages so as to mark an Attack with the most important Stage of its executions.

Example

An Attack with three Executions must derive its stage by picking the most important stage among its executions. If the three Execution Stages are TargetNotFound, TargetNotFound, Running, the resulting stage for the Attack will be Running.

You can see Stages ordered by their importance below.

Stages

Stages are sorted by descending order of importance (the Running Stage holds the highest importance)

Stage Description
Running Attack running on the host
Halt Attack told to halt
RollbackStarted Code to rollback has started
RollbackTriggered Daemon started a rollback of client
InterruptTriggered Daemon issued an interrupt to the client
HaltDistributed Distributed to the host but not yet halted
Initializing Attack is creating the desired impact
Distributed Distributed to the host but not yet running
Pending Created but not yet distributed
Failed Client reported unexpected failure
HaltFailed Halt on client did not complete
InitializationFailed Creating the impact failed
LostCommunication Client never reported finishing/receiving execution
ClientAborted Something on the client/daemon side stopped the Gremlin and it was aborted without user intervention
UserHalted User issued a halt, and that is now complete
Successful Completed running on the Host
TargetNotFound Attack not scoped to any current targets

Conclusion

You now possess tools that make it possible for you to create and schedule attacks with Gremlin. You can also explore the Gremlin Blog for more information on how to use Chaos Engineering with your application infrastructure.