Gremlin Application Attacks

Gremlin is a simple, safe and secure way to use Chaos Engineering to improve system resilience. Application attacks are designed to enable precise targeting of parts of your application, in order to understand how your system and operators react to that stress. See Why ALFI for more discussion.

Integrate the library

To use ALFI, you must integrate the Gremlin libraries into your application and redeploy. Please see the JVM Installation Guide for all of the details to make this happen. When you’ve successfully integrated the library, you should see logging like:

INFO com.gremlin.GremlinServiceFactory - Gremlin enabled for Team abcdefgh-1234-9876-3333-nopqrstuvwxy

Create attacks via the Web UI

Once you’ve got the ALFI library integrated, you can start creating attacks from the Web UI. Here you’ll see a history of ALFI attacks run by your team.

Once you click New ALFI Attack, you’ll get a form with Application Type, Traffic Type, and Impact sections.

Application Type

This section provides a way to target which application are eligible for the ALFI attack. Upon application startup, the ALFI code running in each application creates an ApplicationCoordinates and passes that to the Gremlin API. Each of the ApplicationCoordinates is eligible to pick up an ALFI attack. Please see Application Coordinates Setup for details on how to populate ApplicationCoordinates.

The ALFI library comes with two Application Types out of the box: AWS Lambda and AWS EC2. Custom Application Types can also be created from your application, which can then be used in the Web UI with the Add Custom Field button. Keep in mind that the most effective chaos experiments start small, so keep your custom Application Types as specific as possible.

Traffic Type

This section provides a way to pick out individual requests within your application and only impact that set. Any attribute which you’ve supplied in a TrafficCoordinates is eligible to use in constructing the attack. Please see Traffic Coordinates Setup and Attaching Request Context data to all TrafficCoordinates for details on how to control the data being placed into a TrafficCoordinates instance.

The ALFI library includes integrations for the Apache HTTP client and Dynamo DB client (with more to come!), however you’re free to create any sort of Traffic Type you’d like and use those custom fields as attributes of the attack.

For Traffic Type, you may also supply a Percentage of Traffic value. As probability is used to target this percentage, the actual impact may not exactly reflect the value specified.

Impact

This section provides a way to declare what impact you’d like to inject. You may choose an amount of latency to inject as well as a yes/no switch on whether you want this call to fail. These can also be combined to simulate a slow call which eventually fails. This impact gets applied to all traffic which matches the Traffic Type you’ve described above on the Application Type you’ve described above.

In this section, you also are required to declare the duration of the attack. For this duration, the attack is active and ALFI-enabled applications are impacted. As soon as the duration elapses, the applications no longer know about the attack and are no longer impacted.

Observe attack results

Once you’ve pressed the Create ALFI Attack button, the attack becomes active and applications will start picking it up. Here you can see all of the attributes used in scoping the attack, as well as what the impact is and the duration of the attack. The attack then starts progressing through different phases of its lifecycle, as described here:

Stage Description
Pending Created but no applications have picked up the attack
Distributed At least one application has picked up the attack, but none have been impacted
Impacted At least one application has picked up the attack and been impacted
Successful Impact was applied and duration elapsed
ApplicationNotFound No application ever picked up the attack and duration elapsed
TrafficNotFound No application ever applied impact and duration elapsed
Halted Attack was halted (by UI or API) prior to the duration elapsing