How to Install and Use Gremlin on Amazon Web Services

This tutorial shows how to install and use Gremlin on an Amazon Web Services (AWS) EC2 instance. If you already have an AWS account and a Gremlin account, this should take no more than 15 minutes.

Step 0 - Getting an EC2 Instance

If you don’t have any AWS EC2 instances yet, read Amazon’s documentation and create an instance.

Step 1 - Installing the Gremlin daemon and CLI on your EC2 instance

If you don’t already have an available Amazon Linux EC2 instance, follow the AWS documentation here to launch an EC2 instance.

First, ssh into your EC2 instance and add the Gremlin RPM repository:

$ sudo curl https://rpm.gremlin.com/gremlin.repo -o /etc/yum.repos.d/gremlin.repo

Then install the Gremlin daemon and CLI:

$ sudo yum install -y gremlin gremlind

Step 2 - Validating the Install

Run the following command to confirm gremlin has everything it needs to function: Note: DO NOT run this command on production hosts

$ gremlin syscheck

The CLI will walk through its library of attack types and run some mock attacks:

Checking resource gremlins ...
Checking CPU gremlin ...
Attack on cpu_1 completed successfully
CPU gremlin OK
...

The full syscheck may take a few minutes, so please be patient!

Step 3 - Registering with Gremlin

The Gremlin daemon (gremlind) connects to the Gremlin backend and waits for attack orders from you. When it receives attack orders, it uses the CLI (gremlin) to run the attack.

To connect gremlind to the Gremlin backend, you need your client credentials. (This is NOT the same as the email/password credentials you use to access the Gremlin Web App.) Read the Client Auth docs to see how to find your client credentials in the Web App.

With the credentials in hand, it’s time to configure the daemon. As with most daemons, you can configure gremlind either by configuration file or environment variables. Let’s use the configuration file.

Add these configuration options to the daemon’s configuration file:

$ echo 'GREMLIN_TEAM_ID="<INSERT_YOUR_TEAM_ID>"' >> /etc/default/gremlind
$ echo 'GREMLIN_TEAM_CERTIFICATE_OR_FILE="file:///var/lib/gremlin/gremlin.cert"' >> /etc/default/gremlind
$ echo 'GREMLIN_TEAM_PRIVATE_KEY_OR_FILE="file:///var/lib/gremlin/gremlin.key"' >> /etc/default/gremlind

Then add your PEM-encoded certificate and key to two new files—/var/lib/gremlin/gremlin.cert and /var/lib/gremlin/gremlin.key, respectively—and set the ownership and permissions on the files so that only gremlind can access them:

$ sudo chown gremlin:gremlin /var/lib/gremlin/gremlin.*
$ sudo chmod 600 /var/lib/gremlin/gremlin.*

Optionally, give the Gremlin daemon a custom ID so it’s easy to find in the Web App later:

$ echo 'GREMLIN_IDENTIFIER="my-docker-gremlin-host"' >> /etc/default/gremlind

That’s enough configuration for this tutorial, but feel free to read about other configuration options in the Gremlin Docs.

Restart the daemon to apply the configuration changes:

$ sudo systemctl restart gremlind

Now you’re ready to run attacks using the Gremlin Web App.

Step 4 - Creating Attacks

Using your Gremlin login credentials (which were emailed to you when you created your account), log in to the Gremlin Web App. Then click Create Attack.

Example: Network Latency attack

In a cloud environment, the network is prone to jitters and occasional blips. The Network Gremlin lets you simulate these behaviors so you can see how your application behaves in the face of an unreliable network.

You can create a Latency Attack to inject a delay into certain kinds of traffic outbound from your EC2 instance. To create a Latency Attack, first click the Attack Category dropdown and select Network. Then click the Gremlin Attack dropdown and select Latency.

Next, you can choose how much delay to add, and for how long. Best practice is to start with a small delay and grow it in successive attacks. Let’s start with a 100ms delay for 60s.

Finally, it’s time to target your EC2 instance. If you have many hosts running the Gremlin daemon, you can filter through them here, choosing to run the attack only on some subset of hosts. Since you’re only attacking a single host for now, just tick the checkbox next to the host. (If you don’t see your host in the list, search for its $GREMLIN_IDENTIFIER, which you configured in Step 3, in the search bar.)

Before you click Create New Attack, switch to your EC2 instance and start pinging google.com so you can get an idea of baseline latency:

[ec2-user@ip-10-0-8-81 gremlin]$ ping google.com
PING google.com (216.58.217.142) 56(84) bytes of data.
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=1 ttl=48 time=1.27 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=2 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=3 ttl=48 time=1.33 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=4 ttl=48 time=1.29 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms

Now go back to the Gremlin Web App and click Create New Attack to kick off the attack. To make sure it’s running, check its progress on the Attacks page.

Switch back to your EC2 instance and observe that the round trip time to google.com has indeed increased by 100ms:

.
..
...
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=5 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=6 ttl=48 time=1.30 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=7 ttl=48 time=1.36 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=8 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=9 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=10 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=11 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=12 ttl=48 time=101 ms
64 bytes from iad23s43-in-f14.1e100.net (216.58.217.142): icmp_seq=13 ttl=48 time=101 ms

Step 5 - Halting the Attack using the Gremlin Web App

Safety is paramount. You can stop a Gremlin Attack at anytime using the Gremlin Web App. Navigate to Gremlin Attacks and click on the red Halt button.

Conclusion

You’ve installed Gremlin on an EC2 instance on Amazon Web Services running Amazon Linux and validated that Gremlin works by running a Latency attack. You now possess tools that make it possible for you to explore additional Gremlin Attacks including attacks that impact State and Network.

Gremlin’s Developer Guide is a great resource and reference for using Gremlin to do Chaos Engineering. You can also explore the Gremlin Blog for more information on how to use Chaos Engineering with your application infrastructure.