How to Install and Use Gremlin on Azure

This tutorial walks through how to install Gremlin on Ubuntu 16.04 server in Microsoft Azure and run a CPU attack.

Prerequisites

Before you begin, you’ll need:

  • A Microsoft Azure account
  • An Ubuntu 16.04 server
  • A Gremlin account
  • The apt-transport-https package

Step 1 - Installing the Gremlin Daemon and CLI

First, ssh into your server and add the Gremlin Debian repository:

$ echo "deb https://deb.gremlin.com/ release non-free" | sudo tee /etc/apt/sources.list.d/gremlin.list

Import the repo’s GPG key:

$ sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C81FC2F43A48B25808F9583BDFF170F324D41134 9CDB294B29A5B1E2E00C24C022E8EF3461A50EF6

Then install the Gremlin daemon and CLI:

sudo apt-get update && sudo apt-get install -y gremlind gremlin

Step 2 - Validating the Install

Run the following command to confirm gremlin has everything it needs to function: Note: DO NOT run this command on production hosts

$ gremlin syscheck

The CLI will walk through its library of attack types and run some mock attacks:

Checking resource gremlins ...
Checking CPU gremlin ...
Attack on cpu_1 completed successfully
CPU gremlin OK
...

The full syscheck may take a few minutes, so please be patient!

Step 3 - Configuring the Gremlin Daemon

The Gremlin daemon (gremlind) connects to the Gremlin backend and waits for attack orders from you. When it receives attack orders, it uses the CLI (gremlin) to run the attack.

To connect gremlind to the Gremlin backend, you need your client credentials. (This is NOT the same as the email/password credentials you use to access the Gremlin Web App.) Read the Client Auth docs to see how to find your client credentials in the Web App.

With the credentials in hand, it’s time to configure the daemon. As with most daemons, you can configure gremlind either by configuration file or environment variables. Let’s use the configuration file.

Add these configuration options to the daemon’s configuration file:

$ echo 'GREMLIN_TEAM_ID="<INSERT_YOUR_TEAM_ID>"' >> /etc/default/gremlind
$ echo 'GREMLIN_TEAM_CERTIFICATE_OR_FILE="file:///var/lib/gremlin/gremlin.cert"' >> /etc/default/gremlind
$ echo 'GREMLIN_TEAM_PRIVATE_KEY_OR_FILE="file:///var/lib/gremlin/gremlin.key"' >> /etc/default/gremlind

Then add your PEM-encoded certificate and key to two new files—/var/lib/gremlin/gremlin.cert and /var/lib/gremlin/gremlin.key, respectively—and set the ownership and permissions on the files so that only gremlind can access them:

$ sudo chown gremlin:gremlin /var/lib/gremlin/gremlin.*
$ sudo chmod 600 /var/lib/gremlin/gremlin.*

Optionally, give the Gremlin daemon a custom ID so it’s easy to find in the Web App later:

$ echo 'GREMLIN_IDENTIFIER="my-first-gremlin-host"' >> /etc/default/gremlind

That’s enough configuration for this tutorial, but feel free to read about other configuration options in the Gremlin Docs.

Restart the daemon to apply the configuration changes:

$ sudo systemctl restart gremlind

Now you’re ready to run attacks using the Gremlin Web App.

Step 4 - Creating Attacks

Using your Gremlin login credentials (which were emailed to you when you created your account), log in to the Gremlin Web App. Then click Create Attack.

The “Hello World” of Chaos Engineering is the CPU Resource Attack. To create one, first click the Attack Category dropdown and select Resource. Then, in the Gremlin Attack dropdown, select CPU.

Next, you can choose how many CPU cores the attack should consume, and for how long. The default is to hog a single core for 60 seconds.

Finally, it’s time to target the host you just configured. If you have many hosts running the Gremlin daemon, you can filter through them here, choosing to run the attack only on some subset of hosts. Since you’re only attacking a single host for now, just tick the checkbox next to the host. (If you don’t see your host in the list, search for its $GREMLIN_IDENTIFIER in the search bar.)

As soon as you click Create New Attack, your host’s gremlind will pick up the attack order and start to chew up your CPU. You can see the attack’s progress on the Attacks page.

On your host, run top to check the impact of the Gremlin Attack:

$ top

top - 06:26:47 up 7 days,  7:00,  1 user,  load average: 0.28, 0.07, 0.02
Tasks: 105 total,   1 running, 104 sleeping,   0 stopped,   0 zombie
%Cpu(s): 79.7 us, 20.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  1016120 total,   127140 free,    93956 used,   795024 buff/cache
KiB Swap:        0 total,        0 free,        0 used.   712192 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND     
23768 gremlin   20   0   13268  11136   3576 S 99.3  1.1   0:14.05 gremlin     
23766 root      20   0   40388   3600   3072 R  0.3  0.4   0:00.03 top         
    1 root      20   0   37760   5760   3940 S  0.0  0.6   0:13.74 systemd     
    2 root      20   0       0      0      0 S  0.0  0.0   0:00.00 kthreadd    
    3 root      20   0       0      0      0 S  0.0  0.0   0:01.28 ksoftirqd/0 
    5 root       0 -20       0      0      0 S  0.0  0.0   0:00.00 kworker/0:0H
    7 root      20   0       0      0      0 S  0.0  0.0   0:06.14 rcu_sched   
    8 root      20   0       0      0      0 S  0.0  0.0   0:00.00 rcu_bh      
    9 root      rt   0       0      0      0 S  0.0  0.0   0:00.00 migration/0 
   10 root      rt   0       0      0      0 S  0.0  0.0   0:04.09 watchdog/0  

When your attack is complete, it will move to Completed Attacks on the Attacks page.

Step 5 - Halting a CPU resource attack using the Gremlin Control Panel

You can halt any attack at any time from the Attacks page. Just find your attack and click the red Halt button next to it.

Conclusion

You’ve installed Gremlin on a Microsoft Azure server running Ubuntu 16.04 and tested Gremlin by running the “Hello World” of Chaos Engineering, the CPU Resource attack. You now possess tools that make it possible for you to explore additional Gremlin Attacks including attacks that impact State and Network.

You can also explore the Gremlin Blog for more ideas on how to do Chaos Engineering on your application infrastructure.