Common Errors with solutions¶
Gremlin Agent Errors¶
There are several reasons an agent can lose communication to the control plane. Common examples include
- Running a network based attack that affected the traffic. Ensure both
api.gremlin.com and DNS are white-listed.
- Running a CPU attack has starved
Gremlin of the ability to compute API encryption. This is rare but it does happen.
In the event of a LostCommunication error, The
Gremlin agent will trigger it’s dead-man switch and cease all attacks.
Tc Error: RTNETLINK answers: File exists¶
This can occur on a host when running a network attack, when a previous network attack had been run AND the
agent was killed mid attack by the user, system or other tool which did not allow
Gremlin to run garbage collection.
To solve, please run
Failed to parse execution attribute ‘pid’ for execution < HASH_STRING >¶
There are two non-exclusive modes of failure that can occur with this error message:
- The running version of
Gremlin is several versions out of date
- Update the
Gremlin agent or docker image
/var/lib/gremlin/executions has become corrupt
- Delete the file
Docker: non-zero exit code (137)¶
Docker has killed the container via
kill -9. This is often attributed to OOM issues, and is most often seen when
running a memory attack. Allocating more RAM to Docker usually solves the issue.
Docker: non-zero exit code (1)¶
Unable to find local credentials file¶
Gremlinis not configured to point to the correct credentials file, usually located in
/var/lib/gremlin. Ensure the credentials file(s), either certificates of API keys, exists and
Gremlinhas read+write access.
Permission denied (os error 13)¶
Gremlincontainer does not have proper filesystem permissions.
Gremlinrequires write access to
/var/lib/gremlin, including the ability to create new files. Check permission on the host, and ensure write access is being passed through via docker when running the
Docker: OS Error 1¶
This is often observed in the context of
Capabilities: Unable to inherit one or more required capabilities: cap_net_admin, cap_net_raw
Solution: You’ll need to add some capabilities to that docker container (full list here: https://help.gremlin.com/security/#linux-capabilities)
docker run -it --cap-add=NET_ADMIN --cap-add=KILL --cap-add=SYS_TIME gremlin/gremlin syscheck
API Return codes¶
API: Error Code 401¶
Gremlin agent is unable to authenticate against the API. Causes of this error are usually due to bad or missing
credentials files or certificates, or a revocation issued against the client.
401 Unauthorized - Authorization header is missing or malformed
Client has been revoked (401 Unauthorized)
AUTH_RENEW: 401 Unauthorized
1. Ensure you have valid credentials (Certificates or API keys) being place in a location that
Gremlin can read from.
Gremlin has proper read+write access to
3. Remove the file
/var/lib/gremlin/.credentials if it exists
This error can also be the result of a race condition when
Gremlin daemon is being started prior to the environment
variables being exported.
In some specific cases, this error can also occur when multiple hosts or agents are configured with the same
Common places this can occur:
- Improperly configured ECS/Kubrenettes/Mesosphere where multiple
Gremlin agents are assigned the same virtual IP
- Missing HOST meta data on AWS/GCP/Azure which causes
Gremlin to revert to the default localhost Identifier
API: Error Code 402¶
The client limit for your company or team has been reached,
Gremlin does not have a license to apply to the client.
You may terminate or revoke existing clients, or contact sales to increase the client limit.
API: Error Code 403¶
The account, most likely trial account, has expired. Please contact sales to extend the trial.
API: Error Code 408¶
This is most often attributed to a host having bad time data. Verify the system clock of the host and try again. If this problem persists past validating your hosts system clock, please reach out to support ASAP.
API: Error Code 409¶
An error code of
409 indicates there is a conflicting attack running on the host. This is most often seen in the
case of one network attack running (e.g a blackhole attack) and attempting to launch a second network attack. However,
this can also be seen when trying to run two concurrent network or state attacks against the same target as well.