# Past Alarms and Causes

This page details the causes and possible courses of action for Cloudwatch alarm in Aperture. Alarms only exist for production environments.
Use this page in conjunction with the runbook, which will point you to the AWS or terraform for the affected infrastructure.

Alarm configuration details can be found in the project's terraform [here](https://git.xarth.tv/businessviewcount/aperture/tree/master/terraform/modules/cloudwatch).

## Table of Contents
1. [Elastic Beanstalk](#elastic-beanstalk)
2. [Kinesis Stream](#kinesis-stream)
3. [Kinesis Analytics](#kinesis-analytics)
4. [Elasticache](#elasticache)
5. [Lambda](#lambda)

## Elastic Beanstalk

#### cb-aperture-production-api-health

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Beanstalk application health is `degraded` or `severe` for 2 minutes |
| **Causes**    | Instances are down or returning high threshold of 5xx responses |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Possibly add more instances or reboot instances |

#### cb-aperture-production-api-elb-latency

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average latency from ELB is longer than 500ms for 20 minutes |
| **Causes**    | High request load or not enough instances |
| **Resolution**| Determine cause of request load. Add more instances if needed |

#### cb-aperture-production-api-elb-5xx

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | More than 500 5XX errors are returned from the ELB in 2 minutes |
| **Causes**    | ELB is down or cannot handle request load |
| **Resolution**| Determine cause of request load. Potentially reboot ELB instance or add more instances |

#### cb-aperture-production-api-spillover

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | ELB spillover queue has length over 2500 items for 2 minutes |
| **Causes**    | Request rate has spiked and ELB cannot route incoming requests to instances |
| **Resolution**| Add more instances until the request rate has lowered |

#### cb-aperture-production-api-backend-5XX

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The ELB is reporting that instances are returning more than 500 5XX errors in 2 minutes |
| **Causes**    | Instances are down or returning high threshold of 5xx responses |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Possibly add more instances or reboot instances |

#### cb-aperture-production-api-avg-latency

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average latency from instances is higher than 500ms for 20 minutes |
| **Causes**    | Instances are down, over worked, or bottlenecked |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Check grafana for endpoint timings |

#### cb-aperture-production-api-avg-cpu

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average CPU usage is higher than 80% for 20 minutes |
| **Causes**    | Instances are over worked |
| **Resolution**| Check instance health, add more instances or size up current instances |

## Kinesis Stream

#### spade-downstream-prod-cb-aperture-prod-mw-too-many-records

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The minute-watched stream has more than 13,056,000 records for 2 minutes |
| **Causes**    | Stream is too small to handle the number of records coming in |
| **Resolution**| Increase the size of the stream or work with data-infrastructure to recover |

#### spade-downstream-prod-cb-aperture-prod-mw-too-many-bytes

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The minute-watched stream has more than 13 billion bytes for 2 minutes |
| **Causes**    | Stream is too small to handle the number of bytes coming in |
| **Resolution**| Increase the size of the stream or work with data-infrastructure to recover |


## Kinesis Analytics

#### cb-aperture-production-minute-watched-ratio-application-millis-behind

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The analytics application is more than 1 minute behind the latest incoming record for 10 minutes |
| **Causes**    | Analytics cannot process the incoming stream in a timely manner |
| **Resolution**| Check the amount of traffic coming in. Potentially increse the KPU or the shards, or restart the application |


## Elasticache

#### cb-aperture-prod-cache-evictions

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The elasticache cluster has more then 100 evictions for 2 minutes |
| **Causes**    | The cache is evicting too many items |
| **Resolution**| Check the amount of cache traffic and the load of each node in the cluster |

## Lambda

#### cb-aperture-production-pubsub-sender-lambda-lambda-duration

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The pubsub lambda function duration is greater than 60 seconds in a 10 minute period |
| **Causes**    | Lambda is running too slowly or hanging |
| **Resolution**| Check grafana for lambda times and dependency timings. Possibly increase parallelism in lambda |

#### cb-aperture-production-pubsub-sender-lambda-lambda-errors

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Pubsub lambda function has returned errors 5 times within a 2 minute period |
| **Causes**    | Lambda is erroring out too frequently |
| **Resolution**| Check beanstalk app for health. Check lambda logs for errors. Might require configuration or IAM change |

#### cb-aperture-production-pubsub-sender-lambda-lambda-invocations

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Pubsub lambda has not been invoked in 10 minutes |
| **Causes**    | The cloudwatch rule is disabled or the lambda is not correctly hooked up to it |
| **Resolution**| Find the cloudwatch rule and ensure it is enabled |

#### cb-aperture-production-spade-logger-lambda-lambda-duration

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The spade lambda function duration is greater than 30 seconds in a 10 minute period |
| **Causes**    | Lambda is running too slowly or hanging |
| **Resolution**| Check grafana for lambda times and dependency timings. Possibly increase parallelism in lambda |

#### cb-aperture-production-spade-logger-lambda-lambda-errors

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Spade lambda function has returned errors 5 times within a 2 minute period |
| **Causes**    | Lambda is erroring out too frequently |
| **Resolution**| Check beanstalk app for health. Check lambda logs for errors. Might require configuration or IAM change |

#### cb-aperture-production-spade-logger-lambda-lambda-invocations

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Spade lambda has not been invoked in 10 minutes |
| **Causes**    | The cloudwatch rule is disabled or the lambda is not correctly hooked up to it |
| **Resolution**| Find the cloudwatch rule and ensure it is enabled |

#### cb-aperture-production-error-stream-logger-lambda-lambda-errors

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Error stream lambda function has returned errors 5 times within a 2 minute period |
| **Causes**    | Lambda is erroring out too frequently |
| **Resolution**| Check lambda logs for errors. Might require configuration or IAM change |

#### cb-aperture-production-output-mw-ratio-lambda-lambda-errors

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The mw output lambda function has returned errors 5 times within a 2 minute period |
| **Causes**    | Lambda is erroring out too frequently |
| **Resolution**| Check lambda logs for errors. Might require configuration or IAM change |
