# Alarms and Possible Causes

Alarm configurations can be found in dropship's terraform [here](git-aws.internal.justin.tv/cb/dropship/tree/master/terraform/modules/cloudwatch).

All alarms in cloudwatch can be found [here](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#alarmsV2:?search=cb-dropship&alarmFilter=ALL)

## Table of Contents
1. [Elastic Beanstalk](#elastic-beanstalk)
2. [Load Balancer](#load-balancer)
3. [DynamoDB](#dynamodb)

## Elastic Beanstalk

#### cb-dropship-production-api-health

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Beanstalk application health is `degraded` or `severe` for 2 minutes |
| **Causes**    | Instances are down or returning high threshold of 5xx responses |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Possibly add more instances or reboot instances |

#### cb-dropship-production-api-avg-latency

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average latency from instances is higher than 1s for 20 minutes |
| **Causes**    | Instances are down, over worked, or bottlenecked |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Check grafana for endpoint timings |

#### cb-dropship-production-api-avg-cpu

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average CPU usage is higher than 80% for 20 minutes |
| **Causes**    | Instances are over worked or under sized |
| **Resolution**| Check instance health, add more instances or size up current instances |

# Load Balancer

#### cb-dropship-production-api-elb-latency

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | Average latency from ELB is longer than 500ms for 20 minutes |
| **Causes**    | High request load or not enough instances |
| **Resolution**| Determine cause of request load. Add more instances if needed |

#### cb-dropship-production-api-elb-5xx

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | More than 500 5XX errors are returned from the ELB in 2 minutes |
| **Causes**    | ELB is down or cannot handle request load |
| **Resolution**| Determine cause of request load. Potentially reboot ELB instance or add more instances |

#### cb-dropship-production-api-spillover

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | ELB spillover queue has length over 2500 items for 2 minutes |
| **Causes**    | Request rate has spiked and ELB cannot route incoming requests to instances |
| **Resolution**| Add more instances until the request rate has lowered |

#### cb-dropship-production-api-backend-5XX

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The ELB is reporting that instances are returning more than 500 5XX errors in 2 minutes |
| **Causes**    | Instances are down or returning high threshold of 5xx responses |
| **Resolution**| Check beanstalk health page, rollbar logs, and instance logs. Possibly add more instances or reboot instances |

# DynamoDB

#### dropship-production-quick-actions-layout-write-capacity

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The quick actions layout table write capacity is greater than 500 for 5 minutes |
| **Causes**    | Increased usage of stream manager page, possible request spam |
| **Resolution**| Check dynamo metrics, check application request logging, increase write capacity and tune autoscaling parameters if r/s is sustained |

#### dropship-production-quick-actions-layout-read-capacity

|               |   |
|---------------|----------------------------------------------------------------------|
| **Trigger**   | The quick actions layout table read capacity is greater than 500 for 5 minutes |
| **Causes**    | Increased usage of stream manager page, possible request spam  |
| **Resolution**| Check dynamo metrics, check application request logging, increase read capacity and tune autoscaling parameters if r/s is sustained |
