# Alerts

For all pagerduty alerts, double check which account and what resource the error is coming from. For example, is the account twitch-web-aws, twitch-web-dev, twitch-users-service-dev, or twitch-users-service-prod? Is the alert for canary or the fleet?

## High CPU

**If a high CPU alert is triggered, our ASG may be at it's max size and a temporary fix is to raise the CPU. As this may not fix the issue permanently, continue to investigate the issue.**

A high CPU alert could be caused by multiple things. To help narrow the root cause, look at historical CPU for the past few hours to see if it was a gradual raise or a spike. If it was a spike, are there other metrics that have spike at the same time? If so, work on resolving those spikes. If it was a gradual raise, an ASG size increase may be enough to outlast the load.

## RootFilesystemUtil

A RootFilesystemUtil alert is caused by the disk filling up too quickly. As a temporary solutions, remove hosts that have close to full disks or raise the ASG size. To fix the issue, identify and remove the files that are growing too large.

- `df -h`: use to help identify disk space left.
- `df -ch [path]`: use to identify the disk path that is using up space.

## Procedures

These are common procedures to use when navigating an outage

### Manually adding more instances to Beanstalk ASG

**DO NOT DO THIS DURING A DEPLOY**

Use the beanstalk configuration scaling page to:

- Increase ASG max size to your desired amount
- Set ASG min size to meet max size so that beanstalk scales to meet it

After the outage, revert to original values.

### Terminating bad hosts

**DO NOT DO THIS DURING A DEPLOY**

Use the beanstalk health panel to select and terminate any instances that are misbehaving. The ASG will spin up instances to match.
