

### RunBook for AWS Cloudwatch alarms

#### Where do I find video cloudwatch?
#### [https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#)

#### How do I access the cloudwatch status of beanstalk apps?
##### [https://us-west-2.console.aws.amazon.com/elasticbeanstalk/home?region=us-west-2#/applications](https://us-west-2.console.aws.amazon.com/elasticbeanstalk/home?region=us-west-2#/applications)

#### Checking Alerts
*  Log in to the twitch-video aws control panel via twitch-video-aws.signin.aws.amazon.com
*  From there - go to **ElasticBeanstalk**
*  Select the application name having an alert. The environment will show yellow as warning, and red if the status is offline. This status comes for the number of instances behind the service failing health checks

![pop](assets/cw1.png)

* The main page for the environment will show in Recent event what has happened lately in the environment. Monitor for recent code pushes or errors where instances are restarting or crashing

![pop](assets/cw2.png)

* The **Alarms** section in the environment shows the alarms defined for the service. In here we can see which metric is alerting. the blue line shows the current environment value, and the redline, the threshold at which it alerts.

![pop](assets/cw3.png)

* The **Monitoring** section in cloudwatch is the equivalent in aws to ganglia. here , look for straggler things like too many environments showing health as severe or degraded, increased latency, spikes in requests, etc. All this things can cause trouble.

![pop](assets/cw4.png)

### Environment health definitions
this are the levels of state for a beanstalk environment which signal how bad are things:
* 0 (Ok) - everything is fine
* 1 (Info) - something is happening, like a deploy
* 5 (Unknown) - could not grep status
* 10 (No data) - No data in cloudwatch to measure the system state.
* 15 (Warning) - Some instances are failing or in a bad state, this are intermittent errors, or a bad instance
* 20 (Degraded) - Not all ec2 instances behind the beanstalk app are online. This is possibly some instances crashing and not being restored.
* 25 (Severe) - The app health is in bad state, health checks are failing, requests are failing. etc.


#### Common troubleshooting
* A beanstalk environment is an app running in several autoscaled instances, behind a load balancer. The application runs in a docker container which is restarted when things are deployed or go wrong. Some times things don't work. and in this cases re-deploying is the solution. this will restart the containters and instances.
* Anything showing an error in requests or health checks could very likely be bad code. in this case, it's easy to revert via clean-deploy.internal.justin.tv
* If there is a spike in latency - this could be load or miss configuration. Will require further investigation and checking more cloudwatch status
