# Beanstalk

Users service runs in two beanstalk environments (server and worker) in each region. Users services uses the [single container docker platform](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts.platforms.html#concepts.platforms.docker). Each instance has our docker container with an NGINX instance in front of it. Vanilla beanstalk is used so all public documentation is applicable.

## Configuration

Beanstalk is configured through terraform and ebextensions. Configuration made through ebextensions will be applied at deploy time.

Application configuration happens in code. If an application change is required, a deployment is needed.

## Deployments

Deployments can be made to an environment (prod us-west-2 and prod us-east-1), environment & region (prod us-west-2), or environment, region, and type (prod us-west-2 server). When deploying to an environment or environment and region, if any beanstalk deploys fail in the process, successful ones are not rolled back and the deployment stops.

Deploying to `prod` would include the following beanstalk deploys:

1. prod us-west-2 server
1. prod us-west-2 worker
1. prod us-east-1 server
1. prod us-east-1 worker

If 2 fails, 1 is not rolled back and 3 and 4 do not happen. In this scenario, worker would have to be manually deployed  via `prod-us-west-2-worker`, and us-east-1 be manually deployed via `prod-us-east-1`. Otherwise, the server (api) can be rolled back by manually deploying the previous version using `prod-us-west-2-server`.

Deploying to `prod-us-west-2` would result in deployments to:

1. prod us-west-2 server
1. prod us-west-2 worker

## Platform updates

Periodically, Beanstalk publishes [platform updates](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts.platforms.html#concepts.platforms.docker). [Rolling updates are enabled](https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.rollingupdates.html?icmpid=docs_elasticbeanstalk_console) via [.ebextensions](./deploy/.ebextensions). To perform a platform update, apply the update through the console. When it succeeds, update terraform to reflect the same version.

At the time of writing, Beanstalk will upgrade 1 instance at a time: taking it out of the load balancer, replacing the instance, deploying to it, and verifying it's health. If the health check fails, it's speculated that the update will stop, the old instance will not come back, and more instances will be added if necessary by the ASG.

## Server Deployments

~~Production environments are composed of 2 beanstalk environments: canary and regular. Canary consists of 2 instances. Regular consists of the fleet. A small percentage of traffic (at the time of writing: 1 percent) is sent to canary.~~

Production is currently one environment. Canary has been disabled until a safer routing mechanism is put into place. Weighted DNS proved to be too unpredictable.

Staging environments have one environment which receives all traffic.

Deployments happen in batches and instance health is checked before proceeding. If a health check fails, the deployment fails and the previous version is deployed. If the canary deployment fails, it is rolled back and does not proceed to the rest of the production fleet.

## Worker Deployments

Workers run as a separate environment in each region. If one of the enabled workers fails to start up, the deployment is rolled back.

### Debugging deployments

A deploy could fail for a few reasons:

- Issue during a deployment. The deployment will be rolled back. This can happen because [beanstalk checks health after each batch](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced.html#health-enhanced-effects).
- Issue with the script. If the script failed, the deployment likely didn't happen. It is likely a bash scripting error that needs to be corrected, or an issue with jenkins.

If the rollback failed, read below.

### Rollback failed

1. Try redeploying the last commit to master.
2. Use the console and deploy the previous version to the environment. You can find the previous version by reading `previous.txt` from the jenkins workspace of the deploy.

During partial or total application failure scenarios, it may be necessary to change our [deployment policy](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.deploy-existing-version.html) to get a deployment out quicker. It is recommended to change the deployment policy using the console. 

Deployment policy changes for failure scenarios:

- Partial (some instances are failing): 50%+ rolling deployment
- Total (all instances are failing): All At Once

## Metrics

Beanstalk hosts have their IP as their hostname so application metrics contain IPs. Otherwise, application metrics have not changed. A query to use for all beanstalk hosts is `ip-*`.

Beanstalk provides metrics per environment so we have metrics for the production fleet and canary environments as well as staging. [Here are the metrics beanstalk collects per environment](http://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/aeb-metricscollected.html).

## Instances

Beanstalk refers to instances by their instance ID. It can be useful to find the IP of a host so you can ssh to it. You can use the EC2 dashboard to look it up or use a tool called [jack](https://git-aws.internal.justin.tv/identity/toolbelt/tree/master/cmd/jack).

To SSH on to production instances, you will need to be within AWS. You can SSH to staging and then SSH to production.

## Alarms

Currently, we have alarms on:

- 5xx
- Instances in Warning, Degraded, or Severe state.

You can read more about how to handle alarms [here](./alerts.md). [You can read about health states here](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/health-enhanced-status.html).

## Logging

Our beanstalk environments are configured to stream logs to cloudwatch. Log group format: `/aws/elasticbeanstalk/{ENVIRONMENT_NAME}/{PATH_TO_FILE}`.

Log group examples:

- /aws/elasticbeanstalk/prod-web-users-service-canary-server/var/log/eb-activity.log
- /aws/elasticbeanstalk/prod-web-users-service-server/var/log/eb-docker/containers/eb-current-app/stdouterr.log

Log group meanings:

- /var/log/eb-activity.log: logs of beanstalk daemon
- /var/log/nginx/access.log: nginx access logs
- /var/log/nginx/error.log: nginx error logs
- /var/log/docker-events.log: activity of docker commands run
- /var/log/docker: docker logs
- /var/log/eb-docker/containers/eb-current-app/stdouterr.log: application logs

Each log group has log streams. Each log stream is named after the instance ID.

For more details, see logging links below.

## Environments

### Production

You can only curl the production URLs from within AWS. If the production ELBs are changed by beanstalk, we must remove the default security group as it allows all traffic to 80 and 443.

Environment Name: prod-web-users-service

Regions:

- us-west-2
  - [environment](https://us-west-2.console.aws.amazon.com/elasticbeanstalk/home?region=us-west-2#/application/overview?applicationName=prod-web-users-service)
  - url: https://web-users-service.prod.us-west2.justin.tv, https://users-service.prod.us-west2.twitch.tv
  - [kibana logs](http://kibana.internal.justin.tv/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-4h,mode:quick,to:now))&_a=(columns:!(message,loggroup),filters:!(),index:'core-user-syslog-prod-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'loggroup:*log%20AND%20region:%22us-west-2%22')),sort:!('@timestamp',desc),vis:(aggs:!((params:(field:logstream,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))&indexPattern=core-user-syslog-prod-*&type=histogram), [cloudwatch logs](https://console.aws.amazon.com/cloudwatch/home?region=us-west-2#logs:)

- us-east-1
  - [environment](https://us-east-1.console.aws.amazon.com/elasticbeanstalk/home?region=us-east-1#/application/overview?applicationName=prod-web-users-service)
  - url: https://web-users-service.prod.us-east-1.justin.tv
  - [kibana logs](http://kibana.internal.justin.tv/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-4h,mode:quick,to:now))&_a=(columns:!(message,loggroup),filters:!(),index:'core-user-syslog-prod-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'loggroup:*log%20AND%20region:%22us-east-1%22')),sort:!('@timestamp',desc),vis:(aggs:!((params:(field:logstream,orderBy:'2',size:20),schema:segment,type:terms),(id:'2',schema:metric,type:count)),type:histogram))&indexPattern=core-user-syslog-prod-*&type=histogram), [cloudwatch logs](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logs:)

### Staging

Environment Name: staging-web-users-service

- us-west-2
  - [environment](https://us-west-2.console.aws.amazon.com/elasticbeanstalk/home?region=us-west-2#/application/overview?applicationName=staging-web-users-service)
  - url: https://web-users-service.dev.us-west2.justin.tv
  - [kibana logs](http://kibana-dev.internal.justin.tv/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-4h,mode:quick,to:now))&_a=(columns:!(message),index:'core-user-syslog-dev-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('@timestamp',desc))), [cloudwatch logs](https://console.aws.amazon.com/cloudwatch/home?region=us-west-2#logs:)

- us-east-1
  - [environment](https://us-east-1.console.aws.amazon.com/elasticbeanstalk/home?region=us-east-1#/application/overview?applicationName=staging-web-users-service)
  - url: https://web-users-service.dev.us-east-1.justin.tv
  - [kibana logs](http://kibana-dev.internal.justin.tv/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-4h,mode:quick,to:now))&_a=(columns:!(message),index:'core-user-syslog-dev-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('@timestamp',desc))), [cloudwatch logs](https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#logs:)

## Logging

[Beanstalk log streaming is enabled](http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features.managing.cw.html) through terraform. The logs are stored in [Cloudwatch Logs](http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html). They are also available in kibana ([prod](http://kibana-dev.internal.justin.tv/app/kibana#/discover?_g=()&_a=(columns:!(_source),index:'core-user-syslog-dev-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('@timestamp',desc))),[dev](http://kibana.internal.justin.tv/app/kibana#/discover?_g=()&_a=(columns:!(_source),index:'core-user-syslog-prod-*',interval:auto,query:(query_string:(analyze_wildcard:!t,query:'*')),sort:!('@timestamp',desc)))). The kibana log architecture is:

```
beanstalk -> cloudwatch logs -> lambda -> core-user logstash instances -> core-user s3 -> systems owned services -> kibana
```

Notes:
- Our logstash instance and s3 bucket are configured [here](https://git-aws.internal.justin.tv/identity/core-user-syslog).
- The lambda was inspired by [this guide](http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SubscriptionFilters.html#LambdaFunctionExample).
- You can read more about the [systems owned services here](https://wiki.twitch.com/display/SYS/Central+logging+stack).
