## Troubleshooting and Recovery

### Error Logs

Investigating an issue; **error logs** can be viewed on Rollbar:

- [`staging`](https://rollbar.com/Twitch/CB_Roster/?environment=staging)
- [`production`](https://rollbar.com/Twitch/CB_Roster/?environment=production)

### Rollback

When we want to **rollback** to a previous commit:

- [ ] Navigate to environment of concern on Clean-Deploy, [`staging`](https://clean-deploy.internal.justin.tv/#/cb/roster/history?env=staging) or [`production`](https://clean-deploy.internal.justin.tv/#/cb/roster/history?env=production)
- [ ] From the UI, find the commit you wish to revert to
- [ ] Click `Redeploy` on the commit

### Database Recovery

**Note:** This recovery plan will cause all writes to the database during the recovery time to be lost. 

In the event that the database needs to be **recovered** from a historic snapshot or restore point, the following steps can be taken to restore it. 

> Snapshots are created once a day at 2:00pm UTC and kept for **1 day**. And restore points are created every **5 minutes** between the last taken snapshot and now.

1. Create a temporary database and redirect traffic

- [ ] Navigate to the Roster RDS AWS console
  - Sign in with the `twitch-cb-aws` account for [`staging`](https://us-west-2.console.aws.amazon.com/rds/home?region=us-west-2#database:id=cb-roster-staging;is-cluster=false) and [`production`](https://us-west-2.console.aws.amazon.com/rds/home?region=us-west-2#database:id=cb-roster-production;is-cluster=false).
- [ ] Follow instructions below, starting from **step#4** to create a recovery database. This database will be used as a temporary database to redirect traffic.

  > When filling out the field `DB Instance Identifier`, make sure to add `-<environment>` as the suffix. For example, `my-new-instance-staging`.

  - From a [snapshot](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RestoreFromSnapshot.html)
  - From a [restore point](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html)
- [ ] Configure your new database with the following fields, the rest can be left as is:
  - [ ] `Multi-AZ Deployment: Yes`
- [ ] Navigate to your new database on AWS console. Wait until this database's status is "Available". Once it's available, select the new database and select `Modify`. Remove `default` and add `twitch_subnets` to the list of security groups. Change the DB parameter group to `pg-stat-statements-postgres-9-6`. **Apply immediately.**
- [ ] Copy your new database instance's **`Endpoint`**
- [ ] Update the file `config/staging.yaml` for staging or `config/production.yaml` for production with the new temporary `<Endpoint>`. For example, if your new database has the endpoint `my-new-instance-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com` , you should use it as the host: 

  ```diff
  db:
  master:
  -  host: cb-roster-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com
  +  host: my-new-instance-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com
  ```
- [ ] Create a PR with the updated config file.
- [ ] Once your PR is approved and tests have passed, merge it to master.
- [ ] Navigate to [Clean-Deploy](https://clean-deploy.internal.justin.tv/#/cb/roster) to deploy Roster.
  - [ ] Click `Deploy to` next to the master branch, and deploy to the environment you are recovering.
- [ ] Wait for the deploy to complete. Verify that teams pages are retrieving the correct information.

2. Create the a new database 

- [ ] Navigate to the Roster RDS AWS console
- [ ] Navigate to the old database `cb-roster-staging` or `cb-roster-production`. Verify under the `Monitoring` tab that there are 0 DB connections and there have been no reads or writes to this database since your deploy was completed.
- [ ] Select `Modify` for the old database. Rename the `DB instance identifier` to something that indicates it is no longer the active database (e.g. `cb-roster-staging-old`). **Apply immediately.**
- [ ] Navigate to the old database in the AWS console again under the new name.
- [ ] Follow instructions below, starting from **step#4** to create a recovery database. This database will be used as the recovered database for the future.

  > When filling out the field `DB Instance Identifier`, use the old name for the database (`cb-roster-staging` or `cb-roster-production`).

  - From a [snapshot](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_RestoreFromSnapshot.html)
  - From a [restore point](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PIT.html)
- [ ] Configure your new database with the following fields, the rest can be left as is:
  - [ ] `Multi-AZ Deployment: Yes`
- [ ] Navigate to your new database on AWS console. Wait until this database's status is "Available". Once it's available, select the new database and select `Modify`. Add `twitch_subnets` to the list of security groups. Change the DB parameter group to `pg-stat-statements-postgres-9-6`. **Apply immediately.**
- [ ] Copy your new database instance's **`Endpoint`**
- [ ] Update the file `config/staging.yaml` for staging or `config/production.yaml` for production with the new `<Endpoint>`. For example, if your new database has the endpoint `cb-roster-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com` , you should use it as the host: 

  ```diff
  db:
  master:
  -  host: my-new-instance-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com
  +  host: cb-roster-staging.cqyyxr1hxrhn.us-west-2.rds.amazonaws.com
  ```
  > You are effectively reverting the changes you made to these files for the temporary database. 
- [ ] Create a PR with the updated config file.
- [ ] Once your PR is approved and tests have passed, merge it to master.
- [ ] Navigate to [Clean-Deploy](https://clean-deploy.internal.justin.tv/#/cb/roster) to deploy Roster.
  - [ ] Click `Deploy to` next to the master branch, and deploy to the environment you are recovering.
- [ ] Wait for the deploy to complete. Verify that teams pages are retrieving the correct information.
