# Grid Reboot
This is a tool, written in Go, which will reboot Grid Nodes.

It's designed to:
1) Pause TeamCity Projects
2) Pause Healthchecks
3) Reboot the Grid Nodes
4) Re-Enable Tests
6) Re-Enable Healthchecks

## Environment Variables
| Name | Required | Description |
|----------|----------|--------|
| GRID_ROUTER_HOST | true | The API Server for Grid Router to fetch hubs from and drain. Example: `api-dev.browsergrid.xarth.tv`
| GRID_HUB_ENVIRONMENT | false | The ec2 environment tag for the hub
| GRID_ENVIRONMENT | false | The ec2 environment tag for the nodes

## Setting Up Your Environment
1) Install Homebrew: https://brew.sh/
2) Install Go: http://brewformulas.org/Go
3) Set Up Your GoPath: https://github.com/golang/go/wiki/SettingGOPATH
4) Set up Terraform: https://www.terraform.io/

## Infrastructure
Grid Reboot runs on Amazon ECS (Fargate).

It originally was planned to run on AWS Lambda, but there is a chance it could take longer than Lambda's 15 minute maximum duration.

### Diagram
(because who doesn't like pictures)

![Grid Reboot Diagram, as written below](docs/Grid%20Reboot%20Diagram.png)

### AWS Resource Management
Terraform will manage all of the AWS Resources.
Whenever you make changes to the AWS Infrastructure, it should be committed in Terraform, and then ran on your machine.

### Docker Tagging Convention
No tagging convention is used in this project.
Whenever a new version of master is available, it will replace the "latest" image version.

This is for simplicity so that the ECR Task does not have to be updated each time to point at a new version.

### Build
When a branch is pushed to Github, Jenkins will run unit tests and report the results in junit format.

If the branch is master, it will then build a docker container by installing the application into it.
The job will then push the image to Amazon ECR, where ECR can consume it.

### Deploying
On a merge to master, Jenkins will build the container and push it to ECR.
Grid Reboot will automatically consume that update.

### Running
The ECR Task will utilize the docker container built (as described in the previous step).

A Cloudwatch Event will start, on a Cron Scheduled Basis, which will trigger the job to run.

## Monitoring

### Results
Execution Results are reported to Cloudwatch under "Grid Reboot/Execution Result".
1 = Pass
0 = Fail
Missing Data = The job hasn't ran.

### Alerting
Grid Reboot is designed to alert when the task fails.
Both within the container, and within Cloudwatch Events, it's designed to alert an SNS Topic, which will alert PagerDuty.

The PagerDuty Service can be found here:
https://twitchoncall.pagerduty.com/services/PT3C6K5

## Additional Resources

- [Product Spec](https://docs.google.com/document/d/12xhkSObudGlIUG8OYYWOCGxBaeo95z2LIo19hpxrwXI/edit)