# Consul Ansible Runbooks

## History

Kevin Mullin made these originally to try to snipe corrupted entries off the consul memberlist in a DC. It was adapted to handle doing safe consul server upgrades later.

## Requirements

Set up your ansible env:

```
python3 -m venv .env && .env/bin/pip install -r requirements.txt
source .env/bin/activate
```

## Playbooks

Ansible groups their collection of 'tasks' as 'roles'. Roles are applied using 'playbooks', which are contained in the root directory.
Below is a list of repeatable 'playbooks'.

### consul-upgrade

The `consul-upgrade` playbook specifically targets consul servers (not clients).

It works by going to each Follower server in the peer set one at a time, and upgrading consul to the version specified in the variable `consul_version`, running sanity checks before and after on the quorum health as exposed by [Consul Autopilot](https://www.consul.io/api/operator/autopilot.html#read-health). It will wait up to a configurable time (delay * retries) seconds until Autopilot reports that it has a `FailureTolerance > 0` and `Healthy == true`, if the time elapses before this, ansible will abort any subsequent steps, and not move onto the next Follower, so an operator can take a look at whats going on. Once it completes the same tasks on all the follower servers, it will then proceed to do the same on the Leader.

When a Leader of the consul server 'steps down' gracefully as a leader, there is a non-configurable time in Raft to elect a new leader from the remaining followers. Clients trying to make *consistent* Reads or any Writes to the Datacenter during this window, will receieve a HTTP `500` error, and response `No Cluster Leader`.

### puppet-enable / puppet-disable

This playbook is simple, it disables or enables puppet for you, with a message.
Variables:

  * `disable_msg` variable can be overrwritten.
  * `puppet_disable` (bool): true to disable puppet, false to enable


## Vagrant

In order to help with testing the playbooks, there is an included `Vagrantfile` to help bring up server nodes.

This configuration creates 6 Virtual Machines:
  * 3 Server nodes in `testdc`
  * 3 Server nodes in `testdc2`

### Directions

#### Requires

  * `vagrant` installed
  * `VirtualBox` installed
  * `ansible` installed

#### Steps

  1. `vagrant up` - all nodes will be brought up, and the `consul` role will be applied using ansible. This installs Consul version `1.0.7`, and configures the 2 datacenters `testdc` and `testdc2` to join each other.
  1. You can now play around with the 6 servers at version `1.0.7`, when you're done you can proceed to run the `consul-upgrade` playbook to upgrade them to `1.5.3`.
  1. `ansible-playbook -i .vagrant/provisioners/ansible/inventory/vagrant_ansible_inventory consul-upgrade.yml`

## Ansible-playbook-bulk.sh script

This is a wrapper script to run playbooks on several datacenters. You can use it as follows:
```
./ansible-playbook-bulk.sh --datacenters /tmp/datacenters.list -i hosts.py puppet-enable.yml
```
Wrapper script strips out of the arguments `--datacenters /tmp/datacenters.list` or `--datacenters=/tmp/datacenters.list` and loads names from /tmp/datacenters.list. Lines starting with `#` are ignored.
Other CLI arguments are passed to the `ansible-playbook`, which is called in the loop with `DATACENTER` environment variable cycling through datacenter names.

## ssh and strict key checking

If you will need to update hosts you never sshed to you may need to use `--ssh-common-args` option like this:
```
DATACENTER=qro01 ansible-playbook -i hosts.py --ssh-common-args="-o StrictHostKeyChecking=no" puppet-disable.yml
```
