# tf-io-consul-chaos
Spin up and test various consul migration, load and failure scenarios

# build a test consul cluster in AWS region us-west-2

## whats the plan
* create a vpc
* create 1 public networks
* create 4 private networks
* create 1 instance in a public network with external ip
* create internet gateway in a public network
* route external traffic from the internal networks through the internet gateway
* build consul masters and consul clients in the private networks
* no restrictions on traffic between the private networks
* no restrictions on ingess/egress traffic in public or private networks (besides the fact the network setup prevents the public internet from reaching the private networks)
* Create 3 consul clusters joined in a WAN pool

## build consul ami in us-west-2

* this packer file builds in the team's SSH keys, with a script located in bootstrap/
* packer build file consul.json uses us-west-2
* set AWS_PROFILE
`export AWS_PROFILE="twitch-vidcs-dev`
* change directory
`cd examples\consul-ami`
* build ami
packer build consul.json
* note the AMI id

# Build cluster

* Create terraform.tfvars and populate the following:

```bash
# Required
vpc_name = "<username>"

# Optional
# Change the following values to modify the number of instances in each
# individual cluster.
num_servers = <The number of Consul server nodes to deploy>
num_clients = <number of clients>

# Instance size of the server. I.e. the more clients you have, the bigger you
# want this to be. Defaults to t2.micro
# server_instance_type = "c4.8xlarge"

# This AMI has been built and made available in twitch-vidcs-dev
#
# The ID of the AMI to run in the cluster. This should be an AMI built from the
# Packer template under examples/consul-ami/consul.json. To keep this example
# simple, we run the same AMI on both server and client nodes, but in real-world
# usage, your client nodes would also run your apps.
# ami_id = "ami-061305ee76e8efcf5"

# Set this to false if you want to allow public access from the world to the
# bastion host
# enable_corp_only_prefix = "false"
```

* Copy `files/test.json.sample` to `files/test.json`. 2 important change:
    * set `run` to `true` to run load generation.
    * Set `cmds` to any arbitrary shell commands. This runs on a loop, based on `run_time` interval.

    E.g.

    ```
        ...
        "run": true
        "cmds": [
           "tc qdisc del dev $(awk '$2 == 00000000 {print $1}' /proc/net/route) root",
           "tc qdisc add dev $(awk '$2 == 00000000 {print $1}' /proc/net/route) root netem delay 25ms 10ms distribution normal loss 1% 25%"
        ],
        ...
    ```

* mwinit
~~~
mwinit
~~~
* copy midway creds to aws configure (tempAWSCreds can be found here: https://wiki.twitch.com/display/~ccmolik/Isengard-ified+AWS+CLI+on+Mac)
~~~
export AWS_PROFILE="twitch-vidcs-dev" && tempAWSCreds "$AWS_PROFILE" > ~/.aws/credentials
~~~
* terraform init
~~~
terraform init
~~~
* terraform plan
~~~
terraform plan
~~~
* terraform apply
~~~
terraform apply
~~~
* ip of bastion is printed out. look for instance tagged 'bastion' in console.  look for ip addrs of internal consul hosts in console
~~~
bastion Public IP = 54.214.220.72

ssh -o "StrictHostKeyChecking no" -A -F /dev/null ubuntu@54.214.220.72
~~~

* Copy the test file run/stop load generation in AWS. The sample command is also in the output. Update file in s3 bucket to effect change. The servers and clients fetch this every ~ `run_time` interval.

```
aws s3 cp files/test.json s3://tf-io-consul-chaos-${vpc_name}/test.json
```

# Notes

To induce latency, use `cmds` and run `tc` commands. Always start with `tc qdisc del dev eth0 root` to clear any previous settings.

Copy-pasta Reference: [Use Linux Traffic Control as impairment node in a test environment](https://www.excentis.com/blog/use-linux-traffic-control-impairment-node-test-environment-part-2)

```
Latency and jitter
Using the netem qdisc we can emulate network latency and jitter on all outgoing packets (man page, read more). Some examples:

$ tc qdisc add dev eth0 root netem delay 100ms
<delay packets for 100ms>

$ tc qdisc add dev eth0 root netem delay 100ms 10ms
<delay packets with value from uniform [90ms-110ms] distribution>

$ tc qdisc add dev eth0 root netem delay 100ms 10ms 25%
<delay packets with value from uniform [90ms-110ms] distribution and 25% \
    correlated with value of previous packet>

$ tc qdisc add dev eth0 root netem delay 100ms 10ms distribution normal
<delay packets with value from normal distribution (mean 100ms, jitter 10ms)>

$ tc qdisc add dev eth0 root netem delay 100ms 10ms 25% distribution normal
<delay packets with value from normal distribution (mean 100ms, jitter 10ms) \
    and 25% correlated with value of previous packet>
Packet loss
Using the netem qdisc packet loss can be emulated as well (man page, read more). Some simple examples:

$ tc qdisc add dev eth0 root netem loss 0.1%
<drop packets randomly with probability of 0.1%>

$ tc qdisc add dev eth0 root netem loss 0.3% 25%
<drop packets randomly with probability of 0.3% and 25% correlated with drop \
    decision for previous packet>
But netem can even emulate more complex loss mechanism, such as the Gilbert-Elliot scheme. This scheme defines 2 states Good (or drop Gap) and Bad (or drop Burst). The drop chances of both states and the chances of switching between states are all provided. See section 3 of this paper for more info.

$ tc qdisc add dev eth0 root netem loss gemodel 1% 10% 70% 0.1%
<drop packets using Gilbert-Elliot scheme with probabilities \
    move-to-burstmode (p) of 1%, move-to-gapmode (r) of 10%, \
    drop-in-burstmode (1-h) of 70% and drop-in-gapmode (1-k) of 0.1%>
```

If you change the ssh public keys in `bootstrap/add-keys.sh` , make sure you rebuild and upload a new consul ami with packer, change 'ami_id' in variables_common.tf

# Tear down cluster

Note that terraform does not automatically destroy the s3 bucket if there are existing objects. Manually destroy the bucket if you are done with it.

* terraform destroy
~~~~
terraform destroy
~~~~

# TODO

* Centralized logging
