PostgreSQL
==========
There are a handful of clusters which are based on the [twitch_postgresql module](https://git.xarth.tv/systems/puppet/tree/master/modules/twitch_postgresql) which will be documented here. The d8a team focuses on these clusters. Usually, clusters have a matching hiera file for configuration which matches the fqdn prefix. For example, the rails-postgres cluster (SiteDB) is configured in [hiera/cluster/rails-postgres.yaml](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/rails-postgres.yaml). This configuration, along with the [known data hosts](https://docs.google.com/a/justin.tv/spreadsheets/d/1DEBj2luk73MZfGcQblEejkKvUrT-iaXGtbaEFDixcsc/edit?usp=sharing) is a good starting point for seeing how the cluster works.

Most of the clusters have a single master.

Traffic Routing
---------------
Traffic routing is usually wired from the client to a server through a client localhost HAProxy listening. This is so that if a back-end fails, HAProxy can route to the correct node. You can find routing information by checking the HAProxy configuration of a client node or in the [HAProxy backend files](https://git.xarth.tv/systems/puppet/tree/master/modules/twitch/manifests/haproxy/backends). Most of the time HAProxy routes to a PGBouncer process rather than directly to a database. PostgreSQL listens on port 5432 while PGBouncer listens on any of ports 6542, 6543, 6544. The PGBouncers are configured to talk to a single back-end.

PGBouncer
---------
PGBouncer is a PostgreSQL protocol aware proxy which allows us to maintain a many-to-few connection mapping by keeping a lightweight connection in PGBouncer and saving the expensive PostgreSQL back-ends for connections that are currently running queries.

PGBouncer is usually running on the same host as PostgreSQL and listens on any of ports 6542, 6543, 6544. For some clusters (EG, SiteDB and UsherDB) there is a cluster of PGBouncer hosts pointing at the current master.

PGBouncers are currently specialized for their database in one of the instance classes found in the [twitch_pgbouncer module](https://git.xarth.tv/systems/puppet/tree/master/modules/twitch_pgb) as an [instance](https://git.xarth.tv/systems/puppet/tree/master/modules/twitch_pgbouncer/manifests/instance).

PGBouncer Fail-Over
-------------------
Should a master fail or we want to do a manual fail-over when we have a PGBouncer cluster proxying to master:
* choose a new master and make sure it is in the cluster replicating from the current master. this can be:
  * a hot spare or replica already in the cluster *or*
  * a newly added replica added via the process below
* set the twitch_postgresql master_db variable in hiera/cluster to point at the new master
* set the twitch_postgresql master_db variable in hiera/fqdn for old and new master to point at old master
* make sure any special config_entries like `archive_mode` and `archive_command` are set on the new master as needed
* deploy this configuration puppet across the postgresql cluster
* restart the read replicas so they switch replication master
* ensure that the read replicas point to new master such that the cluster should look similar [this](https://git.xarth.tv/twitch/docs/blob/master/d8a/postgresql/failover.png)
* verify that the new master thinks it is replicating to the read replicas with `select client_addr, application_name, backend_start, state, round((pg_xlog_location_diff(sent_location, replay_location))/1024) as kb_behind from pg_stat_replication order by application_name`
* set the twitch_pgbouncer master_host variable in hiera/cluster to point at the new master
* deploy this puppet configuration across the pgbouncer cluster
* **At this point in time, writes will begin failing**
* promote the new master with `sudo -u postgres /usr/lib/postgresql/${postgresql_version}/bin/pg_ctl promote  -D /etc/postgresql/${postgresql_version}/main`
* **At this point in time, writes will resume succeeding**
* consider adding a new replica if you cannibalized the cluster for the new master
* un-set twitch_postgresql master_db variable in hiera/fqdn for new master
* deploy the new puppet configuration to new master
* decommission the old master

HAProxy Fail-Over
-----------------
Should a master fail or we want to do a manual fail-over when there is no PGBouncer running in front of that PostgreSQL master:
* choose a new master and make sure it is in the cluster replicating from the current master. this can be:
  * a hot spare or replica already in the cluster *or*
  * a newly added replica added via the process below
* set the twitch_postgresql master_db variable in hiera/cluster to point to the new master
* set the twitch_postgresql master_db variable in hiera/fqdn for old and new master to point at old master
* make sure any special config_entries like `archive_mode` and `archive_command` are set on the new master as needed
* deploy this configuration puppet across the postgresql cluster
* restart the read replicas so they switch replication master
* ensure that the read replicas point to new master such that the cluster should look similar [this](https://git.xarth.tv/twitch/docs/blob/master/d8a/postgresql/failover.png)
* verify that the new master thinks it is replicating to the read replicas with `select client_addr, application_name, backend_start, state, round((pg_xlog_location_diff(sent_location, replay_location))/1024) as kb_behind from pg_stat_replication order by application_name`
* adjust the appropriate HAProxy [back-end](https://git.xarth.tv/systems/puppet/tree/master/modules/twitch/manifests/haproxy/listeners) to point at the new master
* deploy this puppet configuration to the clients
* **At this point in time, writes will begin failing**
* promote the new master with `sudo -u postgres /usr/lib/postgresql/${postgresql_version}/bin/pg_ctl promote  -D /etc/postgresql/${postgresql_version}/main`
* **At this point in time, writes will resume succeeding**
* consider adding a new replica if you cannibalized the cluster for the new master
* un-set twitch_postgresql master_db variable in hiera/fqdn for new master
* deploy the new puppet configuration to new master
* decommission the old master

Adding Replicas
---------------
To add a replica:
* provision a host for the job either appropriately named or with the right $::cluster fact
* add the node to the twitch_postgresql replication_hosts in hiera/cluster
* add the node to hiera/clean/true.yaml
* run puppet on the host to get basic configuration
* ensure that /var/lib/postgresql is on a mount with enough space for the data
* run puppet on master so it knows about the new node
* as the postgres user on the new replica
  * `service postgresql stop`
  * `bin/provision-replica ${master}`
* once that completes, run puppet again on the replica
* verify the node is available by checking on master with the query `select client_addr, application_name, backend_start, state, round((pg_xlog_location_diff(sent_location, replay_location))/1024) as kb_behind from pg_stat_replication order by backend_start`
* run puppet on the cluster hosts so they know about the replica

Clusters
--------
### SiteDB
* runs on the rails-postgres nodes
* is implemented in the [twitch_postgresql::instance::sitedb class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_postgresql/manifests/instance/sitedb.pp)
* is primarily configured as the [rails-postgres cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/rails-postgres.yaml) through hiera. **NOTE:** Many of the nodes are customized by FQDN in hiera.
* has a standby nodes ready for promotion
* uses the [twitch_pgbouncer::instance::site PGBouncer](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_pgbouncer/manifests/instance/site.pp) on replicas
* has remote PGBouncers proxying to master which are
  * implemented in the [twitch_pgbouncer::instance::proxy class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_pgbouncer/manifests/instance/proxy.pp)
  * configured as part of the [dbproxy cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/dbproxy.yaml)

### UsherDB
* runs on the usher-postgres nodes
* is implemented by the [twitch_postgresql::instance::servicedb class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_postgresql/manifests/instance/servicedb.pp)
* is primarily configured as the [usher-postgres cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/usher-postgres.yaml) through hiera
* uses the [twitch_pgbouncer::instance::usher PGBouncer](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_pgbouncer/manifests/instance/usher.pp)
* has remote PGBouncers proxying to master which are
  * implemented in the [twitch_pgbouncer::instance::usher class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_pgbouncer/manifests/instance/usher.pp)
  * configured as part of the [usher-pgbouncer cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/usher-pgbouncer.yaml)

### TMIDB
* runs on the tmi-postgres nodes
* is implemented by the [twitch_postgresql::instance::servicedb class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_postgresql/manifests/instance/servicedb.pp)
* is primarily configured as the [tmi-postgres cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/usher-postgres.yaml) through hiera
* uses the [twitch_pgbouncer::instance::groupchat PGBouncer](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_pgbouncer/manifests/instance/groupchat.pp)

### DiscoveryDB
* is implemented by the [twitch_postgresql::instance::servicedb class](https://git.xarth.tv/systems/puppet/blob/master/modules/twitch_postgresql/manifests/instance/servicedb.pp)
* is primarily configured as the [discovery-postgres cluster](https://git.xarth.tv/systems/puppet/blob/master/hiera/cluster/usher-postgres.yaml) through hiera
