# SiteDB
The Twitch central database currently runs in 2 regions with a master
rooted in AWS us-west2. This document covers the configuration and
operation of the cluster.

The cluster is known as rails-postgres in Ganglia and Puppet.

Most of the functionality is owned by the postgres user on the node,
for example, cron jobs. The postgres user is a superuser in the
database and can only connect on the local host as the postgres user.

## Topology
SiteDB is a traditional single-master PostgreSQL cluster.

<a name="master"></a>
**Master:** `rails-postgres-9d73c245.production.twitch-web-aws.us-west2.justin.tv`

You can find the replicas streaming from master with the query:

`select client_addr, application_name, backend_start, state, round((pg_xlog_location_diff(sent_location, replay_location))/1024) as kb_behind from pg_stat_replication order by application_name`

### Hot Spares
Of the streaming replicas, most are in us-west2 and serving traffic to
the website. There are two hot spares which can be promoted to master
if needed.

<a name="hot_spares"></a>
* **Hot Spare:** `rails-postgres-714628b6.production.twitch-web-aws.us-west2.justin.tv`
* **Hot Spare:** `rails-postgres-8a58e952.production.twitch-web-aws.us-west2.justin.tv`

### "Region Master"
We use the term "Region Master" to mean a box which is not actually
master, and is just the root of a replication tree in another
region. There are two replicas in sfo01 streaming from the
[master](#master) which are not serving traffic and can serve as a
region master. One of them is the root in sfo01 and the other is a
spare in case we lose the live region master.

<a name="region_masters"></a>
* **sfo01 Region Master:** `rails-postgres-781339.sfo01.justin.tv`
* **sfo01 Spare Region Master:** `rails-postgres-7e9c10.sfo01.justin.tv`

### Read Replicas
The read replicas replicas are not partitioned by
application. Different applications are running in sfo01 and us-west2
so the workload varies by region and the workload on a read replica is
essentially the same across the region.

Current Replicas:
<a name="read_replicas"></a>
* us-west2
  * rails-postgres-08a9fed2.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-09a9fed3.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-0c442acb.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-13bbcfd4.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-72442ab5.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-73442ab4.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-a8f8af72.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-b2da646a.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-dd58e905.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-de58e906.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-df58e907.production.twitch-web-aws.us-west2.justin.tv
  * rails-postgres-f7f8af2d.production.twitch-web-aws.us-west2.justin.tv
* sfo01
  * rails-postgres-755910.sfo01.justin.tv
  * rails-postgres-780e1d.sfo01.justin.tv
  * rails-postgres-7c2299.sfo01.justin.tv
  * rails-postgres-7c22a5.sfo01.justin.tv
  * rails-postgres-7e88e0.sfo01.justin.tv
  * rails-postgres-7e9c18.sfo01.justin.tv
  * rails-postgres-7e9cc4.sfo01.justin.tv

## PGBouncer
SiteDB uses PGBouncer in transaction mode to do connection
pooling. PGBouncer is listening on ports `6543` and `6542` with two
different processes and then routes to PostgreSQL on port
`5432`.

The PGBouncer instances are running on the read replicas and special
proxy nodes which route queries to [master](#master).

### On Replicas
On a read replica, PGBouncer is running on the local node and sends
every query to the local PostgreSQL instance. The bouncers can be
controlled through the init scripts
`/etc/init.d/pgbouncer_transaction1` and
`/etc/init.d/pgbouncer_transaction2`. Configuration can be found in
the `/etc/pgbouncer/trasaction*` files.

### us-west2 rails-pgbouncer cluster
On the us-west2 rails-pgbouncer cluster, PGBouncer and sends every
query to the currently configured master server. Configuration can be
found in the `/etc/pgbouncer/proxy*` files. The bouncers can be
controlled through the init scripts `/etc/init.d/pgbouncer_proxy1` and
`/etc/init.d/pgbouncer_proxy2`.

Each node in the rails-pgbouncer cluster additionally listens on port
`12005` with HAProxy which does protocol agnostic proxying to the
localhost PGBouncer. There should be no traffic going through it at
this time. HAProxy can be controlled with the init script
`/etc/init.d/haproxy_backend` and configured in
`/etc/haproxy/haproxy-backend.conf`.

#### Current membership:
<a name="rails-pgbouncers"></a>
* rails-pgbouncer-0bb548d3.production.twitch-web-aws.us-west2.justin.tv
* rails-pgbouncer-18c9b8df.production.twitch-web-aws.us-west2.justin.tv
* rails-pgbouncer-3c74c0fb.production.twitch-web-aws.us-west2.justin.tv
* rails-pgbouncer-98f2a542.production.twitch-web-aws.us-west2.justin.tv
* rails-pgbouncer-a810a170.production.twitch-web-aws.us-west2.justin.tv
* rails-pgbouncer-db1b7401.production.twitch-web-aws.us-west2.justin.tv

### sfo01 dbproxy cluster
On the sfo01 dbproxy cluster, PGBouncer and sends every
query to the currently configured master server. Configuration can be
found in the `/etc/pgbouncer/proxy*` files. The bouncers can be
controlled through the init scripts `/etc/init.d/pgbouncer_proxy1` and
`/etc/init.d/pgbouncer_proxy2`.

Each node in the dbproxy cluster additionally listens on port `12005`
with HAProxy which does protocol agnostic proxying to a
rails-pgbouncer instance in us-west2. HAProxy can be controlled with
`/etc/init.d/haproxy_backend` and configured in
`/etc/haproxy/haproxy-backend.conf`.

#### Current membership:
<a name="dbproxies"></a>
* dbproxy-7e8f88.sfo01.justin.tv
* dbproxy-a000ad.sfo01.justin.tv
* dbproxy-a0c35c.sfo01.justin.tv

## Hardware
### sfo01
The sfo01 machines are 32 core machines with 128GB of memory and a
hardware RAID 10 array available at `/mnt/media` using xfs yielding
3.5T of available space.

### us-west2
All SiteDB machines in us-west2 are running on i2.8xlarge with the
instance SSDs in a software RAID 10 array on `/mnt/media` using xfs
giving 3.0T of available space.

## Recover From Failed Master
In the event of a [master](#master) failure, we need to recover the
cluster. Look at the hot spares to ensure they are reasonably caught
up (likely) and choose the host spare which has the latest WAL in the
timeline. Choose one as the future master.

**TODO:** Add instructions on this. It looks like we can co-opt [this](http://www.databasesoup.com/2012/10/determining-furthest-ahead-replica.html).

1. Attach the [region masters](#region_masters), us-west2
   [read replicas](#read_replicas), and the other spare's WAL stream
   to the future master. To do this, edit the
   `/var/lib/postgresql/9.3/main/recovery.conf` setting
   `primary_conninfo` to the future master by name.
2. Restart PostgreSQL with `service PostgreSQL restart` as the
   postgres user on those hosts to pick up the new configuration.
3. Reconfigure the [dbproxy](#dbproxies) and
   [rails-pgbouncer](#rails-pgbouncers) clusters to point at the
   future master. To do this, set the [databases] section of
   `/etc/pgbouncer/proxy{1,2}.ini` to point at the future master ip
   address.
4. Reload all the PGBouncer instances on the [dbproxy](#dbproxies) and
   [rails-pgbouncer](#rails-pgbouncers) clusters with `sudo
   /etc/init.d/pgbouncer_proxy1 reload && sudo
   /etc/init.d/pgbouncer_proxy2 reload` to pick up the new
   configuration.
5. Promote the future master to master with
   `/usr/lib/postgresql/9.3/bin/pg_ctl promote -D
   /etc/postgresql/9.3/main` as the postgres user.
6. Copy the new timeline to the backup host with `scp
   9.3/main/pg_xlog/*.history
   barman@postgres-backup-4cf1dc.sfo01.justin.tv:~/backups/sitedb/history`

## Recover From Failed Region Master
1. Attach the sfo01 [read replicas](#read_replicas) the other region
   master. To do this, edit the
   `/var/lib/postgresql/9.3/main/recovery.conf` setting
   `primary_conninfo` to the other region master by name.
2. Restart PostgreSQL with `service PostgreSQL restart` as the
   postgres user on those hosts to pick up the new configuration.

## Data Provisioning
We have an in-house script we use for provisioning data in our
PostgreSQL nodes. When we have a box that we want to connect to a
cluster, as the postgres user, stop PostgreSQL on the box and run

`~$ bin/provision-replica ${upstream_host}`

where the `upstream_host` is the host which will provide a base
backup. Once the base backup completes, the `provision-replica` script
will attach the local PostgreSQL instance to stream from the
`upstream_host` and restart.
