# Puppet CI

Puppet-CI is:

* a rewritten [Puppet](https://puppetlabs.com/) repository which is tested under continuous
integration via Jenkins that runs the agent at regular intervals, and completes
in a single run.
* lacking a `jtv` user
* without a clean/dirty distinction. All hosts built from Puppet-CI are
"clean".
* Auto-generated nagios
* Coming Soon: Will soon include remote logging support.


## Contents

1. [Getting Started](#getting-started)
2. [Testing with Vagrant](#testing-with-vagrant)
3. [Adding Your Service](#adding-your-service)
4. [Auto-generated Nagios](#auto-generated-nagios)


## Getting Started

### Testing With Vagrant

There is a bit of magic involved setting up Puppet-CI facts so you are currently required to use the following script to properly setup a Vagrant environment:

```bash
$ ./tests/functional <twitch_role> <twitch_environment>
```

By default the `Vagrantfile` will use settings specified in `settings.yaml.example`. If you would like to customize your vagrant settings:

```bash
$ cp settings.yaml.example settings.yaml
```

and make your changes to `settings.yaml`. Modify the `vagrant_vm_box` entry to the base image you want to use.

### AMI

Use `ami-1de7477d` in us-west-2 for precise.
Use `ami-31ea4a51` in us-west-2 for xenial

### Terraform

https://git-aws.internal.justin.tv/systems/terraform/tree/master/modules/puppet-ci

### Building new AMIs

    bash packer.sh

Based on `release/packer-templates`.

## Adding Your Service

Follow our [Puppet & Consul](https://git-aws.internal.justin.tv/twitch/docs/blob/master/release/puppet-consul.md) guide to get started.

## Auto-generated Nagios

Leveraging team-owned Nagios servers, this repo supports automatically creating
and managing your hosts and their services in Nagios. As long as the following
requirements are met:

1. You include `twitch_core` (which happens automatically)
2. The `twitch_team` fact is set to the name of your team's nagios server
   (for example, `twitch_team=systems` for systems-nagios)
3. The `twitch_environment` fact is set to `production`.

The only thing that normally doesn't happen out of the box for new deployments
is setting twitch_team, which can be done as follows:

> echo "twitch_team=systems" > /etc/facter/facts.d/twitch_team.txt

Once the requirements are met and puppet's been run at least once, your Nagios
server will pick up the new host within 30 minutes.

### Services

Out of the box, you get some service checks for free:

* CPU load
* Disk usage
* Memory utilization
* Puppet status (failures, stale runs)

Defining custom services is simple using the `twitch_nagios::service` resource.
All of your custom service modules should include a nagios check inside of them,
so that any server that includes your service will get a functioning Nagios check
automatically without any additional manual intervention.

All services, including the free ones above and custom ones, automatically get
placed into two service groups: "servicename", and "twitch_role servicename".
AKA, an overarching service group for the service across the entire Nagios
installation (E.G. "CPU load"), and a group for the service across just your
cluster (E.G. "syslog-elasticsearch CPU load").

Usage example:

```puppet
twitch_nagios::service {'something':
  is_nrpe => true,
  command => 'check_http -H localhost -p 80',
  description => 'Human readable description',
}
```

Parameters:

- command - REQUIRED. The command to run to check your service. This follows Nagios
  standards. Return 0 for success, 1 for warning, 2 for critical. You can also
  use any of the built-in nagios checks like check_http.
- description - REQUIRED. The human readable description of your service. This
  is what will show up in the Nagios UI for your service.
- is_nrpe - NRPE checks are commands that run on the node itself. Defaults to false.
- * `is_nrpe => true` means the check command will run on the host itself
- * `is_nrpe => false` means the check command will run on the Nagios server
- nrpe_sudo - If this is an NRPE check, whether or not to run your command as root.
  Defaults to false.
- use - Which service template to use. Will rarely need to be changed. Defaults
  to 'generic-service'
- notes_url - A URL that on-call engineers can visit to see how to troubleshoot
  alerts for your service. If you don't provide one, a wiki link will be created
  automatically.
- extra_servicegroups - If you want your service to belong to any service groups
  in addition to the two automatic ones described above, you can supply those here.

#### Recipes

Here's some examples for custom services and how the monitoring definition might
look inside of your modules.

- Do a local HTTP check, from your node, on your custom service:

```puppet
twitch_nagios::service {'myservice1':
    is_nrpe => true,
    command => 'check_http -H localhost -p 8080',
    description => 'My HTTP Service',
}
```

- Do an HTTP check from the Nagios server to your custom service:

```puppet
twitch_nagios::service {'myservice2':
    command => "check_http -H ${::fqdn} -p 80",
    description => 'My Remote HTTP Service',
}
```

- Run a custom script that requires root privileges on your node via NRPE.
  Please node the permissions on the custom script. root:nagios is required
  so that NRPE can read your script.

```puppet
file {'/opt/nagios_checks/check_mything.py':
  ensure => 'present',
  owner  => 'root',
  group  => 'nagios',
  mode   => '0750',
  source => 'puppet:///modules/my_module/check_mything.py',
}

twitch_nagios::service {'custom':
  is_nrpe     => true,
  nrpe_sudo   => true,
  command     => '/opt/nagios_checks/check_mything.py',
  description => 'My Thing',
}
```

We don't yet provide a simple way to put a script on the Nagios server to run
remote checks, so for the moment checks with `is_nrpe => false` can only be
built-in nagios commands such as `check_http`.
