# `team-syslog`

## Team Stack Overview

This module creates an ingestion service which can be used to receive logs via
syslog, [JSONL][2] over TCP, or HTTP and send them to central logging.  It will create
some infrastructure to accomplish this.

It will also create an S3 bucket which
you will be responsible for.  This is your bucket because it contains your logs.
Any object written to this S3 bucket will notify an SQS queue to allow it to be
further processed.  The object *MUST* be [JSONL][2] or line-oriented JSON.  Specifically,
each line must be one full JSON object.  It *SHOULD* contain the following fields:

* `@timestamp`
* `message`

It *MAY* contain any other fields to be indexed.

All fields are translated into strings.

## Standalone Use

If logs should not be indexed but instead housed only in S3 , the SQS trigger
in S3 should be removed as well as the SQS queue itself, 

If the SQS queue is left as is, the messages will simply be deleted after 14
days.

## Central Logging Overview

This is a component in a larger system made up of a few generic terraform
modules and any number of implementation modules which take advantage of the
generic modules.

This module is the team-specific portion of the stack viewed in the diagram
in the [`central-elk README.md`][1] as `Team A's ownership`, the entire 
upper half.

## Data flow

1. Application sends to local machine syslog
2. Syslog forwards to central syslog aggregator (`team's logging endpoint`)
3. Aggregator uploads batched logs in [JSONL][2] to S3
4. S3 trigger adds item to SQS queue for receipt by central logging on Systems side

## Usage

Some understand of Terraform in general is recommended to follow this.  It uses
no Twitch-internal terraform modules and doesn't share state with any other
systems so running this should be similar to running any other terraform.

To use this module, you will need permissions to make EC2 instances, ELBs, S3 buckets,
SQS queues, and to set permissions for these objects.

There are two accounts involved:

* twitch-science-aws (colloquially known as twitch-aws)
* the account which will house the stack

You will need to have a profile in `~/.aws/credentials` for these.

In the sample usage below, I've named the AWS profile `my-team-dev` and the
convention we usually use is to put terraform tfstate files into s3 in a bucket
named the same as the account.  In this case
`s3://my-team-dev/development/my-team-syslog/terraform.tfstate`.  This is only
a convention.  Follow your own team's policies or the policies for the account
where the stack is being placed.

You will also need "`internal_dns_modify`" permissions in `twitch-science-aws`.
Ask in #systems for this if you see errors about Route53.  The profile name is
currently expected to be `twitch-aws` so if you only have `twitch-science-aws`,
copy the credentials to a new section in your ~/.aws/credentials file.

You will also need to decide on a name for the stack.  This will be the name of
the S3 bucket and the name of the indexes in Elasticsearch.  It will also be
part of the DNS name of the service.  If this is for your entire team, it would
be sensible to name it after your team as is seen in the example below.  Use
your judgement to pick a name.  The stack name is `my-team-syslog` in the
example below.

```
# This is where your tfstate file will be stored.  This should meet the
# requirements of your team.
terraform {
  backend "s3" {
    profile = "my-team-dev"
    bucket = "my-team-dev"
    key    = "development/my-team-syslog/terraform.tfstate"
    region = "us-west-2"
  }
}

# Rename the profile to match the account where you want to stand up
# this service.
provider "aws" {
    profile = "my-team-dev"
    region  = "us-west-2"
}

# Customize these variables.
variable "service"     {default = "my-team-syslog"}
variable "environment" {default = "dev"}
variable "owner"       {default = "my-team"}
variable "owner_email" {default = "my-team@justin.tv"}

# Maybe customize these, too
data "aws_security_group" "sg" {
    name = "twitch_subnets"
    # This will need to be set if you have more than one VPC with a
    # twitch_subnets security group in it
    # vpc_id = "vpc-abcdef1234"
}

# And possibly this if you have differently named subnets.  These work for most
# accounts at Twitch.
data "aws_subnet" "subnet_a" { tags { "Name" = "Private - A" } }
data "aws_subnet" "subnet_b" { tags { "Name" = "Private - B" } }
data "aws_subnet" "subnet_c" { tags { "Name" = "Private - C" } }

# You shouldn't need to customize these unless you need functionality from `Advanced Usage`
# To get started, you can leave these alone and come back to it in the future.
module "syslog" {
    source = "git::git+ssh://git@git-aws.internal.justin.tv/terraform-modules/team-syslog.git?ref=v1.0.0"

    environment = "${var.environment}"
    security_groups = ["${data.aws_security_group.sg.id}"]
    subnet_ids = ["${data.aws_subnet.subnet_a.id}", "${data.aws_subnet.subnet_b.id}", "${data.aws_subnet.subnet_c.id}"]
    service = "${var.service}"
    owner = "${var.owner}"
    owner_email = "${var.owner_email}"
}

output "elb_fqdn"             {value = "${module.syslog.elb_fqdn}"}
output "log_bucket"           {value = "${module.syslog.log_bucket}"}
output "queue_account_number" {value = "${module.syslog.queue_account_number}"}
output "s3_bucket_name"       {value = "${module.syslog.s3_bucket_name}"}
```

Deploy this via standard terraform using terraform 0.10 or higher.  If you're not
familiar with terraform, it might be a good time to read the basics but here's a
helping of copypasta.

```
# #maybe the next line if you haven't done it yet
# terraform init
terraform get
terraform plan
# #make sure nothing dumb is happening and uncomment the next line
# terraform apply
```

Once you've deployed this configuration, the output of terraform will need
to be sent to the systems team in order for the central stack to be instructed
to pull from the SQS queue for the team's syslog stack.  This will look something
like the following:

```
Apply complete! Resources: 0 added, 0 changed, 0 destroyed.

Outputs:

elb_fqdn = my-team-syslog.dev.us-west2.justin.tv
log_bucket = my-team-syslog-dev
queue_account_number = 012345678901
s3_bucket_name = my-team-syslog-dev
```

Test your stack by sending a message to it!  Remember to replace the
placeholders below with the actual output from your terraform.

```
echo '{"message": "hi", "@timestamp": "'"`date -u +"%Y/%m/%dT%H:%M:%SZ"`"'"' | nc my-team-syslog.dev.us-west2.justin.tv 7777
aws s3 ls --recursive s3://my-team-syslog-dev/
```

Grab one of those files and see what's in it!  There will be some health checks
from the ELB and your greeting in one of them.

## Shipping logs

Logs may be submitted via syslog.  A file added to `/etc/rsyslog.d` with these contents will
enable logs to be sent to the above sample cluster:

```
$SystemLogUsePIDFromSystem on

$template jsonl,"{%timestamp:::date-rfc3339,jsonf:@timestamp%,\"@message\":\"%msg:::json%\",%fromhost:::jsonf:host%,%syslogfacility-text:::jsonf:syslog_facility%,%syslogfacility:::jsonf:syslog_facility_code%,%syslogseverity-text:::jsonf:syslog_severity%,%syslogseverity:::jsonf:syslog_severity_code%,%app-name:::jsonf:program%,%procid:::jsonf:pid%}"

*.warn @@my-team-syslog.dev.us-west2.justin.tv:7777;jsonl
```

This can be simplified via the supported [puppet module][4].

## Redeploying/Upgrading

If it is necessary to redeploy this module, follow these manual steps to control the deployment process:

* Make changes in terraform
* Terraform apply
  * Ensure that the only resource that will be destroyed is the launch configuration
* Increase desired capacity of ingress ASG to double its current value (increase max if necessary)
* Wait for capacity in the ASG to increase to the new double capacity
* Gracefully shut down older ingress nodes (AWS terminate is fine)
  * You may allow the ASG to replace the old hosts with new hosts
* Change desired capacity to the original value
  * Allow the ASG to terminate whatever set of hosts it chooses

#### Caveat

This deployment method relies on logstash's ability to flush its logs before shutting down.

If your logging stack is currently buffering logs because of degraded performance, give the ingress hosts
time to shut down -- DO NOT log into the box and `kill -9` logstash.

If your logging stack is currently unable to save logs at all (e.g. because the network is broken), logs that
are currently in the internal buffer of logstash will be lost.  If this is an unacceptable loss, connectivity
will need to be restored before following the plan detailed above.

## Advanced Usage

#### Please contact the systems team for assistance in using this functionality.

You may also setup filters, inputs, and outputs in logstash by setting the
`logstash_filters`, `logstash_inputs`, and `logstash_outputs` parameters.

For inputs and outputs, there's also a `_pre` version of the parameter if
you require your configuration to go before the default configuration items.

All of these parameters are placed verbatim into the logstash configuration
in the `filters {}`, `inputs {}`, and `outputs {}` sections.


[1]: https://git-aws.internal.justin.tv/terraform-modules/central-elk/blob/master/README.md
[2]: http://jsonlines.org/
[3]: https://git-aws.internal.justin.tv/terraform-modules/central-elk-lambda-bridge/blob/master/README.md
[4]: https://git-aws.internal.justin.tv/systems/puppet/blob/master/modules/twitch_syslog/manifests/central_syslog.pp
