# Fluent Bit / Syslog

This template repo is a quick way to spin up a syslog ingest server that pipes
incoming logs directly into a cloudwatch log group. Provides out of box support
for syslog->cloudwatch. Just follow the directions found below to deploy this application.

But it's slightly more than that. This repo builds a custom Docker image in Jenkins from
[aws-for-fluent-bit](https://github.com/aws/aws-for-fluent-bit) and a companion side-car
image for Nginx. You can change the default fluent-bit config and add custom parsers.
That means you can pretty easily setup any type of Log pipeline with this application.
You can even incorporate this custom image into
[FireLens](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_firelens.html)
(but you probably shouldn't need to).

The included [Jenkins pipeline](Jenkinsfile) builds and deploys the included [Dockerfiles](docker/)
to [ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html).

# VIDCS Specific Deploy and Gotchas

Bootstrapping done in us-west-2 to start. For the instructions below, ignore the "gotchas" that are related to Secure TCP/TLS/SSL things. We've stripped down the setup to do unencrypted syslog for now to reduce operational complexity.

The staging infra in the twitch-video-ops-stg account was not built/managed using this repository and is a playground for secure TCP syslog with infoblox made by captain.

# Testing

You can set up a local `docker-compose` and copy paste some temporary access creds to an account (like the `vidcs-dev` account) to test end to end functionality.

1. modify the env vars AWS* in `docker-compose.yaml`
1. build the containers: `docker-compose build --no-cache`
1. Run the compose: `docker-compose up`
1. `nc` some content. To validate fluentbit functionality, make sure the JSON you netcat has a `host` and `ident` key. Ex:
  ```
  echo '<some syslog-rfc3164 compliant logline>' | nc 127.0.0.1 514
  ```
1. There might be a few seconds of delay, but you should get a couple `info` msgs from your docker-compose shell.


# Deploy

## 0. Preliminary

-   This requires that you have a delegated authoritative public DNS zone in your account.
-   Deploy this template repo from Github into your own repo; then clone the new repo.
    -   __Do not fork the repo.__ *Deploy the template-repo to a new repo.*

## 1. Create ECR Repo and Jenkins Role

Create an ECR repository to hold the images Jenkins builds from this repo. Also create
a role that Jenkins assumes to upload new images and update/reload ECS tasks.
That's accomplished by deploying the [cfn-jenkins.yml](cloudformation/cfn-jenkins.yml)
CloudFormation template, like this:

```
aws --profile=<profile> --region=us-west-2 cloudformation create-stack --capabilities CAPABILITY_NAMED_IAM \
    --stack-name fluentbit-jenkins-role --template-body file://cloudformation/cfn-jenkins.yml
```

The region doesn't matter. This currently just creates roles that are global IAM resources.
Jenkins creates the ECR repos before uploading the built images. After deploying the
[cfn-jenkins.yml](cloudformation/cfn-jenkins.yml) template, trigger a Jenkins rebuild.

### What's that do?

-   Creates an ECR repo named `twitch-fluentbit-syslog`
-   Creates a role named `twitch-fluentbit-syslog-jenkins-ecs-ecr`
    -   The image must be deployed with Jenkins before tasks that use the "missing" ECR image can be created.
    -   Jenkins must have access to deploy the image before running the CloudFormation that creates the tasks.
    -   This all means the role cannot be created by the same cloudformation. :(
-   The role gives Jenkins access to push to ECR and reload the ECS tasks deployed by this CloudFormation.

## 2. Configuration

Edit [Jenkinsfile](Jenkinsfile) and correct the account IDs in the `parameters` section.
You can add the service (usually `syslog`) as the default to the `RELOAD_SERVICE` parameter.
Doing this will trigger ECS service reloads after Jenkins deploys new images. Not doing
this means you will have to manually trigger a service reload after Jenkins uploads new images.

The default configuration sets up fluent-bit to ingest logs through a NLB using TLS
in `syslog-rfc5424` format. The logs are output to CloudWatch directly which routes them into a
dedicated log group. You can change this behavior by updating [fluent-bit.conf](configs/fluent-bit.conf)
in the new repo. You may also add custom parsers to the [configs/](configs/) folder.

Push these changes to the new repo in the `master` branch. Jenkins should pick
up the job and run the pipeline to build and deploy the docker images to ECR.

Now you may use the new images.

## 3. Bootstrap Deploy of Fluent Bit

Deploy the [cfn-template.yml](cloudformation/cfn-template.yml) CloudFormation template.
The `ServiceName` parameter is prefixed to the `HostedZoneName` and also used in AWS resource names.
Here's a working CLI example (specifically for Infoblox usage, see Special Considerations below):

```
# us-west-2 (PROD)
aws --region=us-west-2 cloudformation create-stack \
  --stack-name fluentbit-syslog \
  --template-body file://cloudformation/cfn-template.yml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameters ParameterKey=AttachSGs,ParameterValue=sg-c60157bf\
  ParameterKey=AttachSubnets,ParameterValue='subnet-893c07ff' \
  ParameterKey=VPC,ParameterValue=vpc-23fa3f44 \
  ParameterKey=HostedZoneName,ParameterValue=prod.video-systems-logging.live-video.a2z.com
```

### What's that do?

-   Creates a DNS entry in your provided public hosted zone. `ServiceName.HostedZoneName.`
-   Creates 3 IAM roles; 1 for the Task Role, 1 for ExecutionTask and 1 for AutoScalingTarget.
-   Creates a serverless ECS Fargate cluster and service running fluent-bit.
-   Sets up auto scaling for the ECS service.
-   Puts a NLB w/ the ACM cert in front of the ECS cluster.
-   Attaches a port 514 listener to the NLB.
-   Configures an initial log group.

# Updates

If you want to change the CloudFormation, both plans are completely safe to re-deploy/update.

This rest of this section generally applies when you want to change the
[fluent-bit config](configs/fluent-bit.conf), or if you want to add custom parsers, filters, or routers.

## Update Prod Deploy

__To deploy to production__, rebuild a `master` branch build and check the `DEPLOY_PROD`
checkbox.

## Update Cfn 

Create Change Set

```
# us-west-2 (PROD)
aws --region=us-west-2 cloudformation create-change-set \
  --change-set-name <ticket-num> \
  --stack-name fluentbit-syslog \
  --template-body file://cloudformation/cfn-template.yml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameters ParameterKey=AttachSGs,ParameterValue=sg-c60157bf\
  ParameterKey=AttachSubnets,ParameterValue='subnet-893c07ff' \
  ParameterKey=VPC,ParameterValue=vpc-23fa3f44
```

the change set name should be unique so please use a VIDCS ticket number to identify your change set.

View the diff:

```
# us-west-2 (PROD)
aws --region=us-west-2 cloudformation describe-change-set \
  --change-set-name <ticket-num> \
  --stack-name fluentbit-syslog
```

Execute the change set:

```
aws --region=us-west-2 cloudformation execute-change-set \
  --change-set-name <ticket-num> \
  --stack-name fluentbit-syslog
```

# Security

Nginx creates a self-signed cert using the CA and CA key in this repo. It uses
them for SSL termination *for appliances that do not support using a hostname*.
The command below was used to create the CA.

Creating a new CA and certs is shown below.
You can use this cert in an external appliance that supports mutual auth, or an appliance
that only supports IP ingestions and expects an IP in the SSL certificate name (Infoblox).

```
# Make CA key and cert.
openssl req -x509 -nodes -new -days 3650 -config ssl/ca.cnf -newkey rsa:4096 -keyout ssl/CA.key -out ssl/CA.crt \
            -subj "/C=US/ST=California/L=San Francisco/O=Systems/OU=Systems/CN=FluentBit-CA"
```

The security of this configuration should be considered before deploying it to production.
What are the consequences if the CA key is stolen? Considering how this is used the
security consequences are probably negligible, but still worth consideration.

## Special Considerations

Any device that can send logs to syslog in RFC5424 format using TLS will work
out of the box, and there is nothing special needed besides deploying this
stack of software. You can send logs to tls://your.hostname:20514. In this
configuration it is recommended to deploy to at least 2 subnets with at least 2
tasks for redundancy. 3 or more tasks is OK too.

If your device, such as Infoblox, can only connect to an IP address to send logs
(as opposed to a hostname), then there are some special configurations to consider.
You must edit [server.cnf](ssl/server.cnf) after deploying this stack and update
it with the NLB IPs. This will cause the Nginx certificate to regenerate with these
IPs added to the alt name. This is required for Infoblox and perhaps other network
appliances. It is additionally recommended to choose only one subnet when deploying
the [CloudFormation stack](cloudformation/cfn-template.yml). Doing this ensures
that your logs are routed to multiple instances. If your logs require redundancy,
then it is further recommended to create two separate stacks in two separate
regions and point your appliance logs to two IPs, one in each stack.

# Usage

So you've installed this stack. Now what how do you send logs to the thing, right?

## Rsyslog

Here's an example rsyslog snippet that sends logs securely, just make sure you picked
a __security group__ you can send logs through on port *20514*.
This sends logs in RFC5424 format using TLS with cert name and CA validation.

```
global(DefaultNetstreamDriverCAFile="/etc/ssl/certs/ca-certificates.crt")

*.* action(type="omfwd" protocol="tcp" StreamDriver="gtls" StreamDriverMode="1"
           StreamDriverAuthMode="x509/name"
           target="syslog.nlb.yourservice.twitch.a2z.com"
           port="20514" template="RSYSLOG_SyslogProtocol23Format")
```

## Infoblox

Infoblox is weird, and it took a while to make it work. It requires that you
upload a valid CA cert, and it requires that your SSL-terminated syslog endpoint
is signed by that CA cert. Furthermore the SSL cert on the endpoint must present
an IP address as an alternate name. This is a non-standard configuration and
requires some tinkering.

You should deploy this stack to two regions for high availability. Choose only
one subnet per region and run at minimum 2 tasks (this is the default). Once
the stack comes up find the load balancer IPs in each region. Each LB will have
only 1 IP, so you'll have two total. The easiest way to find them is to do a DNS
lookup against `<ServiceName>.<region>.<HostZoneName>`. Once you have the two IPs
replace the two IPs found in [server.cnf](ssl/server.cnf#L13-L14) with them and
push your changes into the repo. The new images Jenkins builds will be ready.

Once the new images are deployed and running you can use those two IPs as syslog
endpoints in Infoblox. You'll send logs to port `6514`. Choose `Secure TCP` as
the transport method. The NLB IPs will never change unless you destroy the NLB.
