# Firehose Logger

## Introduction
Its a request logger for the s2s(Malachai), it logs the data locally to disk at the configured location `(default: /var/log/malachai)`
Log files are rotated after a file reaches `100MB`; rotated log files are kept for upto `7 days` on disk.

The logs are also sent to AWS Firehose in the background every 60 seconds. The logs are never deleted from firehose.

Firehose logger uses [Bolt DB](https://github.com/boltdb/bolt) to store the information about logs lines that have been successfully sent and stored in firehose.

Keys Stored in Bolt.
- **lastProcessedFileName**
- **fileOffset**

Logger uses above 2 pointers to determine the file and offset for processing. Logs lines starting at the offset are sent to fireshose.

## Log Rotation
[Log Rotator](https://github.com/natefinch/lumberjack) is used for log rotation, the log files are rotated when they reach the max allowed size. 
All the rotated file are suffixed with timestamp in (format: YYYY-MM-DDTHH24-MI-SS.x`xx; where xxx is millisecond precision)  

## Implemenation
Logs are written on local disk at configured location, before they are processed and sent to AWS Firehose. The log lines are sent to Firehose in batch of 500 log lines per request or upto 4MB in size at a time. Once the logs are successfully stored in Firehose, **fileOffset** and **lastProcessedFileName**  pointers are updated to reflect the progress. Once the boltdb is updated, logs lines upto that offset are considered fully processed.

On the next processing run, **fileOffset** and **lastProcessedFileName**  is read from boltdb, and logs lines written after that marker are processed in similar manner.

If a file was rotated one or more times between 2 processing run, the files are processed in the order they were written to disk.
Since the file name contains the timestamp when they were written to disk, rotated files are sorted based on filename and offset stored in boltdb is assumed to represent the offset of the oldest file.

Here is an example of how the files will be processed:

| Timestamp | fileOffset | lastProcessedFileName | list of files | description | 
| --------- | ------ | --------------------- | ------------- | ----------- |
| t0 | 0 | "" | c.log| start state|
| t1 | 123 | "" | c.log| c.log file was partially procssed, logs till offset 123 were sent to firehose|
| t2 | 123 | "" | c.log, t2.log | file was rotated |
| t3 | 123 | "" | c.log, t2.log, t3.log | file was rotated again |
| t4 | 45 | "t3.log" | c.log, t2.log, t3.log | t2.log and t3.log was full processed, c.log was partially processed till offset 45 |
| t5 | 45 | "t3.log" | c.log, t3.log, t5.log | file was rotated, and t2.log was deleted because it was more than 7 days old. |

**Note** On the next run, file *t5.log* will be read from offset 45 and not *c.log*

## Querying Logs
[AWS Athena](https://aws.amazon.com/athena) can be used to query logs stored in firehose.
More information can be found [here](https://wiki.twitch.com/display/SSE/S2S+Auth+Logging)


## Caveats
1. If for any reason, logs were not sent to fireshose before they were deleted from firehose, the logs would be considered lost.
2. If logs were successfully sent to firehose, but offset and last processed file name are not updated due to any error, those logs will appear twice in firehose.
