# Backfilling

Backfilling stats without thought would cause resource constraints on the Redshift cluster and DynamoDB tables.

This is because the cron environment queries Redshift on a regular interval to ensure users get the the freshest stats possible.

## Precautions

The processing of time series stats depends on the presence of sessions data in DynamoDB.

When backfilling a certain time range, sessions data **must** be completely processed first (i.e. queried from Redshift and then inserted into DynamoDB), before the processing of any time series stats could be initiated.

## Steps
1. Connect to the VPN
1. From the Elastic Beanstalk console, for the **cron** environment, set the environment variable `BACKFILL_IN_PROGRESS` to `true`.

    - This pauses the **cron** environment's querying of Redshift to free up the Redshift cluster's availability.

2. Within the script file ([`cmd/backfill/main.go`](../cmd/backfill/main.go)), change the following parameters:

    - SQS queue URL for ingests ("production" or "staging")
    - Backfill time range
    - Specific stats to be backfilled

3. Run the backfill script:

    ```sh
    go run cmd/backfill/main.go
    ```

4. Monitor your backfilling progress via [Grafana](https://grafana.internal.justin.tv/dashboard/db/cb-semki?orgId=1&from=now-30m&to=now).

5. Reset the `BACKFILL_IN_PROGRESS` environment variable back to `false`.
