# Adding a new SNS-based Activity
If your activity uses SNS without EventBus, as most activities did in the past, then follow this section. If your activity uses EventBus, see the "Adding an EventBus-based Activity" further down this page.

Connecting a new activity type to Sauron is a somewhat complicated process. It currently involves 5 main steps:

1. [Create config file](#create-config)
1. [Run Code Gen](#run-code-gen)
1. [Create infrastructure](#create-infrastructure)
1. [Set up build/deploy jobs](#build-and-deploy)
1. [Implement handler](#implement-handler)

Ultimately you will be submitting 2 PRs:

1. A simple PR with the jenkins.groovy changes and an empty handler, as described in [this](#build-and-deploy) section. This will be merged to master first.
2. A larger PR that contains your json config changes and resulting code-generated files, which includes go files and terraform files. It also contains any changes you had to manually make to handlers.

We will apply the code and terraform changes for you, once the PR is approved and merged. 

## Create Config
Sauron includes a code generator to translate JSON files into code. You can find the config files in
 [handler_definitions/activities](../handler_definitions/activities).
 Details of the JSON format can be found there.

## Run Code Gen
Use `make` to run the code generator
```
make codegen
```
The code generator will report any changes it makes. It is smart enough to not write files if nothing has changed.

## Create Infrastructure
In order to process events, Sauron needs to subscribe to an SNS topic for that event, hook it up to an SQS queue, and invoke a lambda
to handle messages from that queue. Terraform is used to configure and create these pieces.

The [sns_handler](https://git.xarth.tv/cb/sauron/tree/master/terraform/modules/app/lambda/sns_handler) terraform module
defines what goes into the event lambda configuration. The code generator will automatically create a new instance of this module in
the [main app file](https://git.xarth.tv/cb/sauron/blob/master/terraform/modules/app/main.tf). For handlers that
emit alerts, it will also create code in
[event_lambda_alarms.tf](https://git.xarth.tv/cb/sauron/blob/master/terraform/modules/cloudwath/event_lambda_alarms.tf)
and [variables.tf](https://git.xarth.tv/cb/sauron/blob/master/terraform/modules/cloudwatch/variables.tf).

Ensure that the sns topic provided has sufficient IAM permissions to allow the `twitch-cb-aws` account to subscribe to it.

An example PR for this step can be found [here](https://git.xarth.tv/cb/sauron/pull/110). This PR was done
manually before the code generator existed.

Create a PR for the terraform changes, and ask in #dashboard-feedback for a review. The CB team will review and apply the terraform.

## Build and Deploy
Lambdas in sauron are built and deployed using Jenkins. The code generator will automatically add a definition in
[jenkins.groovy](https://git.xarth.tv/cb/sauron/blob/master/jenkins.groovy),
and then define the build and deploy steps in the same file. The job definition and steps will be named after the activity
in the JSON file.

IMPORTANT: Before deploying the bulk of your changes, we must master-merge a separate PR that ONLY contains the following:
1. The `jenkins.groovy` changes.
2. An EMPTY main function for your lambda.

An example PR for this step can be found [here](https://git.xarth.tv/cb/sauron/pull/119). This PR was done
manually before the code generator existed. You may use codegen to generate the jenkins.groovy file, but these changes must still be submitted separately.

You must create a separate PR for these build changes, which we will merge to master before your other changes. We need to do this because new activity handlers will not build until the `jenkins.groovy` file is in `master`, and those will not work without the empty handler.

## Implement Handler
Now we are ready to actually write the code to implement the event handling that will be run by your lambda. Fortunately,
the code generator should have written most of it for you.

1. It will create the activity type [here](https://git.xarth.tv/cb/sauron/blob/master/activity/type.go)
2. Add the required data to the `Activity` type [here](https://git.xarth.tv/cb/sauron/blob/master/activity/api/get.go)
3. Add the required data to the service proto definition [here](https://git.xarth.tv/cb/sauron/blob/master/rpc/sauron/service.proto)
4. Add the required data to the pubsub schema [here](https://git.xarth.tv/cb/sauron/blob/master/activity/pubsub/schema.go)
5. Add the required data to the api model [here](https://git.xarth.tv/cb/sauron/blob/master/internal/activity/get_activity.go)
6. Add the required data to the `Activity` dynamodb struct, which contains all activity types, [here](https://git.xarth.tv/cb/sauron/blob/master/internal/clients/dynamodb/models.go#L23)
7. Add new models for your activity type in dynamodb [here](https://git.xarth.tv/cb/sauron/blob/master/internal/clients/dynamodb/models.go). Both this and the previous step need to be completed, although this struct should contain the same fields as the previous step. 
8. Add a model for pubsub [here](https://git.xarth.tv/cb/sauron/blob/master/internal/clients/pubsub/models.go)
9. Add a method [here](https://git.xarth.tv/cb/sauron/blob/master/internal/clients/pubsub/client.go) to publish your data to pubsub
10. Create a handler package for your new activity under the `internal/event` directory.
11. Create `generated.go`, `handler.go`, `message.go`, and `validate.go` files in this package that implement your handler logic.
 It will also create the beginnings of a `handler_test.go` file to unit test it.
12. Create a new package in the `cmd/event` directory for your new handler.
13. Create a `main.go` file in your new package, and create a `main` function that invokes your handler.
14. If your activity needs to be converted into a Spotlight alert, it will use the `AlertManager` to get the current alert status, and call the `PublishAlert` method on the `Pubsub` client to send an alert. Example usage is in the [bits handler](https://git.xarth.tv/cb/sauron/blob/master/internal/event/bitsusage/handler.go#L106). If you're unsure whether or not your activity should be an alert, ask in #dashboard-feedback for assistance.

If your activity handler is simple, then you may only need to add code to `handler_test.go`. If the logic is a bit more
complex, you will probably need to modify `handler.go` as well, and you may need to modify `validate.go` to do custom
validation tests. If you modify any of these files, *be sure to* edit the first line, replacing "DO NOT EDIT" with
"EDITED". Once you have done this, your file is protected from being overwritten by the code generator.

An example PR for this step can be found [here](https://git.xarth.tv/cb/sauron/pull/118). Again, this PR
comes from a time before the code generator so all of the changes were made by hand.

Create another PR for these changes, then follow the [build/deploy](./deployment.md) steps.

Feel free to ask #dashboard-feedback on Slack for questions or to ping us for a PR review.

## Required Changes For GQL, Twilight And Pubsub
After adding (or modifying) an activity, you may be wondering what changes are necessary on the GQL edge or in PubSub in order to use the new data on the frontend (such as in Twilight). The following changes are needed as followup:

### PubSub
No services need to be modified for using updates to PubSub. The activity feed PubSub uses a topic in the format `dashboard-activity-feed.[channelID]` such as `dashboard-activity-feed.123456`. Any changes to your data should automatically be reflected in the JSON response payload for that topic. See [this](https://git.xarth.tv/cb/sauron/blob/master/docs/pubsub.md) page for more info on PubSub, including details on how to test for it.

Please do note that, in staging, all pubsub topics are prepended with `pubsubtest.`. For example, the Acitivty Feed topic in staging for channel 123456 will be `pubsubtest.dashboard-activity-feed.123456`.

### graphQL
New activities added to the activity feed must make changes in `edge/graphQL` in order to access activity data loaded from DynamoDB. Whether you add new activities or add/remove/modify fields in existing activities, you must make changes to the following places:

1. The resolver (https://git.xarth.tv/edge/graphql/blob/master/resolvers/queries/dashboard_activity_feed.go)
2. The schema (https://git.xarth.tv/edge/graphql/blob/master/schema/types/dashboard_activity_feed.graphql)

This PR contains some examples of adding the necessary parts to graphQL after corresponding changes to Sauron (though it also includes additional changes): https://git.xarth.tv/edge/graphql/pull/2923

### Twilight
Of course, Twilight changes are strictly frontend client changes, but it should be mentioned that adding new activities does require making Twilight changes to use the new activity, even if PubSub is automatically updated. Namely, you will need to have Twilight look for the new activity type in PubSub and show the proper UI to populate the Activity Feed with it. Similar is true for Alerts.

This [PR](https://git.xarth.tv/twilight/twilight/pull/22772/files#diff-c3daf62e4d6f42a760d19a283c19b346R473) shows an example of the changes required in Twilight after the HypeTrain message changes are added. Note that this PR contains more than just these changes. Specifically, you need to update the `dashboard-activity-feed-node-fragment.gql`, `schema.ts` and `pubsub.ts` files with your new fields. You then need to add the activity type to various places (such as [this](https://git.xarth.tv/twilight/twilight/pull/22772/files#diff-544c8bc0aa1756f1698b24f7f35c2572R31) one) and then make changes in the various other Activity-Feed files shown in the PR to ensure it reads the new items and displays the proper localized text.

# Modifying an Existing Activity
The process for modifying an existing activity, such as to add, remove or modify fields, or to change handler behavior, is simpler than creating a new activity, but involves some of the same steps.

To change fields or configuration behavior (such as timeouts), you need to find the json file for the activity's configuration and modify the relevant parts. These can be found under [handler_definitions/activities](../handler_definitions/activities), which also includes detailed information about the json file configurations. After making the relevant changes to the json file, you must [run the codegen](#run-code-gen). 

The following two PRs show examples of the right changes that need to be made to add new fields:
  - https://git.xarth.tv/cb/sauron/pull/274
  - https://git.xarth.tv/cb/sauron/pull/279
  
Notice that the only changes made that are not from code generation are in the json file. However, if the handlers were modified originally, you may need to make modifications to the corresponding handler as well.

If you need to just make changes to a handler, simply change the relevant `handler.go` for your activity; no codegen changes are necessary. Then make a PR for this and flag us, and we'll deploy it for you using clean-deploy.

Important: Do not manually modify files that were generated using codegen; these can easily be identified by the header:
```
// Code generated by sauron/cmd/codegen; DO NOT EDIT.
```

If you modify a file with this header, your changes will be overwritten by the code generator. If you must change one of these files, remove the header. However, keep in mind that codegen will no longer modify this file, even if new fields are added or removed.


# Adding an EventBus-based Activity
If your activity uses EventBus to trigger lambdas, there is some manual work involved beyond the codegen. This section will outline these steps, but you may wish to ping us at #dashboard-feedback in Slack if you need extra help.

For the most part, the steps for EventBus are similar to the steps in "Adding a new SNS-based Activity" above, with some differences:

1) When creating the json file for your new activity definition, you must add the following property at the root level of the json object:
```
"is_event_bus": true,
```
in addition, you must leave out the entirety of the
```
  "sns": {
```
property. Everything else about the setup for the json file should be the same as described in the "Adding a new SNS-based Activity" section above.

2) After running the codegen script, you must add your handler to `cmd/eventbus/main.go` as done by these activities in this [link](https://git.xarth.tv/cb/sauron/blob/master/cmd/eventbus/main.go#L106)
The client is a bit different per activity but you should be able to use the EventBus helpers here.
3) The codegen will not generate changes to the `Jenkins.groovy` file. This is because it will use the same infrastructure as EventBus already does.
4) You may want to increase the concurrency for EventBus [here](https://git.xarth.tv/cb/sauron/blob/master/terraform/production/main.tf#L55) in prod to match what your service is expected to use.

Overall, the process should be simpler than it was for other services, and the only terraform change required is the increase in reserved concurrency for EventBus, described above.

# Changing an Activity from SNS to EventBus
This process is actually relatively tricky and can cause [issues](https://docs.google.com/document/d/1qNTSbKVyEFcjM6TC95SU2t5OaYYuEwSBEiiYxl4nCp4/edit) if not done properly. The general idea is that some teams may be using Sauron with SNS already but wish to change their feed to use EventBus instead. There are many benefits to using EventBus, so this may be a desirable change, though migrating to it is a non-trivial process.

[This](https://git.xarth.tv/cb/sauron/pull/286) PR does MOST of the work (though notice there is a bug in the json file where the field changed names, causing a regression; more on this later). These are the steps you must take to change an activity to use EventBus from SNS. Let's use Raids as an example, since that's what the PR above covers.

### Steps to Change an Activity to EventBus:
Using Raids as an example

1) Go into the json configuration for your activity (eg `handler_definitions/activities/raiding.json`).
  1a) [Add](https://git.xarth.tv/cb/sauron/pull/286/files#diff-e070c0dc423181f196ee93fcf821caceR2) `"is_event_bus": true,` to the top
  1b) [Remove](https://git.xarth.tv/cb/sauron/pull/286/files#diff-e070c0dc423181f196ee93fcf821caceL22) the entire `sns` property and everything in it.
  1c) [Change](https://git.xarth.tv/cb/sauron/pull/286/files#diff-e070c0dc423181f196ee93fcf821caceR17) the `sns_name` fields in each event to be `event_bus_name` instead; this is the name of each property from the EventBus object. 
  1d) IMPORTANT: DO NOT change the `name` property of existing events. Doing so WILL cause a regression in GQL and will break stuff.
  1e) IMPORTANT: Make sure to add any new fields AFTER existing ones; otherwise the Protobuf will generate out of order.
  
2) Run the codegen. To do this, call `make codegen` from the project root. This should generate all the new needed changes from your json changes above, and should remove unneeded stuff.

3) Replace the previous `main.go` handler for your event with a new empty one, as shown in [this](https://git.xarth.tv/cb/sauron/pull/286/files#diff-d867cb77ad9e4c1b3df1f1c768c15f91R15) example. You can copy the entire file as it is:
```package main

import (
"github.com/aws/aws-lambda-go/lambda"
)

func HandleRequest() error {
	return nil
}

func main() {
	// TODO: This will no-op on requests from the sns queue,
	// but should be handled in the eventbus handler.
	// remove this once the eventbus handler is working
	lambda.Start(HandleRequest)
}
```
We will be deleting this later.

4) Undo the codegen changes made to the `jenkins.groovy` file. We actually still need these so we can deploy the empty file change above. We'll delete them later. This file should be unchanged in your first PR.

5) Run `make proto` to generate any changes to your protobuf, if such changes were made (such as if new fields were added)

6) This one is a bit interesting but we'll want to undo the changes made by codegen to the terraform file `terraform/modules/app/main.tf`. We'll make these changes manually later; we need to do this because terraform often is a pain in the butt, such as in this case.

When you made all these changes, create a PR for it. You'll then want to contact #dashboard_feedback in Slack; we'll need to help with the rest of the changes. 

### Steps to Deploy an activity that was changed to EventBus

This process is fairly complex, since we want to avoid both losing and duplicating data in production. Note that we only need to do this when migrating to EventBus, not when creating new EventBus-based activities. Keep in mind that this process is basically a migration and should be treated as such. Also, do not do any of these steps without the help of a dev from Sauron! We'll work with you to help push the needed changes.

Here are the Steps for Staging:

1) Create a branch for our Staging deploy with all the changes from the previous section.
2) Do a `terraform plan` on Staging. There should be NO changes at all.
3) Go to the [EventBus Dashboard](https://eventbus.xarth.tv/services/22) for Sauron and Edit Subscriptions for Staging. Set the proper event for your service (such as `RaidUpdate` for Raids). It's up to you whether to use Production or Staging for the Staging EventBus. Production events will give you more data, and should not cause duplication issues, but Staging might be more convenient for you to generate events. These will be the events and the data for the events that will be processed by Sauron's EventBus handler.
4) Deploy the staging code in from your branch to [clean-deploy](https://deploy.xarth.tv/#/cb/sauron)
5) Monitor VERY carefully. Check the following:
 5a) [GQL Fields](https://grafana.xarth.tv/d/000001303/graphql-field-details?orgId=1&from=now-2d&to=now-1m&var-region=us-west-2&var-env=production&var-type=DashboardActivityFeedActivityRaiding&var-field=All). For example, for raiding check `DashboardActivityFeedActivityRaiding` under Staging. You may need to push events to GQL's staging environment to test this one.
 5b) Check [Sauron's Grafana](https://grafana.xarth.tv/d/JkVhfvRZk/sauron?orgId=1) under staging to ensure there are no escalation in "All Event Errors", "Lambda Errors", "Queue Depth" and "Message Age".
 5c) Look closely at the All Events Throughput graph to see a transition from `yourevent` to `eventbus.yourevent`. See [this](https://grafana.xarth.tv/d/JkVhfvRZk/sauron?orgId=1&from=1615471850167&to=1615555634465) time range to see an example for `raiding` transitioning to `eventbus.raiding`.
 5d) Let it sit for a bit to ensure we don't get issues down the line (10 minutes should be ok).
6) By now you should be migrated, but we need to delete the old infra. [This PR](https://git.xarth.tv/cb/sauron/pull/289/files#) shows how to do this (and it also [changes](https://git.xarth.tv/cb/sauron/pull/289/files#diff-bbff1556dec4b517b8af98947d7c7607R55) the concurrency for their new activity (in this case, it took 20 from the [previous](https://git.xarth.tv/cb/sauron/pull/289/files#diff-63c9c2ed7aed3f3d915b921f82735177L56) allocation).
   6a) Before you make the change to [remove the module](https://git.xarth.tv/cb/sauron/pull/289/files#diff-23f54dcec838dbd458e88e8b2148503cL387), you need to delete it directly (because Terraform). If you don't, you'll get the error "Provider configuration not present". To delete the module named `my_module`, you'll need to do:
   ```
   terraform destroy -target "module.app.module.my_module" 
   ```
   For raids this would be 
   ```
   terraform destroy -target "module.app.module.autohost_raid_notifications" 
   ```
   You can do a plan for this as well, if you want to verify first, to be safe:
   ```
   terraform plan -destroy -target "module.app.module.autohost_raid_notifications" 
   ```
   6b) After step 6a you can fully [remove the module](https://git.xarth.tv/cb/sauron/pull/289/files#diff-23f54dcec838dbd458e88e8b2148503cL387).
7) Monitor logs again. Same stuff as step 5.

Once you are happy this this, we can move on to prod. Prod is mostly the same as staging, with a few key differences.

Here are the Steps for Production:

1) Once we're happy with our changes, we merge to `master`. MAKE SURE that the changes to `jenkins.groovy` are NOT included here. That is, you should have undid the codegen changes to `jenkins.groovy` for your old activity. VERY important!!!!!!!
1b) Ensure that the `jenkins.groovy` changes from codegen are not in `master` yet. Very important!
2) Do a `terraform plan` in production. The changes should be to add reserved concurrency to eventbus and to remove some alarms on your old service implementation. THAT'S IT. Any other changes should be reversed!
3) Run `terraform apply` to apply the changes above. We'll lose alarms for a few minutes but that's ok.
4) Go to the [EventBus Dashboard](https://eventbus.xarth.tv/services/22) for Sauron and Edit Subscriptions for Production. Add the production endpoints to match the ones you did earlier in staging.
5) Monitor. Same links above. Everything should be no-op right now; that is, should be the same behavior as without our changes. No errors. If there are errors at this point, revert the terraform changes.
6) If all the monitors are clear, deploy `master` in [clean-deploy](https://deploy.xarth.tv/#/cb/sauron) to `production`. This is the big one!
7a) Monitor, monitor, monitor. Check all the same graphs as before.
7b) Manually test to ensure that you are seeing the proper results. Test both a new event (while streaming) and also try refreshing the dashboard in Stream Manager to ensure you get your activity feed results.
7c) If anything fails, roll back the clean-deploy change to the previous change. We then start working from step 6.
8) After we let this sit for a while, if we're happy with results, we can delete the old infra. Same steps as staging.
9) The VERY last change is to run codegen again to generate the `jenkins.groovy` [changes](https://git.xarth.tv/cb/sauron/pull/286/files#diff-704ad7ad2be69c96108eb9e541db071fL20) on a separate branch, which we merge to master.
10) Commit an additional [change](https://git.xarth.tv/cb/sauron/pull/289/files#diff-d867cb77ad9e4c1b3df1f1c768c15f91L1) that removes the old empty handler. This is optional, but we should do it. We need to do this separately because Jenkins sucks.
11) Monitor again.

That's it! You should be migrated without having lost or duplicated data. Easy, right? :kappa.
