# Setting up App Monitoring

## Overview

1. PagerDuty
   1. Configure Pagerduty
   1. Create SNS Topic
1. Cloudwatch
   1. Metrics
   1. Alarms
   1. Dashboards
1. Pingdom

## Pagerduty

### Configure Pagerduty

1. Go to
   [EP's Service Directory](https://twitchoncall.pagerduty.com/service-directory/?direction=asc&query=&sort_by=name&team_ids=PBDL32I)
1. Click 'New Service'
   - Fill in name and description
   - Choose 'Amazon Cloudwatch' as the Integration Type, and give the
     integration a name
     - Example: 'Higgs Boson-based Starshot LG Prod'
   - Choose 'Mobile-Web Alert' as the Escalation Policy
   - Click 'Add Service'
   - [Starshot Example](https://twitchoncall.pagerduty.com/service-directory/PWXDOF5)
1. Under the 'Integrations' tab for the new service:
   1. Click 'Add or manage extensions'
   1. Choose 'Slack' as the 'Extension Type'
   1. Enter 'mobile-web-channel' as the 'Name'
   1. Click 'Authorize'
   1. In the Slack screen that shows, enter 'tmw-alerts' as the channel and
      click 'Allow'
1. Click on the integration (Example: 'Higgs Boson-based Starshot LG QA')
1. Copy the integration URL (Example:
   https://events.pagerduty.com/integration/3e17dd0f0098463a80cc47d32fd607d0/enqueue).
   This will be used by the SNS Topic.

### Create SNS Topic

1. Go to
   [EP's SNS Topics](https://us-west-2.console.aws.amazon.com/sns/v3/home?region=us-west-2#/topics)
1. Click 'Create topic'.
   1. For "Type" select "Standard".
   1. For "Name" use "TachyonReplaceWithAppName_alarm_to_PagerDuty".
   1. Use the default settings for everything else. Then press "Create topic".
1. Once created, click 'Create subscription'
1. Select 'HTTPS' as the 'Protocol'
1. Use the Integration URL provided by the PagerDuty integration overview
   (Example:
   https://events.pagerduty.com/integration/3e17dd0f0098463a80cc47d32fd607d0/enqueue)
   as the 'Endpoint' -
   [Valence Example](https://us-west-2.console.aws.amazon.com/sns/v3/home?region=us-west-2#/topic/arn:aws:sns:us-west-2:015957721237:TachyonValence_alarm_to_PagerDuty)

## Cloudwatch

[EP's Cloudwatch Overview](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2)

### Metrics

1. Go to the
   [Log groups](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups)
   view in CloudWatch
1. Select the log group for your app
   - [Valence Example](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/valence_wxytehjpbfegnjhpvtrn)
1. Create metrics filters
1. Use existing metrics filters as a reference
   - [Starshot Example](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/starshot_lkzs92r230i7zisfd421y$23metric-filters)

### Alarms

1. Go to the
   [Alarms](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#alarmsV2:)
   tab
1. Create new alarms
   - Naming convention is '{App name} {alarm description}'
1. Use an existing App alarm set as a reference.
   - [LG Prod Example](<https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#alarmsV2:?~(search~'LG*20Prod)>)
   - Note: metric filters are only selectable once a metric has been reported at
     least once. For the unhandled server error alarm, you will need to manually
     trigger a server error. You can do so by visiting
     `/not-found?fire-drill=true` for the application.
1. After setting up the unexpected error alarm for another unexpected error to
   test the entire Pagerduty <-> CloudWatch alarms integration by visting
   `/not-found?fire-drill=true` for the new app that is being monitored.

### Dashboards

1. Go to the
   [Dashboards](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#dashboards:)
   tab
1. Create new dashboard
   - Click 'Create dashboard'
1. Use an existing dashboard as a reference
   - [Starshot Example](https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#dashboards:name=Starshot)

#### Flume & CloudWatch Insights

Flume dashboards are generated using CloudWatch Insights queries. Insights does
not support "group by" style visualizations. To visualize on more than one
dimension, you need to manually define the conditions to "sum" on.

For example, to view minutes watched by location:

```
stats sum(context.location = 'channel'), sum(context.location = 'vod') by bin(1m)
| filter @message like /logFlume/
| filter message = 'minute-watched'
```

## Pingdom

This step must be completed by Matt (@mfollem).

## Grafana

1. Create a new ECS Dashboard like
   [this one](https://grafana.xarth.tv/d/wf5_4-KGk/starshot).
1. Ensure that the app is selected in the
   [Fastly](https://grafana.xarth.tv/d/oAzqN1dMz/fastly-realtime-cdn?orgId=1&refresh=30s&var-Service=LG%20TV%20service%20production)
   dashboard.

## Sentry

1. Create a new
   [Sentry project](https://sentry.io/organizations/twitch/projects/) and
   configure the app to sends logs for that project.

## Runbooks

1. Add a new directory under `internal-docs/apps` if one does not already exist.
1. In that directory, add a "logs-and-metrics.md" file if one does not already
   exist.
1. Add or update that doc with links to the various metrics / log sources:
   CloudWatch, Grafana, Sentry, etc.
   - [Starshot Example](https://git.xarth.tv/pages/emerging-platforms/tachyon/d/internal-docs/apps/starshot/logs-and-metrics/)
