# Aperture Design and Internals

## Table of Contents
1. [Motivation](#motivation)
2. [How Squad Streaming Works](#how-squad-streaming-works)
3. [How Picture-By-Picture Ad Works](#how-picture-by-picture-ad-works)
4. [How Twitch Prime Watch Party Works](#how-twitch-prime-watch-party-works)
5. [How Aperture Works](#how-aperture-works)
6. [Architecture](#architecture)
7. [Project Organization](#project-organization)
8. [Patterns](#patterns)

## Motivation

[Product doc](https://docs.google.com/document/d/1tlzVO_V45ZTZFBNqgZQVaTqppkZvh2KkM6C935Smoy0) | [Aperture Tech spec](https://docs.google.com/document/d/1afHBDIC4egmGoW53e0rJciH6P9oMIlW8xrtir_dBxVs)

With the release of squad streams, viewers could potentially be watching up to four streams at the same time.
This isn't inherently bad itself, but would result in inflated numbers visible to broadcasters, viewers, and internal reports.
In particular, hours watched metrics across the site would appear inflated, but this may not correlate to an increase in
viewership, but an increase in squad streaming. Similarly, the browse directory may show larger numbers for certain games, which,
again, may not indicate more viewers but instead be due to more squads. Our goal was to ensure that these numbers do not
appear inflated because of squads.

Since the inception of Aperture, Twitch has an increasing need for adding various business logic to viewcounts. As a result, Aperture
also implements business logic for services including picture-by-picture ads and Twitch Prime watch parties.

## How Squad Streaming Works

When broadcasters form a squad, viewers can switch the channel page to a "squad view". The viewer can now see a video player for
each squad member. Squad view has the concept of a "primary" player: the largest player in the squad view. This player object
appears on the top of the view and takes up more screen space (when chatting, viewers are always chatting in the
primary squad member's chat). The player components for the other squad members appear smaller and at the bottom
of the view: these are called "secondary" players.

## How Picture-By-Picture Ad Works

When a broadcaster runs a picture-by-picture ad session, viewers see two players on the channel page - one for the broadcaster's stream and 
one for the ad. As a result, each viewer will count as two views for the video system during the session. To avoid the inflated 
viewcount during a picture-by-picture ad session, the ad service [Saul](https://git.xarth.tv/ads/saul) sends Aperture a 
request to freeze the viewcount during the ad session.

## How Twitch Prime Watch Party Works

When a broadcaster initiates a Twitch Prime Watch Party, their viewers see an Amazon Prime Video Player, which does not go through
Twitch's video system. The viewcounts during a Watch Party needs to be sent to Spade for internal reports. A Watch Party's viewcounts 
can be fetched from the [Blender service](https://git.xarth.tv/vod/blender).

## How Aperture Works

To get the final viewcounts, Aperture does the following: 

1. In order to reduce viewcount inflation from squad streaming, we filter out viewers of secondary squad players from final 
view count numbers. Viewers can enter squad view and have all squad members broadcasts visible, but they will only count as a 
viewer to their primary broadcaster.

Primary and secondary viewership has been added as a field on the minute-watched event, which the video player sends to spade.
We listen to a stream of these events, calculate a ratio of secondary minute watched over total minute watched for each
channel, and then apply that ratio to original viewcount numbers. This has the effect of removing secondary viewership numbers
from reported viewcounts.

2. In order to reduce viewcount inflation from picture-by-picture ad sessions, we receive requests to freeze the viewcount during
an ad session, which contains the freeze duration and the ramp-up duration for after the session. We store this information, along with the 
initial viewcount when the freeze is requested, in the cache. When viewcounts are requested, if the channel is frozen, the initial
viewcount is returned. If the channel is in the ramp-up period, we calculate the viewcount by comparing the viewcount at the time 
and the initial viewcount, and return an adjusted viewcount based on how long the channel is in the ramp-up period.

3. When we request viewcount data for Spade, we also send a request to get viewcounts for Twitch Prime Watch Parties and send 
them for `channel_concurrents`. 

## Architecture

![Aperture architecture](/docs/images/architecture.png)

Aperture is responsible for calculating this ratio and making it available through an API. It also takes responsiblity
for writing the viewcount numbers to `channel_concurrents` and `global_concurrents` in spade, as well as publishing them to
pubsub (see [pubsub docs](/docs/pubsub.md)).

The ratio is calculated by a kinesis analytics application, which sits on top of a kinesis stream of minute-watched events. The ratio
is secondary minute watched over total minute watched, summed over a 5 minute period. This is then output to a lambda, which
writes the value to a cache.

The API gets viewcount numbers from viewcount-api, and then applies the cached ratio to them.

Spade logger fetches viewcounts from viewcount-api and viewcounts from Blender, and joins those numbers with additional live stream data from usher.
It applies the cached ratios to the viewcount numbers, and then sends them the appropriate spade tables.
Spade Logger is a lambda that is called every 1 minute.

Pubsub publisher fetches viewcounts from viewcount-api, applies the ratio to them and then publishes those numbers to the
appropriate pubsub topics. This is a lambda that is called every minute **however**, the lambda publishes to pubsub twice,
so each pubsub topic is updated every 30 seconds.

## Project Organization

`cmd`: Contains the `main` function and the entry point for starting the API server.

`config`: Contains the configuration loading logic and the configuration files for starting the API server.

`internal/apertureserver`: The business logic for the API. This is where most of aperture's code lives.

`internal/clients`: Clients and wrappers for dependencies. The following clients are used:

- **experiments**: Used to get experiment values for determining whether to write to certain tables or pubsub topics. See [discovery/experiments](https://git.xarth.tv/discovery/experiments).
- **memcached**: Used for getting/setting ratios from a cache. Uses [foundation/gomemcache](https://git.xarth.tv/foundation/gomemcache).
- **pubsub**: Sends viewcount data to pubsub. Uses [chat/pubsub-go-pubclient](https://git.xarth.tv/chat/pubsub-go-pubclient).
- **secrets**: For setting/getting secrets from AWS secret manager.
- **spade**: Sends channel concurrents data to spade. Uses [common/spade-client-go](https://git.xarth.tv/common/spade-client-go).
- **stats**: Graphite stats.
- **multiplex**: Calls multiplex for live broadcaster stats. Needed for additional information when sending channel concurrents to spade.
- **viewcount**: Gets live viewcounts from viewcount-api. See [viewcount-api](https://git.xarth.tv/video/viewcount-api).
- **blender**: Gets live viewcounts for Twitch Prime Watch Parties. See [vod/blender](https://git.xarth.tv/vod/blender).
- **frozone**: Stores, fetches, and applies viewcount freezes from picture-by-picture ad sessions.

`internal/fetcher`: Responsible for getting viewcount numbers and applying cached ratios and freezes to them.

`internal/mocks`: Unit testing mocks for clients. Running `make mocks` will put mocks in this directory.

`kinesis_analytics`: SQL/logic for the kinesis analytics application.

`lambda`: Main entry points for various lambdas used by aperture.

- **output_mw_ratio_to_elasticache**: This lambda is set as the output from a kinesis analytics application on a minute_watched stream. It writes minute_watched data to our cache.
- **spade_logger**: Called every minute by a Cloudwatch event, this sends channel concurrents to spade.
- **pubsub_sender**: Called every minute by a Cloudwatch event, this sends channel concurrents to pubsub.
- **error_stream_logger**: Handles logging errors that are streamed from Kinesis Analytics.

`rpc/aperture`: Service definition and twirp generated files.

`scripts`: Build and deploy scripts.

`terraform`: All infrastructure configuration lives here.

## Patterns

### Worker pools

We often use weighted semaphores to simulate worker pools. We create a semaphore of a given size, and acquire a "worker" from the
pool at the creation of each goroutine. Once the goroutine is finished, we release that worker to be made available to another
goroutine. The acquire call on a weighted semaphore blocks until another value is available, which will limit the amount of
goroutines we can create at one time.

The size of these weighted semaphores is based on constants in code. These constants have been tuned in order to maximize performance
and eliminate hammering our infrastructure with too many concurrent requests. **These constants should not be changed without
sufficient testing on a staging environment**. If you do need to change them, ensure that the code has been run against a test
environment and no errors have occurred.

Examples of this pattern can be found [here](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/clients/spade/spade.go)
and [here](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/clients/pubsub/pubsub.go).

### Locking Maps

A [locking map](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/util/locking_ratio_map.go)
data structure is used to perform writes to a map concurrently. Go provides a sync.Map package, but there are several reasons to not
use it. These are documented inline, but worth repeating here:

1. We lose all type safety when using `interface{}` as the type for keys and values.
2. We can range over the InnerMap easier after the keys have been written.
3. sync.Map may or may not have worse performance than a simple lock. At least this struct is predictable.
4. From the golang docs: "The Map type is specialized. Most code should use a plain Go map instead, with separate locking or coordination,
for better type safety and to make it easier to maintain other invariants along with the map content."

The locking map is simply a wrapper for a mutex and a normal go map. We lock the map before each write and then unlock it after, to
ensure that there are no race conditions. We use a `map[string]float64` type here because we store channel ID as keys and ratios as values.

### Ratio Buffer and Cache Fetching

Although ratios are stored in a cache, retrieving a large number of them (like when we get viewcounts for all live channels) can have higher latency. This latency may
not be acceptable to users of Aperture, particularly those that expect the same latency as viewcount-api. Thus, we use an [in-memory buffer](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/util/ratio_buffer.go) to store ratios, and periodically fetch them and update the buffer. The ratio buffer is an `atomic.Value` type that stores a pointer to a map.

This buffer allows many calls to the aperture API to skip fetching numbers from a cache, saving time in both network and cache latency. The buffer has an [expiration time](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/fetcher/fetcher.go#L26), which is used to determine how stale the numbers in the buffer are. After this expiration time has been reached, the next API call that needs ratios will fetch them from the cache and update the buffer. The buffer is also [initialized and filled](https://git.xarth.tv/businessviewcount/aperture/blob/master/internal/fetcher/fetcher.go#L70) on server start. If we fail to initialize on start, we set the buffer to a nil map and continue normally.

