# Video Replication quick overview

## HLS
HTTP Live Streaming (HLS) is Apple's live streaming protocol. There are three different types of files in the HLS protocol: master playlists, media playlists, and media segments. The master playlists contain references to media playlists, which then contain references to media segments. All of these files are served over HTTP by various parts of our infrastructure.

![HLS overview](images/hls.png)

The player starts by fetching the master playlist for a particular channel. This playlist contains a list of media playlist urls and metadata for each. Each quality option is a separate media playlist and it’s up to the player to select one for playback. Here is an example of a master playlist for a live stream (April 2016):

	#EXTM3U
	#EXT-X-TWITCH-INFO:NODE="video-edge-2ca3c4.sfo01",MANIFEST-NODE="video-edge-2ca3c4.sfo01",SERVER-TIME="1459976734.42",USER-IP="192.168.46.255",CLUSTER="sfo01",MANIFEST-CLUSTER="sfo01"
	#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="chunked",NAME="Source",AUTOSELECT=YES,DEFAULT=YES
	#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=3820437,RESOLUTION=1280x720,VIDEO="chunked"
	http://video-edge-2ca3c4.sfo01.hls.ttvnw.net/hls-8c6e48/deadmau5_20641517344_430752810/chunked/index-live.m3u8?token=id=6856951794298168033,bid=20641517344,exp=1459899563,node=video-edge-2ca3c4-1.sfo01.hls.justin.tv,nname=video-edge-2ca3c4.sfo01,fmt=chunked&sig=a387c95cff8bd3e0c2cb85f9c2e2b1bf1d90b0ae
	#EXT-X-MEDIA:TYPE=VIDEO,GROUP-ID="high",NAME="High",AUTOSELECT=YES,DEFAULT=YES
	#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1760000,RESOLUTION=1280x720,VIDEO="high"
	http://video-edge-2ca3c4.sfo01.hls.ttvnw.net/hls-8c6e48/deadmau5_20641517344_430752810/high/index-live.m3u8?token=id=6856951794298168033,bid=20641517344,exp=1459899563,node=video-edge-2ca3c4-1.sfo01.hls.justin.tv,nname=video-edge-2ca3c4.sfo01,fmt=high&sig=ad0a94a5cce53e817ae4dffc12d3fffa0bc95f80
	.... repeated for medium, low, and mobile ....


Media playlists are similar but contain a list of segments instead. These segments are separate TS files that contain 2-4 seconds of video. The player will download each segment and stitch them together into a continuous stream. Here is an example of a media playlist (April 2016):

	#EXTM3U
	#EXT-X-VERSION:3
	#EXT-X-TARGETDURATION:5
	#ID3-EQUIV-TDTG:2016-04-05T00:11:06
	#EXT-X-MEDIA-SEQUENCE:1610
	#EXT-X-TWITCH-ELAPSED-SECS:6442.425
	#EXT-X-TWITCH-TOTAL-SECS:6466.425
	#EXTINF:4.000,
	index-0000001611-w4hI.ts
	#EXTINF:4.000,
	index-0000001612-GzCN.ts
	#EXTINF:4.000,
	index-0000001613-r2y2.ts
	#EXTINF:4.000,
	index-0000001614-DoPO.ts
	#EXTINF:4.000,
	index-0000001615-Q7U8.ts
	#EXTINF:4.000,
	index-0000001616-TRVb.ts


Live video works by constantly refreshing the media playlist and waiting for new segments to appear at the end. Players also have the ability seamlessly switch between different quality options by picking and choosing segments from multiple media playlists. For example, a player could download `medium-55.ts` and switch to a higher quality by downloading `high-56.ts` afterward.

Below, we outline how our system actually implements this protocol. The master playlist is generated and served by find. The media playlist and media segments are generated by the transcoder but served by edge.

## Find
Find generates and serves the master playlist.

## Edge and PR Diagram

![Diagram](images/replication-edge.png)

## Video Edge
HAProxy runs locally and terminates the SSL connection for HTTPS requests. Sandstorm manages the SSL certificates and private keys. HAProxy then hands the request off to Varnish.

Varnish is an open source HTTP reverse proxy. The idea behind Varnish is that it coalesces multiple requests for the same url into one request to some backend service and caches the result. We use Varnish at each step of the replication tree to dramatically reduce latency and backbone utilization between PoPs.

Varnish receives a segment or playlist request and passes it along to a local service called Gatekeeper. Gatekeeper’s job is to verify the access token contained in the url. The access token contains information about the edge and stream which must match or the request will be denied. There is also a cryptographic signature generated by Find using a shared secret key, which is used to prevent forgeries.

Assuming the token is valid, Gatekeeper will then return the upstream url. For example (April 2016), given the media playlist request:

	http://video-edge-2ca3c4.sfo01.hls.ttvnw.net/hls-8c6e48/deadmau5_20641517344_430752810/chunked/index-live.m3u8?token=id=6856951794298168033,bid=20641517344,exp=1459899563,node=video-edge-2ca3c4-1.sfo01.hls.justin.tv,nname=video-edge-2ca3c4.sfo01,fmt=chunked&sig=7fc73830c0157a0b8712ae690436680a9088be04

Gatekeeper returns:

	http://localhost/upstream/hls-8c6e48/deadmau5_20641517344_430752810/chunked/index-live.m3u8

Varnish reads this response and attempts to fetch the file from this url instead. Players requesting the same channel/quality will result in the same upstream url, and thus Varnish will cache those requests. If there is no cache for a given url, the Edge will request the data from an upstream PR.

## Protected Replication (PR)
PR hosts are basically Edge hosts except that they do not handle external traffic. When an Edge does not have a url cached and needs the data, it makes a fetch from one of the PRs. We determine which PR to use based on the hash of the url, ensuring all Edges within a cluster will fetch from the same PR for a given url.

The PRs also run Varnish and serve cached data if available. Otherwise, the PR consults the replication tree and fetches from another PR in another PoP. This continues until one of the hosts has the data cached or we reach SFO and fetch directly from the origin bo.

The purpose of PRs is to maximize cache utilization and prevent using the backbone unless absolutely needed. PRs are not externally available, to prevent being an easy target that takes down the entire video system. As a result, it’s basically just Varnish because we no longer need many of the Edge components.

## Replication Tree
If a file is not already cached, the replication host will use the replication tree to determine the next upstream. The replication tree works by defining a primary and secondary for each PoP. This configuration is stored in Consul and manually updated by operations. When Consul is updated, the Varnish configuration on each PR is regenerated to point to the new hosts.

Here’s what the live replication tree looked like at some point (April 2016):
![Replication tree](images/replication-tree.png)

Let’s follow an example request. A German viewer has been assigned by find to an edge in fra01 (Frankfurt). When they request a media playlist or segment, the edge checks its cache and if missing, forwards the requests to a replication host in fra01. This replication host also checks its cache and then forwards the request to ams01 (Amsterdam). The same process continues through iad02 (Dallas) and eventually to sfo01 (San Francisco) where the origin lives. If any of these clusters is down, we use the dotted line as a backup instead.

## Other Components
### Edge View Counter
Notified by Gatekeeper for each playlist request. Uses information such as the channel, token, and client IP address to determine if the request should be counted as a viewer. This data is then aggregated and sent to Video API where it is stored.

### Global View Counter
Similar to edge view counter, but the raw data is shipped to this service and the aggregation happens there. This gives us more information to determine if a viewer is legitimate. This data is stored in Video API alongside the edge view counts. Video API will pick one of these two view counts depending on the channel as the final view count.

### Prefetcher
Notified by Gatekeeper for each incoming playlist and segment request. Continuously fetches the playlist looking for new segments for active channels. If a segment is found that will most likely be requested by a user, the Prefetcher warms the cache by requesting the segment itself. This can dramatically reduce latency but has some waste associated, and is only enabled in certain PoPs where latency is a problem (Sydney, Hong Kong, etc).

### Statmon
Collects metrics about the utilization and health of Varnish. Deregisters the Edge in Video API if it’s having problems so it will no longer be assigned to new viewers by Find.

### Video API
Stores information about individual streams. Part of the Usher codebase at the moment.
